Being part of CTWUG we are alway on the lookout for new services to run on our network that could prove usefull to the users of the CTWUG network.
The latest idea we got was to provide users with a local copy of the Wikipedia website, this would enable CTWUG users to view wikipedia content at local network speeds without having an internet connection.
After extensive research I came across this nice tutorial on how to run your own copy of Wikipedia from a database dump of the real Wikipedia website. Who whould have thought that Wikipedia would offer a montly database dump of their Website for users to download, good news is that they do.
I’ll provide the steps for you to follow to set up your own Wikipedia mirror, please note basic knowledge of linux, bash, apache and mysql is required. The steps is for installing the mirror on a Ubuntu machine.
So here is the steps for setting up your own Wikipedia mirror from their database dump.
- Install LAMP: Linux Apache MySQL PHP
apt-get update apt-get install apache2 php5 libapache2-mod-php5 mysql-server mysql-client php5-mysql phpmyadmin
- Setup MySQL: You need to set your mysql root password
$ mysql mysql> USE mysql; mysql> UPDATE user SET Password=PASSWORD(’new-password’) WHERE user=’root’; mysql> FLUSH PRIVILEGES;
You also need to create a database for your incoming Wikipedia. Go to http://localhost/ and click on phpmyadmin. Log in using your new root password. Under Create new database, enter
wikidband click Create. On the new page, click on Privileges, add the new userwikiuserand clickcheck all, thenGo. - Download the MediaWiki software: This is the software wikipedia is running on. Go to the MediaWiki download page. On the right, download the .tar.gz file.
wget http://download.wikimedia.org/mediawiki/1.15/mediawiki-1.15.1.tar.gz
Decompress it and move it to /var/www/
tar xf mediawiki-1.15.1.tar.gz mv mediawiki-1.15.1.tar.gz wikipedia sudo mv wikipedia /var/www/
I am installing it under the directory wikipedia.
Change the file permissions of the config directorycd /var/www/wikipedia/ chmod a+x config/
Now navigate to http://localhost/wikipedia/ From here, the only things you need to put in are
- Site name (I chose WikiMirror)
- WikiSysop’s password (The administrator password)
- DB password
Now you need to move LocalSettings.php out of config.
mv config/LocalSettings.php
Now you can go to http://localhost/wikipedia/ and you should see your virgin MediaWiki install!
- Get Wikipedia’s database dumpYou can get the latest version of Wikipedia’s database dump by subscribing to http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2-rss.xml the latest file I got was http://download.wikimedia.org/enwiki/20091009/enwiki-20091009-pages-articles.xml.bz2
wget http://download.wikimedia.org/enwiki/20091009/enwiki-20091009-pages-articles.xml.bz2
The file is 5.2GB so it should take a while to download.
After downloading decompress the filetar xf enwiki-20091009-pages-articles.xml.bz2
The uncompressed size is almost 20 GB so be sure you have enough disk space available.
Now for the lenghty part of the process, you need to import the file into your mysql database.
Download mwimport.sh and save it and run it like thiscat enwiki-<date>.xml | mwimport | mysql -f -u <admin name> -p <database name>
This process should take a few hours to complete, from 7-12 hours depending on your HDD speed and processor.
That should be all, if all went well your will have a complete working copy of wikipedia on your local machine. CTWUG members can look forward to this service very soon.

Just curious, how much memory was on the system that you did this on, and did you modify any of the settings for mySQL?
Couple things:
1) Yes, a tweak to /etc/mysql/my.cnf was required; I changed max_packet_size to 128 MB. (It was 16 MB. 128 was probably overkill… but hey — better safe than sorry.)
2) No need to de-compress the .bz2 — and I don’t know if you could even *do* that with tar, since a .bz2 is a bzip’d file, and not a tar archive. Instead, I used the following:
bzcat enwiki-[...]-pages-articles.xml.bz2 | mwimport | mysql -p -f -u
3) Note that you can’t just do a “wget” on the mwimport link above — that’s a link to a mediawiki page that, in turn, has text you need to stuff into an executable, and then chmod +x on.
excellent walk through, and the only definitive guide that i could find. thank you very much.
a pretty consistant link to download the latest wikipedia pages would be:
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
in the end i had a little bit of trouble trying to get mwimport to be recognized/found. make sure to use the whole file name “mwimport.sh”
Does this include all the images as well?
unfortunately no.. i looked around for an answer to this problem but couldn’t find one.. if you find a way to get all the images, please do share!
1
For the two users above me, the images aren’t available. Longer explanation here:
http://en.wikipedia.org/wiki/Wikipedia:Database_download#Where_are_images_and_uploaded_files
Is anybody hosting a wikipedia mirror that is accessible on-line?
Dumps of images are no longer available, but you can use this automated script to download them: http://meta.wikimedia.org/wiki/Wikix