Being part of CTWUG we are alway on the lookout for new services to run on our network that could prove usefull to the users of the CTWUG network.
The latest idea we got was to provide users with a local copy of the Wikipedia website, this would enable CTWUG users to view wikipedia content at local network speeds without having an internet connection.
After extensive research I came across this nice tutorial on how to run your own copy of Wikipedia from a database dump of the real Wikipedia website. Who whould have thought that Wikipedia would offer a montly database dump of their Website for users to download, good news is that they do.
I’ll provide the steps for you to follow to set up your own Wikipedia mirror, please note basic knowledge of linux, bash, apache and mysql is required. The steps is for installing the mirror on a Ubuntu machine.
So here is the steps for setting up your own Wikipedia mirror from their database dump.
- Install LAMP: Linux Apache MySQL PHP
apt-get update
apt-get install apache2 php5 libapache2-mod-php5 mysql-server mysql-client php5-mysql phpmyadmin
- Setup MySQL: You need to set your mysql root password
$ mysql
mysql> USE mysql;
mysql> UPDATE user SET Password=PASSWORD(’new-password’) WHERE user=’root’;
mysql> FLUSH PRIVILEGES;
You also need to create a database for your incoming Wikipedia. Go to http://localhost/ and click on phpmyadmin. Log in using your new root password. Under Create new database, enter wikidb and click Create. On the new page, click on Privileges, add the new user wikiuser and click check all, then Go.
- Download the MediaWiki software: This is the software wikipedia is running on. Go to the MediaWiki download page. On the right, download the .tar.gz file.
wget http://download.wikimedia.org/mediawiki/1.15/mediawiki-1.15.1.tar.gz
Decompress it and move it to /var/www/
tar xf mediawiki-1.15.1.tar.gz
mv mediawiki-1.15.1.tar.gz wikipedia
sudo mv wikipedia /var/www/
I am installing it under the directory wikipedia.
Change the file permissions of the config directory
cd /var/www/wikipedia/
chmod a+x config/
Now navigate to http://localhost/wikipedia/ From here, the only things you need to put in are
- Site name (I chose WikiMirror)
- WikiSysop’s password (The administrator password)
- DB password
Now you need to move LocalSettings.php out of config.
mv config/LocalSettings.php
Now you can go to http://localhost/wikipedia/ and you should see your virgin MediaWiki install!
- Get Wikipedia’s database dumpYou can get the latest version of Wikipedia’s database dump by subscribing to http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2-rss.xml the latest file I got was http://download.wikimedia.org/enwiki/20091009/enwiki-20091009-pages-articles.xml.bz2
wget http://download.wikimedia.org/enwiki/20091009/enwiki-20091009-pages-articles.xml.bz2
The file is 5.2GB so it should take a while to download.
After downloading decompress the file
tar xf enwiki-20091009-pages-articles.xml.bz2
The uncompressed size is almost 20 GB so be sure you have enough disk space available.
Now for the lenghty part of the process, you need to import the file into your mysql database.
Download mwimport.sh and save it and run it like this
cat enwiki-<date>.xml | mwimport | mysql -f -u <admin name> -p <database name>
This process should take a few hours to complete, from 7-12 hours depending on your HDD speed and processor.
That should be all, if all went well your will have a complete working copy of wikipedia on your local machine. CTWUG members can look forward to this service very soon.