• Home
  • About iGeek
  • Contact Me
  • SMS Alert WordPress Plugin
  • Submit News
  • Entertainment
    • Accommodation
    • Babes
    • Cape Town
    • Cars
    • Cuisine
    • Gaming
    • Humor
    • Movies
    • Music
    • Promotions
    • Sexy
    • Sport
    • Television
  • My Life
    • People
    • Personal
    • Work
  • News
    • Local News
    • World News
  • Technology
    • Android
    • Applications
    • Blogging
    • Broadband
    • Code
    • Gadgets
    • Internet
    • Linux
    • Mobile
    • Pay-TV
    • SEO
    • Social Networking
    • Telecoms
    • Wireless




How to mirror Wikipedia

Posted on October 16, 2009 in Applications, Code, Internet, Linux by Gerhard | 1 Comment » 

Being part of CTWUG we are alway on the lookout for new services to run on our network that could prove usefull to the users of the CTWUG network.

The latest idea we got was to provide users with a local copy of the Wikipedia website, this would enable CTWUG users to view wikipedia content at local network speeds without having an internet connection.

After extensive research I came across this nice tutorial on how to run your own copy of Wikipedia from a database dump of the real Wikipedia website. Who whould have thought that Wikipedia would offer a montly database dump of their Website for users to download, good news is that they do.

I’ll provide the steps for you to follow to set up your own Wikipedia mirror, please note basic knowledge of linux, bash, apache and mysql is required. The steps is for installing the mirror on a Ubuntu machine.

So here is the steps for setting up your own Wikipedia mirror from their database dump.

  1. Install LAMP: Linux Apache MySQL PHP
    apt-get update
    apt-get install apache2 php5 libapache2-mod-php5 mysql-server mysql-client php5-mysql phpmyadmin
  2. Setup MySQL: You need to set your mysql root password
    $ mysql
    mysql> USE mysql;
    mysql> UPDATE user SET Password=PASSWORD(’new-password’) WHERE user=’root’;
    mysql> FLUSH PRIVILEGES;

    You also need to create a database for your incoming Wikipedia. Go to http://localhost/ and click on phpmyadmin. Log in using your new root password. Under Create new database, enter wikidb and click Create. On the new page, click on Privileges, add the new user wikiuser and click check all, then Go.

  3. Download the MediaWiki software: This is the software wikipedia is running on. Go to the MediaWiki download page. On the right, download the .tar.gz file.
    wget http://download.wikimedia.org/mediawiki/1.15/mediawiki-1.15.1.tar.gz

    Decompress it and move it to /var/www/

    tar xf mediawiki-1.15.1.tar.gz
    mv mediawiki-1.15.1.tar.gz wikipedia
    sudo mv wikipedia /var/www/

    I am installing it under the directory wikipedia.
    Change the file permissions of the config directory

    cd /var/www/wikipedia/
    chmod a+x config/

    Now navigate to http://localhost/wikipedia/ From here, the only things you need to put in are

    • Site name (I chose WikiMirror)
    • WikiSysop’s password (The administrator password)
    • DB password

    Now you need to move LocalSettings.php out of config.

    mv config/LocalSettings.php

    Now you can go to http://localhost/wikipedia/ and you should see your virgin MediaWiki install!

  4. Get Wikipedia’s database dumpYou can get the latest version of Wikipedia’s database dump by subscribing to http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2-rss.xml the latest file I got was http://download.wikimedia.org/enwiki/20091009/enwiki-20091009-pages-articles.xml.bz2
    wget http://download.wikimedia.org/enwiki/20091009/enwiki-20091009-pages-articles.xml.bz2

    The file is 5.2GB so it should take a while to download.
    After downloading decompress the file

    tar xf enwiki-20091009-pages-articles.xml.bz2

    The uncompressed size is almost 20 GB so be sure you have enough disk space available.
    Now for the lenghty part of the process, you need to import the file into your mysql database.
    Download mwimport.sh and save it and run it like this

    cat enwiki-<date>.xml | mwimport | mysql -f -u <admin name> -p <database name>

    This process should take a few hours to complete, from 7-12 hours depending on your HDD speed and processor.

That should be all, if all went well your will have a complete working copy of wikipedia on your local machine. CTWUG members can look forward to this service very soon.

Liked this post? Share it!
Publish How to mirror Wikipedia to muti Tweet How to mirror Wikipedia Stubmle How to mirror Wikipedia Share How to mirror Wikipedia on Facebook Submit How to mirror Wikipedia to reddit Email How to mirror Wikipedia to a friend

Article by Gerhard

Gerhard is the owner and founder of iGeek, most of his time he spends writing poetic PHP code, spreading technology news and thinking of new tech startup ideas.

Gerhard has written 398 awesome articles.

Related Posts

None

One Response to “How to mirror Wikipedia”

  1. Dan, on December 14th, 2009 at 06:30 Said:

    Just curious, how much memory was on the system that you did this on, and did you modify any of the settings for mySQL?

Leave a Reply

  • Vote for iGeek Blog
  • iGeek on Facebook
  • Competition SMS Alerts

    Mobile Number
    +

    Change Country

      Unsubscribe
    No spam or subscription fees
  • Tag Cloud

    2010 action movies adsl Afrihost Aggregation Amatomu Android Apple babe blog blogging Blogs Cape Town cheap dstv facebook film reviews free google Google Buzz HTC Hero Internet iPad iPhone launch live microsoft Mobile movie reviews multichoice MyScoop rain samsung Sexy Smartphone Social Networking South Africa twitter Uncapped video Vodacom weather win WordPress World Cup

    WP Cumulus Flash tag cloud by Roy Tanck and Luke Morton requires Flash Player 9 or better.

  • What the readers say

    • Craig on Goodbye Resume, Hello Blog
    • Craig on 5 of the Best SEO WordPress Plugins
    • Craig on Salt Review
    • darylhb on Salt Review
    • ROI_Media (ROI Media) on 5 of the Best SEO WordPress Plugins
    • Craig on Salt Review
    • Mike Hunt on Telkom 10Mbps ADSL Line Upgrade in Progress
    • Simone on Salt Review
    • Gerhard on Uncapped Internet now affordable in South Africa
    • shamiel mohamed on Uncapped Internet now affordable in South Africa
  • Blogroll

    • 2OceansVibe
    • Captain Awsome
    • iMod
    • Jonathan Carter
    • Nick Duncan
    • Pieter Malherbe
    • Stu's take on…
    • ZEF
  • Most Popular

    • HTC Desire and Legend confirmed for South Africa
    • SMS Alert WordPress Plugin
    • MultiChoice launches new cheaper HD PVR decoder
    • Uncapped Internet now affordable in South Africa
    • Who wants a Google Wave invite?
About iGeek | Contact Me | SMS Alert WordPress Plugin

Copyright 2009-2010 www.igeek.co.za
myScoop Afrigator