How to mirror Wikipedia

Being part of CTWUG we are alway on the lookout for new services to run on our network that could prove usefull to the users of the CTWUG network.

The latest idea we got was to provide users with a local copy of the Wikipedia website, this would enable CTWUG users to view wikipedia content at local network speeds without having an internet connection.

After extensive research I came across this nice tutorial on how to run your own copy of Wikipedia from a database dump of the real Wikipedia website. Who whould have thought that Wikipedia would offer a montly database dump of their Website for users to download, good news is that they do.

I’ll provide the steps for you to follow to set up your own Wikipedia mirror, please note basic knowledge of linux, bash, apache and mysql is required. The steps is for installing the mirror on a Ubuntu machine.

So here is the steps for setting up your own Wikipedia mirror from their database dump.

  1. Install LAMP: Linux Apache MySQL PHP
    apt-get update
    apt-get install apache2 php5 libapache2-mod-php5 mysql-server mysql-client php5-mysql phpmyadmin
  2. Setup MySQL: You need to set your mysql root password
    $ mysql
    mysql> USE mysql;
    mysql> UPDATE user SET Password=PASSWORD(’new-password’) WHERE user=’root’;
    mysql> FLUSH PRIVILEGES;

    You also need to create a database for your incoming Wikipedia. Go to http://localhost/ and click on phpmyadmin. Log in using your new root password. Under Create new database, enter wikidb and click Create. On the new page, click on Privileges, add the new user wikiuser and click check all, then Go.

  3. Download the MediaWiki software: This is the software wikipedia is running on. Go to the MediaWiki download page. On the right, download the .tar.gz file.
    wget http://download.wikimedia.org/mediawiki/1.15/mediawiki-1.15.1.tar.gz

    Decompress it and move it to /var/www/

    tar xf mediawiki-1.15.1.tar.gz
    mv mediawiki-1.15.1.tar.gz wikipedia
    sudo mv wikipedia /var/www/

    I am installing it under the directory wikipedia.
    Change the file permissions of the config directory

    cd /var/www/wikipedia/
    chmod a+x config/

    Now navigate to http://localhost/wikipedia/ From here, the only things you need to put in are

    • Site name (I chose WikiMirror)
    • WikiSysop’s password (The administrator password)
    • DB password

    Now you need to move LocalSettings.php out of config.

    mv config/LocalSettings.php

    Now you can go to http://localhost/wikipedia/ and you should see your virgin MediaWiki install!

  4. Get Wikipedia’s database dumpYou can get the latest version of Wikipedia’s database dump by subscribing to http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2-rss.xml the latest file I got was http://download.wikimedia.org/enwiki/20091009/enwiki-20091009-pages-articles.xml.bz2
    wget http://download.wikimedia.org/enwiki/20091009/enwiki-20091009-pages-articles.xml.bz2

    The file is 5.2GB so it should take a while to download.
    After downloading decompress the file

    tar xf enwiki-20091009-pages-articles.xml.bz2

    The uncompressed size is almost 20 GB so be sure you have enough disk space available.
    Now for the lenghty part of the process, you need to import the file into your mysql database.
    Download mwimport.sh and save it and run it like this

    cat enwiki-<date>.xml | mwimport | mysql -f -u <admin name> -p <database name>

    This process should take a few hours to complete, from 7-12 hours depending on your HDD speed and processor.

That should be all, if all went well your will have a complete working copy of wikipedia on your local machine. CTWUG members can look forward to this service very soon.

CentOS’ lead developer disappeared

CentOS the RedHat based Linux operating is in jeapardy after the lead CentOS developer Lance Davis dissapeared into thin air. Normally this would not affect a project by that big a margain but in CentOS’ case it does because Lance Davis has sole control over the CentOS domain, IRC channel and I believe the Banking accounts aswell.

Fellow project Devlopers has written a open letter to Lance Davis and posted this on the CentOS website.

July 30, 2009 04:39 UTC

This is an Open Letter to Lance Davis from fellow CentOS Developers

It is regrettable that we are forced to send this letter but we are left with no other options. For some time now we have been attempting to resolve these problems:

You seem to have crawled into a hole … and this is not acceptable.

You have long promised a statement of CentOS project funds; to this date this has not appeared.

You hold sole control of the centos.org domain with no deputy; this is not proper.

You have, it seems, sole ‘Founders’ rights in the IRC channels with no deputy ; this is not proper.

When I (Russ) try to call the phone numbers for UK Linux, and for you individually, I get a telco intercept ‘Lines are temporarily busy’ for the last two weeks. Finally yesterday, a voicemail in your voice picked up, and I left a message urgently requesting a reply. Karanbir also reports calling and leaving messages without your reply.

Please do not kill CentOS through your fear of shared management of the project.

Clearly the project dies if all the developers walk away.

Please contact me, or any other signer of this letter at once, to arrange for the required information to keep the project alive at the ‘centos.org’ domain.

Sincerely,

Russ Herrold
Ralph Angenendt
Karanbir Singh
Jim Perrin
Donavan Nelson
Tim Verhoeven
Tru Huynh
Johnny Hughes

Introducing Google Chrome OS

First Google took on the browser arena with their Chrome Browser and now they are ready to take on the Operating System arena with their new OS called Google Chrome OS.

Google Chrome OS is an open source, lightweight operating system that will initially be targeted at netbooks. Later this year Google will open-source its code, and netbooks running Google Chrome OS will be available for consumers in the second half of 2010. Because Google is already talking to partners about the project, and they’ll soon be working with the open source community, Google wanted to share their vision now so everyone understands what they are trying to achieve.

Speed, simplicity and security are the key aspects of Google Chrome OS. Google is designing the OS to be fast and lightweight, to start up and get you onto the web in a few seconds. As Google did for the Google Chrome browser, they are going back to the basics and completely redesigning the underlying security architecture of the OS so that users don’t have to deal with viruses, malware and security updates. It should just work.

Google Chrome OS will run on both x86 as well as ARM chips and we are working with multiple OEMs to bring a number of netbooks to market next year.

Google Chrome OS is a new project, separate from Android. Android was designed from the beginning to work across a variety of devices from phones to set-top boxes to netbooks. Google Chrome OS is being created for people who spend most of their time on the web, and is being designed to power computers ranging from small netbooks to full-size desktop systems.

Google still have a lot of work to do, and they’re definitely going to need a lot of help from the open source community to accomplish this vision.

Google Chrome OS seems to me like another entrant specificly for the netbook arena and I for one would like to see a better lightweight operating system than the current popular Limpus Lite. Not only do I think it will perform well but with a name like Chrome it should probably have the nice look of the Chrome browser.

Afrigator SA Topsites ::