This version (2014/04/14 11:52) is a draft.
Approvals: 0/1

MediaWiki

MediaWiki is free and open source software for building and maintaining a wiki website. Using the MediaWiki Download function and the MediaWikiPlugin, you can mirror a Mediawiki website in a Greenstone collection.

MediaWiki Download

See the download page for general information on downloading records through Greenstone.

Greenstone can download HTML pages and associated files like stylesheets from a given MediaWiki website from the GLI (in the Download panel) or the command line (using the downloadfrom.pl script). Either way, you are presented with the following options:

OptionDescription
Source URL(-url <string>)(REQUIRED) Source URL. In case of http redirects, this value may change
Download Depth (-depth <int>)How many hyperlinks deep to go when downloading (Default: 0)
Only files below URL (-below)Only mirror files below this URL
Only files within site (-within)Only mirror files within the same site
Ignore URL patterns (-reject_files <string>)Ignore url list, separate by comma, e.g.*cgi-bin*,*.ppt ignores hyperlinks that contain either 'cgi-bin' or '.ppt' (Default: *action=*,*diff=*,*oldid=*,*printable*,*Recentchangeslinked*, Userlogin*,*Whatlinkshere*, *redirect*, *Special:*,Talk:*,Image:*,*.ppt,*.pdf,*.zip,*.doc)
Exclude directories (-exclude_directories <string>)List of exclude directories (must be absolute path to the directory), e.g. /people,/documentation will exclude the 'people' and 'documentation' subdirectory under the currently crawling site. (Default: /wiki/index.php/Special:Recentchangeslinked, /wiki/index.php/Special:Whatlinkshere, /wiki/index.php/Talk:Creating_CD)

If downloading via the GLI, you can view the downloaded files on the Gather panel. On the left-hand side of the panel, double click the Downloaded Files folder to expand its content. The subfolders are named by the URL. These files are physically stored in a temporary cache directory. You can build a collection using these downloaded files by dragging them across to the Collection section on the right-hand side of the Gather panel.

An example MediaWiki download on the command line would be:

 perl -S downloadfrom.pl -document_mode MediaWiki -url http://en.wikipedia.org/ -depth 1 -reject_files *Recentchangeslinked*, Userlogin*,*Whatlinkshere*, 

This would download files below the url http://en.wikipedia.org/ to one hyperlink deep, rejecting files with Recentchangeslinked, Userlogin, or Whatlinkshere in their url, and excluding the default directories.

Files downloaded from MediaWiki sites are processed by the MediaWikiPlugin.