This version (2014/04/14 11:52) is a draft.
Approvals: 0/1


MediaWiki is free and open source software for building and maintaining a wiki website. Using the MediaWiki Download function and the MediaWikiPlugin, you can mirror a Mediawiki website in a Greenstone collection.

MediaWiki Download

See the download page for general information on downloading records through Greenstone.

Greenstone can download HTML pages and associated files like stylesheets from a given MediaWiki website from the GLI (in the Download panel) or the command line (using the script). Either way, you are presented with the following options:

Source URL(-url <string>)(REQUIRED) Source URL. In case of http redirects, this value may change
Download Depth (-depth <int>)How many hyperlinks deep to go when downloading (Default: 0)
Only files below URL (-below)Only mirror files below this URL
Only files within site (-within)Only mirror files within the same site
Ignore URL patterns (-reject_files <string>)Ignore url list, separate by comma, e.g.*cgi-bin*,*.ppt ignores hyperlinks that contain either 'cgi-bin' or '.ppt' (Default: *action=*,*diff=*,*oldid=*,*printable*,*Recentchangeslinked*, Userlogin*,*Whatlinkshere*, *redirect*, *Special:*,Talk:*,Image:*,*.ppt,*.pdf,*.zip,*.doc)
Exclude directories (-exclude_directories <string>)List of exclude directories (must be absolute path to the directory), e.g. /people,/documentation will exclude the 'people' and 'documentation' subdirectory under the currently crawling site. (Default: /wiki/index.php/Special:Recentchangeslinked, /wiki/index.php/Special:Whatlinkshere, /wiki/index.php/Talk:Creating_CD)

If downloading via the GLI, you can view the downloaded files on the Gather panel. On the left-hand side of the panel, double click the Downloaded Files folder to expand its content. The subfolders are named by the URL. These files are physically stored in a temporary cache directory. You can build a collection using these downloaded files by dragging them across to the Collection section on the right-hand side of the Gather panel.

An example MediaWiki download on the command line would be:

 perl -S -document_mode MediaWiki -url -depth 1 -reject_files *Recentchangeslinked*, Userlogin*,*Whatlinkshere*, 

This would download files below the url to one hyperlink deep, rejecting files with Recentchangeslinked, Userlogin, or Whatlinkshere in their url, and excluding the default directories.

Files downloaded from MediaWiki sites are processed by the MediaWikiPlugin.