MediaWiki is free and open source software for building and maintaining a wiki website. Using the MediaWiki Download function and the MediaWikiPlugin, you can mirror a Mediawiki website in a Greenstone collection.
See the download page for general information on downloading records through Greenstone.
Greenstone can download HTML pages and associated files like stylesheets from a given MediaWiki website from the GLI (in the Download panel) or the command line (using the
downloadfrom.pl script). Either way, you are presented with the following options:
|Source URL(||(REQUIRED) Source URL. In case of http redirects, this value may change|
|Download Depth (||How many hyperlinks deep to go when downloading (Default: 0)|
|Only files below URL (||Only mirror files below this URL|
|Only files within site (||Only mirror files within the same site|
|Ignore URL patterns (||Ignore url list, separate by comma, e.g.*cgi-bin*,*.ppt ignores hyperlinks that contain either 'cgi-bin' or '.ppt' (Default: *action=*,*diff=*,*oldid=*,*printable*,*Recentchangeslinked*, Userlogin*,*Whatlinkshere*, *redirect*, *Special:*,Talk:*,Image:*,*.ppt,*.pdf,*.zip,*.doc)|
|Exclude directories (||List of exclude directories (must be absolute path to the directory), e.g. /people,/documentation will exclude the 'people' and 'documentation' subdirectory under the currently crawling site. (Default: /wiki/index.php/Special:Recentchangeslinked, /wiki/index.php/Special:Whatlinkshere, /wiki/index.php/Talk:Creating_CD)|
If downloading via the GLI, you can view the downloaded files on the Gather panel. On the left-hand side of the panel, double click the Downloaded Files folder to expand its content. The subfolders are named by the URL. These files are physically stored in a temporary cache directory. You can build a collection using these downloaded files by dragging them across to the Collection section on the right-hand side of the Gather panel.
An example MediaWiki download on the command line would be:
perl -S downloadfrom.pl -document_mode MediaWiki -url http://en.wikipedia.org/ -depth 1 -reject_files *Recentchangeslinked*, Userlogin*,*Whatlinkshere*,
This would download files below the url
http://en.wikipedia.org/ to one hyperlink deep, rejecting files with Recentchangeslinked, Userlogin, or Whatlinkshere in their url, and excluding the default directories.
Files downloaded from MediaWiki sites are processed by the MediaWikiPlugin.