MediaWiki is free and open source software for building and maintaining a wiki website. Using the MediaWiki Download function and the MediaWikiPlugin, you can mirror a Mediawiki website in a Greenstone collection.
See the download page for general information on downloading records through Greenstone.
Greenstone can download HTML pages and associated files like stylesheets from a given MediaWiki website from the GLI (in the Download panel) or the command line (using the downloadfrom.pl
script). Either way, you are presented with the following options:
Option | Description |
---|---|
Source URL(-url <string> ) | (REQUIRED) Source URL. In case of http redirects, this value may change |
Download Depth (-depth <int> ) | How many hyperlinks deep to go when downloading (Default: 0) |
Only files below URL (-below ) | Only mirror files below this URL |
Only files within site (-within ) | Only mirror files within the same site |
Ignore URL patterns (-reject_files <string> ) | Ignore url list, separate by comma, e.g.*cgi-bin*,*.ppt ignores hyperlinks that contain either 'cgi-bin' or '.ppt' (Default: *action=*,*diff=*,*oldid=*,*printable*,*Recentchangeslinked*, Userlogin*,*Whatlinkshere*, *redirect*, *Special:*,Talk:*,Image:*,*.ppt,*.pdf,*.zip,*.doc) |
Exclude directories (-exclude_directories <string> ) | List of exclude directories (must be absolute path to the directory), e.g. /people,/documentation will exclude the 'people' and 'documentation' subdirectory under the currently crawling site. (Default: /wiki/index.php/Special:Recentchangeslinked, /wiki/index.php/Special:Whatlinkshere, /wiki/index.php/Talk:Creating_CD) |
If downloading via the GLI, you can view the downloaded files on the Gather panel. On the left-hand side of the panel, double click the Downloaded Files folder to expand its content. The subfolders are named by the URL. These files are physically stored in a temporary cache directory. You can build a collection using these downloaded files by dragging them across to the Collection section on the right-hand side of the Gather panel.
An example MediaWiki download on the command line would be:
perl -S downloadfrom.pl -document_mode MediaWiki -url http://en.wikipedia.org/ -depth 1 -reject_files *Recentchangeslinked*, Userlogin*,*Whatlinkshere*,
This would download files below the url http://en.wikipedia.org/
to one hyperlink deep, rejecting files with Recentchangeslinked, Userlogin, or Whatlinkshere in their url, and excluding the default directories.
Files downloaded from MediaWiki sites are processed by the MediaWikiPlugin.