Downloading from the command line
Greenstone allows you to download files from the internet using a variety of protocols:
- Web: downloads web pages and files via HTTP and FTP.
- MediaWiki: downloads web pages and files via HTTP from a MediaWiki website.
- Z39.50: downloads MARC records that match a particular search criterion from a Z39.50 server.
- SRW/SRU: downloads MARCXML records that match a particular search criterion from a Search/Retrieve via URL (SRU) server.
This can be done from either the Download panel of the GLI, or directly from the command line.
GLI uses a perl script, downloadfrom.pl, to download files. This can be run on the command line,
outside of GLI. The following options are available for all
methods of download (Web, MediaWiki, OAI, Z3950, and SRW) using
|(REQUIRED) The type of server to download from; allowable values:
|The location of the cache directory
|Print information about the server, rather than downloading
|This information is also available from the command line:
perl -S downloadfrom.pl -h
There are also several options available if you are using a proxy:
|Indicates you are using a proxy connection
Each download mode also has its own set of additional options, which are outlined on
their respective pages in the documentation (Web, MediaWiki, OAI, Z39.50, SRW/SRU) . These options are the same as
those available on the GLI Download panel and can also be viewed by running
perl -S downloadinfo.pl <download-module>. For example, to get information
and options for downloading via OAI protocol, you would run:
perl -S downloadinfo.pl OAIDownload
The download modules names are:
- SRWDownload (for SRU/SRW downloads)
Once aware of all of the options, you are able to run the download script. An example download would be:
perl -S downloadfrom.pl -document_mode OAI -url http://www.nzdl.org/cgi-bin/oaiserver.cgi -set demo -max_records 5
This will try to download 5 records from the set demo at the nzdl.org's OAI server.
The records (and optionally documents) will be downloaded into the folder the script is run from. To change this, use the
-cache_dir <full-path-to-folder> option.