This version (2014/04/14 11:52) is a draft.
Approvals: 0/1

Downloading from the command line

Greenstone allows you to download files from the internet using a variety of protocols:

  • Web: downloads web pages and files via HTTP and FTP.
  • MediaWiki: downloads web pages and files via HTTP from a MediaWiki website.
  • OAI: downloads metadata records (and optionally documents) from an OAI-PMH (Open Archives Initiative) server.
  • Z39.50: downloads MARC records that match a particular search criterion from a Z39.50 server.
  • SRW/SRU: downloads MARCXML records that match a particular search criterion from a Search/Retrieve via URL (SRU) server.

This can be done from either the Download panel of the GLI, or directly from the command line.

GLI uses a perl script, downloadfrom.pl, to download files. This can be run on the command line, outside of GLI. The following options are available for all methods of download (Web, MediaWiki, OAI, Z3950, and SRW) using downloadfrom.pl:

OptionDescription
-download_mode <enum>(REQUIRED) The type of server to download from; allowable values: Web, MediaWiki, OAI, Z3950, and SRW
-cache_dir <string>The location of the cache directory
-gli
-infoPrint information about the server, rather than downloading
This information is also available from the command line: perl -S downloadfrom.pl -h

There are also several options available if you are using a proxy:

OptionDescription
-proxy_onIndicates you are using a proxy connection
-proxy_host <string>Proxy host
-proxy_port <string>Proxy port
user_name <string>Proxy username
user_password <string>Proxy password

Each download mode also has its own set of additional options, which are outlined on their respective pages in the documentation (Web, MediaWiki, OAI, Z39.50, SRW/SRU) . These options are the same as those available on the GLI Download panel and can also be viewed by running perl -S downloadinfo.pl <download-module>. For example, to get information and options for downloading via OAI protocol, you would run:

perl -S downloadinfo.pl OAIDownload

The download modules names are:

  • WebDownload
  • MediaWikiDownload
  • OAIDownload
  • Z3950Download
  • SRWDownload (for SRU/SRW downloads)

Once aware of all of the options, you are able to run the download script. An example download would be:

 perl -S downloadfrom.pl -document_mode OAI -url http://www.nzdl.org/cgi-bin/oaiserver.cgi -set demo -max_records 5

This will try to download 5 records from the set demo at the nzdl.org's OAI server.

The records (and optionally documents) will be downloaded into the folder the script is run from. To change this, use the -cache_dir <full-path-to-folder> option.

Additional Resources