Downloading from the command line

Greenstone allows you to download files from the internet using a variety of protocols:

This can be done from either the Download panel of the GLI, or directly from the command line.

Greenstone3

On Windows:

gs3-setup


On Linux/Mac:

source gs3-setup.bash 

Greenstone2

On Windows:

setup


On Linux/Mac:

source setup.bash 

GLI uses a perl script, downloadfrom.pl, to download files. This can be run on the command line, outside of GLI. The following options are available for all methods of download (Web, MediaWiki, OAI, Z3950, and SRW) using downloadfrom.pl:

OptionDescription
-download_mode <enum>(REQUIRED) The type of server to download from; allowable values: Web, MediaWiki, OAI, Z3950, and SRW
-cache_dir <string>The location of the cache directory
-gli
-infoPrint information about the server, rather than downloading
This information is also available from the command line: perl -S downloadfrom.pl -h

There are also several options available if you are using a proxy:

OptionDescription
-proxy_onIndicates you are using a proxy connection
-proxy_host <string>Proxy host
-proxy_port <string>Proxy port
user_name <string>Proxy username
user_password <string>Proxy password

Each download mode also has its own set of additional options, which are outlined on their respective pages in the documentation (Web, MediaWiki, OAI, Z39.50, SRW/SRU) . These options are the same as those available on the GLI Download panel and can also be viewed by running perl -S downloadinfo.pl <download-module>. For example, to get information and options for downloading via OAI protocol, you would run:

perl -S downloadinfo.pl OAIDownload

The download modules names are:

Once aware of all of the options, you are able to run the download script. An example download would be:

 perl -S downloadfrom.pl -document_mode OAI -url http://www.nzdl.org/cgi-bin/oaiserver.cgi -set demo -max_records 5

This will try to download 5 records from the set demo at the nzdl.org's OAI server.

The records (and optionally documents) will be downloaded into the folder the script is run from. To change this, use the -cache_dir <full-path-to-folder> option.

Additional Resources

There are several tutorials on downloading files using various protocols: