Greenstone allows you to download files from the internet using a variety of protocols:
- Web: downloads web pages and files via HTTP and FTP.
- MediaWiki: downloads web pages and files via HTTP from a MediaWiki website.
- Z39.50: downloads MARC records that match a particular search criterion from a Z39.50 server.
- SRW/SRU: downloads MARCXML records that match a particular search criterion from a Search/Retrieve via URL (SRU) server.
This can be done from either the Download panel of the GLI, or directly from the command line.
Downloading from the GLI
In the Librarian Interface, switch to the Download panel. On the left-hand side, you can select the protocol you will be using and the right-hand side of the panel will display the available arguments for that protocol.
If your computer is behind a firewall or proxy server, you will need to edit the proxy settings in the Librarian Interface, which can be accessed using the Configure Proxy… button. When you initiate a download, a popup will ask for your user name and password.
Clicking Server Information will download some information about the server and determine whether a connection can be made. The response is shown in a popup window.
Clear Cache will delete all previously downloaded metadata files.
To start downloading the metadata records, click the Download button. A download progress panel will show up. If you see something like "Downloaded 0 of 0 files…", it is very likely that you have specified an invalid argument. You can click the View Log button for more information.
The download list has an entry for each download processed. Each entry has a text region that gives details of the task along with a progress bar showing current activity. Three buttons appear to the right of each entry. Pause is used to pause a task. Log opens a window showing the download log file. Close terminates the download and removes the task from the list.
You can view the downloaded files on the Gather panel. On the left-hand side of the panel, double click the Downloaded Files folder to expand its content. The subfolders are named by the URL from which the files have been downloaded. At the lowest level of each subfolder are the metadata files, which are organized by the URL or host from which the files were downloaded. These metadata files are physically stored in a temporary cache directory. You can now build a collection using these downloaded metadata files.
Behind the scenes, GLI uses a script called
downloadfrom.pl, which can be run from the command line.