====== OAI ====== The [[http://www.openarchives.org/pmh/|Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)]] allows for interoperability between document repositories. * //Data Providers// expose their metadata using OAI-PMH * //Service Providers// or //harvesters// access this metadata by making service requests. Greenstone allows you to both harvest metadata provided by others and make the metadata for your own collections accessible via OAI-PMH. ===== Harvesting OAI records using Greenstone ===== ==== Downloading records from an OAI repository ==== Greenstone can download records from an OAI repository and build them into a collection. The downloading can either be done from the [[en:gli:download_panel|Download panel]] of the GLI or from the [[command_line_download|command line]]. The following options are available for downloading via OAI: ^Option^Description^ |Source URL (''-url '')|(REQUIRED) OAI repository URL| |Metadata prefix (''-metadata_prefix '')|The metadata format used in the exported metadata, e.g. oai_dc, qdc, etc. The formats available depend on what is offered by the OAI server. All repositories must offer oai_dc. (//Default: oai_dc//)| |Restrict to set (''-set '')| Restrict the download to the specified set in the repository| |Get document (''-get_doc'')|Download the source document if one is specified in the record| |Only include file types (''-get_doc_exts '')|If downloading source documents, only download those whose file extensions match this list. (//Default: ''doc,pdf,ppt''//)| |Max records (''-max_records '')|Maximum number of records to download. If not specified, will download all records.| In the GLI, clicking **Server Information** will cause the following request to be sent to the OAI data provider specified by the ''Source URL'' argument: ?verb=Identify The response is shown in a popup window. You can use the returned server information to help fill out the arguments, for example, the set name and metadata prefix. If you are using the GLI, you can view the downloaded files on the Gather panel. On the left-hand side of the panel, double click the Downloaded Files folder to expand its content. The subfolders are named by the OAI server URL. At the lowest level of each subfolder are the metadata files, which are organized by the specified set name. These metadata files are physically stored in a temporary cache directory. You can build a collection using these downloaded metadata files, using the [[en:plugin:oaiplugin|OAIPlugin]]. ==== Downloading source documents ==== If you select the option to get document, then Greenstone will check the value of dc.identifier. If it is a URL (starts with http, https, ftp), then Greenstone will check whether the file extension matches those listed in the ''Only include file types'' argument. If so, the file is downloaded. If the extension does not match those listed and it is an HTML file, then Greenstone downloads the page and scans though it looking for ''href'''s that match the specified file extensions, and downloads these. ==== Downloading on the command line ==== You can also download OAI records on the command line, using the perl script that GLI uses in the background: **downloadfrom.pl**. There is lots of information about command line downloading on [[en:user_advanced:command_line_download|downloading from the command line]] page. * **Set up the Greenstone environment** in the terminal by running one of the following: source gs3-setup.sh (linux/MacOS, gs3) gs3-setup (Windows, gs3) source setup.bash (Linux/MacOS, gs2) setup (Windows, gs2) * **To see the options available**, run: perl -S downloadinfo.pl OAIDownload The options are the same as you can see in the GLI OAI download panel, listed above. ***An example usage would be:** perl -S downloadfrom.pl -mode OAI -url http://www.nzdl.org/cgi-bin/oaiserver.cgi -set demo -max_records 5 This will try to download 5 records from the set //demo// at the nzdl.org's OAI server. The records (and optionally documents) will be downloaded into the folder the script is run from. To change this, use the **-cache_dir full-path-to-folder** option. ===== Serving OAI Data using Greenstone===== Greenstone comes with a built-in OAI data provider, called **oaiserver**. A configuration file provides options for the set up of the server. Collections can opt in or out of the server, and each collection will be advertised as an OAI set. Multiple collections can be grouped into a single OAI set using Greenstone's OAI super set mechanism. * [[oai_server_gs3|Greenstone 3 OAI server setup and configuration]] * [[oai_server_gs2|Greenstone 2 OAI server setup and configuration]] ===== Additional Resources ===== There are several tutorials concerning using OAI in Greenstone: * [[http://wiki.greenstone.org/wiki/gsdoc/tutorial/gs3-current/en/OAI_collection.htm|Open Archives Initiative (OAI) collection ]] * [[http://wiki.greenstone.org/wiki/gsdoc/tutorial/gs3-current/en/GS_OAI_server.htm|Setting up your Greenstone OAI Server]] * [[http://wiki.greenstone.org/wiki/gsdoc/tutorial/gs3-current/en/OAI_downloading.htm|Downloading over OAI]] There are several tutorials concerning using OAI in Greenstone: * [[http://wiki.greenstone.org/wiki/gsdoc/tutorial/gs2-current/en/OAI_collection.htm|Open Archives Initiative (OAI) collection ]] * [[http://wiki.greenstone.org/wiki/gsdoc/tutorial/gs2-current/en/GS_OAI_server.htm|Setting up your Greenstone OAI Server]] * [[http://wiki.greenstone.org/wiki/gsdoc/tutorial/gs2-current/en/OAI_downloading.htm|Downloading over OAI]] The [[http://www.nzdl.org/cgi-bin/library?a=p&p=about&c=oai-e|OAI example collection]] demonstrates a library built from OAI records.