old:oai_support
Differences
This shows you the differences between two versions of the page.
old:oai_support [2015/08/13 02:01] – external edit 127.0.0.1 | old:oai_support [2018/07/30 23:19] (current) – deleted as all info is in the main wiki kjdon | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | //**This page is in the ' | ||
- | We recommend checking for more up-to-date information using the search box.**// | ||
- | ====== OAI support ====== | ||
- | ====== Creating a collection from an OAI repository ====== | ||
- | |||
- | Greenstone can download records from an OAI repository and build them into a collection. The downloading can be done in two ways: | ||
- | |||
- | ===== From the GLI ===== | ||
- | |||
- | Start the Greenstone Librarian Interface. On the left-hand side of the Librarian Interface' | ||
- | < | ||
- | | ||
- | </ | ||
- | where you can specify the proxy information for your connection if necessary. Clicking Server Information will cause the following request to be sent to the oai data provider specified by the url argument: | ||
- | < | ||
- | < | ||
- | </ | ||
- | The response is shown in a popup window. You can use the returned server information to help fill out the arguments, for example, the set name. Clear Cache will delete all previously downloaded metadata files. To start downloading the metadata records, click the Download button. A download progress panel will show up. If you see something like " | ||
- | |||
- | Behind the scenes, GLI uses a script called // | ||
- | |||
- | You can view the downloaded files on the Gather panel. On the left-hand side of the panel, double click the Downloaded Files folder to expand its content. The subfolders are named by the oai server url. At the lowest level of each subfolder are the metadata files, which are organized by the specified set name. These metadata files are physically stored in a temporary cache directory. | ||
- | |||
- | You can build a collection using these downloaded metadata files. OAIPlug must be included in the collection plugin list. | ||
- | |||
- | The [[http:// | ||
- | === Downloading source documents === | ||
- | |||
- | If the get document checkbox is selected, then Greenstone will check the value of dc.identifier. | ||
- | If it is a URL (starts with http, https, ftp), then: | ||
- | |||
- | There is an option - include filetype - which defaults to doc,pdf,ppt | ||
- | |||
- | We check the file extension to see if it matches one of these. If so, we download it. | ||
- | If there is no file extension, or if the file extension is html, then we download the page and scan though it looking for hrefs that match the specified file extensions, and download those. | ||
- | Also, apparently it can cope with handle URLs, eg **< | ||
- | |||
- | ===== Through the command line ===== | ||
- | |||
- | GLI uses a perl script, **downloadfrom.pl**, | ||
- | Go to your Greenstone folder, and run //source setup.bash// | ||
- | |||
- | downloadfrom.pl can download using several different protocols. These are specified using the **-mode** option. | ||
- | |||
- | To see the available options for download mode, run | ||
- | < | ||
- | perl -S downloadfrom.pl -h | ||
- | </ | ||
- | |||
- | The current options are | ||
- | * **Web**: download a website using http | ||
- | * **MediaWiki**: | ||
- | * **OAI**: | ||
- | * **Z3950**: download using z3950 | ||
- | * **SRW**: download using a SearchRetrieve Webservice | ||
- | |||
- | For OAI downloading, | ||
- | |||
- | To see the options for OAI downloading, | ||
- | < | ||
- | perl -S downloadinfo.pl OAIDownload | ||
- | </ | ||
- | The options are the same as you can see in the GLI OAI download panel. They are: | ||
- | |||
- | * **-url < | ||
- | * **-metadata_prefix < | ||
- | * **-set < | ||
- | * **-get_doc**: | ||
- | * **-get_doc_exts < | ||
- | * **-max_records < | ||
- | |||
- | An example usage would be: | ||
- | < | ||
- | perl -S downloadfrom.pl -mode OAI -url http:// | ||
- | </ | ||
- | |||
- | This will try to download 5 records from the set //demo// at the nzdl.org' | ||
- | |||
- | The records (and optionally documents) will be downloaded into the folder the script is run from. To change this, use the **-cache_dir full-path-to-folder** option. | ||
- | |||
- | **NOTE**, this description is valid for Greenstone 2.85 with patched OAIDownload file, see [[2.85_Release_Notes# | ||
- | |||
- | =====The Greenstone OAI server===== | ||
- | |||
- | Greenstone comes with a built-in OAI data provider. This runs as a CGI program called " | ||
- | |||
- | Configuration of the server is done via the //oai.cfg// file in the Greenstone //etc// directory. Please edit this file and set the repositoryName and repositoryId fields. If you are not using the standard Apache setup that comes with Greenstone, you may need to set oaiserverPath, | ||
- | |||
- | This file specifies general information about the repository, and lists collections to be made accessible to OAI clients. By default, collections are not accessible. To enable a collection, add its name to the // | ||
- | |||
- | Greenstone' | ||
- | |||
- | ==== To add a new metadata set for use with oaiserver ==== | ||
- | |||
- | You need to do the following: | ||
- | * Create a schema (or find an existing one) for the metadata set. See [[http:// | ||
- | * Put the new schema somewhere web accessible | ||
- | * Coding in GSDLHOME/ | ||
- | * Create a new metaformat class for the metadata set. See dublincore.h/ | ||
- | * edit Makefile.in, | ||
- | * Edit recordaction.cpp to include the new header file and instantiate the new class (in recordaction()) | ||
- | |||
- | * Tell the server to use the new set: edit etc/oai.cfg and add the set name to the oaimetadata line. You may also need to add oaimapping information. | ||
- | * Recompile and test. | ||
- | |||
- | =====The Greenstone 3 OAI Server===== | ||
- | |||
- | The Greenstone3 OAI data provider facility is available with versions 3.03 and later. This runs as a servlet called " | ||
- | |||
- | You can see a demonstration one at [[http:// | ||
- | |||
- | ==== Configuration ==== | ||
- | Configuration is done via the two files: **OAIConfig.xml** for repository wide configuration, | ||
- | |||
- | ===OAIConfig.xml=== | ||
- | |||
- | This resides in $GSDL3HOME/ | ||
- | |||
- | Please modify this file and enter the correct values for repositoryName, | ||
- | |||
- | The configurations provided in this file are described as follows: | ||
- | < | ||
- | < | ||
- | </ | ||
- | The name of this oai repository, which is human readable. | ||
- | < | ||
- | < | ||
- | </ | ||
- | The base url to access this repository. | ||
- | < | ||
- | < | ||
- | </ | ||
- | The version of OAI specification this repository supports. The Greenstone 3 OAI server supports both version 1.1 and 2.0, although the support for registration for version 1.1 of the protocol was discontinued on 1 September 2002 by the OAI organization, | ||
- | | ||
- | < | ||
- | < | ||
- | </ | ||
- | The manner in which the repository supports the notion of deleted records. | ||
- | |||
- | < | ||
- | < | ||
- | </ | ||
- | The granularity of the datestamp. The meaning of the string is defined in the specification ISO8601. The other legitimate value of the datestamp which is less fine than this is YYYY-MM-DD. | ||
- | < | ||
- | < | ||
- | </ | ||
- | The repository maintainer email address. There can be more than one email address here, one element for each. | ||
- | |||
- | The information that goes into the response to the Identify verb request along with the above also includes: | ||
- | < | ||
- | < | ||
- | </ | ||
- | which is the earliest time stamp among the built times of all collections in the repository. It is not provided here because it has to be dynamically generated by going through all collections to find whichever collection was built the earliest. | ||
- | |||
- | The following information also must be specified in the OAIConfig.xml file: | ||
- | < | ||
- | < | ||
- | </ | ||
- | This value will decide whether or not the selective harvesting is allowed for a repository. In OAI, the commands ListSets, ListRecords, | ||
- | < | ||
- | < | ||
- | </ | ||
- | The time period in which a newly generated resumption token will remain valid, specified in seconds. Hence, the default value 7200 is equivalent to 2 hours. The use of this property depends on the value of resumeAfter. If the resumeAfter parameter is specified to be negative (any value less than 0), there won't be any token issued. | ||
- | < | ||
- | < | ||
- | </ | ||
- | A list of metadata formats supported by this repository. Since the Dublin Core metadata format is mandatory according to the OAI specification, | ||
- | |||
- | An element containing the standard Dublin Core metadata names is also provided here, instead of hard-coded in the program, in case a repository supports only a modification or extension of the Dublin Core standard. | ||
- | |||
- | ===collectionConfig.xml=== | ||
- | |||
- | Resides in the /etc directory of each collection. The only information relating to the OAI configuration of the collection specified in this file is a list of metadata formats that this particular collection supports, along with some metadata field mappings. | ||
- | |||
- | ====Metadata Field Mapping==== | ||
- | |||
- | Metadata mapping is necessary if the metadata fields you have used in your collections are not the ones that you claim to support in the above two configuration files. For example, the Dublin Core metadata format is mandatory for any repository (hence all collections in the repository). If a particular collection uses a field name such as Title, instead of the Dublin Core name dc.Title (or whatever the supported metadata field name specified in the two configuration files), the filed Title must be linked to dc.Title in order for your metadata to be accessible by the metadata harvestors. | ||
- | |||
- | Field mapping is done in two levels: globally for all collections in a repository, and specifically for one collection. For a particular collection, the mapping specification in the collectionConfig.xml takes precedence over that in the OAIConfig.xml. Hence, the metadata mappings will be first looked for in each collection' | ||
- | |||
- | Mapping in the configuration files takes the following format: | ||
- | < | ||
- | < | ||
- | |||
- | In this case, the first name dc.Title is the publicly accessible field, and the second is the field name that is used in the collection, i.e., the value of the field Title will be returned as the value of dc.Title (if the field dc.Title is requested). | ||
- | |||
- | There is another mapping format that is possible if you have created your own metadata fields and want to make them available for harvesting. For example, the following mapping | ||
- | < | ||
- | < | ||
- | </ | ||
- | means the collection supports a metadata format with the prefix oai_gs, and one of the metadata fields used in the collection is called gs.Title. | ||
- | |||
- | ====Enabling Collections==== | ||
- | |||
- | The concept of ' | ||
- | |||
- | ====Testing==== | ||
- | |||
- | Once you have your OAI service in place, testing can be done via the following online validation facilities http:// | ||
- | |||
- | The former only verifies the Identify command, while extensive testing can be performed via the later one (called // |
old/oai_support.1439431291.txt.gz · Last modified: 2018/07/30 23:19 (external edit)