Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
en:user_advanced:oai [2014/11/03 13:27]
127.0.0.1 external edit
en:user_advanced:oai [2018/07/31 11:11]
kjdon [Downloading source documents]
Line 46: Line 46:
 then Greenstone downloads the page and scans though it looking for ''​href'''​s that match the specified file extensions, and downloads these. ​ then Greenstone downloads the page and scans though it looking for ''​href'''​s that match the specified file extensions, and downloads these. ​
  
-===== Serving OAI Data using Greenstone===== +==== Downloading ​on the command line ====
- +
-Greenstone comes with a built-in OAI data provider, called **oaiserver**. A configuration file provides options for the set up of the server. Collections can opt in or out of the server, and each collection will be advertised as an OAI set. Multiple collections can be grouped into a single OAI set using Greenstone'​s OAI super set mechanism. +
- +
-<TABAREA tabs="​Greenstone3,​Greenstone2">​ +
-<​TAB>​ +
-====The Greenstone3 OAI server==== +
-The Greenstone3 OAI data provider facility comes enabled by default. It runs as a servlet called "​oaiserver",​ and can be accessed using the same URL as the library,  +
-by replacing library with oaiserver. For example, ''<​nowiki>​http://​localhost:​8080/​greenstone3/​oaiserver</​nowiki>''​. You can see a demonstration OAI server at [[http://​www.greenstone.org/​greenstone3/​oaiserver?​verb=Identify|http://​www.greenstone.org/​greenstone3/​oaiserver?​verb=Identify]]. +
- +
-==== Configuration ====  +
-Configuration is done via the files: **''​OAIConfig.xml''​** for repository wide configuration,​ and **''​collectionConfig.xml''​** for collection specific configuration. +
- +
-===OAIConfig.xml=== +
- +
-This file specifies general information about the repository and can be found in ''​greenstone3/​resources/​oai''​. Please edit the file here. When the server starts up, this file will be copied to ''​greenstone3/​web/​WEB-INF/​classes/''​.  +
- +
-Please modify this file and enter the correct values for repositoryName and repositoryIdentifier.  +
-Other values may be modified as needed. The following table lists important configuration +
-options in ''​OAIConfig.xml''​. +
- +
-^  Configurations in OAIConfig.xml ​ ^^ +
-|''<​repositoryName>​repository-name</​repositoryName>''​|The name of this oai repository, which is human readable.| +
-|''<​repositoryIdentifier>​repository-identifier</​repositoryIdentifier>''​|The unique id of this oai repository. If using OAI 2.0, this should be the same as your domain name.| +
-|''<​baseURL>​your-web-server-domain-name/​greenstone3/​oaiserver </​baseURL>''​|The base url to access this repository.| +
-|''<​protocolVersion>​2.0</​protocolVersion>''​|The version of OAI specification this repository supports. The Greenstone 3 OAI server supports both version 1.1 and 2.0, although the support for registration for version 1.1 of the protocol was discontinued ​on 1 September 2002 by the OAI organization,​ some may still be using it (for example, the http://​rocky.dlib.vt.edu/​~jcdlpix/​cgi-bin/​OAI/​jcdlpix.pl OAI server used in the Greenstone tutorial exercises).| +
-|''<​deletedRecord>​no</​deleteRecord>''​|The manner in which the repository supports the notion of deleted records.| +
-|''<​granularity>​yyyy-MM-ddTHH:​mm:​ssZ </​granularity>''​|The granularity of the datestamp. The meaning of the string is defined in the specification ISO8601. The other legitimate value of the datestamp which is less fine than this is YYYY-MM-DD.| +
-|''<​adminEmail>​maintainer-email-address</​adminEmail>''​|The repository maintainer email address. There can be more than one email address here, one element for each. | +
-|''<​oaiInfo><​metadata name="​meta-name">​meta-value</​metadata>​...</​oaiInfo>''​|Metadata describing the repository. Any user defined metadata can go here.| +
-|''<​oaiSuperSet>''​|In the Greenstone OAI server, each collection is presented as an OAI set. You can use the oaiSuperSet to group several collections together to be presented as a single set. See [[#​OAI_Super_Sets|below]]| +
-|''<​useOAIStylesheet>​yes</​useOAIStylesheet>''​|A stylesheet will be specified for the result - enables a nice view of the XML when viewing a response in a browser. Set to '​no'​ if you don't want the stylesheet specified.| +
-|''<​OAIStylesheet>​url</​OAIStylesheet>''​|Set the url here if you want to use a different stylesheet to the default one.| +
-|''<​earliestDatestamp>​ 2001-06-24T18:​09:​47-05:​00Z </​earliestDatestamp>''​|The Identify response includes earliestDatestamp,​ which is the earliest datestamp that is valid for the respository. Generally it is generated by looking at the earliestDatestamp of each collection. If for some reason, the collections don't have valid date stamps, then this value from the config file will be used. | +
-|''<​resumeAfter>​250</​resumeAfter>''​|This value will decide whether or not the selective harvesting is allowed for a repository. In OAI, the commands ListSets, ListRecords,​ and ListIdentifiers are collectively called list requests. In some cases, these lists may be large and it may be practical to partition them among a series of requests and responses. This value decides how many sets/​identifiers/​records to send for the request before issuing a resumption token. A value less than 0 (e.g. -1) indicates that a complete list of items will be returned. See the OAI specification for how flow control is accomplished by using resumption tokens.| +
-|''<​resumptionTokenExpiration>​7200 </​resumptionTokenExpiration>''​|The time period in which a newly generated resumption token will remain valid, specified in seconds. Hence, the default value 7200 is equivalent to 2 hours. The use of this property depends on the value of resumeAfter. If the resumeAfter parameter is specified to be negative (any value less than 0), there won't be any token issued.| +
-|''<​ListMetadataFormats>''​|A list of metadata formats supported by this repository. Since the Dublin Core metadata format is mandatory according to the OAI specification,​ there must be a metadataFormat element with the oai_dc prefix specified here, along with the metadata name mappings if necessary. See [[#​Metadata_Formats|below]] for more info.| +
- +
- +
-===collectionConfig.xml===+
  
-Resides in the ''/​etc''​ directory of each collection. A serviceRackList contains services which are not defined by the collection building process. (These would end up in the buildConfig.xml file)The OAIPMH ServiceRack enables the collection to be served by the OAI server. It contains ​information about which metadata formats ​the collection supports. A mapping list may be provided for each format, mapping Greenstone metadata fields into the fields available for the format. +You can also download OAI records on the command line, using the perl script that GLI uses in the background: **downloadfrom.pl**There is lots of information about command line downloading on [[en:​user_advanced:​command_line_download|downloading from the command line]] page.
-If the collection is part of a super set, then this information is added here too.+
  
-Here is a sample OAIPMH ServiceRack element.+  * Set up the Greenstone environment in the terminal by running one of the following:
 <​code>​ <​code>​
-    <​serviceRack name="​OAIPMH">​ +source gs3-setup.sh (linux/MacOS, gs3) 
-      <​setName>​Lucene demo collection</​setName>​ +gs3-setup (Windows, gs3) 
-      <​setDescription>​A demo collection for greenstone</​setDescription>​ +source setup.bash (Linux/MacOS, gs2) 
-      <!-- States that this collection is part of the humanity super set, which needs to be defined +setup (Windows, gs2)
-       in the OAIConfig.xml file.  --> +
-      <​oaiSuperSet name="​humanity"​/> +
-      <​ListMetadataFormats>​ +
- <!-- This collection supports the DC metadata set. --> +
- <​metadataFormat metadataPrefix="​oai_dc">​ +
-   <​!-- ​  a custom mapping as this collection doesn'​t have exclusive dc metadata --> +
-          <!-- this will replace the dc:​publisher element from the main set --> +
-   <element name="​dc:​publisher">​ +
-     <mapping elements="​dls.Organization"​/+
-   </​element>​ +
- </​metadataFormat>​ +
-      </​ListMetadataFormats>​ +
-    </​serviceRack>​ +
 </​code>​ </​code>​
  
-==== OAI super sets ==== +  * To see the options availablerun:
- +
-In the Greenstone OAI servereach collection is presented as an OAI set. You can use the oaiSuperSet to group several collections together to be presented as a single set.  +
- +
-The format for a super set specification is like the following:+
  
 <​code>​ <​code>​
-<​oaiSuperSet>​ +perl -S downloadinfo.pl OAIDownload
-    <​setSpec>​oai set identifier</​setSpec>​ +
-    <​setName>​Human readable set name</​setName>​ +
-    <​setDescription>​Set description</​setDescription>​ +
-  </​oaiSuperSet>​+
 </​code>​ </​code>​
  
-There can be more than one super set specified in OAIConfig.xml. Collections themselves state which super set they belong to. The format is the following, where xxx must match the setSpec of the set it belongs to. This line must be added into the OAIPMH serviceRack element ​in the collectionConfig.xml file.+The options are the same as you can see in the GLI OAI download panelThey are:
  
-<code+  * **-url ​<string>**: (Required) The OAI repository URL. 
-<oaiSuperSet name="​xxx"/​> +  * **-metadata_prefix ​<string>**: The metadata ​format ​to be used in the downloaded recordse.g. oai_dc, qdc, etc. Formats available depend on what is offered by the OAI serverAll repositories must offer oai_dc. ​ Default: oai_dc 
-</​code>​ +  * **-set <​string>​**:​ Restrict the download to the specified set in the repository 
-==== Metadata Formats ====  +  * **-get_doc**:​ Download source documents too, if available 
- +  * **-get_doc_exts <​string>​**:​ If downloading source documents, only download those whose file extensions match this list.  Defaultdoc,​pdf,​ppt 
-A repository must support the dublin core format, and may support others. Each format must be listed ​in the ListMetadataFormats elementCurrently we only have defined ​the dublin core formatIt looks like this:+  * **-max_records <​int>​**:​ Maximum number of records to download. If not specified, will download all records.
  
 +An example usage would be:
 <​code>​ <​code>​
-  <​ListMetadataFormats>​ + perl -S downloadfrom.pl -mode OAI -url http://www.nzdl.org/cgi-bin/oaiserver.cgi -set demo -max_records 5
-    <​metadataFormat>​ +
-      <​metadataPrefix>​oai_dc</​metadataPrefix>​ +
-      <​schema>​http://​www.openarchives.org/​OAI/​2.0/​oai_dc.xsd</​schema>​ +
-      <​metadataNamespace>​http://www.openarchives.org/OAI/2.0/​oai_dc/</​metadataNamespace>​ +
-      <​elementList>​ +
- <​element name="​dc:​title"><​mapping select="​firstvalidmetadata"​ elements="​dc.Title,​Title"/></​element>​ +
- <​element name="​dc:​creator"><​mapping elements="​dc.Creator"/></​element>​ +
- <​element name="​dc:​subject"><​mapping elements="​dc.Subject"/></​element>​ +
- <​element name="​dc:​description"><​mapping elements="​dc.Description"/></​element>​ +
- <​element name="​dc:​publisher"><​mapping elements="​dc.Publisher"/></​element>​ +
- <​element name="​dc:​contributor"><​mapping elements="​dc.Contributor"/></​element>​ +
- <​element name="​dc:​date"><​mapping elements="​dc.Date"/></​element>​ +
- <​element name="​dc:​type"><​mapping elements="​dc.Type"/></​element>​ +
- <​element name="​dc:​format"><​mapping elements="​dc.Format"/></​element>​ +
- <​element name="​dc:​identifier"><​mapping elements="​dc.Identifier,​Identifier"​ select="​firstvalue"/></​element>​ +
- <​element name="​dc:​source"><​mapping elements="​dc.Source"/></​element>​ +
- <​element name="​dc:​language"><​mapping elements="​dc.Language"/></​element>​ +
- <​element name="​dc:​relation"><​mapping elements="​dc.Relation"/></​element>​ +
- <​element name="​dc:​coverage"><​mapping elements="​dc.Coverage"/></​element>​ +
- <​element name="​dc:​rights"><​mapping elements="​dc.Rights"/></​element>​ +
-      </​elementList>​ +
-    </​metadataFormat>​ +
-  </​ListMetadataFormats>​+
 </​code>​ </​code>​
  
-The elementList lists all elements for the metadata format. Mapping rules dictate which elements in the greenstone collection get output for these OAI elements+This will try to download 5 records from the set //demo// at the nzdl.org'​s ​OAI server.
  
-The OAIConfig.xml file defines ​the metadata formats supported by the respositoryand each collection must specify which of these formats it supportsThe format is like:+The records (and optionally documents) will be downloaded into the folder ​the script is run from. To change thisuse the **-cache_dir full-path-to-folder** option. 
 +===== Serving OAI Data using Greenstone=====
  
-<​code>​ +Greenstone comes with built-in OAI data providercalled ​**oaiserver**. A configuration ​file provides options for the set up of the server. ​Collections can opt in or out of the server, and each collection will be advertised as an OAI set. Multiple ​collections can be grouped into single ​OAI set using Greenstone'​s OAI super set mechanism.
-  <​ListMetadataFormats>​ +
- <​metadataFormat metadataPrefix="​oai_dc">​ +
-   <​!-- ​  custom mapping as this collection doesn'​t have exclusive dc metadata ​--><​!-- this will replace the dc:​publisher element from the main set --> +
-   <element name="​dc:​publisher">​ +
-     <mapping elements="​dls.Organization"/>​ +
-   </​element>​ +
- </​metadataFormat>​ +
-  </​ListMetadataFormats>​ +
-</​code>​  +
-The collection can specify a custom mapping. +
- +
-Field mapping is done in two levels: globally for all collections in a repository +
-and specifically for each collection. For a particular collection, the mapping specification  +
-in the ''​collectionConfig.xml''​ takes precedence over that in the ''​OAIConfig.xml''​. Hence,  +
-the metadata mappings will be first looked for in each collection'​s ''​collectionConfig.xml'';​  +
-if not found, go to the ''​OAIConfig.xml'';​ if not specified there either, the standard Dublin Core +
- field names will be used to retrieve the metadata of the collection. +
- +
-Mappings take the following format: +
-<​code>​ +
-  <element name="​oai-name"/>​ +
-  <element name="​oai-name"><​mapping select="​allvalues|firstvalue|firstvalidmetadata"​  +
-    elements="​comma-separated-list-of-gs-metadata"/></​element>​ +
-</​code>​ +
- +
-In the first case, the server will look for '​oai-name'​ metadata in the collection. No mapping will be done.\\ +
-In the second case, the server will look for any metadata in the elements list and map it to oai-name in the output. The select attribute determines how many values are output. The default setting is '​allvalues'​. +
-  ​* **allvalues**: will display all values of each metadata element +
-  * **firstvalue**:​ will go through each metadata element until it finds a value, and will return only one value. +
-  * **firstvalidmetadata**:​ will go through each element until it finds one that has a value, then output all values of that element.  +
-  +
-Some examples: +
- +
-<​code><​element name="​dc:​title"/></​code>​ +
-The server will look for dc:title metadata and output it if found. Note that standard Greenstone metadata uses '​.'​ for namespaces, not ':',​ so this will not find anything.  +
-<​code><​element name="​dc:​title"><​mapping select="​firstvalidmetadata"​ elements="​dc.Title,​Title"/></​element></​code>​ +
-This will output all dc.Titles as dc:title metadata. If no dc.Title is found, then any Titles will be output. +
-<​code><​element name="​dc:​date"><​mapping select="​allvalues"​ elements="​dc.Date,​gs.Date"/></​element></​code>​ +
-This will output all dc.Dates and gs.Dates as dc:date. +
- +
- +
- +
-====Disabling Collections====  +
- +
-By default, in Greenstone3,​ all collections are enabled for the OAI server, and each collection is mapped into an OAI set. +
-new collection in Greenstone contains the OAIPMH ServiceRack in its collectionConfig.xml ​file. To disable ​the collection in the oaiserver, comment out this ServiceRack. Note, this has to be done by hand, as GLI has not been set up to modify this part of the collectionConfig.xml file. +
-The collectionConfig.xml file lives in //​greenstone3/​web/​sites/​localsite/​collect/<​colname>/​etc/​collectionConfig.xml//​. Make sure the collection is not open in GLI when you are editing this file by hand. +
- +
-==== Disabling the OAI Server ==== +
- +
-If you do not want to provide an OAI server ​alongside your Greenstone3 library you'll need to remove the servlet information from the Greenstone3 web.xml. +
-Make sure Tomcat is not running (by closing the Greenstone3 server program, or running **ant stop** ​in the greenstone3 folder on the command line). Open up //​greenstone3/​web/​WEB-INF/​web.xml//​. There are two parts you'll need to remove, ​or comment ​out. The first one is the servlet specification for the oaiserver. It looks like this: +
-<​code>​ +
-        <​servlet>​ +
-                <​servlet-name>​oaiserver</​servlet-name>​ +
-                <​description>​an oai servlet</​description>​ +
-                <​servlet-class>​org.greenstone.gsdl3.OAIServer</​servlet-class>​ +
-                <​init-param>​ +
-                        <​param-name>​default_lang</​param-name>​ +
-                        <​param-value>​en</​param-value>​ +
-                </​init-param>​ +
-                <​init-param>​ +
-                        <​param-name>​site_name</​param-name>​ +
-                        <​param-value>​localsite</​param-value>​ +
-                </​init-param>​ +
-        </​servlet>​ +
- +
-</​code>​ +
- +
-The second part to remove or comment out is the servlet mapping, which maps the url to the servlet. +
-It looks like this: +
-<​code>​ +
-        <​servlet-mapping>​ +
-                <​servlet-name>​oaiserver</​servlet-name>​ +
-                <​url-pattern>/​oaiserver</​url-pattern>​ +
-        </​servlet-mapping>​ +
-</​code>​ +
- +
-Once these two sections are gone or commented out, Tomcat will no longer provide the oaiserver. You can add them back in if you wish to reinstate it later. +
- +
-==== OAI Datestamps ====  +
- +
-The "​datestamp"​ tag for a record comes from the "​oailastmodified"​ metadata that is added automatically by Greenstone when you build a collection. This value is obtained from the operating system, and is usually the last time the file was edited. However, if the file has been copied (for example if you used the GLI to add the file into your collection) then the oailastmodified value will probably ​be the time the file was copied. +
- +
-To manually ​set the OAI datestamp for a document, add gs.OAIDateStamp metadata. This must be in the form YYYY-MM-DD. This will be used instead of oailastmodified if it exists.  +
- +
-==== Resetting the server ==== +
- +
-If you have rebuilt ​collections ​then you need to reset the server. This can either ​be done by restarting it (stop the server and run gs3-server.sh/​bat again, or run 'ant restart'​),​ or you can reset the server using the reset command: http://​localhost:​8383/​greenstone3/​oaiserver?​reset. This will make it reload all the collection information again. (Make sure you use the correct host name and port number.) +
-.</​TAB>​ +
-<!-- #######################################################################################​ +
-###########################################################################################​ +
-##########################################################################################​-->​ +
-<​TAB>​ +
-====The Greenstone2 OAI server==== +
-Greenstone comes with built-in ​OAI data provider. This runs as a CGI program called +
- ''​oaiserver.cgi'',​ and is installed in the Greenstone cgi-bin directory.  +
-It can be accessed via the same URL as the Greenstone library (replacing  +
-''​library.cgi''​ with ''​oaiserver.cgi''​). On Windows, you must be using a web server (eg Apache) not the local library server.  +
- +
-Configuration of the server is done via the ''​oai.cfg''​ file in the Greenstone ''​etc''​ directory. This file specifies general information about the repository, lists collections to be made accessible and may include metadata mapping information. **Important:​** the ''​oai.cfg''​ file must be utf-8 encoded. +
- +
-Please edit ''​oai.cfg''​ and set the repositoryName and repositoryId fields.  +
-If you are not using the standard Apache setup that comes with Greenstone,  +
-you may need to set oaiserverPath,​ libraryPath,​ docRootPath. Optionally,  +
-you can set baseServerURL to use a domain name instead of IP address in URLs. +
- +
-By default, collections are not accessible. To enable a collection, add its name to the ''​oaicollection''​ list.  +
- +
-Greenstone'​s OAI server currently supports Dublin Core, Qualified Dublin Core, and  +
-RFC1807 metadataFor collections that use other metadata sets, including extracted  +
-metadata, metadata mapping rules should be provided to map the existing metadata  +
-to Dublin Core. ''​oai.cfg''​ has more details. +
- +
-==== To add a new metadata set for use with oaiserver ==== +
  
-You need to do the following:​ +  ​* [[oai_server_gs3|Greenstone ​3 OAI server setup and configuration]] 
-  ​Create a schema (or find an existing one) for the metadata set. See [[http://​www.greenstone.org/​namespaces/​gsdl_qdc/​1.0/​gsdl.qdc.xsd|Greenstone's qualified dublin core]] schema, OAI standard ​[[http://​www.openarchives.org/​OAI/​1.1/​dc.xsd|dublin core]] files for examples. +  * [[oai_server_gs2|Greenstone 2 OAI server setup and configuration]]
-    * Put the new schema somewhere web accessible +
-  * Coding in GSDLHOME/​runtime-src/​src/​oaiservr:​ +
-    * Create a new metaformat class for the metadata set. See dublincore.h/​cpp,​ qualified_dublincore.h/​cpp,​ rfc1807.h/​cpp for examples. +
-    * edit Makefile.in,​ Makefile and win32.mak to use the new files +
-    * Edit recordaction.cpp to include the new header file and instantiate the new class (in recordaction())+
  
-  * Tell the server to use the new set: edit etc/oai.cfg and add the set name to the oaimetadata line. You may also need to add oaimapping information. 
-  * Recompile and test. 
-</​TAB></​TABAREA>​ 
  
 ====Testing==== ​ ====Testing==== ​