Greenstone tutorial exercise

Back to wiki
Back to index
Prerequisite: A simple image collection
Devised for Greenstone version: 2.85|3.06
Modified for Greenstone version: 2.87|3.08

Setting up your Greenstone OAI Server

Greenstone 2 collections are not enabled for OAI out of the box. To make a collection available for serving up over OAI, some minor adjustments need to be made first. This tutorial will look at how to make an existing collection available over OAI and testing its accessibility by getting it validated against the Open Archives validator.

  1. Use a text editor to open the file etc/oai.cfg located in your Greenstone installation folder. The oai.cfg configuration file contains properties that control the behaviour and features of your Greenstone OAI server.

    The basic properties to edit in order to get your collection served by the inbuilt OAI server are the repositoryName, repositoryID and oaicollection. Look up these properties in the file.

    For repositoryName and repositoryID, type in some values that make sense for your digital library. For example:

    repositoryName "Greenstone"
    repositoryID "greenstone"

  1. For this tutorial, we'll make the backdrop collection created in the simple image tutorial available over OAI. Therefore, add this collection's name to the end of the oaicollection property:

    oaicollection demo documented-examples/oai-e backdrop

    If you have a great many documents and do not want the OAI server to return all of them in one go, you could set the resumeafter property to something lower than the default 250 value in the oai.cfg file. Like:

    resumeafter 50

  1. If you're on Windows, it's best to be using the Apache web server. So if you're using the Local Library Server, stop the web server by exiting the little white dialog (the Greenstone Server Interface). Use a file browser to go into your Greenstone installation directory and rename the server.exe there to server.not to disable it. Now re-launch the Greenstone Server from the Start menu, so that this time, the included Apache web server will be used instead, launching its own little white dialog.

  1. You are now ready to visit your oaiserver home page to check that it's all looking good. Start up the Greenstone Server by going to Windows Start → All Programs → Greenstone 2.87 → Greenstone Server.

    Press the Enter Library button and you will end up on your Digital Library home page as usual. Adjust the URL so that instead of the library.cgi suffix, it says oaiserver.cgi.

    The page that loads now will contain an error message (badVerb) saying that you've provided an illegal OAI verb. This is because the OAI specification requires you to provide more instruction in the URL as to what you want. The specification defines verbs and possible arguments to them.

    A basic verb is Identify, which requests the OAI server to return some information about the OAI repository that it's serving. Adjust the URL once more by suffixing ?verb=Identify, so that your URL now looks like:

    http://<domain>/greenstone/cgi-bin/oaiserver.cgi?verb=Identify

    Visiting this page now gives some information about your Greenstone OAI repository.

  1. Although the data transmitted over OAI is in the form of XML, Greenstone uses a stylesheet to transform that XML response into a user-friendly, structured web page that you see when you perform the Identify request (as happens when you visit the verb=Identify response page). This allows Identify and other verbs in the OAI specification to be shown in the main Greenstone OAI Server pages as link buttons. You can see these verbs represented in the main Greenstone oaiserver.cgi (or oaiserver.cgi?verb=Identify) page as a row of links, starting with "Identify" at the top and in the lower end of the page.

    Clicking on the links will execute that verb as a request and return the response from your Greenstone OAI server as a structured web page. Try clicking on all the links.

  1. OAI defines a concept called a Set. In Greenstone, the OAI Set concept is mapped to the practical Greenstone collection. The link to the ListSets verb will therefore request the Greenstone OAI server to list all the collections that have been enabled for OAI.

    Click on the ListSets link and have a look.

    The response page for the ListSets verb will show you that your backdrop collection (created in the Simple image collection tutorial) is one of the collections available over OAI in your Greenstone repository.

  1. You will see a couple of buttons next to each collection (or Set) listed here. The first is Identifiers and the second Records. Click on the Identifiers button for the backdrop Set. This will list all the IDs of the documents contained in your OAI collection.

    If you look at the IDs, they look similar enough to Greenstone's internal document IDs, but with an additional prefix (oai:<repositoryID>:<setname>, where repositoryID was set by you in the oai.cfg configuration file, and setname is the name of the collection).

  1. Click the browser Back button to get back to the ListSets page and press the Records button located next to the backdrop collection.

    If you had specified some Dublin Core (dc) metadata for each of the images in the backdrop collection, then the page that loads will display this information for each document in the collection (Set).

    Greenstone's OAI at present supports 3 metadata formats, as is explained in the instructive comments in the oai.cfg file. Of these three, the OAI standard for Dublin Core, oai_dc, is the one pertinent to this tutorial. If your collection specifies metadata for a different metadata set format, you can use the oai.cfg file to tell Greenstone how to map the metadata fields of your chosen metadata set format into the Dublin Core metadata set supported by the Greenstone OAI server (or one of the other metadata sets it supports).

    Look in the oai.cfg file again and scroll down to the section on oaimapping, which will explain and provide examples for how to specify such mappings from your metadata format to one that Greenstone's OAI server uses. For instance, the demo collection comes enabled for OAI upon installation, and specifies some mappings from its DLS metadata format to OAI DC. Its dls.Title metadata is mapped to oai_dc.title using the following line in the oai.cfg configuration file (note the use of case):

    oaimapping dls.Title oai_dc.title

    Because the backdrop collection uses DC metadata already, no mapping is required.

Validating the Greenstone OAI server

In this section, you'll be testing that you've set up your Greenstone OAI server correctly so that it's accessible over OAI. For this part of the exercise, you need to be on a networked computer and your host computer needs to be visible to the outside world. (That is, when you provide the full name of your computer, someone else in the world should be able to find that computer by typing its URL into their browser's address field.)

We'll be using an external OAI client to access our up-and-running Greenstone OAI server. It's not just any OAI client either, but an OAI Server validator.

  1. You will want to be running the included Apache web server. So if you're on Windows and using the Local Library Server, quit it and rename the server.exe application in your Greenstone installation folder to server.not. Then use the Start menu shortcut to the Greenstone Server once more, to now launch the Apache web server.

  1. For this exercise, we will be visiting the Open Archives Validator, for which your OAIserver needs to provide a valid email address. In a text editor, open up your greenstone installation's etc/oai.cfg file and set the value of the maintainer field to your email address.

    Note that by default, your Greenstone installation will make the demo collection available over OAI. This collection has been set up with a dummy (and invalid) email address for the creator and maintainer fields in the collection's collect.cfg file. You will need to open up collect/demo/etc/collect.cfg and clear the email values for the creator and maintainer properties (or else set these to a valid email again). Otherwise the OpenArchives validator will resort to using the demo collection's default dummy email to send the initial validation results to. Alternatively, you can simply remove the demo collection from being listed in the oai.cfg file's oaicollection property, which will cease to make the demo collection available over OAI.

    Note also that, if you wish to specify contact emails at a collection level, you will need to edit your greenstone installation's collect/<collection-name>/etc/collect.cfg file for those collections and set the creator and maintainer fields to the desired email address.

  1. If your collection contains document items for which you have not assigned any (Dublin Core, dc) metadata, the OAI validation can fail because it is dependent on having Metadata Formats listed even on a per record (per document) basis. Therefore, if your document has no dc metadata assigned, Greenstone won't know what OAI-supported metadata format is used by that document in order to list it.

    In practice, this means that you either have to assign one or more dc.* metadata to each document in your OAI collection, or you will have to set up an oaimapping in the oai.cfg file to map existing metadata of whichever format to dc.* metadata.

    For instance, if you created an image collection without assigning any metadata and are happy to use the Title or Source metadata that Greenstone extracted for each image (ex.Title, ex.SourceFile) as the image document's "title", you could map either of these metadata to dc.Title in the file oai.cfg. To do so, you'd open up oai.cfg in an editor, go down to the section specifying the oaimapping properties and add a new line:

    oaimapping Title oai_dc.title

    (Or: oaimapping SourceFile oai_dc.title).

    This step will not be not necessary for the backdrop collection if you had assigned any dc.* metadata for each image in the collection.

    Note: If the demo collection that comes with a Greenstone installation is not built, it will either need to be built before submitting your OAI server for inspection by the Open Archives validator, or you will need to adjust the oai.cfg file once more by removing the mention of

    demo

    from the

    oaicollection

    property. This is because the demo collection is mentioned as being set up for OAI in the oai.cfg file. However, if this collection is unbuilt, it will not be accessible to the OAI validator and so your oaiserver may fail tests due to this oversight.

  1. If you are working with legacy collections (built before Greenstone version 2.85) you may have to rebuild them if you plan to make them available over OAI and be compliant with the Open Archives validator. Rebuilding old collections will recalculate the earliest datestamp value for the repository. This calculation is different from Greenstone 2.85 onwards.

  1. Next you will need to set up your Greenstone server to be accessible from outside, so that external OAI clients can access it.

    Go to the File → Settings menu of your Greenstone server interface dialog and check the Allow External Connections option and also check the Get local IP and resolve to a name option (or the Get local IP option) as its address resolution method.

  1. Press the button in the Greenstone Server Interface dialog that says Enter Library (or it may say Restart Library). Your Digital Library home page will open up in a browser tab. Adjust this URL to have a suffix of oaiserver.cgi in place of the terminating library.cgi, then copy the resulting URL and visit http://www.openarchives.org/Register/ValidateSite.

  1. The Open Archives Validator page will request the URL to your Greenstone OAI server. Paste the URL you have in your copy buffer into the field provided for this, and press the Validate baseURL button to start running the tests. You will be told to check the adminEmail address you provided to continue the remaining tests and to get the validation report.

    If the validator does not recognise the URL, make sure you have given the full domain of your host machine rather than just the host name. If that URL is still not accepted, visit the oaiserver.cgi?verb=Identify page again and check this works. If it doesn't, it may be that your machine is not set up to be accessible to outside networks. Check your proxy settings, make sure you've set up port forwarding and that your firewall is not interfering.


Copyright © 2005-2016 by the New Zealand Digital Library Project at the University of Waikato, New Zealand
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled “GNU Free Documentation License.”