Greenstone tutorial exercise

Back to wiki
Back to index
Prerequisite: Downloading files from the web
Devised for Greenstone version: 2.60|3.06
Modified for Greenstone version: 2.86|3.08

Pointing to documents on the web

  1. Open up your tudor collection, and in the Gather panel inspect the files you dragged into it. The first folder is englishhistory.net, which opens up to reveal tudor, and so on. The files represent a complete sweep of the pages (and supporting images) that constitute the Tudor citizens section of the englishhistory.net web site. They were downloaded from the web in a way that preserved the structure of the original site. This allows any page's original URL to be reconstructed from the folder hierarchy.

  1. In the Design panel, select the Document Plugins section, then select the plugin HTMLPlugin line and click <Configure Plugin...>. A popup window appears. Locate the file_is_url option (about halfway down the first block of items) and switch it on. Click <OK>.

    Setting this option to the HTMLPlugin means that Greenstone sets an additional piece of metadata for each document called URL, which gives its original URL.

    It is important that the files gathered in the collection start with the web domain name (englishhistory.net in this case). The conversion process will not work if you dragged over a subfolder, for example the tudor folder, because this will set URL metadata to something like

    http://tudor/citizens/...

    rather than

    http://englishhistory.net/tudor/citizens/...

    If you had copied over a subfolder previously, delete it and make a fresh copy. Drag the folder in the right-hand side of the Gather panel on to the trash can in the lower right corner. Then obtain a fresh copy of the files by dragging across the englishhistory.net folder from the sample_files → tudor folder (or the Downloaded Files folder if you have done exercise Downloading files from the web) on the left-hand side.

  1. To make use of the new URL metadata, the icon link must be changed to serve up the original URL rather than the copy stored in the digital library. Go to the Format panel, select the Format Features section and edit the documentNode template of the Browse format statement by replacing

    <gsf:link type="document">
    <gsf:icon type="document"/>
    </gsf:link>

    with

    <gsf:link type="web">
    <gsf:icon type="web"/>
    </gsf:link>

  1. Switch to the Create panel and build and preview the collection. Note that the document icons have changed. Try clicking on boleyn.html. The collection behaves exactly as before, except that when you click a document icon your web browser retrieves the original document from the web (assuming it is still there by the time you do this exercise!). If you are working offline you will be unable to retrieve the document.


Copyright © 2005-2016 by the New Zealand Digital Library Project at the University of Waikato, New Zealand
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled “GNU Free Documentation License.”