Greenstone tutorial exercise

Back to wiki
Back to index
Prerequisite: A collection of Word and PDF files
Devised for Greenstone version: 2.70w|3.06
Modified for Greenstone version: 2.86|3.07

Formatting the Word and PDF collection

In this exercise, we play around with the format statements in the Word and PDF collection.

  1. Open the reports collection in the Librarian Interface and go to the Format Features section of the Format panel.

Tidying up the default format statement

  1. In this part of the exercise, we make the format statement simpler without changing the resulting display.

    Greenstone's default format statement is complex because it is designed to produce something reasonable under almost any conditions, and also because for practical reasons it needs to be backwards compatible with legacy collections. For this collection, we don't need all of the complexity.

    Make sure that the Browse format statement is selected in the list of formats.

    An excerpt from the default Browse format statement for documentNode looks like the following:

    <td valign="top">
    <gsf:link type="source">
    <gsf:choose-metadata>
    <gsf:metadata name="thumbicon"/>
    <gsf:metadata name="srcicon"/>
    </gsf:choose-metadata>
    </gsf:link>
    </td>

    This format statement is the default used for the documentNode vertical lists under classifiers.

    <gsf:choose-metadata>
    <gsf:metadata name="thumbicon"/>
    <gsf:metadata name="srcicon"/>
    </gsf:choose-metadata>

    chooses ex.thumbicon metadata if it's there, otherwise chooses ex.srcicon metadata. If neither are present, nothing is displayed. For this collection there is no ex.thumbicon metadata so the choice is not needed.

    Replace the longer excerpt above with

    <td valign="top">
    <gsf:link type="source">
    <gsf:metadata name="srcicon"/>
    </gsf:link>
    </td>

    Next edit the global format features: there is no exp.Title metadata, so remove that element from the following

    <gsf:choose-metadata>
    <gsf:metadata name="dc.Title"/>
    <gsf:metadata name="exp.Title"/>
    <gsf:metadata name="ex.dc.Title"/>
    <gsf:metadata name="Title"/>
    <gsf:default>Untitled</gsf:default>
    </gsf:choose-metadata>

    Preview the collection to make sure the display hasn't changed. You shouldn't notice any difference when looking at search results, classifiers etc.

Linking to Greenstone version or original version of documents

  1. For collections with documents that undergo a conversion process during importing (e.g. Word, PDF, PowerPoint documents, but not text, HTML documents), the original file is stored in the collection along with the converted version. The default Browse format statement links to both versions, but the format statement for Search links only to the converted version of the original file:

    <gsf:link type="document">
    <gsf:icon type="document"/>
    </gsf:link>

    links to the Greenstone HTML version, while

    <gsf:link type="source">
    <gsf:metadata name="srcicon"/>
    </gsf:link>

    links to the original.

    Choose Search in Format Features. Experiment with removing either of the two links from the format statement.

    To see the results of your changes, preview the collection and do a search. You are making changes to documentNodes under Search, which means the changes will only apply to search results.

    Storing and displaying the original allows users to see the correct format, but requires the user to have the relevant program installed. It also increases the size of the collection. The Greenstone version can be viewed in a browser, but may not look as nice.

Making bookshelves show how many items they contain

  1. Next, we'll customize the format statement for the creators list. Classifier bookshelves have only a few pieces of metadata to display: ex.Title and numleafdocs. Whatever metadata the classifier has been built on, the bookshelf label is always stored as ex.Title. This is why a Creator is printed out for each bookshelf even though dc.Creator is not specified in the format statement.

    Make each bookshelf in the Creator classifier show how many entries it contains. In the Format Features section of the Format panel, select the Browse format statement. This consists of three parts: the first gsf:template is the format statement defining the display of a documentNode, the second one is the format statement that controls the appearance of VList classifierNodes (which appear as bookshelves here), while the final gsf:template block is the format statement defining the display of HList classifierNodes.

    Scroll down to the end of the second format statement, which is the one for the VList classifiers and appears just before the start of the format statement for HList classifiers. Then insert the line highlighted below, which will display the number of leaf documents inside a classifier bookshelf:

    <gsf:template match="classifierNode[@classifierStyle = 'VList']">
    ...
    <td valign="top">(<gsf:metadata name="numleafdocs"/>)</td>
    </gsf:template>
    <gsf:template match="classifierNode[@classifierStyle = 'HList']">
    <gsf:link type="classifier">
    <gsf:metadata name="Title"/>
    </gsf:link>
    </gsf:template>

    Preview the collection. Click on the creators list and notice that the bookshelves now display how many documents they contain.

    This revised format statement has the effect of specifying in brackets how many items are contained within a bookshelf.

Displaying multi-valued metadata

  1. Next we modify the document entries in the Creator classifier to display all authors. Back in Format Features, select the Browse format in the list of assigned formats. Edit the format statement for documentNode after the part where it displays the Title metadata, so that it now additionally contains the new line highlighted below. This will display the dc.Creator metadata.

    <td valign="top">
    <gsf:link type="document">
    <xsl:call-template name="choose-title"/>
    <gsf:switch>
    <gsf:metadata name="Source"/>
    <gsf:when test="exists">
    <br/>
    <i>(<gsf:metadata name="Source"/>)</i>
    </gsf:when>
    </gsf:switch>
    </gsf:link>
    <br/>
    <gsf:metadata name="dc.Creator" />

    </td>

    The format statement as it is above will now display the Greenstone link, the link to the original, then the Title as before. Since it's defined for documentNodes, it will display all the Authors (Creators), and the source document for documents. Preview the creators list and make sure that all authors are displayed for documents.

    The additional line <gsf:metadata name="dc.Creator" /> displays all the Creator metadata for the document, separated by a comma (", "). All the metadata values for dc.Creator will be returned by default. If you wish to retrieve only the first, last or nth value for a metadata, you would use the pos attribute. For example, <gsf:metadata name="dc.Creator" pos="first"/> (or alternatively, <gsf:metadata name="dc.Creator" pos="1"/>) displays only the first author.

    In sectioned documents, you can allow users to browse and search on the section level. In this case, you may want to request metadata for the parent, ancestors, or root (i.e. document) of the current section. For instance, if you want to display the title of the section's parent you can use <gsf:metadata name="Title" select="parent"/>.

  1. You can change the separator between the authors. Modify the format statement, and replace <gsf:metadata name="dc.Creator" /> with <gsf:metadata name="dc.Creator" separator=" "/>. This will add a space after each author. Preview the creators list. However, if you want a newline to separate each author, it requires a little more in order to escape the HTML newline (<br />) element:

    <gsf:metadata name="dc.Creator"><separator><br /></separator></gsf:metadata>

    If you have done exercise Enhanced Word document handling, the collection will have both dc.Creator and ex.Creator metadata. To display the metadata values for both, you can use

    <gsf:metadata name="dc.Creator" />, <gsf:metadata name="Creator" />

    To display dc.Creator if it is present, otherwise display ex.Creator, use

    <gsf:choose-metadata>
    <gsf:metadata name="dc.Creator" />
    <gsf:metadata name="Creator" />
    </gsf:choose-metadata>

Advanced multi-valued metadata

  1. You may notice that the AZCompactList classifier's configuration dialog has two options after the metadata option: firstvalueonly and allvalues. Manually added metadata can be used to replace or enhance automatically extracted metadata, and these options control exactly which pieces of metadata a document is classified by.

    For example, say we have two documents. Document 1 has four Creators specified (dc.Creator = dcA, dc.Creator = dcB, ex.Creator = exA, ex.Creator = exB), while document 2 has three (ex.Creator = exA, ex.Creator = exB, ex.Creator = exC). The following table shows which metadata values each document is classified by, for the different classifier options:

    AZCompactList optionsDocument 1Document 2
    -metadata dc.Creator,ex.CreatordcA, dcBexA, exB, exC
    -metadata dc.Creator,ex.Creator -firstvalueonlydcAexA
    -metadata dc.Creator,ex.Creator -allvaluesdcA, dcB, exA, exBexA, exB, exC
  1. We'll now set the firstvalueonly option for the creators classifier. Switch to the Browsing Classifiers section of the Design panel, select the AZCompactList for dc.Creator metadata in the Assigned Classifiers box and click <Configure Classifier...>. Select the firstvalueonly option.

    Rebuild and preview the collection. Now the creators list classifies documents based on the first author appearing in the dc.Creator metadata.

    If you set the metadata field of AZCompactList to dc.Creator,ex.Creator in the A collection of Word and PDF files exercise, now the creators list will classify based on the first author appearing in either the dc.Creator metadata or the ex.Creator metadata.


Copyright © 2005-2015 by the New Zealand Digital Library Project at the University of Waikato, New Zealand
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled “GNU Free Documentation License.”