Table of Contents

Building Collections

Part of the Greenstone Beginner's Guide

Greenstone Librarian Interface

The simplest way to build new collections is to use the Greenstone Librarian Interface (GLI). The GLI is a graphical tool for building new collections, altering or deleting existing collections, and exporting existing collections. It allows you to import or assign metadata, and has an interactive collection design module. Launch the GLI under Windows by selecting Greenstone from the Programs section of the Start menu and choosing Librarian Interface. Under Linux, run gli.sh from the gsdl/gli directory. GLI supports six basic activities:

In addition, GLI offers a few other options and features in its File menu.

We're going to take a quick look at each of these activities here. For more assistance using the GLI, there is a GLI Help, which is available here on the wiki and can also be accessed through the GLI, by clicking Help in the upper-right corner of the interface.

To view all of the panels in the GLI, you need to have a collection open. Go to File → New… to create a new collection. This collection can either be created using the Greenstone defaults, or, it can be based on any other collection in your library, and will have the same metadata sets, design and formatting of the base collection. The collections page outlines the basics of creating, opening, saving, and deleting collections.

Downloading Files from the Internet

The Download panel allows you to download files from the internet using a variety of protocols:

Collecting Files for Your Collection

The Gather panel is where you determine which files will be in your collection. You can add any files on your computer, and they will be copied into your collection. You can also rename and remove files from your collection here.

Finally, some files (those that are metadata database file types, such as, MARC, OAI, CDS/ISIS, BibTex, Refer and ProCite) can include data for many different documents. Because of this, their metadata cannot be immediately viewed/edited in the Librarian Interface. However, these files can be 'exploded' into individual records for editing.

Enriching Your Collection with Metadata

After adding documents to your collection, you can manually add metadata for the documents in the Enrich panel. Metadata can be added for individual documents, multiple documents, and folders. You can add or remove metadata sets (by default the Dublin Core, Greenstone, and extracted metadata sets are selected). From this panel, you can also access GEMS to build your own, new metadata sets.

Once you have built your collection (see Producing your collection below), you will also be able to view any metadata Greenstone has extracted from the files (this is the metadata that begins with ex.).

If you don't want to enter metadata using the GLI, there are other options, like having the metadata in a CSV file. You can read more about metadata in Greenstone on the metadata page.

Configuring Your Collection

The Design panel dictates how your documents will be handled (using document plugins), and how users will interact with your collection (using search indexes, partition indexes and browsing classifiers).

Plugins

Plugins tell Greenstone how to process the files in your collection. Every document must be processed by a plugin. There are two types of plugins: Document Plugins and Metadata Plugins. As their names suggest, document plugins handle the documents that comprise your collection. The plugin name is often a very good indicator of what documents it will process: the WordPlugin processes Microsoft Word documents and the ImagePlugin processes image files (e.g. PNG, JPG, GIF). Metadata plugins handle files containing metadata about documents in your collection (e.g. CSV).

For more information on how Greenstone interacts with specific file types, including which plugins can process them, visit the document types page.

Searching

When determining how users will be able to perform search queries on your collection, you have three things to consider: the search indexer, the search index(es), and the partition index(es).

For every collection, you can decide which search indexer to use—MG, MGPP, or Lucene. The search indexer is what parses and indexes the text. Basically, the indexer determines how search indexes will be built, and each indexer works a bit differently. Depending on the indexer you select, you can decide to index word stems, ignore case (case folding), ignore accents (accent folding), and index at the document level, section level, or both (and choose which is the default level).

Search Indexes specify which parts of the text are searchable. You can assign any number of search indexes to a collection. You can build indexes on the full-text of the documents, on specific metadata fields (like titles or authors), and on any combination of fields. Indexes can be searched for particular words, combinations of words, or phrases, and results are ordered according to how relevant they are to the query.

Partition indexes can be used to split your collection into subsections for search purposes. If your collection includes documents in multiple languages, you can create subsections based on language. You can also create partitions based on the value of any metadata field(s).

Cross-collection search (searching multiple collections at once), is automatically enabled for Greenstone3, and can be specified on the Format panel in Greenstone2.

Browsing

Browsing involves lists that the user can examine: lists of authors, titles, dates, hierarchical classification structures, and so on. Users can browse interactively around lists, and hierarchical structures that are generated from the metadata that is associated with each document in the collection.

The ability to browse collections is handled by browsing classifiers. You choose which browsing classifiers will be created for each collection. You can create a browsing classifier for any metadata field (or combination of metadata fields) in your collection. Every classifier you create results in an additional tab on the navigation bar of your collection's website.

Configuring classifiers allows you to specify whether the documents will be displayed on one page or several (and how they will be split into sections, e.g. a specific number of documents per page or a page for each letter).

Greenstone3

In the Browsing Classifier section of the Design Panel, you can choose between two flat-file collection databases. (Alternatively, it is possible to manually create a collection database with MS-SQL.)

Greenstone2

In the Browsing Classifier section of the Design Panel, you can choose the database in use for the collection: JDBM, GDBM, or SQLite. (Alternatively, it is possible to manually create a collection database with MS-SQL.)

Producing Your Collection

After adding in your documents, providing metadata, and configuring plugins, indexes, and classifiers, you are ready to build your collection. This is done in the Create panel. As the collection builds, information about the build (including what plugin processes each document) will be displayed.

The Create panel presents many build options (covered in-depth on the collection building page). One of the most used is the ability to schedule builds (only available in expert mode).

Any time you make changes on the Gather, Enrich, or Design panels, you must rebuild the collection before the changes will take effect.

Customizing Your Collection's Appearance

Greenstone3

Finally, we have the Format panel, which provides a certain amount of control over how your collection looks. You can:

  • write the description of your collection
  • provide names for search indexes
  • choose pictures to represent the collection on the library home page and the header image
  • translate pieces of text

Most of your time in the Format panel, though, will likely be spent in the Format features section, where you write format statements for your collection. Format statements dictate the format of the content of individual documents pages; of documents in the list of search results; and of documents in the browsing classifiers. They allow you to specify what metadata is displayed for each document and how it is displayed. They also determine which search pages are enabled (plain, simple, and/or advanced).

Changes in the Format panel do not require the collection to be rebuilt to take effect. However, for Greenstone3, the Preview Collection button must be pressed to view any changes made in the Format panel.

Greenstone2

Finally, we have the Format panel, which provides a certain amount of control over how your collection looks. You can:

  • write the description of your collection
  • provide names for search indexes
  • choose pictures to represent the collection on the library home page and the header image
  • translate pieces of text
  • choose collections for cross-collection searching
  • decide which metadata fields should be used to describe new documents added through the Depositor

In addition, if you want to change the style of this collection's website specifically, you can use the Collection Specific Macros, where you can define you own macros for this collection, including macros to add CSS and script. You can read more about macros to get a better understanding of what they are, how they work in Greenstone, and how they can be used to customize your collection.

Most of your time in the Format panel, though, will likely be spent in the Format features section, where you write format statements for your collection. Format statements dictate the format of the content of individual documents pages; of documents in the list of search results; and of documents in the browsing classifiers. They allow you to specify what metadata is displayed for each document and how it is displayed.

Changes in the Format panel do not require the collection to be rebuilt to take effect.

Additional Options and Features

The GLI File menu provides some important options and features. Preferences allows you to change the interface language; the mode, which effects what functions are accessible in the GLI; and connection settings (like the web path to Greenstone and proxy settings).

File Associations determine which application is used to open each document type when you open documents in the GLI.

Greenstone3

Export allows you to export collections into different metadata formats (METS, DSpace, and MARCXML).

Greenstone2

Export and Export to CD/DVD allow you to export collections into different metadata formats (METS, DSpace, and MARCXML) and into a format that can be used to write an executable CD-ROM/DVD, respectively.

Note to Mac users: Keyboard shortcuts for cut, copy, and paste within the GLI are Ctrl-x, Ctrl-c, and Ctrl-v, respectively— the same as the shortcuts on Windows and Linux. So, if you're going to copy some text from another application you'd use Apple-c as usual, but to paste this in GLI you would use Ctrl-v. Alternately, use the Edit menu to cut, copy, and paste in the GLI.

Other Collection Building Options

Besides the GLI, there are several other ways to build Greenstone collections:

You now know the basics of using Greenstone to build a digital library! But we aren't done quite yet. It is also important to know how to customize your collection, so it looks like your own.