Command Line Building

It is possible to create and build collections directly from the command line. This page provides the basic information on building Greenstone collections on the command line.

The first section shows how to rebuild a collection that has been created and edited in GLI. GLI doesn't do proper incremental building, so for large collections, it may save time to set up a collection using GLI and build it on the command line.

The second part shows how to create, edit and build a collection entirely using the command line.

Using GLI to create a collection, then using command line for building

If your collection will grow very large, it will save you time to build it using command line building tools. Initially, using GLI, you want to

  • Create a new collection
  • Add a few documents and metadata
  • Configure your collection. What indexes, plugin options, classifiers etc do you need?
  • Build it in GLI and preview. Do you need to change configuration settings?

Once you have the collection set up the way you want, then you can start adding the bulk of your documents. You can do this using GLI. And add metadata using GLI.

When its time to build, you can either build in GLI, or on the command line. Command line build is useful if you want to schedule it for building overnight, for example, or if you want to build incrementally. The sections below detail full build, and incremental build.

Set up Greenstone environment

To begin, you will need to open a terminal window (see below), and set up the Greenstone environment. In the terminal, change directory to the greenstone top level folder. Run the following command to setup the environment:

Greenstone versionWindowsLinux/Mac
2setupsource setup.bash
3gs3-setupsource gs3-setup.sh

Note, if you close your terminal window and start another one, you will need to invoke the setup command again.

Build on the command line

Now you can build the collection.

The main command for rebuilding a collection is full-rebuild.pl.

Greenstone versionWindowsLinux/Mac
2perl -S full-rebuild.pl <collname>full-rebuild.pl <collname>
3perl -S full-rebuild.pl -site localsite <collname>full-rebuild.pl -site localsite <collname>

Notes:

  • replace <collname> withe the short collection identifier. This is the name of the collection's folder in the collect folder. You can also see it in GLI's title bar. It will be in brackets after the collection title. Eg "greenstone demo collection (demo)". In this case, the collname is demo.
  • If you have a custom site for Greenstone 3, replace 'localsite' with your sitename.
  • There are options for full-rebuild.pl. View the list of options by running [perl -S] full-rebuild.pl -h
  • For Linux and MacOS, you can leave off the perl -S for all the perl commands on this page. If your Windows environment is set up to associate the Perl application with

files ending in .pl, you can also leave off perl -S for Windows too.

Running full-rebuild.pl will reimport and index all the documents. You will need to do this if you have changed plugin options, or other configuration options. If the configuration hasn't changed, and you just want to add new documents or update modified documents, then you should use incremental building.

Incremental building

Incremental building is where you only process the new or changed documents each time you build, thereby speeding up the build process. New and modified documents will be processed, and deleted documents will be removed from the collection. If metadata has changed, then documents will be reprocessed.

Important note for collection design: Greenstone can notice that metadata in a folder has been added/changed, but it is not smart enough to tell which documents in the folder the changed metadata belongs to. Therefore, if metadata in a folder has changed (including new metadata being added), then all documents in that folder will be reimported. This means that if you have all your documents in the top level import folder, adding new metadata or changing any metadata for any document will result in all documents being reimported. If you intend to do incremental import, then please organize your documents into subfolders. That way modifying metadata for some documents won't result in all other documents being reimported.

Note 2: An empty metadata file in an import folder (including the top level import folder) will trigger a full reimport of all documents in that folder. This is a bug in Greenstone 2.87, 3.08 and earlier. Empty metadata files will automatically get added by GLI. The solution is to add a piece of metadata to a document using the Enrich panel. Just one will do.

The main command for incremental rebuild is incremental-rebuild.pl. You can use this in place of full-rebuild.pl.

Greenstone versionWindowsLinux/Mac
2perl -S incremental-rebuild.pl <collname>incremental-rebuild.pl <collname>
3perl -S incremental-rebuild.pl -site localsite <collname>incremental-rebuild.pl -site localsite <collname>

Indexer Note: only the Lucene and Solr indexers can do incremental indexing. MG and MGPP cannot. If you do incremental-rebuild with MG or MGPP, indexing will be carried out over the entire collection. So we recommend Lucene or Solr if you will be doing incremental building.

Finer control of the build process

The build process actually consists of several stages:

  • importing the original documents into greenstone's XML archive format
  • building the collection, which includes indexing the archive documents and generating a database of metadata and classifier structures
  • activating the collection in the live library (if necessary)

These stages can all be run separately. Note, the greenstone environment must be set up in any terminal window before you can run these commands.

Importing a collection

This is the process of converting the original documents, which might be a mixture of file types, into a standardised XML based format - the Greenstone archive format. Original source documents live in the import folder of a collection, while the archive documents live in the archives folder.

The command to import a collection is import.pl. Type perl -S import.pl at the prompt to get a list of all the options for the import program, or view them here.

Greenstone versionImport command
2perl -S import.pl <collname>
3perl -S import.pl -site localsite <collname>

As before, you need to put in your own collection name, and change the site name if you are using a custom greenstone3 site.

Don't worry about all the text that scrolls past—it's just reporting the progress of the import. Note that you do not have to be in either the collect or dlpeople directories when this command is entered; because the Greenstone environment has been set up, the Greenstone software can work out where the necessary files are.

Incremental import

You can run just the import phase incrementally, using incremental-import.pl in place of import.pl.

Building a collection

The next phase is to “build” the collection, which creates all the indexes and databases that make the collection work. Type perl -S buildcol.pl at the command prompt for a list of collection-building options, which are also listed here. For now, stick to the defaults by typing

Greenstone versionBuild command
2perl -S buildcol.pl <collname>
3perl -S buildcol.pl -site localsite <collname>

Again, don't worry about the “progress report” text that scrolls past.

Make the collection live

Finally, we need to make the collection "live" by replacing the collection's old index folder with the contents of the building folder. And for greenstone 3, we need to reload it in the library. We can do this in two ways:

Running activate.pl

Greenstone versionActivate command
2perl -S activate.pl <collname>
3perl -S activate.pl -site localsite <collname>

Or manually: Delete the index folder, rename building to index, then restart the Greenstone3 server. Note, the collection lives in the following location:

Greenstone versionCollection location
2path-to-greenstone2/collect/<collname>
3path-to-greenstone3/web/sites/localsite/collect/<collname>

Passing import/buildcol options to rebuild scripts

Import or buildcol options can be passed to full-rebuild and incremental-rebuild. If the option is shared between import.pl and buildcol.pl then it can appear as is, such as -verbosity 5. This value will be passed to both programs. If an option is specific to one of the programs in particular, then prefix it with 'import:' or 'buildcol:' respectively, as in '-import:OIDtype hash_on_full_filename'.

Creating and Editing a Collection on the command line

Create a collection

To create the skeleton of a collection we use mkcol.pl. This creates all the folders the collection needs, and sets up a default configuration. Typing perl -S mkcol.pl will provide the full list of options, which you can also view here.

To create a new collection:

Greenstone versionmkcol command
2perl -S mkcol.pl [options] <collname>
3perl -S mkcol.pl -site localsite [options] <collname>

For example, to create a collection named dlpeople in localsite with the creator's email address of me@cs.waikato.ac.nz, type

Greenstone versionmkcol command
2perl -S mkcol.pl -creator me@cs.waikato.ac.nz <collname>
3perl -S mkcol.pl -site localsite -creator me@cs.waikato.ac.nz <collname>

(Since Greenstone3 allows you to have multiple sites, you must always specify in which site the collection is in. The default site is called localsite.)

To view the newly created files, move to the newly created collection directory by typing

Greenstone versionWindowsLinux/Mac
2cd %GSDL3HOME%\collect\dlpeoplecd $GSDL3HOME/collect/dlpeople
3cd %GSDL3HOME%\sites\localsite\collect\dlpeoplecd $GSDL3HOME/sites/localsite/collect/dlpeople

You can list the contents of this directory by typing dir (Windows) or ls (Linux/Mac). There should be several subdirectories, which differ slightly between Greenstone 2 & 3:

  • etc
  • images
  • import
  • macros (Greenstone2 only)
  • script
  • style

Add documents and metadata

To add documents into the collection, simply copy them into the import folder. YOu can manually add metadata by creating metadata.xml files or adding metadata databases. See metadata and specifying_filenames_manually_in_metadataxml for more details.

Edit the Config file

In the collection's etc directory there is a configuration file. This is collect.cfg for Greenstone 2, collectionConfig.xml for Greenstone 3. Any modifications that you can make in the GLI, can also be achieved by manually editing this file. Simply open it using your favorite text editor, e.g. Notepad or Wordpad, make changes and save it. You can learn more about the Collection configuration file here.

Build the Collection

Now you can build the collection using the rebuild commands, or using import/buildcol, as described in the earlier sections.

Additional information

Opening a terminal on Windows

On Windows, there are several different ways to open a DOS terminal (a black console screen known as the DOS Prompt). Do one of the following:

  • Start → All Programs → Accessories → Command Prompt
  • Under the Start menu, type cmd into the search box and press Enter
  • Hold down your keyboard's Windows key and press the key for letter r. (The Windows key is located between the Ctrl and Alt keys on your keyboard.) In the Run dialog that appears, type cmd in the textfield and press the OK button.
  • In any Windows Explorer, hold down Shift and right click in an empty area in the window. Select Open command window here from the menu.

Additional Resources

While this page only goes through the basics of building collections, there are many other scripts that can be run from the command line (like downloading documents). You can take a look at the scripts and their options to get an idea of what else is available.