It is possible to create and build collections directly from the command line. This page provides the basic information on building Greenstone collections on the command line.
The first section shows how to rebuild a collection that has been created and edited in GLI. GLI doesn't do proper incremental building, so for large collections, it may save time to set up a collection using GLI and build it on the command line.
The second part shows how to create, edit and build a collection entirely using the command line.
If your collection will grow very large, it will save you time to build it using command line building tools. Initially, using GLI, you want to
Once you have the collection set up the way you want, then you can start adding the bulk of your documents. You can do this using GLI. And add metadata using GLI.
When its time to build, you can either build in GLI, or on the command line. Command line build is useful if you want to schedule it for building overnight, for example, or if you want to build incrementally. The sections below detail full build, and incremental build.
To begin, you will need to open a terminal window (see below), and set up the Greenstone environment. In the terminal, change directory to the greenstone top level folder. Run the following command to setup the environment:
Greenstone version | Windows | Linux/Mac |
---|---|---|
2 | setup | source setup.bash |
3 | gs3-setup | source gs3-setup.sh |
Note, if you close your terminal window and start another one, you will need to invoke the setup command again.
Now you can build the collection.
The main command for rebuilding a collection is full-rebuild.pl.
Greenstone version | Windows | Linux/Mac |
---|---|---|
2 | perl -S full-rebuild.pl <collname> | full-rebuild.pl <collname> |
3 | perl -S full-rebuild.pl -site localsite <collname> | full-rebuild.pl -site localsite <collname> |
Notes:
files ending in .pl
, you can also leave off perl -S
for Windows too.
Running full-rebuild.pl will reimport and index all the documents. You will need to do this if you have changed plugin options, or other configuration options. If the configuration hasn't changed, and you just want to add new documents or update modified documents, then you should use incremental building.
Incremental building is where you only process the new or changed documents each time you build, thereby speeding up the build process. New and modified documents will be processed, and deleted documents will be removed from the collection. If metadata has changed, then documents will be reprocessed.
Important note for collection design: Greenstone can notice that metadata in a folder has been added/changed, but it is not smart enough to tell which documents in the folder the changed metadata belongs to. Therefore, if metadata in a folder has changed (including new metadata being added), then all documents in that folder will be reimported. This means that if you have all your documents in the top level import folder, adding new metadata or changing any metadata for any document will result in all documents being reimported. If you intend to do incremental import, then please organize your documents into subfolders. That way modifying metadata for some documents won't result in all other documents being reimported.
Note 2: An empty metadata file in an import folder (including the top level import folder) will trigger a full reimport of all documents in that folder. This is a bug in Greenstone 2.87, 3.08 and earlier. Empty metadata files will automatically get added by GLI. The solution is to add a piece of metadata to a document using the Enrich panel. Just one will do.
The main command for incremental rebuild is incremental-rebuild.pl. You can use this in place of full-rebuild.pl.
Greenstone version | Windows | Linux/Mac |
---|---|---|
2 | perl -S incremental-rebuild.pl <collname> | incremental-rebuild.pl <collname> |
3 | perl -S incremental-rebuild.pl -site localsite <collname> | incremental-rebuild.pl -site localsite <collname> |
Indexer Note: only the Lucene and Solr indexers can do incremental indexing. MG and MGPP cannot. If you do incremental-rebuild with MG or MGPP, indexing will be carried out over the entire collection. So we recommend Lucene or Solr if you will be doing incremental building.
The build process actually consists of several stages:
These stages can all be run separately. Note, the greenstone environment must be set up in any terminal window before you can run these commands.
This is the process of converting the original documents, which might be a mixture of file types, into a standardised XML based format - the Greenstone archive format. Original source documents live in the import folder of a collection, while the archive documents live in the archives folder.
The command to import a collection is import.pl
. Type perl -S import.pl
at the prompt to get a list of all the options for the import program, or view them here.
Greenstone version | Import command |
---|---|
2 | perl -S import.pl <collname> |
3 | perl -S import.pl -site localsite <collname> |
As before, you need to put in your own collection name, and change the site name if you are using a custom greenstone3 site.
Don't worry about all the text that scrolls past—it's just reporting the progress of the import. Note that you do not have to be in either the collect or dlpeople directories when this command is entered; because the Greenstone environment has been set up, the Greenstone software can work out where the necessary files are.
You can run just the import phase incrementally, using incremental-import.pl
in place of import.pl
.
The next phase is to “build” the collection, which creates all the indexes and databases that make the collection work.
Type perl -S buildcol.pl
at the command prompt for a list of collection-building options, which are also listed here.
For now, stick to the defaults by typing
Greenstone version | Build command |
---|---|
2 | perl -S buildcol.pl <collname> |
3 | perl -S buildcol.pl -site localsite <collname> |
Again, don't worry about the “progress report” text that scrolls past.
Finally, we need to make the collection "live" by replacing the collection's old index
folder
with the contents of the building
folder. And for greenstone 3, we need to reload it in the library. We can do this in two ways:
Running activate.pl
Greenstone version | Activate command |
---|---|
2 | perl -S activate.pl <collname> |
3 | perl -S activate.pl -site localsite <collname> |
Or manually: Delete the index folder, rename building to index, then restart the Greenstone3 server. Note, the collection lives in the following location:
Greenstone version | Collection location |
---|---|
2 | path-to-greenstone2/collect/<collname> |
3 | path-to-greenstone3/web/sites/localsite/collect/<collname> |
Import or buildcol options can be passed to full-rebuild and incremental-rebuild. If the option is shared between import.pl and buildcol.pl then it can appear as is, such as -verbosity 5. This value will be passed to both programs. If an option is specific to one of the programs in particular, then prefix it with 'import:' or 'buildcol:' respectively, as in '-import:OIDtype hash_on_full_filename'.
To create the skeleton
of a collection we use mkcol.pl. This creates all the folders the collection needs, and sets up a default configuration. Typing perl -S mkcol.pl
will provide
the full list of options, which you can also view here.
To create a new collection:
Greenstone version | mkcol command |
---|---|
2 | perl -S mkcol.pl [options] <collname> |
3 | perl -S mkcol.pl -site localsite [options] <collname> |
For example, to create a collection named dlpeople in localsite
with the creator's email address of [email protected], type
Greenstone version | mkcol command |
---|---|
2 | perl -S mkcol.pl -creator [email protected] <collname> |
3 | perl -S mkcol.pl -site localsite -creator [email protected] <collname> |
(Since Greenstone3 allows you to have multiple sites, you must always specify in which site the collection is in. The default site is called localsite
.)
To view the newly created files, move to the newly created collection directory by typing
Greenstone version | Windows | Linux/Mac |
---|---|---|
2 | cd %GSDL3HOME%\collect\dlpeople | cd $GSDL3HOME/collect/dlpeople |
3 | cd %GSDL3HOME%\sites\localsite\collect\dlpeople | cd $GSDL3HOME/sites/localsite/collect/dlpeople |
You can list the contents of this directory by typing dir
(Windows) or ls
(Linux/Mac).
There should be several subdirectories, which differ slightly between Greenstone 2 & 3:
To add documents into the collection, simply copy them into the import folder. YOu can manually add metadata by creating metadata.xml files or adding metadata databases. See metadata and specifying_filenames_manually_in_metadataxml for more details.
In the collection's etc
directory there is a configuration file. This is collect.cfg
for Greenstone 2, collectionConfig.xml
for Greenstone 3.
Any modifications that you can make in the GLI, can also be achieved by manually editing this file. Simply open it using your favorite text editor,
e.g. Notepad or Wordpad, make changes and save it. You can learn more about the Collection configuration file here.
Now you can build the collection using the rebuild commands, or using import/buildcol, as described in the earlier sections.
On Windows, there are several different ways to open a DOS terminal (a black console screen known as the DOS Prompt). Do one of the following:
Start → All Programs → Accessories → Command Prompt
cmd
into the search box and press Entercmd
in the textfield and press the OK button. Open command window here
from the menu.While this page only goes through the basics of building collections, there are many other scripts that can be run from the command line (like downloading documents). You can take a look at the scripts and their options to get an idea of what else is available.