User Tools

Site Tools


en:user_advanced:command_line_building
This version is outdated by a newer approved version.DiffThis version (2018/03/11 23:41) is a draft.
Approvals: 0/1

This is an old revision of the document!


Command Line Building

It is possible to create and build collections directly from the command line. This page provides the basic information on building Greenstone collections on the command line. The full instructions are provided for Windows users. If you are on a MacOS/Linux, the steps are the same, but some of the commands themselves are slightly different. These differences are listed in the MacOSX/Linux section.

The first part of this page shows how to rebuild a collection that has been created and edited in GLI. GLI doesn't do proper incremental building, so for large collections, it may save time to set up a collection using GLI and build it on the command line.

The second part shows how to create, edit and build a collection entirely using the command line.

Using GLI to create a collection, using command line for building

If your collection will grow very large, it will save you time to build it using command line building tools. Initially, using GLI, you want to

  • Create a new collection
  • Add a few documents and metadata
  • Configure your collection. What indexes, plugin options, classifiers etc do you need?
  • Build it in GLI and preview. Do you need to change configuration settings?

Once you have the collection setup the way you want, then you can start adding the bulk of your documents. You can do this using GLI. And add metadata using GLI.

Setup Greenstone environment

To begin, you will need to open a terminal window, and set up the Greenstone environment. In the terminal, change directory to the greenstone top level folder. Run the following command to setup the environment:

Greenstone versionWindowsLinux
2setupsource setup.bash
3gs3-setupsource gs3-setup.sh

Note, if you close your terminal window and start another one, you will need to invoke the setup command again.

Build on the command line

Now you can build the collection.

The main command for rebuilding a collection is full-rebuild.pl.

Greenstone versionWindowsLinux
2perl -S full-rebuild.pl <collname>full-rebuild.pl <collname>
3perl -S full-rebuild.pl -site localsite <collname>full-rebuild.pl -site localsite <collname>

Notes:

  • replace <collname> withe the short collection identifier. This is the name of the collection's folder in the collect folder. You can also see it in GLI's title bar. It will be in brackets after the collection title. Eg "greenstone demo collection (demo)". In this case, the collname is demo.
  • If you have a custom site for Greenstone 3, replace 'localsite' with your sitename.
  • There are options for full-rebuild.pl. View the list of options by running [perl -S] full-rebuild.pl -h

Running full-rebuild.pl will reimport and index all the documents. This is useful to do every so often and especially if you have changed plugin options, or other configuration options. If the configuration hasn't changed, and you just want to add new documents or update modified documents, then you should use incremental building.

Incremental building

Incremental building is where you only process the new or changed documents each time you build, thereby speeding up the build process. New and modified documents will be processed, and deleted documents will be removed from the collection. If metadata has changed, then documents will be reprocessed.

Important note for collection design: Greenstone can notice that metadata in a folder has been added/changed, but it is not smart enough to tell which documents in the folder the changed metadata belongs to. Therefore, if metadata in a folder has changed (including new metadata being added), then all documents in that folder will be reimported. This means that if you have all your documents in the top level import folder, adding new metadata or changing any metadata for any document will result in all documents being reimported. If you intend to do incremental import, then please organize your documents into subfolders. That way modifying metadata for some documents won't result in all other documents being reimported.

The main command for incremental rebuild is incremental-rebuild.pl. You can use this in place of full-rebuild.pl.

Greenstone versionWindowsLinux
2perl -S incremental-rebuild.pl <collname>incremental-rebuild.pl <collname>
3perl -S incremental-rebuild.pl -site localsite <collname>incremental-rebuild.pl -site localsite <collname>

Indexer Note: only the Lucene and Solr indexers can do incremental indexing. MG and MGPP cannot. If you do incremental-rebuild with MG or MGPP indexing will be carried out over the entire collection. So we recommend Lucene or Solr if you will be doing incremental building.

Finer control of the build process

The build process actually consists of several stages:

  • importing the original documents into greenstone's XML archive format
  • building the collection: indexing the archive documents and generating a database of metadata and classifier structures
  • activating the collection in the live library (if necessary)

These stages can all be run separately. Note, the greenstone environment must be setup in any terminal window before you can run these commands.

Import the collection

Now you are ready to “import” the collection. This is the process of bringing the documents into the Greenstone system, standardizing the document format, the way that metadata is specified, and the file structure in which the documents are stored. Type perl —S import.pl at the prompt to get a list of all the options for the import program, or view them here.

perl —S import.pl -site localsite dlpeople

Don't worry about all the text that scrolls past—it's just reporting the progress of the import. Note that you do not have to be in either the collect or dlpeople directories when this command is entered; because %GSDL3SRCHOME% is already set, the Greenstone software can work out where the necessary files are.

Build the collection

The next phase is to “build” the collection, which creates all the indexes and files that make the collection work. Type perl —S buildcol.pl at the command prompt for a list of collection-building options, which are also listed here. For now, stick to the defaults by typing

perl —S buildcol.pl -site localsite dlpeople

Again, don't worry about the “progress report” text that scrolls past.

Make the collection live

Finally, we need to make the collection "live" by replacing the collection's old index folder with the contents of the building folder. We can do this in two ways:

In an explorer window (i.e. outside of the terminal) simply select the contents of the dlpeople collection's building directory and drag them into the index directory.

Alternatively, you can remove the index directory (and all its contents) by typing the command

rd /s index            # on Windows NT/2000
deltree /Y index       # on Windows 95/98

and then change the name of the building directory to index with

ren building index

It is important that these commands are issued from the correct directory (unlike the Greenstone commands mkcol.pl, import.pl and buildcol.pl). If the current working directory is not dlpeople, type cd %GSDL3HOME%\sites\localsite\collect\dlpeople before going through the rd, ren and mkdir sequence above.

If your Greenstone server is already running, you should be able to access the newly built collection from your Greenstone homepage. You will have to reload the page if you already had it open in your browser, or perhaps even close the browser and restart it (to prevent caching problems).

In summary then, the commands typed to produce the dlpeople collection are:

cd C:\Users\jsmith\Greenstone3 # assuming default location
gs3-setup
perl —S mkcol.pl -site localsite —creator [email protected] dlpeople
cd %GSDL3HOME%\sites\localsite\collect\dlpeople
xcopy /s C:\Users\jsmith\dldocuments\* import
perl —S import.pl -site localsite dlpeople
perl —S buildcol.pl -site localsite dlpeople
rd /s index           # on Windows NT/2000
deltree /Y index      # on Windows 95/98
ren building index

Creating and Editing a Collection on the command line

Create a collection

The first program we will look at is the Perl program mkcol.pl, whose name stands for “make a collection.” Typing perl —S mkcol.pl will provide the full list of options, which you can also view here.

(If your Windows environment is set up to associate the Perl application with files ending in .pl, you can leave off the perl -S for all of these scripts.)

To create a new collection:

perl -S mkcol.pl [options] collection-name

For example, to create a collection named dlpeople in localsite with the creator's email address of [email protected], type

perl —S mkcol.pl -site localsite —creator [email protected] dlpeople


(Since Greenstone3 allows you to have multiple sites, you must always specify in which site the collection is in. The default site is called localsite.)

To view the newly created files, move to the newly created collection directory by typing

cd %GSDL3HOME%\sites\localsite\collect\dlpeople
cd $GSDL3HOME/sites/localsite/collect/dlpeople

You can list the contents of this directory by typing dir. There should be six subdirectories:

  • etc
  • images
  • import
  • script
  • style

Add documents

Now we must populate the collection with sample documents. To do this, we copy documents into the collections import folder. Assuming your documents are in the folder C:\Users\jsmith\dldocuments, you can either:

select the contents of the dldocuments directory and drag them into the dlpeople collection's import directory.

Or, you can type the command

xcopy /s C:\Users\jsmith\dldocuments\* import

Edit the Config file

In the collection's etc directory there is a file called collectionConfig.xml. Any modifications that you can make in the GLI, can also be achieved by manually editing the collectionConfig.xml file. Simply open it using your favorite text editor, e.g. Notepad or Wordpad, make changes and save it. You can learn more about the Collection configuration file here.

Additional information

Opening a terminal on Windows

On Windows, there are several different ways to open a DOS terminal (a black console screen known as the DOS Prompt). Do one of the following:

  • Start → All Programs → Accessories → Command Prompt
  • Under the Start menu, type cmd into the search box and press Enter
  • Hold down your keyboard's Windows key and press the key for letter r. (The Windows key is located between the Ctrl and Alt keys on your keyboard.) In the Run dialog that appears, type cmd in the textfield and press the OK button.
  • In any Windows Explorer, hold down Shift and right click in an empty area in the window. Select Open command window here from the menu.

88888888888888888888888888888

<TABAREA tabs="Greenstone3,Greenstone2"> <TAB>

MacOSX/Linux


To create a collection:

mkcol.pl -site localsite —creator [email protected] dlpeople


To move to the newly created collection directory:

cd $GSDL3HOME/sites/localsite/collect/dlpeople


You can list the contents of this directory by typing ls. In the collection's etc directory there is a file called collect.cfg. You can open and edit this using your favorite text editor — emacs is a popular editor on Linux.

To copy the contents of the /home/documents/dldocuments directory into the GSDL3HOME/sites/localsite/collect/dlpeople/import directory. To do this, type the command

cp —r /home/documents/dldocuments/*   import/


To “import” the collection:

import.pl -site localsite dlpeople


Next, “build” the collection:

buildcol.pl -site localsite dlpeople


Finally, make the collection “live” by putting all the material that has just been put in the collection's building directory into the index directory. First, remove the old index:

rm —r index/*

(assuming you are in the dlpeople directory)


And move the building directory to index:

mv building/* index/


In summary then, the commands typed to produced the dlpeople collection are:

cd /home/jsmith/Greenstone3 # assuming default Greenstone in user directory
source ./gs3-setup.bash 
mkcol.pl —creator [email protected] dlpeople
cd $GSDL3HOME/collect/dlpeople
cp —r /home/documents/dldocuments/*   import/
import.pl -site localsite dlpeople
buildcol.pl -site localsite dlpeople
rm -r index/*
mv building/* index

Additional Resources

While this page only goes through the basics of building collections, there are many other scripts that can be run from the command line (like downloading documents). You can take a look at the scripts and their options to get an idea of what else is available. </TAB> <TAB>

Windows

Open a terminal

On Windows, there are several different ways to open a DOS terminal (a black console screen known as the DOS Prompt). Do one of the following:

  • Start → All Programs → Accessories → Command Prompt
  • Under the Start menu, type cmd into the search box and press Enter
  • Hold down your keyboard's Windows key and press the key for letter r. (The Windows key is located between the Ctrl and Alt keys on your keyboard.) In the Run dialog that appears, type cmd in the textfield and press the OK button.
  • In any Windows Explorer, hold down Shift and right click in an empty area in the window. Select Open command window here from the menu.

Setup the Environment

In order to build collections in Greenstone (or run any other Greenstone scripts from the command line), you must first setup the terminal's environment for Greenstone. To do this, first change into the directory where Greenstone has been installed. Assuming Greenstone was installed in its default location (and your username is "jsmith"), you can move there by typing:

cd C:\Users\jsmith\Greenstone

Note if the path to your Greenstone installation includes spaces (e.g. Program Files), you must put quotations around the path. For example: cd "C:\Program Files\Greenstone" and cd "%GSDLHOME%\collect\dlpeople".

Next, at the prompt type:

setup

This batch file (which you can read if you like) tells the system where to look for Greenstone programs.

Note: On Windows 95/98 systems running setup.bat may fail with an Out of environment space error. If this happens, you should edit your system's config.sys file (normally found at C:\config.sys) and add the line shell=C:\command.com /e:4096 /p (where C: is your system drive letter) to expand the size of the environment table. You'll need to reboot for this change to take effect, and then repeat the steps above for Greenstone.

If, later on in your interactive session at the DOS prompt, you wish to return to the top level Greenstone directory you can accomplish this by typing cd %GSDLHOME%.

If you close your DOS window and start another one, you will need to invoke setup.bat again.

Now you are in a position to make, build and rebuild collections.

Create a collection

The first program we will look at is the Perl program mkcol.pl, whose name stands for “make a collection.” Typing perl —S mkcol.pl will provide the full list of options, which you can also view here.

(If your Windows environment is set up to associate the Perl application with files ending in .pl, you can leave off the perl -S for all of these scripts.)

To create a new collection:

perl -S mkcol.pl [options] collection-name

For example, to create a collection named dlpeople with the creator's email address of [email protected], type

perl —S mkcol.pl —creator [email protected] dlpeople


Please substitute your email address for mine!

To view the newly created files, move to the newly created collection directory by typing

cd %GSDLHOME%\collect\dlpeople

You can list the contents of this directory by typing dir. There should be six subdirectories:

  • etc
  • images
  • import
  • macros
  • script
  • style

Add documents

Now we must populate the collection with sample documents. To do this, we copy documents into the collections import folder. Assuming your documents are in the folder C:\Users\jsmith\dldocuments, you can either:

select the contents of the dldocuments directory and drag them into the dlpeople collection's import directory.

Or, you can type the command

xcopy /s C:\Users\jsmith\dldocuments\* import

Edit the Config file

In the collection's etc directory there is a file called collect.cfg. Open it using your favorite text editor, e.g. Notepad or Wordpad. Any modifications that you can make in the GLI, can also be achieved by manually editing this collection configuration file. Simply open it using your favorite text editor, e.g. Notepad or Wordpad, make changes and save it. You can learn more about the Collection configuration file here.

Build the collection

Building a collection consists of two main stages, importing and building. Importing is the process of bringing the documents into the Greenstone system, standardizing the document format, the way that metadata is specified, and the file structure in which the documents are stored. The building stage generates the indexes, databases and other auxiliary files that are needed to make the collection work in Greenstone.

These processes can be run separately, or, in later Greenstone versions, a single script can be run which invokes both processes (see below).

Importing

Type perl —S import.pl at the prompt to get a list of all the options for the import program, or view them here.

perl —S import.pl dlpeople

Don't worry about all the text that scrolls past—it's just reporting the progress of the import. Note that you do not have to be in either the collect or dlpeople directories when this command is entered; because %GSDLHOME% is already set, the Greenstone software can work out where the necessary files are.

Building

Type perl —S buildcol.pl at the command prompt for a list of collection-building options, which are also listed here. For now, stick to the defaults by typing:

perl —S buildcol.pl dlpeople

Again, don't worry about the “progress report” text that scrolls past.

Make the collection live

Finally, we need to make the collection "live" by replacing the collection's old index folder with the contents of the building folder. We can do this in two ways:

In an explorer window (i.e. outside of the terminal) simply select the contents of the dlpeople collection's building directory and drag them into the index directory.

Alternatively, you can remove the index directory (and all its contents) by typing the command

rd /s index            # on Windows NT/2000
deltree /Y index       # on Windows 95/98

and then change the name of the building directory to index with

ren building index

It is important that these commands are issued from the correct directory (unlike the Greenstone commands mkcol.pl, import.pl and buildcol.pl). If the current working directory is not dlpeople, type cd %GSDLHOME%\collect\dlpeople before going through the rd, ren and mkdir sequence above.

If your Greenstone server is already running, you should be able to access the newly built collection from your Greenstone homepage. You will have to reload the page if you already had it open in your browser, or perhaps even close the browser and restart it (to prevent caching problems). Alternatively, if you are using the “local library” version of Greenstone you will have to restart the library program.

Build the collection in one easy step

An alternative to running import, then build, then deleting the old index and renaming building to index, is to run a single command, full-rebuild.pl.

perl -S full-rebuild.pl dlpeople

This will run import.pl, buildcol.pl and then remove the old indexes and copy the new ones into the index folder.

Import or buildcol options can be passed to full-rebuild. If the option is shared between import.pl and buildcol.pl then it can appear as is, such as -verbosity 5. This value will be passed to both programs. If an option is specific to one of the programs in particular, then prefix it with 'import:' or 'buildcol:' respectively, as in '-import:OIDtype hash_on_full_filename'

Remember, you can run 'perl -S import.pl' or 'perl -S buildcol.pl' from the command line with no arguments to see the specific options they take.

Summary

In summary then, the commands typed to produce the dlpeople collection are:

To set up the collection:

cd C:\Users\jsmith\Greenstone # assuming default location
setup.bat
perl —S mkcol.pl —creator [email protected] dlpeople
cd %GSDLHOME%\collect\dlpeople
xcopy   /s   d:\collect\dlpeople\*   import # assuming D drive

To build the collection:

perl -S full-rebuild.pl dlpeople

or

perl —S import.pl dlpeople
perl —S buildcol.pl dlpeople
rd /s index           # on Windows NT/2000
deltree /Y index      # on Windows 95/98
ren building index

MacOSX/Linux

Running Greenstone from the command line on MacOSX and Linux is very similar to doing it on a Windows. Some of the commands are just a bit different. Please read through the Windows section for more information about the steps mentioned here.

First change into the directory where Greenstone has been installed. For example, if Greenstone is installed under its default name at the top level of your user account you can move there by typing

cd /home/jsmith/Greenstone


To set up the Greenstone environment:

source ./setup.bash 

If you are unsure of the shell type you are using, enter echo $0 at your command-line prompt —it will print out the sought information. If you are using a different shell contact your system administrator for advice.


To create a collection:

mkcol.pl —creator [email protected] dlpeople


To move to the newly created collection directory:

cd $GSDLHOME/collect/dlpeople


You can list the contents of this directory by typing ls. In the collection's etc directory there is a file called collect.cfg. You can open and edit this using your favorite text editor — emacs is a popular editor on Linux.

To copy the contents of the /home/documents/dldocuments directory into the GSDLHOME/collect/dlpeople/import directory. To do this, type the command

cp —r /home/documents/dldocuments/*   import/


To build the collection in one step:

full-rebuild.pl dlpeople

Or, to build it step by step manually:

To “import” the collection:

import.pl dlpeople


Next, “build” the collection:

buildcol.pl dlpeople


Finally, make the collection “live” by putting all the material that has just been put in the collection's building directory into the index directory. First, remove the old index:

rm —r index/*

(assuming you are in the dlpeople directory)


And move the building directory to index:

mv building/* index/


In summary then, the commands typed to produced the dlpeople collection are:

cd /home/jsmith/Greenstone # assuming default Greenstone in user directory
source ./setup.bash 
mkcol.pl —creator [email protected] dlpeople
cd $GSDLHOME/collect/dlpeople
cp —r /home/documents/dldocuments/*   import/

To build the collection:

full-rebuild.pl dlpeople

or

import.pl dlpeople
buildcol.pl dlpeople
rm -r index/*
mv building/* index

Incremental Building

Incremental building is where you only process the new or changed documents each time you build, thereby speeding up the build process.

Incremental importing: New documents will be imported. Modified documents will be re-imported. Deleted documents will be removed from the collection. If metadata has changed, then documents will be reimported.

Important note for collection design: Greenstone can notice that metadata in a folder has been added/changed, but it is not smart enough to tell which documents in the folder the changed metadata belongs to. Therefore, if metadata in a folder has changed (including new metadata being added), then all documents in that folder will be reimported. This means that if you have all your documents in the top level import folder, adding new metadata or changing any metadata for any document will result in all documents being reimported. If you intend to do incremental import, then please organize your documents into subfolders. That way modifying metadata for some documents won't result in all other documents being reimported.

Incremental indexing: Currently only the Lucene indexer (and Solr indexer included with Greenstone 3) can do incremental indexing. If you are using MG/MGPP then a full buildcol pass will be done, even if incremental-buildcol.pl is used.

If collection design has changed, then you will need to do a full rebuild. Changes to plugin options, and some import options will necessitate a full import. Changes to search indexes, partition indexes, browsing classifiers will necessitate a full buildcol.

If you are doing incremental building, a full rebuild every now and then can be a good idea, in case something hasn't gone quite right in the incremental process. Once we've finished retesting incremental building, this shouldn't be a problem any more. In the meantime, if you notice anything weird after an incremental build, then a full rebuild is a good idea then too.

On the command line, you can run building/importing incrementally by using the scripts incremental-rebuild.pl, incremental-import.pl and incremental-buildcol.pl instead of full-rebuild.pl, import.pl and buildcol.pl, respectively.

Note that running incremental-buildcol.pl when you are not using Lucene for your indexer will be the same as running buildcol.pl. Without any -builddir option, incremental-buildcol.pl will do the indexing into the existing index directory, so you don't need to rename building to index.

Additional Resources

While this page only goes through the basics of building collections, there are many other scripts that can be run from the command line (like downloading documents). You can take a look at the scripts and their options to get an idea of what else is available. </TAB> </TABAREA>

en/user_advanced/command_line_building.1520811667.txt.gz · Last modified: 2018/03/11 23:41 by kjdon