Approvals: 0/1
This is an old revision of the document!
Table of Contents
Command Line Building
It is possible to create and build collections directly from the command line. This page provides the basic information on building Greenstone collections on the command line. The full instructions are provided for Windows users. If you are on a MacOS/Linux, the steps are the same, but some of the commands themselves are slightly different. These differences are listed in the MacOSX/Linux section.
The first part of this page shows how to rebuild a collection that has been created and edited in GLI. GLI doesn't do proper incremental building, so for large collections, it may save time to set up a collection using GLI and build it on the command line.
The second part shows how to create, edit and build a collection entirely using the command line.
Using GLI to create a collection, using command line for building
If your collection will grow very large, it will save you time to build it using command line building tools. Initially, using GLI, you want to
- Create a new collection
- Add a few documents and metadata
- Configure your collection. What indexes, plugin options, classifiers etc do you need?
- Build it in GLI and preview. Do you need to change configuration settings?
Once you have the collection setup the way you want, then you can start adding the bulk of your documents. You can do this using GLI. And add metadata using GLI.
Setup Greenstone environment
To begin, you will need to open a terminal window, and set up the Greenstone environment. In the terminal, change directory to the greenstone top level folder. Run the following command to setup the environment:
Greenstone version | Windows | Linux |
---|---|---|
2 | setup | source setup.bash |
3 | gs3-setup | source gs3-setup.sh |
Note, if you close your terminal window and start another one, you will need to invoke the setup command again.
Build on the command line
Now you can build the collection.
The main command for rebuilding a collection is full-rebuild.pl.
Greenstone version | Windows | Linux |
---|---|---|
2 | perl -S full-rebuild.pl <collname> | full-rebuild.pl <collname> |
3 | perl -S full-rebuild.pl -site localsite <collname> | full-rebuild.pl -site localsite <collname> |
Notes:
- replace <collname> withe the short collection identifier. This is the name of the collection's folder in the collect folder. You can also see it in GLI's title bar. It will be in brackets after the collection title. Eg "greenstone demo collection (demo)". In this case, the collname is demo.
- If you have a custom site for Greenstone 3, replace 'localsite' with your sitename.
- There are options for full-rebuild.pl. View the list of options by running [perl -S] full-rebuild.pl -h
Running full-rebuild.pl will reimport and index all the documents. This is useful to do every so often and especially if you have changed plugin options, or other configuration options. If the configuration hasn't changed, and you just want to add new documents or update modified documents, then you should use incremental building.
Incremental building
Incremental building is where you only process the new or changed documents each time you build, thereby speeding up the build process. New and modified documents will be processed, and deleted documents will be removed from the collection. If metadata has changed, then documents will be reprocessed.
Important note for collection design: Greenstone can notice that metadata in a folder has been added/changed, but it is not smart enough to tell which documents in the folder the changed metadata belongs to. Therefore, if metadata in a folder has changed (including new metadata being added), then all documents in that folder will be reimported. This means that if you have all your documents in the top level import folder, adding new metadata or changing any metadata for any document will result in all documents being reimported. If you intend to do incremental import, then please organize your documents into subfolders. That way modifying metadata for some documents won't result in all other documents being reimported.
The main command for incremental rebuild is incremental-rebuild.pl. You can use this in place of full-rebuild.pl.
Greenstone version | Windows | Linux |
---|---|---|
2 | perl -S incremental-rebuild.pl <collname> | incremental-rebuild.pl <collname> |
3 | perl -S incremental-rebuild.pl -site localsite <collname> | incremental-rebuild.pl -site localsite <collname> |
Indexer Note: only the Lucene and Solr indexers can do incremental indexing. MG and MGPP cannot. If you do incremental-rebuild with MG or MGPP indexing will be carried out over the entire collection. So we recommend Lucene or Solr if you will be doing incremental building.
Finer control of the build process
The build process actually consists of several stages:
- importing the original documents into greenstone's XML archive format
- building the collection: indexing the archive documents and generating a database of metadata and classifier structures
- activating the collection in the live library (if necessary)
These stages can all be run separately. Note, the greenstone environment must be setup in any terminal window before you can run these commands.
Import the collection
Now you are ready to “import” the collection.
This is the process of bringing the documents into the Greenstone system,
standardizing the document format, the way that metadata is specified,
and the file structure in which the documents are stored.
Type perl —S import.pl
at the prompt to get a list of all the options for the import program,
or view them here.
perl —S import.pl -site localsite dlpeople
Don't worry about all the text that scrolls past—it's just reporting
the progress of the import. Note that you do not have to be in either the
collect or dlpeople directories when this command is entered;
because %GSDL3SRCHOME%
is already set, the Greenstone software can work out where
the necessary files are.
Build the collection
The next phase is to “build” the collection,
which creates all the indexes and files that make the collection work.
Type perl —S buildcol.pl
at the command prompt for a list of
collection-building options, which are also listed here.
For now, stick to the defaults by typing
perl —S buildcol.pl -site localsite dlpeople
Again, don't worry about the “progress report” text that scrolls past.
Make the collection live
Finally, we need to make the collection "live" by replacing the collection's old index
folder
with the contents of the building
folder. We can do this in two ways:
In an explorer window (i.e. outside of the terminal) simply select
the contents of the dlpeople collection's building
directory and drag them into the index
directory.
Alternatively, you can remove the index
directory
(and all its contents) by typing the command
rd /s index # on Windows NT/2000 deltree /Y index # on Windows 95/98
and then change the name of the building
directory to index
with
ren building index
It is important that these commands are issued from the correct directory
(unlike the Greenstone commands mkcol.pl
, import.pl
and buildcol.pl
).
If the current working directory is not dlpeople, type
cd %GSDL3HOME%\sites\localsite\collect\dlpeople
before going through the
rd
, ren
and mkdir
sequence above.
If your Greenstone server is already running, you should be able to access the newly built collection from your Greenstone homepage. You will have to reload the page if you already had it open in your browser, or perhaps even close the browser and restart it (to prevent caching problems).
In summary then, the commands typed to produce the dlpeople collection are:
cd C:\Users\jsmith\Greenstone3 # assuming default location gs3-setup perl —S mkcol.pl -site localsite —creator [email protected] dlpeople cd %GSDL3HOME%\sites\localsite\collect\dlpeople xcopy /s C:\Users\jsmith\dldocuments\* import perl —S import.pl -site localsite dlpeople perl —S buildcol.pl -site localsite dlpeople rd /s index # on Windows NT/2000 deltree /Y index # on Windows 95/98 ren building index
Creating and Editing a Collection on the command line
Create a collection
The first program we will look at is the Perl program mkcol.pl
,
whose name stands for “make a collection.” Typing perl —S mkcol.pl
will provide
the full list of options, which you can also view here.
(If your Windows environment is set up to associate the Perl application with
files ending in .pl
, you can leave off the perl -S
for all of these scripts.)
To create a new collection:
perl -S mkcol.pl [options] collection-name
For example, to create a collection named dlpeople in localsite
with the creator's email address of [email protected], type
perl —S mkcol.pl -site localsite —creator [email protected] dlpeople
(Since Greenstone3 allows you to have multiple sites, you must always specify in which site the
collection is in. The default site is called localsite
.)
To view the newly created files, move to the newly created collection directory by typing
cd %GSDL3HOME%\sites\localsite\collect\dlpeople cd $GSDL3HOME/sites/localsite/collect/dlpeople
You can list the contents of this directory by typing dir
.
There should be six subdirectories:
- etc
- images
- import
- script
- style
Add documents
Now we must populate the collection with sample documents. To do this,
we copy documents into the collections import
folder. Assuming your documents are in the folder
C:\Users\jsmith\dldocuments
, you can either:
select the contents of the dldocuments
directory
and drag them into the dlpeople
collection's import
directory.
Or, you can type the command
xcopy /s C:\Users\jsmith\dldocuments\* import
Edit the Config file
In the collection's etc
directory there is a file called collectionConfig.xml
.
Any modifications that you can make in the GLI, can also be achieved by manually editing the
collectionConfig.xml
file. Simply open it using your favorite text editor,
e.g. Notepad or Wordpad, make changes and save it. You can learn more about the Collection configuration file here.
Additional information
Opening a terminal on Windows
On Windows, there are several different ways to open a DOS terminal (a black console screen known as the DOS Prompt). Do one of the following:
Start → All Programs → Accessories → Command Prompt
- Under the Start menu, type
cmd
into the search box and press Enter - Hold down your keyboard's Windows key and press the key for letter r. (The Windows key is located between the Ctrl and Alt keys on your keyboard.) In the Run dialog that appears, type
cmd
in the textfield and press the OK button. - In any Windows Explorer, hold down Shift and right click in an empty area in the window. Select
Open command window here
from the menu.
88888888888888888888888888888
<TABAREA tabs="Greenstone3,Greenstone2"> <TAB>
MacOSX/Linux
To create a collection:
mkcol.pl -site localsite —creator [email protected] dlpeople
To move to the newly created
collection directory:
cd $GSDL3HOME/sites/localsite/collect/dlpeople
You can list the contents of this directory by typing ls
. In the collection's etc directory there is a file called collect.cfg.
You can open and edit this using your favorite text editor — emacs is a popular editor on Linux.
To copy the contents of the /home/documents/dldocuments
directory into the GSDL3HOME/sites/localsite/collect/dlpeople/import
directory. To do this, type the command
cp —r /home/documents/dldocuments/* import/
To “import” the collection:
import.pl -site localsite dlpeople
Next, “build” the collection:
buildcol.pl -site localsite dlpeople
Finally, make the collection “live” by putting all the material that has just been put in the collection's building directory into the index directory. First, remove the old index:
rm —r index/*
(assuming you are in the dlpeople
directory)
And move the building directory to index:
mv building/* index/
In summary then, the commands typed to produced the dlpeople collection are:
cd /home/jsmith/Greenstone3 # assuming default Greenstone in user directory source ./gs3-setup.bash mkcol.pl —creator [email protected] dlpeople cd $GSDL3HOME/collect/dlpeople cp —r /home/documents/dldocuments/* import/ import.pl -site localsite dlpeople buildcol.pl -site localsite dlpeople rm -r index/* mv building/* index
Additional Resources
While this page only goes through the basics of building collections, there are many other scripts that can be run from the command line (like downloading documents). You can take a look at the scripts and their options to get an idea of what else is available. </TAB> <TAB>
Windows
Open a terminal
On Windows, there are several different ways to open a DOS terminal (a black console screen known as the DOS Prompt). Do one of the following:
Start → All Programs → Accessories → Command Prompt
- Under the Start menu, type
cmd
into the search box and press Enter - Hold down your keyboard's Windows key and press the key for letter r. (The Windows key is located between the Ctrl and Alt keys on your keyboard.) In the Run dialog that appears, type
cmd
in the textfield and press the OK button. - In any Windows Explorer, hold down Shift and right click in an empty area in the window. Select
Open command window here
from the menu.
Setup the Environment
In order to build collections in Greenstone (or run any other Greenstone scripts from the command line), you must first setup the terminal's environment for Greenstone. To do this, first change into the directory where Greenstone has been installed. Assuming Greenstone was installed in its default location (and your username is "jsmith"), you can move there by typing:
cd C:\Users\jsmith\Greenstone
Note if the path to your Greenstone installation includes spaces (e.g. Program Files), you must
put quotations around the path. For example: cd "C:\Program Files\Greenstone"
and cd "%GSDLHOME%\collect\dlpeople"
.
Next, at the prompt type:
setup
This batch file (which you can read if you like) tells the system where to look for Greenstone programs.
Note: On Windows 95/98 systems running setup.bat
may fail with an Out of environment space error. If this happens, you should edit your system's config.sys
file (normally found at C:\config.sys
) and add the line shell=C:\command.com /e:4096 /p
(where C:
is your system drive letter) to expand the size of the environment table. You'll need to reboot for this change to take effect, and then repeat the steps above for Greenstone.
If, later on in your interactive session at the DOS prompt,
you wish to return to the top level Greenstone directory you can accomplish this by typing
cd %GSDLHOME%
.
If you close your DOS window and start another one, you will need to invoke setup.bat
again.
Now you are in a position to make, build and rebuild collections.
Create a collection
The first program we will look at is the Perl program mkcol.pl
,
whose name stands for “make a collection.” Typing perl —S mkcol.pl
will provide
the full list of options, which you can also view here.
(If your Windows environment is set up to associate the Perl application with
files ending in .pl
, you can leave off the perl -S
for all of these scripts.)
To create a new collection:
perl -S mkcol.pl [options] collection-name
For example, to create a collection named dlpeople with the creator's email address of [email protected], type
perl —S mkcol.pl —creator [email protected] dlpeople
Please substitute your email address for mine!
To view the newly created files, move to the newly created collection directory by typing
cd %GSDLHOME%\collect\dlpeople
You can list the contents of this directory by typing dir. There should be six subdirectories:
- etc
- images
- import
- macros
- script
- style
Add documents
Now we must populate the collection with sample documents. To do this,
we copy documents into the collections import
folder. Assuming your documents are in the folder
C:\Users\jsmith\dldocuments
, you can either:
select the contents of the dldocuments
directory
and drag them into the dlpeople
collection's import
directory.
Or, you can type the command
xcopy /s C:\Users\jsmith\dldocuments\* import
Edit the Config file
In the collection's etc
directory there is a file called collect.cfg
.
Open it using your favorite text editor, e.g. Notepad or Wordpad. Any modifications that you
can make in the GLI, can also be achieved by manually editing this
collection configuration file. Simply open it using your favorite text editor,
e.g. Notepad or Wordpad, make changes and save it.
You can learn more about the Collection configuration file here.
Build the collection
Building a collection consists of two main stages, importing and building. Importing is the process of bringing the documents into the Greenstone system, standardizing the document format, the way that metadata is specified, and the file structure in which the documents are stored. The building stage generates the indexes, databases and other auxiliary files that are needed to make the collection work in Greenstone.
These processes can be run separately, or, in later Greenstone versions, a single script can be run which invokes both processes (see below).
Importing
Type perl —S import.pl at the prompt to get a list of all the options for the import program, or view them here.
perl —S import.pl dlpeople
Don't worry about all the text that scrolls past—it's just reporting
the progress of the import. Note that you do not have to be in either the
collect or dlpeople directories when this command is entered;
because %GSDLHOME%
is already set, the Greenstone software can work out where
the necessary files are.
Building
Type perl —S buildcol.pl
at the command prompt for a list of
collection-building options, which are also listed here.
For now, stick to the defaults by typing:
perl —S buildcol.pl dlpeople
Again, don't worry about the “progress report” text that scrolls past.
Make the collection live
Finally, we need to make the collection "live" by replacing the collection's old index
folder
with the contents of the building
folder. We can do this in two ways:
In an explorer window (i.e. outside of the terminal) simply select
the contents of the dlpeople collection's building
directory and drag them into the index
directory.
Alternatively, you can remove the index
directory
(and all its contents) by typing the command
rd /s index # on Windows NT/2000 deltree /Y index # on Windows 95/98
and then change the name of the building
directory to index
with
ren building index
It is important that these commands are issued from the correct directory
(unlike the Greenstone commands mkcol.pl
, import.pl
and buildcol.pl
).
If the current working directory is not dlpeople, type
cd %GSDLHOME%\collect\dlpeople
before going through the
rd
, ren
and mkdir
sequence above.
If your Greenstone server is already running, you should be able to access the newly built collection from your Greenstone homepage. You will have to reload the page if you already had it open in your browser, or perhaps even close the browser and restart it (to prevent caching problems). Alternatively, if you are using the “local library” version of Greenstone you will have to restart the library program.
Build the collection in one easy step
An alternative to running import, then build, then deleting the old index and renaming building to index, is to run a single command, full-rebuild.pl.
perl -S full-rebuild.pl dlpeople
This will run import.pl, buildcol.pl and then remove the old indexes and copy the new ones into the index folder.
Import or buildcol options can be passed to full-rebuild. If the option is shared between import.pl and buildcol.pl then it can appear as is, such as -verbosity 5. This value will be passed to both programs. If an option is specific to one of the programs in particular, then prefix it with 'import:' or 'buildcol:' respectively, as in '-import:OIDtype hash_on_full_filename'
Remember, you can run 'perl -S import.pl' or 'perl -S buildcol.pl' from the command line with no arguments to see the specific options they take.
Summary
In summary then, the commands typed to produce the dlpeople collection are:
To set up the collection:
cd C:\Users\jsmith\Greenstone # assuming default location setup.bat perl —S mkcol.pl —creator [email protected] dlpeople cd %GSDLHOME%\collect\dlpeople xcopy /s d:\collect\dlpeople\* import # assuming D drive
To build the collection:
perl -S full-rebuild.pl dlpeople
or
perl —S import.pl dlpeople perl —S buildcol.pl dlpeople rd /s index # on Windows NT/2000 deltree /Y index # on Windows 95/98 ren building index
MacOSX/Linux
Running Greenstone from the command line on MacOSX and Linux is very similar to doing it on a Windows. Some of the commands are just a bit different. Please read through the Windows section for more information about the steps mentioned here.
First change into the directory where Greenstone has been installed. For example, if Greenstone is installed under its default name at the top level of your user account you can move there by typing
cd /home/jsmith/Greenstone
To set up the Greenstone environment:
source ./setup.bash
If you are unsure of the shell type you are using, enter echo $0
at your
command-line prompt —it will print out the sought information.
If you are using a different shell contact your system administrator for advice.
To create a collection:
mkcol.pl —creator [email protected] dlpeople
To move to the newly created
collection directory:
cd $GSDLHOME/collect/dlpeople
You can list the contents of this directory by typing ls
. In the collection's etc directory there is a file called collect.cfg.
You can open and edit this using your favorite text editor — emacs is a popular editor on Linux.
To copy the contents of the /home/documents/dldocuments
directory into the GSDLHOME/collect/dlpeople/import
directory. To do this, type the command
cp —r /home/documents/dldocuments/* import/
To build the collection in one step:
full-rebuild.pl dlpeople
Or, to build it step by step manually:
To “import” the collection:
import.pl dlpeople
Next, “build” the collection:
buildcol.pl dlpeople
Finally, make the collection “live” by putting all the material that has just been put in the collection's building directory into the index directory. First, remove the old index:
rm —r index/*
(assuming you are in the dlpeople
directory)
And move the building directory to index:
mv building/* index/
In summary then, the commands typed to produced the dlpeople collection are:
cd /home/jsmith/Greenstone # assuming default Greenstone in user directory source ./setup.bash mkcol.pl —creator [email protected] dlpeople cd $GSDLHOME/collect/dlpeople cp —r /home/documents/dldocuments/* import/
To build the collection:
full-rebuild.pl dlpeople
or
import.pl dlpeople buildcol.pl dlpeople rm -r index/* mv building/* index
Incremental Building
Incremental building is where you only process the new or changed documents each time you build, thereby speeding up the build process.
Incremental importing: New documents will be imported. Modified documents will be re-imported. Deleted documents will be removed from the collection. If metadata has changed, then documents will be reimported.
Important note for collection design: Greenstone can notice that metadata in a folder has been added/changed, but it is not smart enough to tell which documents in the folder the changed metadata belongs to. Therefore, if metadata in a folder has changed (including new metadata being added), then all documents in that folder will be reimported. This means that if you have all your documents in the top level import folder, adding new metadata or changing any metadata for any document will result in all documents being reimported. If you intend to do incremental import, then please organize your documents into subfolders. That way modifying metadata for some documents won't result in all other documents being reimported.
Incremental indexing: Currently only the Lucene indexer (and Solr indexer included with Greenstone 3) can do incremental indexing. If you are using MG/MGPP then a full buildcol pass will be done, even if incremental-buildcol.pl is used.
If collection design has changed, then you will need to do a full rebuild. Changes to plugin options, and some import options will necessitate a full import. Changes to search indexes, partition indexes, browsing classifiers will necessitate a full buildcol.
If you are doing incremental building, a full rebuild every now and then can be a good idea, in case something hasn't gone quite right in the incremental process. Once we've finished retesting incremental building, this shouldn't be a problem any more. In the meantime, if you notice anything weird after an incremental build, then a full rebuild is a good idea then too.
On the command line, you can run building/importing incrementally by using the scripts incremental-rebuild.pl, incremental-import.pl and incremental-buildcol.pl instead of full-rebuild.pl, import.pl and buildcol.pl, respectively.
Note that running incremental-buildcol.pl when you are not using Lucene for your indexer will be the same as running buildcol.pl. Without any -builddir option, incremental-buildcol.pl will do the indexing into the existing index directory, so you don't need to rename building to index.
Additional Resources
While this page only goes through the basics of building collections, there are many other scripts that can be run from the command line (like downloading documents). You can take a look at the scripts and their options to get an idea of what else is available. </TAB> </TABAREA>