====== Command Line Building ====== It is possible to create and build collections directly from the command line. This page provides the basic information on building Greenstone collections on the command line. The first section shows how to rebuild a collection that has been created and edited in GLI. GLI doesn't do proper incremental building, so for large collections, it may save time to set up a collection using GLI and build it on the command line. The second part shows how to create, edit and build a collection entirely using the command line. ===== Using GLI to create a collection, then using command line for building ===== If your collection will grow very large, it will save you time to build it using command line building tools. Initially, using GLI, you want to * Create a new collection * Add a few documents and metadata * Configure your collection. What indexes, plugin options, classifiers etc do you need? * Build it in GLI and preview. Do you need to change configuration settings? Once you have the collection set up the way you want, then you can start adding the bulk of your documents. You can do this using GLI. And add metadata using GLI. When its time to build, you can either build in GLI, or on the command line. Command line build is useful if you want to schedule it for building overnight, for example, or if you want to build incrementally. The sections below detail full build, and incremental build. ==== Set up Greenstone environment ==== To begin, you will need to open a terminal window (see [[#opening_a_terminal_on_windows | below]]), and set up the Greenstone environment. In the terminal, change directory to the greenstone top level folder. Run the following command to setup the environment: ^Greenstone version^Windows^Linux/Mac^ |2|setup|source setup.bash| |3|gs3-setup|source gs3-setup.sh| Note, if you close your terminal window and start another one, you will need to invoke the setup command again. ==== Build on the command line ==== Now you can build the collection. The main command for rebuilding a collection is full-rebuild.pl. ^Greenstone version^Windows^Linux/Mac^ |2|perl -S full-rebuild.pl |full-rebuild.pl | |3|perl -S full-rebuild.pl -site localsite |full-rebuild.pl -site localsite | Notes: * replace withe the short collection identifier. This is the name of the collection's folder in the collect folder. You can also see it in GLI's title bar. It will be in brackets after the collection title. Eg "greenstone demo collection (demo)". In this case, the collname is demo. * If you have a custom site for Greenstone 3, replace 'localsite' with your sitename. * There are options for full-rebuild.pl. View the list of options by running [perl -S] full-rebuild.pl -h * For Linux and MacOS, you can leave off the perl -S for all the perl commands on this page. If your Windows environment is set up to associate the Perl application with files ending in ''.pl'', you can also leave off ''perl -S'' for Windows too. Running full-rebuild.pl will reimport and index all the documents. You will need to do this if you have changed plugin options, or other configuration options. If the configuration hasn't changed, and you just want to add new documents or update modified documents, then you should use incremental building. ==== Incremental building ==== Incremental building is where you only process the new or changed documents each time you build, thereby speeding up the build process. New and modified documents will be processed, and deleted documents will be removed from the collection. If metadata has changed, then documents will be reprocessed. Important note for collection design: Greenstone can notice that metadata in a folder has been added/changed, but it is not smart enough to tell which documents in the folder the changed metadata belongs to. Therefore, if metadata in a folder has changed (including new metadata being added), then all documents in that folder will be reimported. This means that if you have all your documents in the top level import folder, adding new metadata or changing any metadata for any document will result in all documents being reimported. If you intend to do incremental import, then please organize your documents into subfolders. That way modifying metadata for some documents won't result in all other documents being reimported. Note 2: An empty metadata file in an import folder (including the top level import folder) will trigger a full reimport of all documents in that folder. This is a bug in Greenstone 2.87, 3.08 and earlier. Empty metadata files will automatically get added by GLI. The solution is to add a piece of metadata to a document using the Enrich panel. Just one will do. The main command for incremental rebuild is incremental-rebuild.pl. You can use this in place of full-rebuild.pl. ^Greenstone version^Windows^Linux/Mac^ |2|perl -S incremental-rebuild.pl |incremental-rebuild.pl | |3|perl -S incremental-rebuild.pl -site localsite |incremental-rebuild.pl -site localsite | Indexer Note: only the Lucene and Solr indexers can do incremental indexing. MG and MGPP cannot. If you do incremental-rebuild with MG or MGPP, indexing will be carried out over the entire collection. So we recommend Lucene or Solr if you will be doing incremental building. ===== Finer control of the build process ===== The build process actually consists of several stages: * **importing** the original documents into greenstone's XML archive format * **building** the collection, which includes **indexing** the archive documents and generating a **database** of metadata and classifier structures * **activating** the collection in the live library (if necessary) These stages can all be run separately. Note, the greenstone environment must be set up in any terminal window before you can run these commands. ==== Importing a collection ==== This is the process of converting the original documents, which might be a mixture of file types, into a standardised XML based format - the Greenstone archive format. Original source documents live in the import folder of a collection, while the archive documents live in the archives folder. The command to import a collection is ''import.pl''. Type ''perl -S import.pl'' at the prompt to get a list of all the options for the import program, or view them [[script_options#import.pl|here]]. ^Greenstone version^Import command^ |2|perl -S import.pl | |3|perl -S import.pl -site localsite | As before, you need to put in your own collection name, and change the site name if you are using a custom greenstone3 site. Don't worry about all the text that scrolls past—it's just reporting the progress of the import. Note that you do not have to be in either the //collect// or //dlpeople// directories when this command is entered; because the Greenstone environment has been set up, the Greenstone software can work out where the necessary files are. === Incremental import === You can run just the import phase incrementally, using ''incremental-import.pl'' in place of ''import.pl''. ==== Building a collection ==== The next phase is to “build” the collection, which creates all the indexes and databases that make the collection work. Type ''perl -S buildcol.pl'' at the command prompt for a list of collection-building options, which are also listed [[script_options#buildcol.pl|here]]. For now, stick to the defaults by typing ^Greenstone version^Build command^ |2|perl -S buildcol.pl | |3|perl -S buildcol.pl -site localsite | Again, don't worry about the “progress report” text that scrolls past. ==== Make the collection live ==== Finally, we need to make the collection "live" by replacing the collection's old ''index'' folder with the contents of the ''building'' folder. And for greenstone 3, we need to reload it in the library. We can do this in two ways: Running activate.pl ^Greenstone version^Activate command^ |2|perl -S activate.pl | |3|perl -S activate.pl -site localsite | Or manually: Delete the index folder, rename building to index, then restart the Greenstone3 server. Note, the collection lives in the following location: ^Greenstone version^Collection location^ |2|path-to-greenstone2/collect/| |3|path-to-greenstone3/web/sites/localsite/collect/| ==== Passing import/buildcol options to rebuild scripts ==== Import or buildcol options can be passed to full-rebuild and incremental-rebuild. If the option is shared between import.pl and buildcol.pl then it can appear as is, such as -verbosity 5. This value will be passed to both programs. If an option is specific to one of the programs in particular, then prefix it with 'import:' or 'buildcol:' respectively, as in '-import:OIDtype hash_on_full_filename'. ===== Creating and Editing a Collection on the command line ===== ==== Create a collection ==== To create the ''skeleton'' of a collection we use mkcol.pl. This creates all the folders the collection needs, and sets up a default configuration. Typing ''perl -S mkcol.pl'' will provide the full list of options, which you can also view [[script_options#mkcol.pl|here]]. To create a new collection: ^Greenstone version^mkcol command^ |2|perl -S mkcol.pl [options] | |3|perl -S mkcol.pl -site localsite [options] | For example, to create a collection named //dlpeople// in ''localsite'' with the creator's email address of //me@cs.waikato.ac.nz//, type ^Greenstone version^mkcol command^ |2|perl -S mkcol.pl -creator me@cs.waikato.ac.nz | |3|perl -S mkcol.pl -site localsite -creator me@cs.waikato.ac.nz | //(Since Greenstone3 allows you to have multiple [[en:user:sites]], you must always specify in which site the collection is in. The default site is called ''localsite''.)// To view the newly created files, move to the newly created collection directory by typing ^Greenstone version^Windows^Linux/Mac^ |2|cd %GSDL3HOME%\collect\dlpeople|cd $GSDL3HOME/collect/dlpeople| |3|cd %GSDL3HOME%\sites\localsite\collect\dlpeople|cd $GSDL3HOME/sites/localsite/collect/dlpeople| You can list the contents of this directory by typing ''dir'' (Windows) or ''ls'' (Linux/Mac). There should be several subdirectories, which differ slightly between Greenstone 2 & 3: * //etc// * //images// * //import// * //macros// (Greenstone2 only) * //script// * //style// ==== Add documents and metadata==== To add documents into the collection, simply copy them into the import folder. YOu can manually add metadata by creating metadata.xml files or adding metadata databases. See [[en:user:metadata]] and [[en:user_advanced:metadata#specifying_filenames_manually_in_metadataxml]] for more details. ==== Edit the Config file ==== In the collection's ''etc'' directory there is a configuration file. This is ''collect.cfg'' for Greenstone 2, ''collectionConfig.xml'' for Greenstone 3. Any modifications that you can make in the GLI, can also be achieved by manually editing this file. Simply open it using your favorite text editor, e.g. Notepad or Wordpad, make changes and save it. You can learn more about the Collection configuration file [[en:user:configuration files#Collection configuration files|here]]. ==== Build the Collection ==== Now you can build the collection using the rebuild commands, or using import/buildcol, as described in the earlier sections. ===== Additional information ===== ==== Opening a terminal on Windows ==== On Windows, there are several different ways to open a DOS terminal (a black console screen known as the DOS Prompt). Do one of the following: * ''Start -> All Programs -> Accessories -> Command Prompt'' * Under the Start menu, type ''cmd'' into the search box and press Enter * Hold down your keyboard's Windows key and press the key for letter r. (The Windows key is located between the Ctrl and Alt keys on your keyboard.) In the Run dialog that appears, type ''cmd'' in the textfield and press the OK button. * In any Windows Explorer, hold down Shift and right click in an empty area in the window. Select ''Open command window here'' from the menu. ===== Additional Resources ===== While this page only goes through the basics of building collections, there are many other scripts that can be run from the command line (like [[en:user_advanced:command_line_download|downloading]] documents). You can take a look at the [[script_options|scripts and their options]] to get an idea of what else is available.