User Tools

Site Tools


List of Greenstone 2 collect.cfg file options

This page gives options available to the Greenstone 2 collection configuration file collect.cfg. This file is generated by GLI, however, if you are doing command line building you may want to edit it manually. Also, a few options here are not available through GLI.

Options only available in the configuration file

If marked as multiple, this means that you can have more than one specification of that option.

creatorsingle email address Email address of the collection creator.
maintainermultiple email addresses Email address(es) of collection maintainer(s).
publictrue/false If true, the collection will be displayed on the home page, not otherwise. Even if false, the collection will still be available to anyone provided they know the URL to it.
pluginplugin-name [plugin-options]MA list of plugins (and their options) to use for the collection. This determines which kinds of documents can be included in the collection. See here for more details.
buildtypemg/mgpp/lucene Determines which indexer should be used for the collection.
indexeslist of indexes A list of indexes that should be built. For MG collections, the format of each index is level:fields, where level is one of 'document', 'section', 'paragraph', and fields is a comma separated list of 'text' or metadata names. For example, document:text,Title. For MGPP/Lucene collections, each index is a comma-separated list of 'text' or metadata names. For example, text dc.Subject dc.Title,Title. The order specifies the order they will be displayed in the drop-down box on the interface.
defaultindexone of the indexes The index that will be selected by default on the search page.
levelsone or more of 'document', 'section', 'paragraph' Only for MGPP/Lucene collections, specifies which levels to index at. The order specifies the order they will be displayed in the drop-down box on the interface.
subcollectionid patternMSubcollection definitions. The id is a name that can be used in the indexsubcollections option. The pattern is like "[!]field/expression/[i]", where ! and i are optional. Field can be 'filename' or any metadata name - this is the metadata that will be tested. The expression is a (perl) regular expression that defines the matching pattern. Documents whose 'field' metadata matches 'expression' will be included in this subcollection. A ! in front negates it, so only documents that don't match will be included. i specifies that the match should be case insensitive.
indexsubcollectionslist of subcollection ids A list of subcollections to index, where each entry is a comma separated list of subcollection identifiers. The order specifies the order they will be displayed in the drop-down box on the interface.
languageslist of language identifiers Like subcollections, but based on the language of the documents. For example, "languages en fr en,fr" will provide subindexes for english documents, french documents, and both together. The order specifies the order they will be displayed in the drop-down box on the interface.
language_metadataa single metadata name The metadata element to use to determine the language of each document. default is ex.Language.
classifyclassifier-name classifier-optionsMSpecifications for classifiers for browsing.
formatoption-name option-valueMFormatting options for the collection. See here for more details.
collectionmetakey [l=xx] valueMSpecifies language specific strings for some components of the interface. See here for more details.
supercollectiona space separated list of collection names A list of collections that should be searched together (cross collection searching). The user will be given this list on the preferences page and can change which collections are included in a search
supercollectionoptionsuniform_search_results_formatting By default, individual search results when cross-collection searching are formatted according to the collection the each result came from. Setting this option will make all search results use the format statement of the collection the user is currently in.
maxnumericinteger The maximum nuber of digits a 'word' can have in the index dictionary. Default is 4. This means that large numbers will be split into several words for indexing. For example, if maxnumeric is 4, "1342663" will be split into "1342" and "663".
mirror"interval N" Used by to specify that the collection is mirrored, and what interval the update should be done at (number of days). Requires some wget/w3mirror config files to be in the etc directory of the collection.
acquire"OAI [-getdoc] -src <url-to-oai-repository>"MSpecifies the repository(ies) to download records from. Currently only OAI protocol is supported. If -getdoc is specified, download the document too. Otherwise only the metadata will be downloaded.

Options also available as options to and

Some options can be specified on the command line to and/or In general, the syntax is the same for both cases, except for on/off options: In the config file, they must have a value (true), which in the command line they are just flags (-optionname), where setting the flag makes it true, and not setting it makes it false.

archivedirfull path to a directoryProduce the archives in this directory instead of the default gsdl/collect/<collection-name>/archives
maxdocsintegerMaximum number of documents to import/build.
verbosity0-5Indicates the level of output desired. The higher the number, the more verbose the output.
debugtrueRun import/build in debug mode

Options also available as options to

importdirfull path to directoryUse a different import directory instead of the default gsdl/collect/<collection-name>/import
removeoldtrueRemove the current contents of the archives directory
keepoldfalseDon't keep the current contents of the archives directory.
gziptrueUse gzip to compress archive files. Then ZIPPlug will need to be added to the plugin list to enable building from compressed documents.
OIDtypehash/incremental/assigned/dirnameUse this type of identifier generation scheme (default hash).
groupsizeintegerGroup this many documents into a simgle archive file. Useful for bibliographic collections where there are many very small documents.
sortmetametadata nameSort documents by this metadata for building. Search results for boolean queries will be displayed in this order.
saveasMETS/GAGenerate the archives in this format (default GA).
separate_cjktrueInsert spaces between Chinese/Japanese/Korean characters to make each character a word. (These languages don't have spaces and so entire sentences can end up as 'words' in the index.)

Options also available as options to

builddirfull path to directoryProduce the indexes in this directory instead of the default gsdl/collect/<collection-name>/building
cachedirfull path to directory??
keepoldtrueKeep the contents of the old building directory (useful when used with the mode option).
textcompresscomma separated list of 'text' and/or metadata namesUse the specified fields in the compressed text (default text). For MGPP collections only.
no_texttrueDon't store any compressed text
no_strip_htmltrueDon't strip HTML tags from indexed text (MGPP/Lucene collections only)
remove_empty_classificationstrueRemove empty classifiers and empty nodes from other classifiers.
modeall/compress_text/build_index/infodbCarry out only a certain part of the build process (default all).
create_imagestrueAttempt to create collection images. Relies on Gimp and Perl Gimp support being available.
dontbuildlist of indexesDon't build the specified indexes (instead of building all specified in indexes)
indexone index nameOnly build this one index (instead of building all specified in indexes)
dontgdbmlist of metadata fieldsDon't store the specified metadata fields in the GDBM database
sections_index_document_metadatanever/always/unless_section_metadata_existsIndex document level metadata in each section

Collectionmeta options

There are some standard collectionmeta options:

collectionnameThe full name of the collection
collectionextraA short description of the collection
iconcollectionThe icon to be used on the collection home page
iconcollectionsmallThe icon to be used on the library home page. iconcollection will be used if this not specified.

Other collection metadata is based on indexes, subcollections and languages. The keys must match the index names, preceded by a '.' dot. Here are some examples:

  • indexes document:text,Title section:text
    • collectionmeta .document:text,Title [l=en] "text and titles"
    • collectionmeta .section:text [l=en] "section text"
  • indexes text Title Subject (MGPP indexes)
    • collectionmeta .text [l=en] "full text"
    • collectionmeta .Title [l=en] "titles"
    • collectionmeta .Subject [l=en] "subjects"
  • levels document section
    • collectionmeta .document [l=en] "document"
    • collectionmeta .section [l=en] "chapter"
  • languages en fr es en,fr,es
    • collectionmeta .en [l=en] "english"
    • collectionmeta .fr [l=en] "french"
    • collectionmeta .es [l=en] "spanish"
    • collectionmeta .en,fr,es [l=en] "all"
en/user/list_of_configuration_file_options.txt · Last modified: 2023/03/13 01:46 by