User Tools

Site Tools


old:configuring_your_collection

This page is in the 'old' namespace, and was imported from our previous wiki. We recommend checking for more up-to-date information using the search box.

Configuring Your Collection

Once your files are marked up with metadata, you next decide how the documents should be accessed by the end users. What kind of information is searchable? What ways are provided to browse through the documents? These things can be customized; this section describes how to do it.

The Design View

This section introduces you to the design view and explains how to navigate between the various views within this pane.

With the Librarian Interface, you can configure how the documents are processed, and how the collection is accessed by the user. The configuration options are divided into different sections, each associated with a particular stage of collection customization.

On the left is a list of different views, and on the right are the controls associated with the current one. To change to a different view, click its name in the list.

To understand the stages and terms involved in designing a collection, first read Chapters 1 and 2 of the Greenstone Developer's Guide.

Document Plugins

This section describes how to configure the document Document Plugins the collection uses. It explains how you specify what Document Plugins to use, what parameters to pass to them, and in what order they occur. Under the "Design" tab, click "Document Plugins".

To add a plugin, select it using the "Select plugin to add" pull-down list near the bottom and then click "Add Plugin". A window appears entitled "Configuring Arguments"; it is described later. Once you have configured the new plugin, it is added to the end of the "Assigned Plugins" list. Generally, you would only have one instance of each plugin. However, you can add the same plugin more than once; in that case, the multiple instances would normally be configured differently to provide a useful result (for example by setting the process_exp argument, see http://wiki.greenstone.org/gsdoc/tutorial/en/enhanced_pdf.htm ).

To see a short description of a plugin, select it in the "Select plugin to add" pull-down list, then hover the mouse over it. A tool-tip displaying the description will appear.

To remove a plugin, select it in the list and click "Remove Plugin".

Plugins are configured by providing arguments. To alter them, select the plugin from the list and click "Configure Plugin" (or double-click the plugin). A "Configuring Arguments" dialog appears with various controls for specifying arguments.

There are different kinds of controls. Some are checkboxes, and clicking one adds the appropriate option to the plugin. Others are text strings, with a checkbox and a text field. Click the box to enable the argument, then type appropriate text (regular expression, file path etc) in the box. Others are pull-down menus from which you can select from a given set of values. To learn what an argument does, let the mouse hover over its name for a moment and a description will appear.

When you have changed the configuration, click "OK" to commit the changes and close the dialog, or "Cancel" to close the dialog without changing any plugin arguments.

The Document Plugins in the list are executed in order, and the ordering is sometimes important. Select a plugin in the list and use the glidict::CDM.Move.Move_Up and glidict::CDM.Move.Move_Down buttons to change its place in the list.

Search Indexes

Indexes specify what parts of the collection are searchable. This section explains how to add and remove indexes, and set a default index. Under the glidict::GUI.Design tab, click glidict::CDM.GUI.Indexes.

The top right of the glidict::CDM.GUI.Indexes panel displays information about which indexer is currently being used by the collection. This can be changed by clicking glidict::CDM.BuildTypeManager.Change. A popup window appears with the list of options: MG, MGPP, and Lucene. Changing this affects how the indexes are built, and may affect search functionality.

The glidict::CDM.IndexManager.Indexes list shows what indexes are currently assigned to the collection.

To add an index, click glidict::CDM.IndexManager.New_Index… A popup window appears with a list of sources, which includes text and metadata. Select which sources you want to index. The glidict::CDM.IndexManager.Select_All and glidict::CDM.IndexManager.Select_None buttons will check or uncheck all of the items in the list, respectively. Once a new index has been defined, click glidict::CDM.IndexManager.Add_Index to add it to the collection. glidict::CDM.IndexManager.Add_Index will only become active once the settings describe a new index that is not already assigned to the collection.

For MG indexes, you also need to choose the granularity of the index, using the glidict::CDM.IndexManager.Level menu.

For MGPP and Lucene indexes, index granularity is determined globally, not per index. The possible levels are displayed on the main glidict::CDM.GUI.Indexes pane, and can be added to the collection by ticking the checkboxes.

A special index is available for MGPP and Lucene: an "allfields" index which merely provides combined searching over all specified indexes, without having to specify a separate index that contains all sources. To add this index, check the glidict::CDM.IndexManager.Allfields_Index check box and click glidict::CDM.IndexManager.Add_Index.

For MGPP and Lucene, an glidict::CDM.IndexManager.Add_All button is also provided, as a shortcut to adding all metadata and text sources as individual indexes.

To edit an index, select it and click glidict::CDM.IndexManager.Edit_Index. A similar dialog to the glidict::CDM.IndexManager.New_Index one is shown.

To remove an index, select it from the list of Assigned indexes and click glidict::CDM.IndexManager.Remove_Index.

The order in which the indexes are specified in the Assigned Indexes list is the order they appear in the drop down menu on the search page. Use the glidict::CDM.Move.Move_Up and glidict::CDM.Move.Move_Down buttons to change this ordering.

The one that is selected by default on the search page is called the "default index". This can be set by selecting an index from the list and clicking "Set Default". The default index is tagged with "[Default Index]" in the "Assigned Indexes" list. If no default index is set, the first one in the list will be used as the default.

The names used for the drop-down list of indexes on the search page can be set in the glidict::CDM.GUI.SearchMetadata panel of the glidict::GUI.Format view (see Search).

Search Index Options

There are some additional options controlling how the indexes are built. These may not be available for a particular index, in which case will be greyed out.

Stemming and case-folding may be enabled or disabled for MG and MGPP indexes. If enabled, stemmed and case-folded indexes will be created, and the user will have the option of searching with case folding and stemming on or off. If disabled, searching will be case-sensitive and unstemmed, and the options will not be displayed on the Preferences page of the collection.

Accent-folding is available for MGPP indexes. This works in a similar way to case-folding, but instead of lower and upper case letters matching, letters with diacritics match those without. A Lucene index is always accent-folded; no option to switch this on and off will be displayed to the user on the collection's Preferences page.

Chinese, Japanese and Korean text is often not segmented into individual words. As indexing relies on word breaks being present in the text, this results in an unsearchable index. Setting the glidict::CDM.IndexingManager.Separate_cjk option will add spaces between each Chinese/Japanese/Korean character in the text and in search terms, so that character level searching is carried out.

Partition Indexes

Indexes are built on particular text or metadata sources. The search space can be further controlled by partitioning the indexes, either by language or by a predetermined filter. This section describes how to do this. Under the "Design" tab, click "Partition Indexes".

The "Partition Indexes" view has three tabs; "Define Filters", "Assign Partitions" and "Assign Languages". To learn more about partitions read about sub-collections and sub-indexes in Chapter 2 of the Greenstone Developer's Guide.

Note that for MG collections, the total number of partitions generated is a combination of all indexes, sub-collection filters and languages chosen. Two indexes with two sub-collection filters in two languages would yield eight index partitions. For MGPP, all indexes are created in one physical index, so there would only be four index partitions. For Lucene, the number of physical indexes is determined by the number of levels assigned to the collection, one index per level. So for the above situation, one level would result in four physical indexes, while two levels would result in eight.

Define Filters

Filters allow you to group together into a sub-collection all documents in an index for which a metadata value matches a given pattern.

To create a filter, click the "Define Filters" tab and enter a name for the new filter into the "Subcollection filter name:" field. Next choose a document attribute to match against, either a metadata element or the name of the file in question. Enter a regular expression to use during the matching. You can toggle between "Including" documents that match the filter, or "Excluding" them. Finally, you can specify any of the standard PERL regular expression flags to use when matching (e.g. "i" for case-insensitive matching). Finally, click "Add Filter" to add the filter to the "Defined Subcollection Filters" list.

To remove a filter, select it from the list and click "Remove Filter".

To alter a filter, select it from the list, change any of the values that appear in the editing controls and click "Replace Filter" to commit the changes.

Defining filters does not create sub-collections. Sub-collections are specified in the glidict::CDM.SubcollectionManager.Subindex_Controls, based on the filters you have just defined.

Assign Partitions

Having defined one or more sub-collection filters, use the "Assign Partitions" tab to build indexes for it (or for a group of filters). Select the desired filter or filters from the "Defined Subcollection Filters" list and click "Add Partition". Each specified partition will result in a sub-collection that contains documents that match any of the filters associated with that partition.

To alter a partition, select it from the list, modify the filters, and click glidict::CDM.SubcollectionIndexManager.Replace_Subindex.

To remove a partition, select it from the list and click "Remove Partition".

The order that the partitions are specified in the Assigned Partitions list is the order they appear in the drop down menu on the search page. Use the glidict::CDM.Move.Move_Up and glidict::CDM.Move.Move_Down buttons to change this ordering.

To make a partition the default one, select it from the list and click glidict::CDM.SubcollectionIndexManager.Set_Default_Subindex.

The names used for the drop-down list of partitions on the search page can be set in the glidict::CDM.GUI.SearchMetadata part of the glidict::GUI.Format panel (see Search).

Assign Languages

This section details how to restrict search indexes to particular languages. You do this by generating a partition using the "Assign Languages" tab of the "Partition Indexes" panel.

Language partitions use metadata to determine which documents are in the specified languages and therefore should be included in the partition. Greenstone generates "ex.Language" metadata for most documents, and this is the default metadata to use. However, this can be changed by setting glidict::CDM.LanguageManager.LanguageMetadata to the correct metadata element.

To add a new language partition, select one or more languages from the "Languages to add" list, and click "Add Partition".

To change an existing partition, select it from the "Assigned Language Partitions" list, modify the selected languages in the "Languages to add" list below, and click "Replace Partition".

To remove a language partition, select it from the "Assigned Language Partitions" list and click "Remove Partition".

The order that the language partitions are specified in the Assigned Language Partitions list is the order they appear in the drop down menu on the search page. Use the glidict::CDM.Move.Move_Up and glidict::CDM.Move.Move_Down buttons to change this ordering.

To set the default language partition, select it from the list and click "Set Default".

The names used for the drop-down list of language partitions on the search page can be set in the glidict::CDM.GUI.SearchMetadata part of the glidict::GUI.Format panel (see Search).

Browsing Classifiers

This section explains how to assign "Browsing Classifiers", which are used for browsing, to the collection. Under the "Design" tab, click "Browsing Classifiers".

To add a classifier, select it using the "Select classifier to add" pull-down list near the bottom and then click "Add Classifier…". A window appears entitled "Configuring Arguments"; instructions for this dialog are just the same as for Document Plugins (see Document Plugins). Once you have configured the new classifier, it is added to the end of the "Assigned Classifiers" list.

To see a short description of a classifier, select it in the "Select classifier to add" pull-down list, then hover the mouse over it. A tool-tip displaying the description will appear.

Each classifier has several arguments that can be configured. Important arguments include "metadata", which specifies the metadata that documents will be classified on, and "buttonname", which is the name that will appear in the navigation bar.

To remove a classifier, select it from the list and click "Remove Classifier".

To change the arguments for a classifier, select it from the list and click "Configure Classifier" (or double-click on the classifier in the list).

The ordering of Browsing Classifiers in the collection's navigation bar is reflected in their order here. To change it, select the classifier you want to move and click glidict::CDM.Move.Move_Up or glidict::CDM.Move.Move_Down.

For further information on Browsing Classifiers read Chapter 2, Greenstone Developer's Guide – Getting the most out of your documents.

old/configuring_your_collection.txt · Last modified: 2023/03/13 01:46 by 127.0.0.1