User Tools

Site Tools


en:user_advanced:lucene

Indexing Using Lucene

Lucene can be used as the collection indexer instead of MG/MGPP. You can select Lucene in GLI from the Search part of the Design Pane.

See the Searching page for more information about the differences between MG, MGPP and Lucene.

Editing the collection's configuration file

Many of the advanced features for Lucene searching are not yet available in GLI, but require you to edit the collection's configuration file directly. Please make sure that GLI does not have your collection open before modifying the configuration file, otherwise GLI will overwrite your changes when it saves the file.

Greenstone3

A collection's configuration file is called 'collectionConfig.xml' and can be found in the collection's etc folder. This can be found at <greenstone home folder>/web/sites/localsite/collect/<collname>/etc/collectionConfig.xml.

Greenstone2

A collection's configuration file is called 'collect.cfg' and can be found in the collection's etc folder. This can be found at <greenstone home folder>/collect/<collname>/etc/collect.cfg.

Sorting search results

Lucene indexes can sort search results based on fields indexed. In 2.85 and 3.05, search result sort options were based on the indexes specified. If a collection had text, Title and Subject indexes, then the search results could be sorted by Title and Subject. Text and allfields indexes were ignored for sorting purposes.

For 3.06 and 2.86 and onwards, the list of search sort options is now specified separately to the list of indexes. For example, searching can be offered on Titles and Subjects, with sorting by Date and Author. The user is also now offered the option of ascending/descending sort order.

If you are searching at section level, then you may want the sections to inherit document level metadata for sorting purposes. For example, if each document has a Date, and you want to sort search results by Date, then each section needs to be given that date in the index. The build option sections_sort_on_document_metadata controls this inheritance. It is just like the indexing option sections_index_document_metadata, but is used with the sort fields.

The possible values for this option are

never don't include document metadata for the section
always include documet level metadata in the section
unless_section_metadata_exists include document metadata only if there is not already a value at the section.

GLI has not been updated yet to offer a graphical interface for search result sorting, so you will need to edit the collection configuration file by hand (see above).

Greenstone3

Fields to sort search results by are specified using <sort> elements. They go into the <search> element, and their format is just like <index> elements. Values can be any metadata element, or two special values: rank and none.

The following is a sample search section, where the user can search in text, titles, subjects, organisations. Search results can be sorted by rank, date or no sorting. Sections inherit document level metadata for sorting.

<buildOption name="sections_sort_on_document_metadata" value="unless_section_metadata_exists"/>
<search type="lucene">
   <level name="section">
      <displayItem lang="en" name="name">chapter</displayItem>
   </level>
   <level name="document">
      <displayItem lang="en" name="name">book</displayItem>
   </level>
   <defaultLevel name="section"/>
   <index name="allfields">
      <displayItem lang="en" name="name">all fields</displayItem>
   </index>
   <index name="text">
       <displayItem lang="en" name="name">text</displayItem>
   </index>
   <index name="dc.Title,Title">
      <displayItem lang="en" name="name">titles</displayItem>
   </index>
   <index name="dc.Subject">
      <displayItem lang="en" name="name">subjects</displayItem>
   </index>
   <index name="dls.Organization">
      <displayItem lang="en" name="name">organisations</displayItem>
   </index>
   <sort name="rank">
      <displayItem lang="en" name="name">rank</displayItem>
   </sort>
   <sort name="dc.Date">
      <displayItem lang="en" name="name">date</displayItem>
   </sort>
   <sort name="none">
      <displayItem lang="en" name="name">natural (build) order</displayItem>
   </sort>
   <searchType name="plain"/>
   <searchType name="simpleform"/>
   <searchType name="advancedform"/>
</search>

Greenstone2

For greenstone 2, you add a sortfields line to collect.cfg, similar to the indexes line.

levels document section
indexes dc.Title dc.Subject 
sortfields dc.Date 
sections_sort_on_document_metadata unless_section_metadata_exists # (a buildcol option) 
en/user_advanced/lucene.txt · Last modified: 2023/03/13 01:46 by 127.0.0.1