User Tools

Site Tools


en:user_advanced:lucene
no way to compare when less than two revisions

Differences

This shows you the differences between two versions of the page.


en:user_advanced:lucene [2023/03/13 01:46] (current) – created - external edit 127.0.0.1
Line 1: Line 1:
 +
 +
 +
 +====== Indexing Using Lucene ======
 +
 +Lucene can be used as the collection indexer instead of MG/MGPP. You can select Lucene in GLI from the Search part of the Design Pane.
 +
 +See the [[en:user:searching|Searching]] page for more information about the differences between MG, MGPP and Lucene.
 +
 +=== Editing the collection's configuration file===
 +
 +Many of the advanced features for Lucene searching are not yet available in GLI, but require you to edit the collection's configuration file directly. Please make sure that GLI does not have your collection open before modifying the configuration file, otherwise GLI will overwrite your changes when it saves the file. 
 +
 +<tabbox Greenstone3>
 +A collection's configuration file is called 'collectionConfig.xml' and can be found in the collection's etc folder. This can be found at ''<greenstone home folder>/web/sites/localsite/collect/<collname>/etc/collectionConfig.xml''.
 +<tabbox Greenstone2>
 +A collection's configuration file is called 'collect.cfg' and can be found in the collection's etc folder. This can be found at ''<greenstone home folder>/collect/<collname>/etc/collect.cfg''.
 +</tabbox>
 +
 +==== Sorting search results ====
 +
 +Lucene indexes can sort search results based on fields indexed. In 2.85 and 3.05, search result sort options were based on the indexes specified. If a collection had text, Title and Subject indexes, then the search results could be sorted by Title and Subject. Text and allfields indexes were ignored for sorting purposes.
 +
 +For 3.06 and 2.86 and onwards, the list of search sort options is now specified separately to the list of indexes. For example, searching can be offered on Titles and Subjects, with sorting by Date and Author. The user is also now offered the option of ascending/descending sort order.
 +
 +If you are searching at section level, then you may want the sections to inherit document level metadata for sorting purposes. For example, if each document has a Date, and you want to sort search results by Date, then each section needs to be given that date in the index. The build option  ''sections_sort_on_document_metadata'' controls this inheritance. It is just like the indexing option ''sections_index_document_metadata'', but is used with the sort fields.
 +
 +The possible values for this option are
 +|never  | don't include document metadata for the section |
 +|always  | include documet level metadata in the section |
 +|unless_section_metadata_exists | include document metadata only if there is not already a value at the section.|
 +
 +GLI has not been updated yet to offer a graphical interface for search result sorting, so you will need to edit the collection configuration file by hand (see [[#editing_the_collection_s_configuration_file|above]]).
 +
 +
 +<tabbox Greenstone3>
 +
 +Fields to sort search results by are specified using ''<sort>'' elements. They go into the ''<search>'' element, and their format is just like ''<index>'' elements. Values can be any metadata element, or two special values: ''rank'' and ''none''.
 +
 +
 +The following is a sample search section, where the user can search in text, titles, subjects, organisations. Search results can be sorted by rank, date or no sorting. Sections inherit document level metadata for sorting.
 +
 +<code>
 +<buildOption name="sections_sort_on_document_metadata" value="unless_section_metadata_exists"/>
 +<search type="lucene">
 +   <level name="section">
 +      <displayItem lang="en" name="name">chapter</displayItem>
 +   </level>
 +   <level name="document">
 +      <displayItem lang="en" name="name">book</displayItem>
 +   </level>
 +   <defaultLevel name="section"/>
 +   <index name="allfields">
 +      <displayItem lang="en" name="name">all fields</displayItem>
 +   </index>
 +   <index name="text">
 +       <displayItem lang="en" name="name">text</displayItem>
 +   </index>
 +   <index name="dc.Title,Title">
 +      <displayItem lang="en" name="name">titles</displayItem>
 +   </index>
 +   <index name="dc.Subject">
 +      <displayItem lang="en" name="name">subjects</displayItem>
 +   </index>
 +   <index name="dls.Organization">
 +      <displayItem lang="en" name="name">organisations</displayItem>
 +   </index>
 +   <sort name="rank">
 +      <displayItem lang="en" name="name">rank</displayItem>
 +   </sort>
 +   <sort name="dc.Date">
 +      <displayItem lang="en" name="name">date</displayItem>
 +   </sort>
 +   <sort name="none">
 +      <displayItem lang="en" name="name">natural (build) order</displayItem>
 +   </sort>
 +   <searchType name="plain"/>
 +   <searchType name="simpleform"/>
 +   <searchType name="advancedform"/>
 +</search>
 +</code>
 +
 +<tabbox Greenstone2>
 +
 +For greenstone 2, you add a sortfields line to collect.cfg, similar to the indexes line.
 +
 +<code>
 +levels document section
 +indexes dc.Title dc.Subject 
 +sortfields dc.Date 
 +sections_sort_on_document_metadata unless_section_metadata_exists # (a buildcol option) 
 +</code>
 +</tabbox>
  
en/user_advanced/lucene.txt · Last modified: 2023/03/13 01:46 by 127.0.0.1