User Tools

Site Tools


en:user:searching

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:user:searching [2018/07/31 01:36] – [Cross-collection searching] kjdonen:user:searching [2023/03/13 01:46] (current) – external edit 127.0.0.1
Line 1: Line 1:
 +
 +
 +
 ====== Searching ====== ====== Searching ======
  
 In Greenstone, you can dictate how users will be able to search each  In Greenstone, you can dictate how users will be able to search each 
-collection. You can select one of three indexers to determine how documents are indexed, and +collection. You can select defferent indexers to determine how documents are indexed, and 
 you can create indexes based on any number of metadata fields and the text of the documents. you can create indexes based on any number of metadata fields and the text of the documents.
  
Line 17: Line 20:
  
 These indexers are available: These indexers are available:
-  * **[[http://www.nzdl.org/html/mg.html|MG]]**: MG the original indexer used by Greenstone, developed mainly by Alistair Moffat and described in the classic book [[http://www.cs.mu.oz.au/mg/|Managing Gigabytes]]. It does section level indexing, and searches can be boolean or ranked (not both at once). For each index specified in the collection, a separate physical index is created. For phrase searching, Greenstone does an "AND" search on all the terms, then scans the resulting hits to see if the phrase is present. It has been extensively tested on very large collections (many GB of text).  +  * **MG**: MG the original indexer used by Greenstone, developed mainly by Alistair Moffat and described in the classic book [[http://www.cs.mu.oz.au/mg/|Managing Gigabytes]]. It does section level indexing, and searches can be boolean or ranked (not both at once). For each index specified in the collection, a separate physical index is created. For phrase searching, Greenstone does an "AND" search on all the terms, then scans the resulting hits to see if the phrase is present. It has been extensively tested on very large collections (many GB of text).  
-  * **[[http://files.greenstone.org/technical/mgpp_user.pdf|MGPP]]**: MGPP (MG plus plus), the new version of MG, was developed by the New Zealand Digital Library Project. It does word level indexing, which allows fielded, phrase and proximity searching to be handled by the indexer. Boolean searches can be ranked. Only a single index is created for a Greenstone collection: document/section levels and text/metadata fields are all handled by the one index. For collections with many indexes, this results in a smaller collection size than using MG. For large collections, searching may be a bit slower due to the index being word level rather than section level.  +     * [[http://www.nzdl.org/html/mg.html|MG website]] 
-  * **[[http://lucene.apache.org/core/|Lucene]]**: Lucene was developed by the Apache Software Foundation. It handles field and proximity searching, but only at a single level (e.g. complete documents or individual sections, but not both). Therefore document and section indexes for a collection require two separate indexes. It provides a similar range of search functionality to MGPP with the addition of single-character wildcards and range searching. It was added to Greenstone to facilitate [[incremental building | incremental collection building]], which MG and MGPP can't provide. +     * [[nzdl:mg | More info about MG ]] 
 +  * **MGPP**: MGPP (MG plus plus), the new version of MG, was developed by the New Zealand Digital Library Project. It does word level indexing, which allows fielded, phrase and proximity searching to be handled by the indexer. Boolean searches can be ranked. Only a single index is created for a Greenstone collection: document/section levels and text/metadata fields are all handled by the one index. For collections with many indexes, this results in a smaller collection size than using MG. For large collections, searching may be a bit slower due to the index being word level rather than section level.  
 +      * [[http://files.greenstone.org/technical/mgpp_user.pdf|MGPP user guide]] 
 +  * **Lucene**: Lucene was developed by the Apache Software Foundation. It handles field and proximity searching, but only at a single level (e.g. complete documents or individual sections, but not both). Therefore document and section indexes for a collection require two separate indexes. It provides a similar range of search functionality to MGPP with the addition of single-character wildcards and range searching. It was added to Greenstone to facilitate [[incremental building | incremental collection building]], which MG and MGPP can't provide.  
 +    * [[  |lucene web site]] 
 +    * [[en:user_advanced:lucene| More info about Lucene]]
  
 +  * **SOLR**: Available in Greenstone3.
 +      * [[en:user_advanced:solr| More info about SOLR ]]
 +    * 
 Changing the indexer affects how the indexes are built, and may affect search functionality.  Changing the indexer affects how the indexes are built, and may affect search functionality. 
 The following table compares the indexers' features, which are explained in the sections below. The following table compares the indexers' features, which are explained in the sections below.
Line 115: Line 126:
  
 ===== Partition Indexes ===== ===== Partition Indexes =====
-<!-- id:144 --> Indexes are built on particular text or metadata sources.  The search space can be further controlled by partitioning the indexes, either by language or by a predetermined filter.  Partition indexes can be set in the "Partition Indexes" section under the Design panel. Partition indexes are a way to create a "sub-collection" within the collection for search purposes. + Indexes are built on particular text or metadata sources.  The search space can be further controlled by partitioning the indexes, either by language or by a predetermined filter.  Partition indexes can be set in the "Partition Indexes" section under the Design panel. Partition indexes are a way to create a "sub-collection" within the collection for search purposes. 
  
-<!-- id:145 --> The "Partition Indexes" view has three tabs; + The "Partition Indexes" view has three tabs; 
 "Define Filters", "Assign Partitions" and "Assign Languages".   "Define Filters", "Assign Partitions" and "Assign Languages".  
 For more on how to create partitions, visit the [[en:gli:design_panel#partition_indexes|partition indexes]] page.  For more on how to create partitions, visit the [[en:gli:design_panel#partition_indexes|partition indexes]] page. 
Line 123: Line 134:
  
 ===== Searching a collection ===== ===== Searching a collection =====
-<TABAREA tabs="Greenstone3,Greenstone2"> +<tabbox Greenstone3>
-<TAB>+
 Greenstone3 offers three different search page options, offering increasing  Greenstone3 offers three different search page options, offering increasing 
 levels of granularity: levels of granularity:
Line 130: Line 140:
   * **Form Search** presents all search options available for the collection, and multiple text boxes for query words/phrases, so you are able to look for different words/phrases in different metadata fields.   * **Form Search** presents all search options available for the collection, and multiple text boxes for query words/phrases, so you are able to look for different words/phrases in different metadata fields.
   * **Advanced Search** provides multiple text boxes for query words/phrases, and all available search options can be set individually for each box.   * **Advanced Search** provides multiple text boxes for query words/phrases, and all available search options can be set individually for each box.
-</TAB> +<tabbox Greenstone2>
-<TAB>+
 {{  :en:search-prefs-gs2.png?direct&500|}} {{  :en:search-prefs-gs2.png?direct&500|}}
 Searching can be performed from the about page (depending on the search preference settings) and the search page of a collection. Searching can be performed from the about page (depending on the search preference settings) and the search page of a collection.
Line 143: Line 152:
  
  
-</TAB> +</tabbox>
-</TABAREA>+
  
  
Line 206: Line 214:
 In Greenstone 3, if you use SOLR as your search indexer, you can have faceted searching. This means you can filter search results based on other metadata. The facet options need to be set up manually in the collectionConfig.xml file as GLI does not allow you to enter them manually yet. In Greenstone 3, if you use SOLR as your search indexer, you can have faceted searching. This means you can filter search results based on other metadata. The facet options need to be set up manually in the collectionConfig.xml file as GLI does not allow you to enter them manually yet.
  
-Add <facet> elements into the <search> element, in a simlar fashion to the index elements:+Add <facet> elements into the <search> element, in a similar fashion to the index elements:
  
 <code> <code>
Line 220: Line 228:
 </code> </code>
  
-===== <!-- id:380 -->Cross-collection searching =====+===== Cross-collection searching =====
  
-<!-- id:381 -->Greenstone has a facility for “cross-collection searching,” which allows several collections to be searched at once, with the results combined behind the scenes as though you were searching a single unified collection. Any subset of the collections can be searched.+Greenstone has a facility for “cross-collection searching,” which allows several collections to be searched at once, with the results combined behind the scenes as though you were searching a single unified collection. Any subset of the collections can be searched.
  
-<TABAREA tabs="Greenstone3,Greenstone2"> +<tabbox Greenstone3>Cross-collection searching is enabled by default in Greenstone3. The search box on the home page
-<TAB>Cross-collection searching is enabled by default in Greenstone3. The search box on the home page+
 of your library allows you to search all collections at once. Collections do not have to be built with of your library allows you to search all collections at once. Collections do not have to be built with
 the same indexer; however, if collections //are// built with different indexers, results may not be ranked correctly, the same indexer; however, if collections //are// built with different indexers, results may not be ranked correctly,
Line 235: Line 242:
     <serviceRack name="CrossCollectionSearch"/>     <serviceRack name="CrossCollectionSearch"/>
 </code> </code>
-</TAB> +<tabbox Greenstone2>
-<TAB>+
 Cross-collection searching is done by specifying a list of other collections  Cross-collection searching is done by specifying a list of other collections 
 to be searched along with the current one.  to be searched along with the current one. 
Line 251: Line 257:
 The Preferences page allows you to choose which collections are included in the searches. The Preferences page allows you to choose which collections are included in the searches.
  
-<!-- id:382 -->Cross-collection searching is enabled by a line in the collection Configuration file:+Cross-collection searching is enabled by a line in the collection Configuration file:
  
 <code> <code>
Line 257: Line 263:
 </code> </code>
  
-<!-- id:383 -->where the collections involved are called //col_1//, //col_2//, … The same line should appear in the configuration file of every collection that is involved.+where the collections involved are called //col_1//, //col_2//, … The same line should appear in the configuration file of every collection that is involved.
  
-</TAB> +</tabbox>
-</TABAREA>+
  
 ===== Formatting cross-collection search results ===== ===== Formatting cross-collection search results =====
  
-<TABAREA tabs="Greenstone3,Greenstone2"> +<tabbox Greenstone3>
-<TAB>+
 In greenstone3, the Cross Collection Search service has its own format statement. It is currently not available for editing via GLI, you will need to edit it by hand. In greenstone3, the Cross Collection Search service has its own format statement. It is currently not available for editing via GLI, you will need to edit it by hand.
  
Line 283: Line 287:
 </code> </code>
  
-You can modify the format statement in almost the same way you do any format statement. You can use <gsf:metadata> like normal. The main difference is in how links (to documents and source files) are constructed. Currently <gsf:link> won't give the right result for a cross collection search result. You need to construct the link by hand.+You can modify the format statement in almost the same way you do any format statement. You can use //<gsf:metadata>// like normal. The main difference is in how links (to documents and source files) are constructed. Currently //<gsf:link>// won't give the right result for a cross collection search result. You need to construct the link by hand.
  
 Links to the greenstone version of the document look like: Links to the greenstone version of the document look like:
Line 323: Line 327:
 </code> </code>
 This will output thumbicon metadata if it exists, otherwise will output srcicon metadata. This will output thumbicon metadata if it exists, otherwise will output srcicon metadata.
-</TAB> +<tabbox Greenstone2>
-<TAB>+
 In greenstone2, search results are formatted according to the format statement of the collection the result comes from. (I think! Is this true??) In greenstone2, search results are formatted according to the format statement of the collection the result comes from. (I think! Is this true??)
-</TAB> +</tabbox>
-</TABAREA>+
 ===== SQL Search forms ===== ===== SQL Search forms =====
 There are 2 SQL search forms: simple and advanced.  There are 2 SQL search forms: simple and advanced. 
Line 347: Line 349:
  
  
-<TABAREA tabs="Greenstone3,Greenstone2"> +<tabbox Greenstone3>
-<TAB>+
 The [[http://wiki.greenstone.org/wiki/gsdoc/tutorial/gs3-current/en/indexers.htm|tutorial on indexers]] demonstrates some of the differences between the MGPP and Lucene indexers. The [[http://wiki.greenstone.org/wiki/gsdoc/tutorial/gs3-current/en/indexers.htm|tutorial on indexers]] demonstrates some of the differences between the MGPP and Lucene indexers.
-</TAB> +<tabbox Greenstone2>
-<TAB>+
 The [[http://wiki.greenstone.org/wiki/gsdoc/tutorial/gs2-current/en/indexers.htm|tutorial on indexers]] demonstrates some of the differences between the MGPP and Lucene indexers. The [[http://wiki.greenstone.org/wiki/gsdoc/tutorial/gs2-current/en/indexers.htm|tutorial on indexers]] demonstrates some of the differences between the MGPP and Lucene indexers.
-</TAB> +</tabbox>
-</TABAREA>+
en/user/searching.1533001006.txt.gz · Last modified: 2018/07/31 01:36 by kjdon