User Tools

Site Tools


en:user:searching

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
en:user:searching [2017/04/09 21:55] – [Formatting cross-collection search results] kjdonen:user:searching [2020/06/15 01:49] – [Formatting cross-collection search results] kjdon
Line 2: Line 2:
  
 In Greenstone, you can dictate how users will be able to search each  In Greenstone, you can dictate how users will be able to search each 
-collection. You can select one of three indexers to determine how documents are indexed, and +collection. You can select defferent indexers to determine how documents are indexed, and 
 you can create indexes based on any number of metadata fields and the text of the documents. you can create indexes based on any number of metadata fields and the text of the documents.
  
Line 17: Line 17:
  
 These indexers are available: These indexers are available:
-  * **[[http://www.nzdl.org/html/mg.html|MG]]**: MG the original indexer used by Greenstone, developed mainly by Alistair Moffat and described in the classic book [[http://www.cs.mu.oz.au/mg/|Managing Gigabytes]]. It does section level indexing, and searches can be boolean or ranked (not both at once). For each index specified in the collection, a separate physical index is created. For phrase searching, Greenstone does an "AND" search on all the terms, then scans the resulting hits to see if the phrase is present. It has been extensively tested on very large collections (many GB of text).  +  * **MG**: MG the original indexer used by Greenstone, developed mainly by Alistair Moffat and described in the classic book [[http://www.cs.mu.oz.au/mg/|Managing Gigabytes]]. It does section level indexing, and searches can be boolean or ranked (not both at once). For each index specified in the collection, a separate physical index is created. For phrase searching, Greenstone does an "AND" search on all the terms, then scans the resulting hits to see if the phrase is present. It has been extensively tested on very large collections (many GB of text).  
-  * **[[http://www.greenstone.org/docs/mgpp_user.pdf|MGPP]]**: MGPP (MG plus plus), the new version of MG, was developed by the New Zealand Digital Library Project. It does word level indexing, which allows fielded, phrase and proximity searching to be handled by the indexer. Boolean searches can be ranked. Only a single index is created for a Greenstone collection: document/section levels and text/metadata fields are all handled by the one index. For collections with many indexes, this results in a smaller collection size than using MG. For large collections, searching may be a bit slower due to the index being word level rather than section level.  +     * [[http://www.nzdl.org/html/mg.html|MG website]] 
-  * **[[http://lucene.apache.org/core/|Lucene]]**: Lucene was developed by the Apache Software Foundation. It handles field and proximity searching, but only at a single level (e.g. complete documents or individual sections, but not both). Therefore document and section indexes for a collection require two separate indexes. It provides a similar range of search functionality to MGPP with the addition of single-character wildcards and range searching. It was added to Greenstone to facilitate [[incremental building | incremental collection building]], which MG and MGPP can't provide. +     * [[nzdl:mg | More info about MG ]] 
 +  * **MGPP**: MGPP (MG plus plus), the new version of MG, was developed by the New Zealand Digital Library Project. It does word level indexing, which allows fielded, phrase and proximity searching to be handled by the indexer. Boolean searches can be ranked. Only a single index is created for a Greenstone collection: document/section levels and text/metadata fields are all handled by the one index. For collections with many indexes, this results in a smaller collection size than using MG. For large collections, searching may be a bit slower due to the index being word level rather than section level.  
 +      * [[http://files.greenstone.org/technical/mgpp_user.pdf|MGPP user guide]] 
 +  * **Lucene**: Lucene was developed by the Apache Software Foundation. It handles field and proximity searching, but only at a single level (e.g. complete documents or individual sections, but not both). Therefore document and section indexes for a collection require two separate indexes. It provides a similar range of search functionality to MGPP with the addition of single-character wildcards and range searching. It was added to Greenstone to facilitate [[incremental building | incremental collection building]], which MG and MGPP can't provide.  
 +    * [[  |lucene web site]] 
 +    * [[en:user_advanced:lucene| More info about Lucene]]
  
 +  * **SOLR**: Available in Greenstone3.
 +      * [[en:user_advanced:solr| More info about SOLR ]]
 +    * 
 Changing the indexer affects how the indexes are built, and may affect search functionality.  Changing the indexer affects how the indexes are built, and may affect search functionality. 
 The following table compares the indexers' features, which are explained in the sections below. The following table compares the indexers' features, which are explained in the sections below.
Line 206: Line 214:
 In Greenstone 3, if you use SOLR as your search indexer, you can have faceted searching. This means you can filter search results based on other metadata. The facet options need to be set up manually in the collectionConfig.xml file as GLI does not allow you to enter them manually yet. In Greenstone 3, if you use SOLR as your search indexer, you can have faceted searching. This means you can filter search results based on other metadata. The facet options need to be set up manually in the collectionConfig.xml file as GLI does not allow you to enter them manually yet.
  
-Add <facet> elements into the <search> element, in a simlar fashion to the index elements:+Add <facet> elements into the <search> element, in a similar fashion to the index elements:
  
 <code> <code>
Line 230: Line 238:
 as each indexer ranks results differently.  as each indexer ranks results differently. 
  
-If you create a new [[en:sites|site]] for your installation, enable cross-collection search by adding+If you create a new [[en:user:sites|site]] for your installation, enable cross-collection search by adding
 the CrossCollectionSearch serviceRack to the site's ''siteConfig.xml'' file: the CrossCollectionSearch serviceRack to the site's ''siteConfig.xml'' file:
 <code> <code>
Line 283: Line 291:
 </code> </code>
  
-Modify the format statement in the same way you do any format statement. Just keep the collname variable there, and take note of how the links are constructed.+You can modify the format statement in almost the same way you do any format statement. You can use //<gsf:metadata>// like normal. The main difference is in how links (to documents and source files) are constructed. Currently //<gsf:link>// won't give the right result for a cross collection search result. You need to construct the link by hand. 
 + 
 +Links to the greenstone version of the document look like: 
 +  *library/collection/collname/document/docID 
 + 
 +You can construct this using eg 
 +<code> 
 +<a><xsl:attribute name='href'><xsl:value-of select="$library_name"/>/collection/ 
 +    <xsl:value-of select='@collection'/>/document/<xsl:value-of select='@nodeID'/> 
 +    </xsl:attribute>...</a> 
 +</code> 
 + 
 +Links to a source file look like 
 +  * sites/sitename/collect/collname/index/assoc/assocfilepath/srclinkFile 
 + 
 +You can construct this using eg 
 +<code> 
 +<a><xsl:attribute name='href'>sites/localsite/collect/ 
 +      <xsl:value-of select='@collection'/>/index/assoc/<gsf:metadata name="assocfilepath"/>/ 
 +     <gsf:metadata name="srclinkFile"/></xsl:attribute>...</a> 
 +</code> 
 + 
 +To put a link to the PDF (or other source file) file, dependent on whether there is a PDF or not, use the following: 
 +<code> 
 +<td><gsf:if-metadata-exists><gsf:metadata name="srclinkFile"/> 
 +       <gsf:if><a><xsl:attribute name='href'>sites/localsite/collect/ 
 +                  <xsl:value-of select='@collection'/>/index/assoc/ 
 +                  <gsf:metadata name="assocfilepath"/>/<gsf:metadata name="srclinkFile"/>    
 +                  </xsl:attribute><gsf:metadata name="srcicon"/></a> 
 +       </gsf:if></gsf:if-metadata-exists></td> 
 +</code> 
 + 
 +This uses <gsf:if-metadata-exists> to text on the existence of srclinkFile metadata, and only output the link if it is present. srclinkFile metadata is the name of the source file that should be linked to. Note this may not be the same as the original file, as some plugins rename the files (eg PDFPlugin by default uses doc.pdf). 
 + 
 +If you have collections with images too, you may link to add in the thumbnail linked to the original image. In this case, replace <gsf:metadata name="srcicon"/> with  
 +<code> 
 +<gsf:choose-metadata><gsf:metadata name="thumbicon"/><gsf:metadata name="srcicon"/> 
 +</gsf:choose-metadata> 
 +</code> 
 +This will output thumbicon metadata if it exists, otherwise will output srcicon metadata.
 </TAB> </TAB>
 <TAB> <TAB>
en/user/searching.txt · Last modified: 2023/03/13 01:46 by 127.0.0.1