User Tools

Site Tools


old:more_about_indexing

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Last revisionBoth sides next revision
old:more_about_indexing [2015/08/13 01:55] – external edit 127.0.0.1old:more_about_indexing [2017/05/08 01:58] kjdon
Line 27: Line 27:
  
   * **MG**: This is the original indexer used by Greenstone, developed mainly by Alistair Moffat and described in the classic book [[http://www.cs.mu.oz.au/mg/|Managing Gigabytes]]. It does section level indexing, and searches can be boolean or ranked (not both at once). For each index specified in the collection, a separate physical index is created. For phrase searching, Greenstone does an "AND" search on all the terms, then scans the resulting hits to see if the phrase is present. It has been extensively tested on very large collections (many GB of text).  [[http://www.nzdl.org/html/mg.html|MG in Greenstone]].    * **MG**: This is the original indexer used by Greenstone, developed mainly by Alistair Moffat and described in the classic book [[http://www.cs.mu.oz.au/mg/|Managing Gigabytes]]. It does section level indexing, and searches can be boolean or ranked (not both at once). For each index specified in the collection, a separate physical index is created. For phrase searching, Greenstone does an "AND" search on all the terms, then scans the resulting hits to see if the phrase is present. It has been extensively tested on very large collections (many GB of text).  [[http://www.nzdl.org/html/mg.html|MG in Greenstone]]. 
-  * **MGPP**: This new version of MG (MG plus plus) was developed by the New Zealand Digital Library Project. It does word level indexing, which allows fielded, phrase and proximity searching to be handled by the indexer. Boolean searches can be ranked. Only a single index is created for a Greenstone collection: document/section levels and text/metadata fields are all handled by the one index. For collections with many indexes, this results in a smaller collection size than using MG. For large collections, searching may be a bit slower due to the index being word level rather than section level. [[http://www.greenstone.org/docs/mgpp_user.pdf|MGPP user guide]]+  * **MGPP**: This new version of MG (MG plus plus) was developed by the New Zealand Digital Library Project. It does word level indexing, which allows fielded, phrase and proximity searching to be handled by the indexer. Boolean searches can be ranked. Only a single index is created for a Greenstone collection: document/section levels and text/metadata fields are all handled by the one index. For collections with many indexes, this results in a smaller collection size than using MG. For large collections, searching may be a bit slower due to the index being word level rather than section level. [[http://files.greenstone.org/technical/mgpp_user.pdf|MGPP user guide]]
   * **Lucene**: Lucene was developed by the Apache Software Foundation. It handles field and proximity searching, but only at a single level (e.g. complete documents or individual sections, but not both). Therefore document and section indexes for a collection require two separate indexes. It provides a similar range of search functionality to MGPP with the addition of single-character wildcards and range searching. It was added to Greenstone to facilitate incremental collection building, which MG and MGPP can't provide. [[http://lucene.apache.org/|Lucene home page]]   * **Lucene**: Lucene was developed by the Apache Software Foundation. It handles field and proximity searching, but only at a single level (e.g. complete documents or individual sections, but not both). Therefore document and section indexes for a collection require two separate indexes. It provides a similar range of search functionality to MGPP with the addition of single-character wildcards and range searching. It was added to Greenstone to facilitate incremental collection building, which MG and MGPP can't provide. [[http://lucene.apache.org/|Lucene home page]]
  
old/more_about_indexing.txt · Last modified: 2023/03/13 01:46 by 127.0.0.1