old:more_about_indexing
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
old:more_about_indexing [2015/08/13 01:55] – external edit 127.0.0.1 | old:more_about_indexing [2023/03/13 01:46] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | |||
+ | |||
+ | |||
//**This page is in the ' | //**This page is in the ' | ||
We recommend checking for more up-to-date information using the search box.**// | We recommend checking for more up-to-date information using the search box.**// | ||
Line 27: | Line 30: | ||
* **MG**: This is the original indexer used by Greenstone, developed mainly by Alistair Moffat and described in the classic book [[http:// | * **MG**: This is the original indexer used by Greenstone, developed mainly by Alistair Moffat and described in the classic book [[http:// | ||
- | * **MGPP**: This new version of MG (MG plus plus) was developed by the New Zealand Digital Library Project. It does word level indexing, which allows fielded, phrase and proximity searching to be handled by the indexer. Boolean searches can be ranked. Only a single index is created for a Greenstone collection: document/ | + | * **MGPP**: This new version of MG (MG plus plus) was developed by the New Zealand Digital Library Project. It does word level indexing, which allows fielded, phrase and proximity searching to be handled by the indexer. Boolean searches can be ranked. Only a single index is created for a Greenstone collection: document/ |
* **Lucene**: Lucene was developed by the Apache Software Foundation. It handles field and proximity searching, but only at a single level (e.g. complete documents or individual sections, but not both). Therefore document and section indexes for a collection require two separate indexes. It provides a similar range of search functionality to MGPP with the addition of single-character wildcards and range searching. It was added to Greenstone to facilitate incremental collection building, which MG and MGPP can't provide. [[http:// | * **Lucene**: Lucene was developed by the Apache Software Foundation. It handles field and proximity searching, but only at a single level (e.g. complete documents or individual sections, but not both). Therefore document and section indexes for a collection require two separate indexes. It provides a similar range of search functionality to MGPP with the addition of single-character wildcards and range searching. It was added to Greenstone to facilitate incremental collection building, which MG and MGPP can't provide. [[http:// | ||
Line 56: | Line 59: | ||
The issue is that when the data is stored in presentation form the words will not be matched when doing a search, this should be understandable when you realise that the underlying UNCODE is very different (even if the word searched for is presented identically). | The issue is that when the data is stored in presentation form the words will not be matched when doing a search, this should be understandable when you realise that the underlying UNCODE is very different (even if the word searched for is presented identically). | ||
- | |||
old/more_about_indexing.1439430901.txt.gz · Last modified: 2017/05/08 01:58 (external edit)