User Tools

Site Tools


en:user_advanced:solr

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
en:user_advanced:solr [2015/02/09 10:59] – external edit 127.0.0.1en:user_advanced:solr [2023/03/13 01:46] (current) – external edit 127.0.0.1
Line 1: Line 1:
 +
 +
 +
 ======Indexing Using SOLR====== ======Indexing Using SOLR======
  
Line 5: Line 8:
 See http://lucene.apache.org/solr/ for more details about SOLR. See http://lucene.apache.org/solr/ for more details about SOLR.
  
-There is only rudimentary, partial support in GLI for building a Greenstone collection with solr as the indexer. At present, GLI will preserve solr-specific elements, such as the ''option'' subelement of ''index'' above, as well as ''solr'' and ''facet'' elements. Building a SOLR collection in GLI has 2 drawbacks in Greenstone 3.06rc1: +There is only rudimentary, partial support in //GLI// for building a Greenstone collection with solr as the indexer. At present, GLI will preserve solr-specific elements, such as the ''option'' subelement of ''index'' mentioned below, as well as ''solr'' and ''facet'' elements. Building a SOLR collection in GLI has 2 drawbacks in Greenstone 3.06rc1: 
-  * Building a solr collection in GLI will stop the Greenstone server before building the collection and restart it when the collection has been rebuilt +  * Building a solr collection in GLI will stop the Greenstone server before building the collection and restart it when the collection has been rebuilt. In more recent versions of GS3 this is no longer a problem, as newer GS3 versions do not stop and start the server for building a solr collection any more. 
-  * Java 7 is needed to successfully build a solr collection in GLI, at least on Windows. For this to work, you will first need to have a JDK 7 installed on your machine (with the JAVA_HOME environment variable set up, and with JAVA_HOME/bin added to your PATH environemtn variable). Secondly, you will need to move your Greenstone 3 installation's ''packages/jre'' out of the way before running GLI, so that GLI finds your Java 7 instead.+  * Java 7is needed to successfully build a solr collection in GLI, at least on Windows. For this to work, you will first need to have a JDK 7 installed on your machine (with the JAVA_HOME environment variable set up, and with JAVA_HOME/bin added to your PATH environment variable). Secondly, you will need to move your Greenstone 3 installation's ''packages/jre'' out of the way before running GLI, so that GLI finds your Java 7 instead.
  
 ====== Accessing Solr Admin ====== ====== Accessing Solr Admin ======
-In case Greenstone server and client on the same PC open in browser http://127.0.0.1:8383/solr +  * In the case of a Greenstone server and client running on the same PCopen up http://127.0.0.1:8383/solr in your browser 
- +  In the case of a remote Greenstone serveryou need to forward ports. Assuming the default GS3 port 8383, on a Linux terminal you'd do:\\ ''ssh -L 8383:127.0.0.1:8383 <greenstone-server-machine>''
-In case remote Greenstone server you need to forward ports by ssh -L 8383:127.0.0.1:8383 greenstoneserver+
  
 ==== Using analyzers specifically suited to different languages  ==== ==== Using analyzers specifically suited to different languages  ====
Line 58: Line 60:
 The analyzers that will be used for each language are defined in the file ''ext/solr/conf/schema.xml(.in)'' located in your Greenstone 3.06 installation. For instance, Japanese uses the Kuromoji analyzer by default, which is optimised to allow natural searching in Japanese. Spanish by default has been set up to use the SnowballPorterFilter.  The analyzers that will be used for each language are defined in the file ''ext/solr/conf/schema.xml(.in)'' located in your Greenstone 3.06 installation. For instance, Japanese uses the Kuromoji analyzer by default, which is optimised to allow natural searching in Japanese. Spanish by default has been set up to use the SnowballPorterFilter. 
  
-If you want your own analyzer, you need to have FilterFactory and Filter Solr classes in placed in jar archive. Jar file should be placed in WEB-INF/lib dir of solr.war archive located in ./packages/tomcat/webapps/solr.war +Diego Spano has investigated this Spanish analyzer's stemming abilities and has found that it does not always produce the expected results. Diego has read that ''Hunspell'' may be a better analyzer for Spanish. Hunspell is also available for many other languages. Instructions on how to modify the ''ext/solr/conf/schema.xml(.in)'' file to use Hunspell for a language instead are at http://wiki.apache.org/solr/HunspellStemFilterFactory. An example for Polish is at http://solr.pl/en/2012/04/02/solr-4-0-and-polish-language-analysis/ 
-Also you need to describe your analyzer in ext/solr/conf/schema.xml.in Example of implementing russian morphology analyzer:+ 
 +You will need to modify the ''ext/solr/conf/schema.xml(.in)'' file before building a new solr collection that will use the modifications. 
 + 
 + 
 +If you want your own analyzer, you need to have the FilterFactory and Filter Solr classes placed in jar archive. The jar file should be placed in ''WEB-INF/lib'' dir of the ''solr.war'' archive located in ''./packages/tomcat/webapps/solr.war''\\ 
 +Also you need to describe your analyzer in ''ext/solr/conf/schema.xml.in''. An example of implementing the Russian morphology analyzer:
 <code> <code>
  <fieldType name="text_ru_morph" class="solr.TextField" positionIncrementGap="100">  <fieldType name="text_ru_morph" class="solr.TextField" positionIncrementGap="100">
Line 70: Line 77:
  </fieldType>  </fieldType>
 </code> </code>
-In example above search string is sliced to words(tokens) by tokenizer.  +In the example above, the search string is sliced to words(tokens) by the tokenizer.\\ 
-Further all charcaters in each token are converted to lowercase to simplify analyzing by filter LowerCaseFilterFactory.  +Furtherall characters in each token are converted to lowercase to simplify analyzing by the filter LowerCaseFilterFactory.\\ 
-On the next stage filter StopFilterFactory remove tokens that represents common words. +In the next stage, the filter StopFilterFactory removes tokens that represent common words.\\ 
-Last stage is getting normalized form of word(token) by custom Filter. +The last stage involves getting the normalized form of word(token) by custom Filter.
- +
- +
-Diego Spano has investigated this analyzer's stemming abilities and has found that it does not always produce the expected results. Diego has read that ''Hunspell'' may be a better analyzer for Spanish. Hunspell is also available for many other languages. Instructions on how to modify the ''ext/solr/conf/schema.xml(.in)'' file to use Hunspell for a language instead are at http://wiki.apache.org/solr/HunspellStemFilterFactory. An example for Polish is at http://solr.pl/en/2012/04/02/solr-4-0-and-polish-language-analysis/ +
- +
-You will need to modify the ''ext/solr/conf/schema.xml(.in)'' file before building a new solr collection that will use the modifications. +
- +
- +
- +
- +
  
en/user_advanced/solr.1423479584.txt.gz · Last modified: 2016/08/12 07:38 (external edit)