User Tools

Site Tools


en:plugin:unknownconverterplugin

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:plugin:unknownconverterplugin [2021/08/19 03:34] – [Download JRE 8 and install locally into your GS3] anupamaen:plugin:unknownconverterplugin [2023/03/13 01:46] (current) – external edit 127.0.0.1
Line 1: Line 1:
 +
 +
 +
 ====== The UnknownConverterPlugin ====== ====== The UnknownConverterPlugin ======
  
Line 9: Line 12:
 Apache Tika is Apache's open-source software to extract text from countless different (textual) document types, one of which is docx. While one can write code to make calls on Apache-Tika's API, their ready made jar file contained everything that we needed to get Greenstone to index text in docx files. Apache Tika is Apache's open-source software to extract text from countless different (textual) document types, one of which is docx. While one can write code to make calls on Apache-Tika's API, their ready made jar file contained everything that we needed to get Greenstone to index text in docx files.
  
-All that's necessary is to drop an Apache-Tika jar file into your ''GS3/gs2build/ext'' and then configure an UnknownConverterPlugin instance to make use of it. Building the collection with this will  allow Greenstone to process and index docx files to make them searchable without requiring users to install libreoffice.+The steps involve putting a JRE 8 into your Greenstone 3, drop an Apache-Tika jar file into your ''GS3/gs2build/ext'' and then configuring an UnknownConverterPlugin instance to make use of it. Building the collection with this will allow Greenstone to process and index docx files to make them searchable without requiring users to install libreoffice. 
 + 
 +==== Steps for users of Greenstone 3 versions after 3.10 ==== 
 +Your Greenstone 3, whether running on Windows or Unix systems, is ready to process docx files out of the box. 
 + 
 +Run GLI, drag and drop docx files into your collection and after building, full text searching for your docx files will be available. 
 + 
 +==== Steps for 3.10 users ==== 
 +1. [[http://wiki.greenstone.org/doku.php?id=en:plugin:unknownconverterplugin#download_jre_8_and_install_locally_into_your_gs3|Instructions to quickly get a JRE 8 and install it locally into your GS3]] 
 + 
 +Linux users can now start up GLI, drag and drop docx files into a collection. After building, your collection will have full text search for your docx files. 
 + 
 +2. **Extra step for Windows users:**\\ 
 +Use a text editor to edit ''<your-GS3>/gs2build/collect/modelcol/etc/collectionConfig.xml'' as follows.\\ 
 +Locate the line that says: 
 +<code>java -jar $GSDLHOME/ext/tika/tika-app-*.jar --html --pretty-print --encoding=UTF-8 %%INPUT_FILE > %%OUTPUT</code> 
 +and change it to say: 
 +<code>java -jar %GSDLHOME%/ext/tika/tika-app-1.24.1.jar --html --pretty-print --encoding=UTF-8 %%INPUT_FILE > %%OUTPUT</code> 
 +Save the ''modelcol'''s ''collectionConfig.xml'' file before closing. 
 + 
 +Now you can run GLI, drag and drop docx files into your collections and after building you'll now have full text search for your docx files. 
 + 
 + 
 +==== Steps for 3.09 users ====
  
 **The UnknownConverterPlugin has been officially available since Greenstone 3.09, so that 3.09 users can also start using Tika with the plugin, by** **The UnknownConverterPlugin has been officially available since Greenstone 3.09, so that 3.09 users can also start using Tika with the plugin, by**
Line 34: Line 60:
  
 ===== Download JRE 8 and install locally into your GS3 ===== ===== Download JRE 8 and install locally into your GS3 =====
-GS3 comes bundled with JRE 7, but tika-app-1.24.1.jar needs JRE 8+.\\ The following steps will have you quickly set up with a JRE 8 local to your Greenstone 3 installation.+GS3 comes bundled with JRE 7, but the bundled ''tika-app-1.24.1.jar'' needs JRE 8+.\\ The following steps for your Operating System will have you quickly set up with a JRE 8 local to your Greenstone 3 installation.
  
-1. **For Windows:**+**For Windows users:**
  
 a. Use a File Explorer to do the following on the file system: a. Use a File Explorer to do the following on the file system:
-Rename <your-GS3>\packages\jre to <your-GS3>\packages\jre.orig +  * Rename ''<your-GS3>\packages\jre'' to ''<your-GS3>\packages\jre.orig'
-- If you're on Windows: create folder <your-GS3>\packages\jre+  * Create folder ''<your-GS3>\packages\jre''
  
 b. Visit: https://www.java.com/en/download/manual.jsp b. Visit: https://www.java.com/en/download/manual.jsp
Line 55: Line 81:
  
  
-2. **For Linux users:**+**For Linux users:**
  
-a. Rename <your-GS3>\packages\jre to <your-GS3>\packages\jre.orig+a. Rename ''<your-GS3>\packages\jre'' to ''<your-GS3>\packages\jre.orig''
  
 b. Visit: https://www.java.com/en/download/manual.jsp b. Visit: https://www.java.com/en/download/manual.jsp
Line 72: Line 98:
 Then rename the ''jre...'' folder to just ''jre''. Then rename the ''jre...'' folder to just ''jre''.
  
-You want to end up with this structure: ''<your-GS3>/packages/jre/bin'' +You want to end up with this file structure: ''<your-GS3>/packages/jre/bin''
  
en/plugin/unknownconverterplugin.1629344065.txt.gz · Last modified: 2021/08/19 03:34 by anupama