en:user_advanced:ice_cite
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| en:user_advanced:ice_cite [2019/02/21 06:26] – [Using the UnknownConverterPlugin to launch Icecite from GLI to do the PDF to text conversion] anupama | en:user_advanced:ice_cite [2023/03/13 01:46] (current) – external edit 127.0.0.1 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | |||
| + | |||
| + | |||
| ====== Processing PDFs with Icecite and the UnknownConverterPlugin ====== | ====== Processing PDFs with Icecite and the UnknownConverterPlugin ====== | ||
| The contents of this page were originally created as the final section of the Greenstone 3 tutorial "Using the UnknownConverterPlugin to make unsupported document formats searchable", | The contents of this page were originally created as the final section of the Greenstone 3 tutorial "Using the UnknownConverterPlugin to make unsupported document formats searchable", | ||
| Line 8: | Line 11: | ||
| ==== Using the Icecite' | ==== Using the Icecite' | ||
| - | // | + | // |
| //As Icecite needs Java 8, you need to have either a JDK8 or a JRE8 installed in order to proceed with this portion of the tutorial.// | //As Icecite needs Java 8, you need to have either a JDK8 or a JRE8 installed in order to proceed with this portion of the tutorial.// | ||
| Line 42: | Line 45: | ||
| - Select the **UnknownConverterPlugin** in the list of plugins and keep pressing the **<Move Up>** button to shift it upwards, until it appears in the plugin pipeline above the existing **PDFPlugin**, | - Select the **UnknownConverterPlugin** in the list of plugins and keep pressing the **<Move Up>** button to shift it upwards, until it appears in the plugin pipeline above the existing **PDFPlugin**, | ||
| - Move to the **Create** pane and build the collection. Once more, when Icecite conversion utility is called by Greenstone' | - Move to the **Create** pane and build the collection. Once more, when Icecite conversion utility is called by Greenstone' | ||
| + | |||
| + | |||
| + | <!-- | ||
| + | USING THE ICECITE TOOL TO CONVERT FROM PDF TO TXT | ||
| + | |||
| + | 1. Need Java 8 for compiling and probably also for running Icecite | ||
| + | < | ||
| + | export JAVA_HOME=/ | ||
| + | export PATH=$JAVA_HOME/ | ||
| + | </ | ||
| + | |||
| + | 2. Get and compile icecite, following the instructions at https:// | ||
| + | < | ||
| + | git clone https:// | ||
| + | cd icecite | ||
| + | git pull --recurse-submodules | ||
| + | cd pdf-parent/ | ||
| + | mvn install | ||
| + | </ | ||
| + | |||
| + | 3. Run icecite, general instructions at https:// | ||
| + | < | ||
| + | cd ../../ | ||
| + | cd icecite/ | ||
| + | java -jar target/ | ||
| + | </ | ||
| + | Examples: | ||
| + | greenstone@bedrock: | ||
| + | |||
| + | greenstone@bedrock: | ||
| + | |||
| + | greenstone@bedrock: | ||
| + | |||
| + | (Also tried with input file pdf01.pdf from the Reports collection) | ||
| + | |||
| + | |||
| + | 4. If you see the exception | ||
| + | --- | ||
| + | Exception in thread " | ||
| + | at org.apache.pdfbox.pdmodel.encryption.PDEncryption.< | ||
| + | at org.apache.pdfbox.pdfparser.PDFParser.prepareDecryption(PDFParser.java: | ||
| + | at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java: | ||
| + | at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java: | ||
| + | at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java: | ||
| + | at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java: | ||
| + | at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java: | ||
| + | at parser.pdfbox.core.PdfStreamEngine.processFile(PdfStreamEngine.java: | ||
| + | at parser.pdfbox.PdfBoxParser.parse(PdfBoxParser.java: | ||
| + | at cli.PdfParserCommandLine.parse(PdfParserCommandLine.java: | ||
| + | at cli.PdfParserCommandLine.processFile(PdfParserCommandLine.java: | ||
| + | at cli.PdfParserCommandLine.process(PdfParserCommandLine.java: | ||
| + | at cli.PdfParserCommandLine.main(PdfParserCommandLine.java: | ||
| + | Caused by: java.lang.ClassNotFoundException: | ||
| + | at java.net.URLClassLoader$1.run(URLClassLoader.java: | ||
| + | at java.net.URLClassLoader$1.run(URLClassLoader.java: | ||
| + | at java.security.AccessController.doPrivileged(Native Method) | ||
| + | at java.net.URLClassLoader.findClass(URLClassLoader.java: | ||
| + | at java.lang.ClassLoader.loadClass(ClassLoader.java: | ||
| + | at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java: | ||
| + | at java.lang.ClassLoader.loadClass(ClassLoader.java: | ||
| + | ... 13 more | ||
| + | |||
| + | --- | ||
| + | |||
| + | Then: | ||
| + | a. Obtain bouncycastle (encryption? | ||
| + | |||
| + | Download both jar files listed under the " | ||
| + | |||
| + | b. Then see https:// | ||
| + | for how to run a java programme when you have multiple jar files on classpath, as you can't run java with both -cp and -jar. | ||
| + | |||
| + | greenstone@bedrock: | ||
| + | --> | ||
en/user_advanced/ice_cite.1550730390.txt.gz · Last modified: 2019/02/21 06:26 by anupama
