en:user_advanced:ice_cite
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
en:user_advanced:ice_cite [2019/03/13 05:57] – anupama | en:user_advanced:ice_cite [2023/03/13 01:46] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | |||
+ | |||
+ | |||
====== Processing PDFs with Icecite and the UnknownConverterPlugin ====== | ====== Processing PDFs with Icecite and the UnknownConverterPlugin ====== | ||
The contents of this page were originally created as the final section of the Greenstone 3 tutorial "Using the UnknownConverterPlugin to make unsupported document formats searchable", | The contents of this page were originally created as the final section of the Greenstone 3 tutorial "Using the UnknownConverterPlugin to make unsupported document formats searchable", | ||
Line 42: | Line 45: | ||
- Select the **UnknownConverterPlugin** in the list of plugins and keep pressing the **<Move Up>** button to shift it upwards, until it appears in the plugin pipeline above the existing **PDFPlugin**, | - Select the **UnknownConverterPlugin** in the list of plugins and keep pressing the **<Move Up>** button to shift it upwards, until it appears in the plugin pipeline above the existing **PDFPlugin**, | ||
- Move to the **Create** pane and build the collection. Once more, when Icecite conversion utility is called by Greenstone' | - Move to the **Create** pane and build the collection. Once more, when Icecite conversion utility is called by Greenstone' | ||
+ | |||
+ | |||
+ | <!-- | ||
+ | USING THE ICECITE TOOL TO CONVERT FROM PDF TO TXT | ||
+ | |||
+ | 1. Need Java 8 for compiling and probably also for running Icecite | ||
+ | < | ||
+ | export JAVA_HOME=/ | ||
+ | export PATH=$JAVA_HOME/ | ||
+ | </ | ||
+ | |||
+ | 2. Get and compile icecite, following the instructions at https:// | ||
+ | < | ||
+ | git clone https:// | ||
+ | cd icecite | ||
+ | git pull --recurse-submodules | ||
+ | cd pdf-parent/ | ||
+ | mvn install | ||
+ | </ | ||
+ | |||
+ | 3. Run icecite, general instructions at https:// | ||
+ | < | ||
+ | cd ../../ | ||
+ | cd icecite/ | ||
+ | java -jar target/ | ||
+ | </ | ||
+ | Examples: | ||
+ | greenstone@bedrock: | ||
+ | |||
+ | greenstone@bedrock: | ||
+ | |||
+ | greenstone@bedrock: | ||
+ | |||
+ | (Also tried with input file pdf01.pdf from the Reports collection) | ||
+ | |||
+ | |||
+ | 4. If you see the exception | ||
+ | --- | ||
+ | Exception in thread " | ||
+ | at org.apache.pdfbox.pdmodel.encryption.PDEncryption.< | ||
+ | at org.apache.pdfbox.pdfparser.PDFParser.prepareDecryption(PDFParser.java: | ||
+ | at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java: | ||
+ | at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java: | ||
+ | at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java: | ||
+ | at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java: | ||
+ | at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java: | ||
+ | at parser.pdfbox.core.PdfStreamEngine.processFile(PdfStreamEngine.java: | ||
+ | at parser.pdfbox.PdfBoxParser.parse(PdfBoxParser.java: | ||
+ | at cli.PdfParserCommandLine.parse(PdfParserCommandLine.java: | ||
+ | at cli.PdfParserCommandLine.processFile(PdfParserCommandLine.java: | ||
+ | at cli.PdfParserCommandLine.process(PdfParserCommandLine.java: | ||
+ | at cli.PdfParserCommandLine.main(PdfParserCommandLine.java: | ||
+ | Caused by: java.lang.ClassNotFoundException: | ||
+ | at java.net.URLClassLoader$1.run(URLClassLoader.java: | ||
+ | at java.net.URLClassLoader$1.run(URLClassLoader.java: | ||
+ | at java.security.AccessController.doPrivileged(Native Method) | ||
+ | at java.net.URLClassLoader.findClass(URLClassLoader.java: | ||
+ | at java.lang.ClassLoader.loadClass(ClassLoader.java: | ||
+ | at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java: | ||
+ | at java.lang.ClassLoader.loadClass(ClassLoader.java: | ||
+ | ... 13 more | ||
+ | |||
+ | --- | ||
+ | |||
+ | Then: | ||
+ | a. Obtain bouncycastle (encryption? | ||
+ | |||
+ | Download both jar files listed under the " | ||
+ | |||
+ | b. Then see https:// | ||
+ | for how to run a java programme when you have multiple jar files on classpath, as you can't run java with both -cp and -jar. | ||
+ | |||
+ | greenstone@bedrock: | ||
+ | --> |
en/user_advanced/ice_cite.1552456628.txt.gz · Last modified: 2019/03/13 05:57 by anupama