en:user_advanced:ice_cite
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
en:user_advanced:ice_cite [2019/02/21 06:20] – [Using the UnknownConverterPlugin to launch Icecite from GLI to do the PDF to text conversion] anupama | en:user_advanced:ice_cite [2019/04/24 09:29] – anupama | ||
---|---|---|---|
Line 8: | Line 8: | ||
==== Using the Icecite' | ==== Using the Icecite' | ||
- | // | + | // |
//As Icecite needs Java 8, you need to have either a JDK8 or a JRE8 installed in order to proceed with this portion of the tutorial.// | //As Icecite needs Java 8, you need to have either a JDK8 or a JRE8 installed in order to proceed with this portion of the tutorial.// | ||
Line 22: | Line 22: | ||
// | // | ||
- Run GLI | - Run GLI | ||
- | - Create a new collection called Icecite. In the Gather pane, drop in the sample PDF file into your collection. | + | - Create a new collection called Icecite. In the **Gather** pane, drop in the sample PDF file into your collection. |
- In the **Design** pane and select **Document Plugins** from the list on the left. Add the **UnknownConverterPlugin**. Having tried out the Icecite conversion command manually in the previous part of this tutorial, we're now ready to use it when configuring the **UnknownConverterPlugin**. Click **< | - In the **Design** pane and select **Document Plugins** from the list on the left. Add the **UnknownConverterPlugin**. Having tried out the Icecite conversion command manually in the previous part of this tutorial, we're now ready to use it when configuring the **UnknownConverterPlugin**. Click **< | ||
* set '' | * set '' | ||
* set '' | * set '' | ||
* set '' | * set '' | ||
- | * set '' | + | * set '' |
* set the '' | * set the '' | ||
* on Windows:\\ '' | * on Windows:\\ '' | ||
* on Unix systems:\\ ''/ | * on Unix systems:\\ ''/ | ||
- | Note: When filling in the '' | + | Note: When filling in the '' |
You will however need to adjust the above value for exec_cmd by finding out where your Java 8 is installed and replacing ''/ | You will however need to adjust the above value for exec_cmd by finding out where your Java 8 is installed and replacing ''/ | ||
Line 39: | Line 39: | ||
//The above command will use the java executable to run the java Icecite program that does the actual PDF to text conversion. Greenstone will run the command given after first filling in the %%GSDL3SRCHOME, | //The above command will use the java executable to run the java Icecite program that does the actual PDF to text conversion. Greenstone will run the command given after first filling in the %%GSDL3SRCHOME, | ||
- | | + | |
- | | + | |
- | | + | |
+ | |||
+ | |||
+ | <!-- | ||
+ | USING THE ICECITE TOOL TO CONVERT FROM PDF TO TXT | ||
+ | |||
+ | 1. Need Java 8 for compiling and probably also for running Icecite | ||
+ | < | ||
+ | export JAVA_HOME=/ | ||
+ | export PATH=$JAVA_HOME/ | ||
+ | </ | ||
+ | |||
+ | 2. Get and compile icecite, following the instructions at https:// | ||
+ | < | ||
+ | git clone https:// | ||
+ | cd icecite | ||
+ | git pull --recurse-submodules | ||
+ | cd pdf-parent/ | ||
+ | mvn install | ||
+ | </ | ||
+ | |||
+ | 3. Run icecite, general instructions at https:// | ||
+ | < | ||
+ | cd ../../ | ||
+ | cd icecite/ | ||
+ | java -jar target/ | ||
+ | </ | ||
+ | Examples: | ||
+ | greenstone@bedrock: | ||
+ | |||
+ | greenstone@bedrock: | ||
+ | |||
+ | greenstone@bedrock: | ||
+ | |||
+ | (Also tried with input file pdf01.pdf from the Reports collection) | ||
+ | |||
+ | |||
+ | 4. If you see the exception | ||
+ | --- | ||
+ | Exception in thread " | ||
+ | at org.apache.pdfbox.pdmodel.encryption.PDEncryption.< | ||
+ | at org.apache.pdfbox.pdfparser.PDFParser.prepareDecryption(PDFParser.java: | ||
+ | at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java: | ||
+ | at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java: | ||
+ | at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java: | ||
+ | at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java: | ||
+ | at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java: | ||
+ | at parser.pdfbox.core.PdfStreamEngine.processFile(PdfStreamEngine.java: | ||
+ | at parser.pdfbox.PdfBoxParser.parse(PdfBoxParser.java: | ||
+ | at cli.PdfParserCommandLine.parse(PdfParserCommandLine.java: | ||
+ | at cli.PdfParserCommandLine.processFile(PdfParserCommandLine.java: | ||
+ | at cli.PdfParserCommandLine.process(PdfParserCommandLine.java: | ||
+ | at cli.PdfParserCommandLine.main(PdfParserCommandLine.java: | ||
+ | Caused by: java.lang.ClassNotFoundException: | ||
+ | at java.net.URLClassLoader$1.run(URLClassLoader.java: | ||
+ | at java.net.URLClassLoader$1.run(URLClassLoader.java: | ||
+ | at java.security.AccessController.doPrivileged(Native Method) | ||
+ | at java.net.URLClassLoader.findClass(URLClassLoader.java: | ||
+ | at java.lang.ClassLoader.loadClass(ClassLoader.java: | ||
+ | at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java: | ||
+ | at java.lang.ClassLoader.loadClass(ClassLoader.java: | ||
+ | ... 13 more | ||
+ | |||
+ | --- | ||
+ | |||
+ | Then: | ||
+ | a. Obtain bouncycastle (encryption? | ||
+ | |||
+ | Download both jar files listed under the " | ||
+ | |||
+ | b. Then see https:// | ||
+ | for how to run a java programme when you have multiple jar files on classpath, as you can't run java with both -cp and -jar. | ||
+ | |||
+ | greenstone@bedrock: | ||
+ | --> |
en/user_advanced/ice_cite.txt · Last modified: 2023/03/13 01:46 by 127.0.0.1