en:tutorials
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | Next revisionBoth sides next revision | ||
en:tutorials [2017/10/05 02:42] – anupama | en:tutorials [2019/04/24 09:30] – anupama | ||
---|---|---|---|
Line 443: | Line 443: | ||
</ | </ | ||
- | <!-- | ||
- | USING THE ICECITE TOOL TO CONVERT FROM PDF TO TXT | ||
- | 1. Need Java 8 for compiling and probably also for running Icecite | ||
- | < | ||
- | export JAVA_HOME=/ | ||
- | export PATH=$JAVA_HOME/ | ||
- | </ | ||
- | |||
- | 2. Get and compile icecite, following the instructions at https:// | ||
- | < | ||
- | git clone https:// | ||
- | cd icecite | ||
- | git pull --recurse-submodules | ||
- | cd pdf-parent/ | ||
- | mvn install | ||
- | </ | ||
- | |||
- | 3. Run icecite, general instructions at https:// | ||
- | < | ||
- | cd ../../ | ||
- | cd icecite/ | ||
- | java -jar target/ | ||
- | </ | ||
- | Examples: | ||
- | greenstone@bedrock: | ||
- | |||
- | greenstone@bedrock: | ||
- | |||
- | greenstone@bedrock: | ||
- | |||
- | (Also tried with input file pdf01.pdf from the Reports collection) | ||
- | |||
- | |||
- | 4. If you see the exception | ||
- | --- | ||
- | Exception in thread " | ||
- | at org.apache.pdfbox.pdmodel.encryption.PDEncryption.< | ||
- | at org.apache.pdfbox.pdfparser.PDFParser.prepareDecryption(PDFParser.java: | ||
- | at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java: | ||
- | at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java: | ||
- | at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java: | ||
- | at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java: | ||
- | at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java: | ||
- | at parser.pdfbox.core.PdfStreamEngine.processFile(PdfStreamEngine.java: | ||
- | at parser.pdfbox.PdfBoxParser.parse(PdfBoxParser.java: | ||
- | at cli.PdfParserCommandLine.parse(PdfParserCommandLine.java: | ||
- | at cli.PdfParserCommandLine.processFile(PdfParserCommandLine.java: | ||
- | at cli.PdfParserCommandLine.process(PdfParserCommandLine.java: | ||
- | at cli.PdfParserCommandLine.main(PdfParserCommandLine.java: | ||
- | Caused by: java.lang.ClassNotFoundException: | ||
- | at java.net.URLClassLoader$1.run(URLClassLoader.java: | ||
- | at java.net.URLClassLoader$1.run(URLClassLoader.java: | ||
- | at java.security.AccessController.doPrivileged(Native Method) | ||
- | at java.net.URLClassLoader.findClass(URLClassLoader.java: | ||
- | at java.lang.ClassLoader.loadClass(ClassLoader.java: | ||
- | at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java: | ||
- | at java.lang.ClassLoader.loadClass(ClassLoader.java: | ||
- | ... 13 more | ||
- | |||
- | --- | ||
- | |||
- | Then: | ||
- | a. Obtain bouncycastle (encryption? | ||
- | |||
- | Download both jar files listed under the " | ||
- | |||
- | b. Then see https:// | ||
- | for how to run a java programme when you have multiple jar files on classpath, as you can't run java with both -cp and -jar. | ||
- | |||
- | greenstone@bedrock: | ||
- | --> |
en/tutorials.txt · Last modified: 2023/11/29 00:21 by kjdon