User Tools

Site Tools


en:tutorials

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:tutorials [2017/10/05 02:42] anupamaen:tutorials [2025/06/12 09:03] (current) – [Greenstone3] anupama
Line 1: Line 1:
-<TABAREA tabs="Greenstone3,Greenstone2"+<tabbox Greenstone3> 
-<TAB>======Greenstone 3.08 tutorial exercises (26 Aug 2016)======+**Greenstone tutorial exercises (Dec 2024, Jun 2025)**
  
-  * These work with Greenstone 3.08. [[http://wiki.greenstone.org/gsdoc/tutorial/gs3-current/en/all_tutorials.html|Print version]] +  * These work with the tested Greenstone 3.12 rc1 binaries available on the [[https://www.greenstone.org/snapshots|snapshots]] page. [[http://wiki.greenstone.org/gsdoc/tutorial/gs3-current/en/all_tutorials.html | Print version]] 
-  * Each tutorial that requires sample files has a link to a zip download of the files. Alternatively, you can download all of the sample files for all of the tutorials in a single [[http://wiki.greenstone.org/gsdoc/tutorial/gs3-current/sample_files/sample_files.zip|sample_files.zip]]. + 
 +  * Each tutorial that requires sample files has a link to a zip download of the files. Alternatively, you can download all of the sample files for all of the tutorials in a single [[http://wiki.greenstone.org/gsdoc/tutorial/gs3-current/sample_files/sample_files.zip | sample_files.zip]]. 
  
   * Tutorials for older versions of Greenstone can be found at the [[legacy:tutorials | Old Tutorials ]] page.   * Tutorials for older versions of Greenstone can be found at the [[legacy:tutorials | Old Tutorials ]] page.
 +
 +
  
 **[[http://wiki.greenstone.org/gsdoc/tutorial/gs3-current/en/small_html_collection.htm|Building a small collection of HTML files]]** **[[http://wiki.greenstone.org/gsdoc/tutorial/gs3-current/en/small_html_collection.htm|Building a small collection of HTML files]]**
Line 28: Line 31:
  
 **[[http://wiki.greenstone.org/gsdoc/tutorial/gs3-current/en/images_gps.htm|An image collection with GPS metadata]]** **[[http://wiki.greenstone.org/gsdoc/tutorial/gs3-current/en/images_gps.htm|An image collection with GPS metadata]]**
 +
   * Extracting embedded metadata   * Extracting embedded metadata
   * Adding in a map view to browsing   * Adding in a map view to browsing
Line 45: Line 49:
  
   * Tidying up the default format statement   * Tidying up the default format statement
-  * Linking to Greenstone version or original version of documents+  * Linking to the Greenstone version or original version of documents
   * Making bookshelves show how many items they contain   * Making bookshelves show how many items they contain
   * Displaying multi-valued metadata   * Displaying multi-valued metadata
   * Advanced multi-valued metadata   * Advanced multi-valued metadata
- 
- 
-**[[http://wiki.greenstone.org/gsdoc/tutorial/gs3-current/en/pdfbox-extension.htm|Processing newer versions of PDF with PDFBox]]** 
- 
  
  
 **[[http://wiki.greenstone.org/gsdoc/tutorial/gs3-current/en/enhanced_pdf.htm|Enhanced PDF handling]]** **[[http://wiki.greenstone.org/gsdoc/tutorial/gs3-current/en/enhanced_pdf.htm|Enhanced PDF handling]]**
  
-  * Modes in the Librarian Interface 
-  * Splitting PDFs into sections 
   * Using image format   * Using image format
   * Using process_exp to control document processing (advanced)   * Using process_exp to control document processing (advanced)
 +  * Customising the table of contents section heading display
   * Opening PDF files with query terms highlighted   * Opening PDF files with query terms highlighted
  
Line 71: Line 70:
   * Removing pre-defined table of contents   * Removing pre-defined table of contents
   * Extracting document properties as metadata   * Extracting document properties as metadata
 +  * Processing docx files
  
  
 **[[http://wiki.greenstone.org/gsdoc/tutorial/gs3-current/en/associated_files.htm|Associated files: combining different versions of the same document together]]** **[[http://wiki.greenstone.org/gsdoc/tutorial/gs3-current/en/associated_files.htm|Associated files: combining different versions of the same document together]]**
  
-//This tutorial demonstrates how to link different versions of the same document together in Greenstone.//+''This tutorial demonstrates how to link different versions of the same document together in Greenstone.''
   * Associating one document with another   * Associating one document with another
   * Linking to associated documents   * Linking to associated documents
Line 138: Line 138:
   * Using different icons for different media types   * Using different icons for different media types
   * Building a full-size version of the collection   * Building a full-size version of the collection
 +  * Adding an image collage browser
  
  
Line 168: Line 169:
   * Downloading using the command line   * Downloading using the command line
   * Building the downloaded documents in GLI   * Building the downloaded documents in GLI
 +
 +
 +**[[http://wiki.greenstone.org/gsdoc/tutorial/gs3-current/en/unknown_converter_plugin.htm|Using the UnknownConverterPlugin to make unsupported document formats searchable]]**
 +
 +  * Working with DjVu documents in Greenstone
 +  * Extracting the text from DjVu documents with DjVuLibre's djvutxt
 +  * Processing DjVu documents with the UnknownConverterPlugin
 +  * Associating an icon with DjVu documents in Greenstone
  
  
Line 199: Line 208:
   * Use search mode hotkeys with query term   * Use search mode hotkeys with query term
   * A quick reference of the search mode hotkeys in MGPP   * A quick reference of the search mode hotkeys in MGPP
 +
  
 **[[http://wiki.greenstone.org/gsdoc/tutorial/gs3-current/en/incremental_cmdline.htm|Incrementally building a collection using the command line]]** **[[http://wiki.greenstone.org/gsdoc/tutorial/gs3-current/en/incremental_cmdline.htm|Incrementally building a collection using the command line]]**
Line 205: Line 215:
   * Incrementally deleting some documents from a collection   * Incrementally deleting some documents from a collection
   * Editing a document's text and metadata, and then incrementally rebuilding the collection   * Editing a document's text and metadata, and then incrementally rebuilding the collection
-  * Incrementally indexing automatically+  * Automatic incremental indexing
  
-===== Customization ===== 
  
 **[[http://wiki.greenstone.org/gsdoc/tutorial/gs3-current/en/using_themes.htm|Customization: Themes]]** **[[http://wiki.greenstone.org/gsdoc/tutorial/gs3-current/en/using_themes.htm|Customization: Themes]]**
Line 259: Line 268:
   * Adding functionality to the quick search box   * Adding functionality to the quick search box
   * Adding the library name and login links   * Adding the library name and login links
-  * Interface language files 
  
  
-</TAB> +**[[http://wiki.greenstone.org/gsdoc/tutorial/gs3-current/en/webswing_gli.htm|Using WebSwing GLI (Web GLI)]]**
-<!-- ############################################################################################## +
-################################################################################################# +
-##################################################--> +
-<TAB> +
-====== Greenstone 2.87 tutorial exercises (September 2017======+
  
 +  * Creating a user account
 +  * Accessing WebSwing GLI: a Greenstone Librarian Interface (GLI) application accessible over your browser
 +  * Setting up and working with the tutorial sample files through Webswing GLI
 +
 +<tabbox Greenstone2>
 +** Greenstone 2.87 tutorial exercises (September 2017) **
   * These work with Greenstone 2.87. [[http://wiki.greenstone.org/gsdoc/tutorial/gs2-current/en/all_tutorials.html|Print version]]    * These work with Greenstone 2.87. [[http://wiki.greenstone.org/gsdoc/tutorial/gs2-current/en/all_tutorials.html|Print version]] 
   * For installation and setup instructions, and for patches, refer to the [[en:release:2.87_release_notes | 2.87 Release Notes]].   * For installation and setup instructions, and for patches, refer to the [[en:release:2.87_release_notes | 2.87 Release Notes]].
Line 439: Line 448:
   * Use the Depositor to do incremental addition   * Use the Depositor to do incremental addition
   * Batch addition with the Depositor   * Batch addition with the Depositor
-</TAB> 
  
  
-</TABAREA> 
-<!-- 
-USING THE ICECITE TOOL TO CONVERT FROM PDF TO TXT 
  
-1. Need Java 8 for compiling and probably also for running Icecite +</tabbox>
-<code> +
-export JAVA_HOME=/opt/java8/ +
-export PATH=$JAVA_HOME/bin:$PATH +
-</code>+
  
-2. Get and compile icecite, following the instructions at https://github.com/ckorzen/icecite 
-<code> 
-git clone https://github.com/ckorzen/icecite.git --recursive 
-cd icecite 
-git pull --recurse-submodules 
-cd pdf-parent/ 
-mvn install 
-</code> 
- 
-3. Run icecite, general instructions at https://github.com/ckorzen/icecite 
-<code> 
-cd ../../ 
-cd icecite/pdf-cli 
-java -jar target/pdf-cli-*-jar-with-dependencies.jar [options] <input> [<output>] 
-</code> 
-Examples: 
- greenstone@bedrock:~/icecite/pdf-cli$ java -jar target/pdf-cli-0.0.1-SNAPSHOT-jar-with-dependencies.jar --format txt --feature words ~/Downloads/A9-access-best-practices.pdf ~/Desktop/iceciteconverted1.txt 
- 
- greenstone@bedrock:~/icecite/pdf-cli$ java -jar target/pdf-cli-0.0.1-SNAPSHOT-jar-with-dependencies.jar --format txt --feature lines ~/Downloads/A9-access-best-practices.pdf ~/Desktop/iceciteconverted2.txt 
- 
- greenstone@bedrock:~/icecite/pdf-cli$ java -jar target/pdf-cli-0.0.1-SNAPSHOT-jar-with-dependencies.jar --format txt --feature paragraphs ~/Downloads/A9-access-best-practices.pdf ~/Desktop/iceciteconverted3.txt 
- 
-(Also tried with input file pdf01.pdf from the Reports collection) 
- 
- 
-4. If you see the exception 
---- 
-Exception in thread "main" java.lang.NoClassDefFoundError: org/bouncycastle/jce/provider/BouncyCastleProvider 
- at org.apache.pdfbox.pdmodel.encryption.PDEncryption.<init>(PDEncryption.java:96) 
- at org.apache.pdfbox.pdfparser.PDFParser.prepareDecryption(PDFParser.java:282) 
- at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:199) 
- at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:249) 
- at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:847) 
- at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:803) 
- at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:757) 
- at parser.pdfbox.core.PdfStreamEngine.processFile(PdfStreamEngine.java:120) 
- at parser.pdfbox.PdfBoxParser.parse(PdfBoxParser.java:44) 
- at cli.PdfParserCommandLine.parse(PdfParserCommandLine.java:268) 
- at cli.PdfParserCommandLine.processFile(PdfParserCommandLine.java:247) 
- at cli.PdfParserCommandLine.process(PdfParserCommandLine.java:233) 
- at cli.PdfParserCommandLine.main(PdfParserCommandLine.java:168) 
-Caused by: java.lang.ClassNotFoundException: org.bouncycastle.jce.provider.BouncyCastleProvider 
- at java.net.URLClassLoader$1.run(URLClassLoader.java:372) 
- at java.net.URLClassLoader$1.run(URLClassLoader.java:361) 
- at java.security.AccessController.doPrivileged(Native Method) 
- at java.net.URLClassLoader.findClass(URLClassLoader.java:360) 
- at java.lang.ClassLoader.loadClass(ClassLoader.java:424) 
- at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) 
- at java.lang.ClassLoader.loadClass(ClassLoader.java:357) 
- ... 13 more 
- 
---- 
- 
-Then: 
-a. Obtain bouncycastle (encryption?) jar files from https://www.bouncycastle.org/latest_releases.html 
- 
-Download both jar files listed under the "Provider" column for row "JDK 1.5 - JDK 1.8" (not sure that both are necessary) and put them in icecite/pdf-cli folder (for example) 
- 
-b. Then see https://stackoverflow.com/questions/15930782/call-java-jar-myfile-jar-with-additional-classpath-option 
-for how to run a java programme when you have multiple jar files on classpath, as you can't run java with both -cp and -jar. 
- 
-greenstone@bedrock:~/icecite/pdf-cli$ java -classpath '.:/home/greenstone/icecite/pdf-cli/*:target/pdf-cli-0.0.1-SNAPSHOT-jar-with-dependencies.jar' cli.PdfParserCommandLine --format txt --feature words ~/Desktop/24.pdf ~/Desktop/24converted.txt 
---> 
en/tutorials.1507171325.txt.gz · Last modified: 2017/10/05 02:42 by anupama