Table of Contents

Greenstone Extensions

OpenOffice

The Open Office extension provides a document conversion facility if Open Office or LibreOffice is already installed on the system. In order to use the Open Office extension,

User-contributed notes

Tesseract

Tesseract is an Open Source OCR Engine.

The tesseract extension contains the tesseract program, plus Greenstone plugins to use it during build.

You can get the Tesseract extension in two ways:

  1. Download the tar or zip file from https://trac.greenstone.org/browser/gs2-extensions/tesseract/trunk, and place it in ext (for Greenstone 2) or gs2build/ext (for Greenstone 3). Extract all files.
  1. Checkout the source and compile it:
 
  cd greenstone3/gs2build/ext
  svn co https://svn.greenstone.org/gs2-extensions/tesseract/trunk/src tesseract
  cd tesseract
  ./CASCADE-MAKE.sh

Once installed (by either method), you will need to open a new terminal and source gs3-setup.sh to have the extensions environment variables set.

TesseractPlugins

The tesseract extension comes with two plugins: TesseractTextExtractor and TesseractImagePlugin. TesseractTextExtractor is a helper plugin that will run Tesseract on an image, producing a text file. TesseractImagePlugin can replace ImagePlugin, adding Tesseract OCR ability to it.