The Open Office extension provides a document conversion facility if Open Office or LibreOffice is already installed on the system. In order to use the Open Office extension,
Tesseract is an Open Source OCR Engine.
The tesseract extension contains the tesseract program, plus Greenstone plugins to use it during build.
You can get the Tesseract extension in two ways:
cd greenstone3/gs2build/ext svn co https://svn.greenstone.org/gs2-extensions/tesseract/trunk/src tesseract cd tesseract ./CASCADE-MAKE.sh
Once installed (by either method), you will need to open a new terminal and source gs3-setup.sh to have the extensions environment variables set.
The tesseract extension comes with two plugins: TesseractTextExtractor and TesseractImagePlugin. TesseractTextExtractor is a helper plugin that will run Tesseract on an image, producing a text file. TesseractImagePlugin can replace ImagePlugin, adding Tesseract OCR ability to it.