User Tools

Site Tools


en:user_advanced:greenstone_extensions

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:user_advanced:greenstone_extensions [2025/07/25 02:47] kjdonen:user_advanced:greenstone_extensions [2025/07/25 02:55] (current) – [Tesseract] kjdon
Line 20: Line 20:
  
 ===== Tesseract ===== ===== Tesseract =====
 +
 +Tesseract is an Open Source OCR Engine.
 +
 +The tesseract extension contains the tesseract program, plus Greenstone plugins to use it during build.
 +
 +You can get the Tesseract extension in two ways:
 +
 +  - Download the tar or zip file from [[https://trac.greenstone.org/browser/gs2-extensions/tesseract/trunk]], and place it in //ext// (for Greenstone 2) or //gs2build/ext// (for Greenstone 3). Extract all files.
 +
 +  - Checkout the source and compile it:
 +<code> 
 +  cd greenstone3/gs2build/ext
 +  svn co https://svn.greenstone.org/gs2-extensions/tesseract/trunk/src tesseract
 +  cd tesseract
 +  ./CASCADE-MAKE.sh
 +</code>
 +
 +Once installed (by either method), you will need to open a new terminal and source gs3-setup.sh to have the extensions environment variables set.
 +
 +
 +==== TesseractPlugins====
 +
 +The tesseract extension comes with two plugins: TesseractTextExtractor and TesseractImagePlugin.
 +TesseractTextExtractor is a helper plugin that will run Tesseract on an image, producing a text file.
 +TesseractImagePlugin can replace ImagePlugin, adding Tesseract OCR ability to it.
en/user_advanced/greenstone_extensions.1753411672.txt.gz · Last modified: 2025/07/25 02:47 by kjdon