en:user_advanced:greenstone_extensions
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
en:user_advanced:greenstone_extensions [2025/07/25 02:47] – kjdon | en:user_advanced:greenstone_extensions [2025/07/25 02:55] (current) – [Tesseract] kjdon | ||
---|---|---|---|
Line 20: | Line 20: | ||
===== Tesseract ===== | ===== Tesseract ===== | ||
+ | |||
+ | Tesseract is an Open Source OCR Engine. | ||
+ | |||
+ | The tesseract extension contains the tesseract program, plus Greenstone plugins to use it during build. | ||
+ | |||
+ | You can get the Tesseract extension in two ways: | ||
+ | |||
+ | - Download the tar or zip file from [[https:// | ||
+ | |||
+ | - Checkout the source and compile it: | ||
+ | < | ||
+ | cd greenstone3/ | ||
+ | svn co https:// | ||
+ | cd tesseract | ||
+ | ./ | ||
+ | </ | ||
+ | |||
+ | Once installed (by either method), you will need to open a new terminal and source gs3-setup.sh to have the extensions environment variables set. | ||
+ | |||
+ | |||
+ | ==== TesseractPlugins==== | ||
+ | |||
+ | The tesseract extension comes with two plugins: TesseractTextExtractor and TesseractImagePlugin. | ||
+ | TesseractTextExtractor is a helper plugin that will run Tesseract on an image, producing a text file. | ||
+ | TesseractImagePlugin can replace ImagePlugin, |
en/user_advanced/greenstone_extensions.1753411672.txt.gz · Last modified: 2025/07/25 02:47 by kjdon