en:user_advanced:greenstone_extensions
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| en:user_advanced:greenstone_extensions [2025/07/25 02:47] – kjdon | en:user_advanced:greenstone_extensions [2025/07/25 02:55] (current) – [Tesseract] kjdon | ||
|---|---|---|---|
| Line 20: | Line 20: | ||
| ===== Tesseract ===== | ===== Tesseract ===== | ||
| + | |||
| + | Tesseract is an Open Source OCR Engine. | ||
| + | |||
| + | The tesseract extension contains the tesseract program, plus Greenstone plugins to use it during build. | ||
| + | |||
| + | You can get the Tesseract extension in two ways: | ||
| + | |||
| + | - Download the tar or zip file from [[https:// | ||
| + | |||
| + | - Checkout the source and compile it: | ||
| + | < | ||
| + | cd greenstone3/ | ||
| + | svn co https:// | ||
| + | cd tesseract | ||
| + | ./ | ||
| + | </ | ||
| + | |||
| + | Once installed (by either method), you will need to open a new terminal and source gs3-setup.sh to have the extensions environment variables set. | ||
| + | |||
| + | |||
| + | ==== TesseractPlugins==== | ||
| + | |||
| + | The tesseract extension comes with two plugins: TesseractTextExtractor and TesseractImagePlugin. | ||
| + | TesseractTextExtractor is a helper plugin that will run Tesseract on an image, producing a text file. | ||
| + | TesseractImagePlugin can replace ImagePlugin, | ||
en/user_advanced/greenstone_extensions.1753411672.txt.gz · Last modified: 2025/07/25 02:47 by kjdon
