This version (2014/04/14 11:52) is a draft.
Approvals: 0/1

List of Plugins

Greenstone incorporates plugins for many different file formats, listed below. The tables include the current plugin names, as well as names prior to the 2.81 Greenstone release and a short description of the plugin.

Each plugin has different metadata fields available for it. "Default" metadata fields will be automatically assigned (or extracted if possible), while the "Available fields" are other items of metadata that the plugin may be able to assign based on any arguments given to that plugin in the collect.cfg file.

All top level file plugins are derived from BasePlugin, and have following metadata fields:

Plugin nameDefault fieldsAvailable fields
BasePluginLanguage, Encoding, SourceFirstNNNN, Keyphrases, Acronym

Top level plugins

File plugins

Plugin name (old name)DescriptionDefault fieldsAvailable Fields
BibTexPlugin (BibTexPlug)Plugin that imports BibTex files. Inherits from SplitTextFile.Title, Creator, Abstract, Author, Booktitle, Chapter, Copyright, Date, Edition, Editor, EntryType Journal, Keywords, Month, Note, Number, Pages, Publisher, PublisherAddress, Volume, Year
BookPlugin (BookPlug) Plugin that imports Humanity Library collection files. A simplification of HBPlugin. Inherits from AutoExtractMetadata.
CONTENTdmPlugin (CONTENTdmPlug) Plugin that imports RDF files in exported CONTENTdm collections. Inherits from ConvertBinaryFile, ReadXMLFile.
ConvertToRogPlugin (ConvertToRogPlug) ?? Inherits from RogPlugin.
CSVPlugin (CSVPlug)Plugin that imports files in comma-separated value format. A new document will be created for each line of the file. Inherits from SplitTextFile.
DatabasePlugin (DBPlug)Plugin that extracts records from databases (requires additional Perl setup). Inherits from AutoExtractMetadata. (arbitrary metadata field names based on Database configuration file)
DSpacePlugin (DSpacePlug)Plugin that imports DSpace archive format. Inherits from BasePlugin.
EmailPlugin (EMAILPlug)Plugin that imports saved email files (not MS OutLook format though). Inherits from SplitTextFile.Date, DateText, From, FromAddr, FromName, Headers, Subject, Title (based on subject, from, and date), To
ExcelPlugin (ExcelPlug)Plugin that imports Microsoft Excel files. Inherits from ConvertBinaryFile. (all fields as in HTMLPlug)
FavouritesPlugin (FavouritesPlug)Plugin that imports Internet Explorer Favourites files. Inherits from ReadTextFile.
FOXPlugin (FOXPlug)Plugin that imports FOX database files. Inherits from BasePlugin.
HBPlugin (HBPlug)Plugin that imports an HTML book directory. Used by Humanity Library collection. Inherits from BasePlugin.
HTMLPlugin (HTMLPlug)Plugin that imports HTML files. Inherits from ReadTextFile, HBPlugin.Title, URLAuthor, Creator, Email (others as found in the -metadata_fields option)
HTMLImagePlugin (W3ImgPlug)Plugin that imports HTML files, creating a Greenstone document for each image in the web page. Inherits from HTMLPlugin.
ImagePlugin (ImagePlug)Plugin that imports JPEG, GIF etc see Inherits from BasePlugin, ImageConverter.Image, ImageHeight, ImageSize, ImageType, ImageWidth, ScreenHeight, screenicon, ScreenSize, ScreenType, ScreenWidth, Source, srclink, srcicon, Thumb, ThumbHeight, ThumbType, ThumbWidth
IndexPlugin (IndexPlug)Plugin that processes an index.txt file, which lists all files to be included in the collection, plus additional metadata for those documents. Inherits from in the index.txt file(use metadata.xml files instead of using this plugin)
ISISPlugin (ISISPlug)Plugin that imports CDS/ISIS database files. Inherits from SplitTextFile.
LaTeXPlugin (LaTeXPlug)Plugin that imports LaTeX files. Inherits from ReadTextFile.
LOMPlugin (LOMPlug)Plugin that imports LOM (Learning Object Metadata) files. Inherits from ReadTextFile.
MARCPlugin (MARCPlug)Plugin that imports MARC metadata. Inherits from SplitTextFile.Creator, Description, MarcIdentifier, MarcSource, URL, Publisher, Relation, Rights, Subject, Title, Type(Metadata fields as in the marctodc.txt file)
MARCXMLPlugin (MARCXMLPlug) Plugin that imports MARC metadata in XML format. Inherits from ReadXMLFile, ReadTextFile.
MediaInfoOGVPluginPlugin for importing OGV movie files. Requires Mediainfo ( to be installed to extract metadata.
MediaWikiPlugin (MediaWikiPlug)Plugin that imports MediaWiki web pages. Inherits from HTMLPlugin.
MetadataCSVPlugin (MetadataCSVPlug) Plugin that imports metadata in CSV (comma separated value) format. The Filename field in the CSV file is used to determine which document the metadata belongs to. Inherits from BasePlugin.
MP3Plugin (MP3Plug)Plugin that imports MP3 audio files. Inherits from BasePlugin.
NulPlugin (NULPlug)Plugin that imports dummy files (.nul). These may generated when bibliographic databases are 'exploded'. Inherits from BasePlugin.
OAIPlugin (OAIPlug)Plugin that imports Open Archives Initiatives (OAI) data. Inherits from ReadXMLFile, ReadTextFile.URL, (all metadata in .oai markup file)
OggVorbisPlugin(OggVorbisPlug)Plugin that imports Ogg Vorbis Files. Inherits from BasePlugin.
OpenDocumentPlugin (OpenDocumentPlug)Plugin that imports OASIS OpenDocument format documents (used by OpenOffice 2.0). Inherits from ReadXMLFile.
PagedImagePlugin (PagedImgPlug) Plugin that imports sequences of image files (formats as for ImagePlug), with optional associated plain text. Each document requires an item file listing the image/text files that make up the document. Inherits from ReadXMLFile, ReadTextFile, ImageConverter.Image, ImageHeight, ImageSize, ImageType, ImageWidth, ScreenHeight, screenicon, ScreenSize, ScreenType, ScreenWidth, Source, srclink, srcicon, Thumb, ThumbHeight, ThumbType, ThumbWidth
PDFPlugin (PDFPlug)Plugin that imports PDF files. Inherits from ConvertBinaryFile. (all fields in HTMLPlug)
PostScriptPlugin (PSPlug)Plugin that imports Postscript files. Inherits from ConvertBinaryFile.Title Date, Pages, (all fields in TextPlug)
PowerPointPlugin (PPTPlug)Plugin that imports Microsoft Powerpoint files. Inherits from ConvertBinaryFile. (all fields in HTMLPlug)
ProCitePlugin (ProCitePlug)Plugin that imports ProCite files. Inherits from SplitTextFile.
RealMediaPlugin (RealMediaPlug)Plugin that imports RealMedia files. Inherits from BasePlugin.
ReferPlugin (ReferPlug)Plugin that imports Refer files. Inherits from SplitTExtFile.Abstract, BookConfOnly, Booktitle, Copyright, Creator, Date, Editor, Keywords, Journal, JournalsOnly, Number, Pages, Publisher, Publisheraddr, Report, Title, Volume
RogPlugin (RogPlug) Plugin that imports .rog or .mdb files. Inherits from BasePlugin.
RTFPlugin (RTFPlug) Plugin that imports RTF files. Inherits from ConvertBinaryFile. (all fields in HTMLPlug)
SourceCodePlugin (SRCPlug) Plugin that imports source code (C/C++, Perl, Shell). Inherits from ReadTextFile. Title, filename, includes, class, classdecl
StructuredHTMLPlugin (StructuredHTMLPlug)Plugin that imports structured HTML documents, splitting them into sections based on style information. Inherits from HTMLPlugin.
TextPlugin (TEXTPlug) Plugin that imports plain text files. Inherits from ReadTextFile. Title
UnknownPlugin(UnknownPlug) Plugin that imports files with a user-specified file extension. No processing is done on the file. Instead a fictional document is created and the file is attached to that document. Used to import files that Greenstone can't otherwise handle. Inherits from BasePlugin.(as given in the -assoc_field plugin argument)
WordPlugin (WordPlug)Plugin that imports Microsoft Word documents. Inherits from ConvertBinaryFile. (all fields in HTMLPlug)
ZIPPlugin (ZIPPlug)Plugin that unpacks compressed or archive file formats and sends content down plugin pipeline. Handled formats include gzip (.gz, .z, .tgz, .taz), bzip (.bz), bzip2 (.bz2), zip (.zip, .jar) and tar (.tar). Relies on the appropriate utility being present: gunzip, bunzip, bunzip2, unzip, tar. Inherits from BasePlugin.

Special plugins

Plugin nameDescription
DirectoryPlugin (RecPlug) Processes directories: recurses through a directory, passing each file it finds to the plugin pipeline. Used during collection importing and building. Inherits from PrintInfo.
MetadataXMLPlugin (MetadataXMLPlug) Processes metadata.xml files which are generated by GLI. Used during collection importing. Inherits from BasePlugin.
OAIMetadataXMLPluginVersion of MetadataXMLPlugin that processes metadata.xml files. Additionally, it uses the "dc.Identifier" field and extracts OAI metadata from the specified OAI server (-oai_server_http_path)
ArchivesInfPlugin (ArcPlug) Processes the archives.inf file generated during importing. Used during collection building only. Inherits from PrintInfo.
GreenstoneXMLPlugin (GAPlug) Processes the Greenstone archive documents. Used during collection building only. Inherits from ReadXMLFile.
GreenstoneMETSPlugin (METSPlug) Processes Greenstone archive documents in METS form. Used during collection building only. Inherits from ReadXMLFile.

Base Plugins

Plugin nameDescription
PrintInfo (was part of BasPlug) Base class for all plugins and helper plugins. Contains code for generating the output for, and for parsing the plugin arguments.
BasePlugin (was part of BasPlug) Base class for all standard document plugins. Contains code for file blocking, handling filename encoding, associating related files, and assigning doc identifiers. Inherits from PrintInfo.
AutoExtractMetadata (was part of BasPlug) Base class for plugins that processes documents with text. Uses all the helper plugins to add extra functionality to BasePlugin, such as automatic metadata extraction. Inherits from BasePlugin and all helper plugins.
ReadTextFile (BasPlug) Base class for plugins that process plain textual files. Contains code for reading in the file and working out the language and encoding. Inherits from AutoExtractMetadata.
ReadXMLFile (XMLPlug) Base class for plugins that process XML files. Contains code for generating and running an XML parser. Inherits from BasePlugin.
ConvertBinaryFile (ConvertToPlug) Base class for plugins that process binary files which are converted to text/html/images by running Contains code for calling, setting up the secondary plugins which will process the converted file, and passing the file to those plugins. Inherits from AutoExtractMetadata.
SplitTextFile (SplitPlug) Base class for plugins that process files containg many records. Contains code that splits up the text into segments, which then get processed by the top-level plugin. Inherits from ReadTextFile.
MetadataPassOn-the-side base class to BasePlugin that supports metadata plugins utilise metadata_read pass of

Helper Plugins

Plugin nameDescription
BaseMediaConverter (was part of ImagePlug) Helper plugin that provides base functionality such as file caching for media conversion. Inherits from PrintInfo.
ImageConverter (was part of ImagePlug) Helper plugin that converts images using ImageMagick. Inherits from BaseMediaConverter.
AcronymExtractor (was part of BasPlug) Helper plugin that locates and marks up acronyms in text. Inherits from PrintInfo.
DateExtractor (was part of BasPlug) Helper plugin that extracts historical date information from text. Inherits from PrintInfo.
EmailAddressExtractor (was part of BasPlug) Helper plugin that extracts email addresses from text. Inherits from PrintInfo.
GISExtractor (GISBasPlug) Helper plugin that extracts placenames from text. Requires GIS extension to Greenstone. Inherits from PrintInfo.
KeyphraseExtractor (was part of BasPlug) Helper plugin that generates keyphrases from text. Uses Kea keyphrase extraction system. Inherits from PrintInfo.
AutoLoadConvertersHelper plugin that dynamically loads up extension converter plugins.

Extension Plugins

Extension plugins are those that do not come with Greenstone by default and must be downloaded separately. To use these plugins, download the compressed this

Plugin nameDescriptionLink
PDFBoxConverterConverter plugin that runs PDFBox, which can convert PDF files of versions greater than 1.4.Download here
OpenOfficeConverterConverter plugin that runs Open Office to convert various source documents to HTML.Download Open Office extension here
OpenOfficePluginTop level plugin that uses Open Office to convert various types of documents.