This page is in the 'old' namespace, and was imported from our previous wiki. We recommend checking for more up-to-date information using the search box.

Plugins

[2.81 and later]

Greenstone incorporates plugins for many different file formats, listed below. Plugin names changed for the 2.81 release, and the previous name is also shown. File formats for potential plugins are shown on the PotentialPlugins page.

Top level File plugins

Plugin nameOld nameDescription
BibTexPlugin(BibTexPlug)Plugin that imports BibTex files. Inherits from SplitTextFile.
BookPlugin (BookPlug) Plugin that imports Humanity Library collection files. A simplification of HBPlugin. Inherits from AutoExtractMetadata.
CONTENTdmPlugin (CONTENTdmPlug) Plugin that imports RDF files in exported CONTENTdm collections. Inherits from ConvertBinaryFile, ReadXMLFile.
ConvertToRogPlugin (ConvertToRogPlug) ?? Inherits from RogPlugin.
CSVPlugin (CSVPlug)Plugin that imports files in comma-separated value format. A new document will be created for each line of the file. Inherits from SplitTextFile.
DatabasePlugin (DBPlug)Plugin that extracts records from databases (requires additional Perl setup). Inherits from AutoExtractMetadata.
DSpacePlugin (DSpacePlug)Plugin that imports DSpace archive format. Inherits from BasePlugin.
EmailPlugin (EMAILPlug)Plugin that imports saved email files (not MS OutLook format though). Inherits from SplitTextFile.
ExcelPlugin (ExcelPlug)Plugin that imports Microsoft Excel files. Inherits from ConvertBinaryFile.
FavouritesPlugin (FavouritesPlug)Plugin that imports Internet Explorer Favourites files. Inherits from ReadTextFile.
FOXPlugin (FOXPlug)Plugin that imports FOX database files. Inherits from BasePlugin.
HBPlugin (HBPlug)Plugin that imports an HTML book directory. Used by Humanity Library collection. Inherits from BasePlugin.
HTMLPlugin (HTMLPlug)Plugin that imports HTML files. Inherits from ReadTextFile, HBPlugin.
HTMLImagePlugin (W3ImgPlug)Plugin that imports HTML files, creating a Greenstone document for each image in the web page. Inherits from HTMLPlugin.
ImagePlugin (ImagePlug)Plugin that imports JPEG, GIF etc see http://www.imagemagick.org/www/formats.html. Inherits from BasePlugin, ImageConverter.
IndexPlugin (IndexPlug)Plugin that processes an index.txt file, which lists all files to be included in the collection, plus additional metadata for those documents. Inherits from BasePlugin.
ISISPlugin (ISISPlug)Plugin that imports CDS/ISIS database files. Inherits from SplitTextFile.
LaTeXPlugin (LaTeXPlug)Plugin that imports LaTeX files. Inherits from ReadTextFile.
LOMPlugin (LOMPlug)Plugin that imports LOM (Learning Object Metadata) files. Inherits from ReadTextFile.
MARCPlugin (MARCPlug)Plugin that imports MARC metadata. Inherits from SplitTextFile.
MARCXMLPlugin (MARCXMLPlug) Plugin that imports MARC metadata in XML format. Inherits from ReadXMLFile, ReadTextFile.
MediaWikiPlugin (MediaWikiPlug)Plugin that imports MediaWiki web pages. Inherits from HTMLPlugin.
MetadataCSVPlugin (MetadataCSVPlug) Plugin that imports metadata in CSV (comma separated value) format. The Filename field in the CSV file is used to determine which document the metadata belongs to. Inherits from BasePlugin.
MP3Plugin (MP3Plug)Plugin that imports MP3 audio files. Inherits from BasePlugin.
NulPlugin (NULPlug)Plugin that imports dummy files (.nul). These may generated when bibliographic databases are 'exploded'. Inherits from BasePlugin.
OAIPlugin (OAIPlug)Plugin that imports Open Archives Initiatives (OAI) data. Inherits from ReadXMLFile, ReadTextFile.
OggVorbisPlugin (OggVorbisPlug)
OpenDocumentPlugin (OpenDocumentPlug)Plugin that imports OASIS OpenDocument format documents (used by OpenOffice 2.0). Inherits from ReadXMLFile.
PagedImagePlugin (PagedImgPlug) Plugin that imports sequences of image files (formats as for ImagePlug), with optional associated plain text. Each document requires an item file listing the image/text files that make up the document. Inherits from ReadXMLFile, ReadTextFile, ImageConverter.
PDFPlugin(PDFPlug)Plugin that imports PDF files. Inherits from ConvertBinaryFile.
PostScriptPlugin(PSPlug)Plugin that imports Postscript files. Inherits from ConvertBinaryFile.
PowerPointPlugin(PPTPlug)Plugin that imports Microsoft Powerpoint files. Inherits from ConvertBinaryFile.
ProCitePlugin(ProCitePlug)Plugin that imports ProCite files. Inherits from SplitTextFile.
RealMediaPlugin(RealMediaPlug)Plugin that imports RealMedia files. Inherits from BasePlugin.
ReferPlugin(ReferPlug)Plugin that imports Refer files. Inherits from SplitTExtFile.
RogPlugin(RogPlug) Plugin that imports .rog or .mdb files. Inherits from BasePlugin.
RTFPlugin (RTFPlug) Plugin that imports RTF files. Inherits from ConvertBinaryFile.
SourceCodePlugin(SRCPlug) Plugin that imports source code (C/C++, Perl, Shell). Inherits from ReadTextFile.
StructuredHTMLPluin(StructuredHTMLPlug)Plugin that imports structured HTML documents, splitting them into sections based on style information. Inherits from HTMLPlugin.
TextPlugin(TEXTPlug) Plugin that imports plain text files. Inherits from ReadTextFile.
UnknownPlugin(UnknownPlug) Plugin that imports files with a user-specified file extension. No processing is done on the file. Instead a fictional document is created and the file is attached to that document. Used to import files that Greenstone can't otherwise handle. Inherits from BasePlugin.
WordPlugin(WordPlug)Plugin that imports Microsoft Word documents. Inherits from ConvertBinaryFile.
ZIPPlugin (ZIPPlug)Plugin that unpacks compressed or archive file formats and sends content down plugin pipeline. Handled formats include gzip (.gz, .z, .tgz, .taz), bzip (.bz), bzip2 (.bz2), zip (.zip, .jar) and tar (.tar). Relies on the appropriate utility being present: gunzip, bunzip, bunzip2, unzip, tar. Inherits from BasePlugin.

Top level Special Plugins

Plugin name Old nameDescription
DirectoryPlugin (RecPlug) Processes directories: recurses through a directory, passing each file it finds to the plugin pipeline. Used during collection importing and building. Inherits from PrintInfo.
MetadataXMLPlugin (MetadataXMLPlug) Processes metadata.xml files which are generated by GLI. Used during collection importing. Inherits from BasePlugin.
ArchivesInfPlugin (ArcPlug) Processes the archives.inf file generated during importing. Used during collection building only. Inherits from PrintInfo.
GreenstoneXMLPlugin (GAPlug) Processes the Greenstone archive documents. Used during collection building only. Inherits from ReadXMLFile.
GreenstoneMETSPlugin (METSPlug) Processes Greenstone archive documents in METS form. Used during collection building only. Inherits from ReadXMLFile.

Base Plugins

Plugin nameOld nameDescription
PrintInfo (was part of BasPlug) Base class for all plugins and helper plugins. Contains code for generating the output for pluginfo.pl, and for parsing the plugin arguments.
BasePlugin (was part of BasPlug) Base class for all standard document plugins. Contains code for file blocking, handling filename encoding, associating related files, and assigning doc identifiers. Inherits from PrintInfo.
AutoExtractMetadata (was part of BasPlug) Base class for plugins that processes documents with text. Uses all the helper plugins to add extra functionality to BasePlugin, such as automatic metadata extraction. Inherits from BasePlugin and all helper plugins.
ReadTextFile (BasPlug) Base class for plugins that process plain textual files. Contains code for reading in the file and working out the language and encoding. Inherits from AutoExtractMetadata.
ReadXMLFile (XMLPlug) Base class for plugins that process XML files. Contains code for generating and running an XML parser. Inherits from BasePlugin.
ConvertBinaryFile (ConvertToPlug) Base class for plugins that process binary files which are converted to text/html/images by running gsConvert.pl. Contains code for calling gsConvert.pl, setting up the secondary plugins which will process the converted file, and passing the file to those plugins. Inherits from AutoExtractMetadata.
SplitTextFile (SplitPlug) Base class for plugins that process files containg many records. Contains code that splits up the text into segments, which then get processed by the top-level plugin. Inherits from ReadTextFile.

Helper Plugins

Plugin name Old nameDescription
BaseMediaConverter (was part of ImagePlug) Helper plugin that provides base functionality such as file caching for media conversion. Inherits from PrintInfo.
ImageConverter (was part of ImagePlug) Helper plugin that converts images using ImageMagick. Inherits from BaseMediaConverter.
Acronym (was part of BasPlug) Helper plugin that locates and marks up acronyms in text. Inherits from PrintInfo.
Date (was part of BasPlug) Helper plugin that extracts historical date information from text. Inherits from PrintInfo.
EmailAddress (was part of BasPlug) Helper plugin that extracts email addresses from text. Inherits from PrintInfo.
GIS (GISBasPlug) Helper plugin that extracts placenames from text. Requires GIS extension to Greenstone. Inherits from PrintInfo.
Keyphrase (was part of BasPlug) Helper plugin that generates keyphrases from text. Uses Kea keyphrase extraction system. Inherits from PrintInfo.