When building collections, Greenstone processes each different format of source
document by seeking a “plugin” that can deal with that particular format.
Plugins are specified in the collection configuration file. Greenstone
generally uses the filename to determine document formats—for example,
foo.txt is processed as a text file,
foo.html as html, and
foo.doc as a Word file.
Plugins parse the imported documents and extract metadata from them. For example, the HTMLplugin converts html pages to the Greenstone Archive Format and extracts metadata which is explicit in the document format—such as titles, enclosed by <title></title> tags.
While all plugins process file,
- some group several files into one document,
- some split one file into several documents—also called 'exploding'] and
- some have a one to one mapping.
Greenstone includes a wide array of plugins, however, if you need to process document formats not handled by existing plugins; format documents in some special way; or extract a new kind of metadata, it is possible for you to develop new plugins.
Managing Plugins in the GLI
Plugins can be managed from the Document Plugins section of the Design panel. When you create a collection based on "New Collection", the Assigned Plugins list will by default include a list of the commonly used plugins (e.g. HTMLPlugin, WordPlugin, PDFPlugin). If your collection will not include any document types that are processed by these plugins, they can be removed (by selecting the plugin and clicking the Remove Plugin button). For instance, if there are no PDF's in your collection, you can remove the PDFPlugin. However, GreenstoneXMLPlugin is a special plugin that should not be removed, unless you are changing the archive format.
If you are in Expert mode, you will also see three plugins at the bottom of the list:
which can be configured, but cannot be removed.
Plugins are processed in the order they appear in the list. So, if a document can be processed by more than one plugin in the Assigned Plugins list, it will be processed by the first one.
Plugins on the commandline
To find more about any plugin, just type pluginfo.pl plugin-name at the command prompt. (You need to invoke the appropriate setup script first, if you haven't already, and on Windows you need to type perl —S pluginfo.pl plugin-name if your environment is not set up to associate files ending in .pl as Perl executables). This displays information about the plugin on the screen—what plugin-specific options it takes, and what general options are allowed.
Run the pluginfo.pl command on the plugin name after setting up your environment for Greenstone. For example:
perl -S pluginfo.pl PDFPlugin