Plugins are written in the Perl language. They all derive from a basic plugin called BasePlugin, which performs universally-required operations like creating a new Greenstone archive document to work with, assigning an object identifier (OID), and handling the sections in a document. Plugins are kept in the perllib/plugins directory.
An outline of program flow when using import.pl
for developers writing their own plugins:
import.pl
calls the methods begin, read then end.metadata_read
method only gets called from RecPlugin. (and MetadataCSVPlugin)All plugins inherit from BasPlugin.
Most plugins call the BasePlugin read method, then do the format specific stuff using their own process method.
Plugins can implement either read or process (or both).
metadata.xml
files are read by the metadata_read
method.csv
text file with the first line containing field names is read by metadata_readAdding metadata
add_utf8_metadata
adds metadata that is already in utf8add_metadata converts
to utf8 before adding metadata that is not already in utf8collect///colname///perllib/plugins
, so any other collections can still use the standard ones.Thanks to Wendy Osborn for most of this text.
If you select a plugin and press Configure Plugin…, you will see the configuration options available for the plugin. You might notice that the options are split into sections. The options at the very top are specific to the plugin; the remaining options are inherited from other plugins.
If you are creating your own plugin, you can choose to have it inherit from other, similar plugins (which, in turn, likely inherit from additional plugins). Top-level plugins (including those that you select to process documents) all inherit from other plugins.
Document processing plugins are used by the collection-building software to parse each source document in a way that depends on its format. A collection's configuration file lists all plugins that are used when building it. During the import operation, each file or directory is passed to each plugin in turn until one is found that can process it—thus earlier plugins take priority over later ones. If no plugin can process the file, a warning is printed (to standard error) and processing passes to the next file. (This is where the block_exp option can be useful—to prevent these error messages for files that might be present but don't need processing.) During building, the same procedure is used, but the archives directory is processed instead of the import directory.
The standard Greenstone plugins are listed here. Recursion is necessary to traverse directory hierarchies. Although the import and build programs do not perform explicit recursion, some plugins cause indirect recursion by passing files or directory names into the plugin pipeline. For example, the standard way of recursing through a directory hierarchy is to specify RecPlugin, which does exactly this. If present, it should be the last element in the pipeline.
Some plugins are written for specific collections that have a document format not found elsewhere. These collection-specific plugins are found in the collection's perllib/plugins directory. Collection-specific plugins can be used to override general plugins with the same name.
Some document-processing plugins use external programs that parse specific proprietary formats—for example, Microsoft Word—into either plain text, images, or HTML. A general plugin called ConvertToPlugin invokes the appropriate conversion program and passes the result to either TEXTPlugin or HTMLPlugin. We describe this in more detail shortly.
Some plugins have individual options, which control what they do in finer detail than the general options allow. Select a plugin from the list of plugins to view a complete list of all of its available options.
Proprietary formats pose difficult problems for any digital library system. Although documentation may be available about how they work, they are subject to change without notice, and it is difficult to keep up with changes. Greenstone has adopted the policy of using GPL (Gnu Public License) conversion utilities written by people dedicated to the task. Utilities to convert Word and PDF formats are included in the packages directory. These all convert documents to either text or HTML. Then HTMLPlugin and TEXTPlugin are used to further convert them to the Greenstone archive format. ConvertToPlugin is used to include the conversion utilities. Like BasePlugin it is never called directly. Rather, plugins written for individual formats are derived from it: ConvertToPlugin uses Perl's dynamic inheritance scheme to inherit from either TEXTPlugin or HTMLPlugin, depending on the format to which a source document has been converted.
When ConvertToPlugin receives a document, it calls gsConvert.pl (found in Greenstone3/gs2build/bin/scripts
) to invoke the appropriate conversion utility. Once the document has been converted, it is returned to ConvertToPlugin, which invokes the text or HTML plugin as appropriate. Any plugin derived from ConvertToPlugin has an option convert_to, whose argument is either text or HTML, to specify which intermediate format is preferred. Text is faster, but HTML generally looks better, and includes pictures.
When ConvertToPlugin receives a document, it calls gsConvert.pl (found in GSDLHOME/bin/script
) to invoke the appropriate conversion utility. Once the document has been converted, it is returned to ConvertToPlugin, which invokes the text or html plugin as appropriate. Any plugin derived from ConvertToPlugin has an option convert_to, whose argument is either text or html, to specify which intermediate format is preferred. Text is faster, but html generally looks better, and includes pictures.
Sometimes there are several conversion utilities for a particular format, and gsConvert may try different ones on a given document. For example, the preferred Word conversion utility wvWare does not cope with anything less than Word 6, and a program called AnyToHTML, which essentially just extracts whatever text strings can be found, is called to convert Word 5 documents.
The steps involved in adding a new external document conversion utility are:
Greenstone incorporates plugins for many different file formats, listed on the Plugins page. But we are always looking for more! If there is a specific plugin you would like us to write on a contractual basis then contact us. Also, we welcome contributions of code to enable us to extend Greenstone. The following is a list of plugins we would like.
Documents/Office formats:
Video:
Audio:
Bibliographic:
Images:
Macintosh archives:
Others: