===== Explaining plugins ===== Plugins are written in the Perl language. They all derive from a basic plugin called //BasePlugin//, which performs universally-required operations like creating a new Greenstone archive document to work with, assigning an object identifier (OID), and handling the sections in a document. Plugins are kept in the //perllib/plugins// directory. An outline of program flow when using ''import.pl'' for developers writing their own plugins: * ''import.pl'' calls the methods begin, read then end. * This starts at the import directory. * RecPlugin handles directories, and will look through a directory to see what files are there. * The ''metadata_read'' method only gets called from RecPlugin. (and MetadataCSVPlugin) All plugins inherit from BasPlugin. * BasPlugin inplements the metadata_read and read methods. * BasPlugin read calls the process method. Most plugins call the BasePlugin read method, then do the format specific stuff using their own process method. * Some plugins override read. Plugins can implement either read or process (or both). ===== NOTE on methods- order called ===== * **metadata_read**: first to be called - usually by RecPlugin - but also by MetadataCSVPlugin * in RecPlugin Greenstone ''metadata.xml'' files are read by the ''metadata_read'' method * in MetadataCSVPlugin a ''.csv'' text file with the first line containing field names is read by metadata_read * **read**: called after metadata read * **process**: called last? Adding metadata * ''add_utf8_metadata'' adds metadata that is already in utf8 * ''add_metadata converts'' to utf8 before adding metadata that is not already in utf8 ===== Collections-Specific Plugins ===== * It's best to put modified plugins into **''collect///colname///perllib/plugins''** , so any other collections can still use the standard ones. * A collection specific plugin has to have the same name as an existing plugin if you are over-riding the system-wide version of the plugin. * The collection-specific one is used instead the system-wide one. * The collection-specific plugin appears in the GLI when you have that collection loaded. //Thanks to Wendy Osborn for most of this text.// ===== Inheritance ===== If you select a plugin and press **Configure Plugin...**, you will see the configuration options available for the plugin. You might notice that the options are split into sections. The options at the very top are specific to the plugin; the remaining options are //inherited// from other plugins. If you are creating your own plugin, you can choose to have it inherit from other, similar plugins (which, in turn, likely inherit from additional plugins). Top-level plugins (including those that you select [[#Document processing plugins|to process documents]]) all inherit from other plugins. ==== Document processing plugins ==== Document processing plugins are used by the collection-building software to parse each source document in a way that depends on its format. A collection's configuration file lists all plugins that are used when building it. During the import operation, each file or directory is passed to each plugin in turn until one is found that can process it—thus earlier plugins take priority over later ones. If no plugin can process the file, a warning is printed (to standard error) and processing passes to the next file. (This is where the //block_exp// option can be useful—to prevent these error messages for files that might be present but don't need processing.) During building, the same procedure is used, but the //archives// directory is processed instead of the //import// directory. The standard Greenstone plugins are [[en:plugin:index|listed here]]. Recursion is necessary to traverse directory hierarchies. Although the import and build programs do not perform explicit recursion, some plugins cause indirect recursion by passing files or directory names into the plugin pipeline. For example, the standard way of recursing through a directory hierarchy is to specify //RecPlugin//, which does exactly this. If present, it should be the last element in the pipeline. Some plugins are written for specific collections that have a document format not found elsewhere. These collection-specific plugins are found in the collection's //perllib/plugins// directory. Collection-specific plugins can be used to override general plugins with the same name. Some document-processing plugins use external programs that parse specific proprietary formats—for example, Microsoft Word—into either plain text, images, or HTML. A general plugin called //ConvertToPlugin// invokes the appropriate conversion program and passes the result to either //TEXTPlugin// or //HTMLPlugin//. We describe this in more detail shortly. Some plugins have individual options, which control what they do in finer detail than the general options allow. Select a plugin from the [[en:plugin:index|list of plugins]] to view a complete list of all of its available options. ==== Plugins to import proprietary formats ==== Proprietary formats pose difficult problems for any digital library system. Although documentation may be available about how they work, they are subject to change without notice, and it is difficult to keep up with changes. Greenstone has adopted the policy of using GPL (Gnu Public License) conversion utilities written by people dedicated to the task. Utilities to convert Word and PDF formats are included in the //packages// directory. These all convert documents to either text or HTML. Then //HTMLPlugin// and //TEXTPlugin// are used to further convert them to the Greenstone archive format. //ConvertToPlugin// is used to include the conversion utilities. Like //BasePlugin// it is never called directly. Rather, plugins written for individual formats are derived from it: //ConvertToPlugin// uses Perl's dynamic inheritance scheme to inherit from either //TEXTPlugin// or //HTMLPlugin//, depending on the format to which a source document has been converted. When //ConvertToPlugin// receives a document, it calls //gsConvert.pl// (found in ''Greenstone3/gs2build/bin/scripts'') to invoke the appropriate conversion utility. Once the document has been converted, it is returned to //ConvertToPlugin//, which invokes the text or HTML plugin as appropriate. Any plugin derived from //ConvertToPlugin// has an option //convert_to//, whose argument is either //text// or //HTML//, to specify which intermediate format is preferred. Text is faster, but HTML generally looks better, and includes pictures. When //ConvertToPlugin// receives a document, it calls //gsConvert.pl// (found in ''GSDLHOME/bin/script'') to invoke the appropriate conversion utility. Once the document has been converted, it is returned to //ConvertToPlugin//, which invokes the text or html plugin as appropriate. Any plugin derived from //ConvertToPlugin// has an option //convert_to//, whose argument is either //text// or //html//, to specify which intermediate format is preferred. Text is faster, but html generally looks better, and includes pictures. Sometimes there are several conversion utilities for a particular format, and //gsConvert// may try different ones on a given document. For example, the preferred Word conversion utility //wvWare// does not cope with anything less than Word 6, and a program called //AnyToHTML//, which essentially just extracts whatever text strings can be found, is called to convert Word 5 documents. The steps involved in adding a new external document conversion utility are: * Install the new conversion utility so that it is accessible by Greenstone (put it in the //packages// directory). * Alter //gsConvert.pl// to use the new conversion utility. This involves adding a new clause to the //if// statement in the //main// function, and adding a function that calls the conversion utility. * Write a top-level plugin that inherits from //ConvertToPlugin// to catch the format and pass it on. ===== Potential Plugins ===== Greenstone incorporates plugins for many different file formats, listed on the [[en:plugin:index|Plugins]] page. But we are always looking for more! If there is a specific plugin you would like us to write on a contractual basis then contact us. Also, we welcome contributions of code to enable us to extend Greenstone. The following is a list of plugins we would like. **Documents/Office formats:** * AbiWord * Gnumeric Spreadsheet * Kword (all Koffice formats) * OpenOffice file formats: * Writer (.sxw) * Calc (.sxd) * Impress (.sxi) * Draw (.sxd) * StarOffice formats (.sdc, .sdw etc) * Wordperfect **Video:** * MPEG * Quicktime (.mov) * AVI (Audio Video Interleave), Microsoft video **Audio:** * Windows Media Audio (.wma) * Windows audio (.wav) * Sun Audio (.au) * Audio Interchange File Format (.aiff) * MIDI (.mid) * MIDI karoke (.kar) * CD Audio (.cda) * Shorten (.shn) **Bibliographic:** * Endnote **Images:** * DjVu (.djvu) * Photoshop (.psd) * PaintShopPro (.psp) **Macintosh archives:** * .hqx Mac archive * .sit * Self extracting Archive (.sea) **Others:** * Scalable Graphics Format (.svg) * Synchronized Multimedia Integration Language SMIL (.smil) * Macromedia Flash (.fla) * Macromedia shockwave (.swf) * OpenGL * VRML/X3D * TrueType fonts (TTF)