Document Types

Greenstone can process a wide array of document types. The table below provides information about document types, their file extensions, the Greenstone plugin(s) that can process these file types, and any additional information about the document type in relation to Greenstone.

Remember that every document in your collection must be processed by a plugin, or it will not be included in your collection. However, if Greenstone does not currently include a plugin that will process your document type, it is possible to create your own.

Document TypeExtensionPlugin(s)Notes
word.doc/.docxWordPlugin
Plain Text.txtTextPlugin
Rich Text.rtfRTFPlugin
Portable Document.pdfPDFPlugin
PowerPoint.ppt, .pptxPowerPointPlugin
Web pages.html, .php, .cgi,
.asp, etc.
HTMLPlugin
MediaWiki MediaWikiPlugin
Image.jpg,.gif,.jpeg,
.png,etc
ImagePlugin
video.avi,.mpeg, etc
MP3.mp3MP3Plugin
Scanned images with OCR
CDS/ISIS ISISPlugin
Comma-Separated Value.csvCSVPlugin,MetadataCSVPlugin
ContentDM ContentDMPlugin
DSpace DSpacePlugin
MARC.marcMARCPlugin
Compressed/Archived.gz,.tgz,.z,.taz, .bz,.bz2,.zip,.jar,.tarZIPPlugin
PMB ISISPlugin* PMB is a open source integrated library management software. It stands for "PhpMyBibli". It supports Unimarc format (not MARC 21). Greenstone doesn't support PMB files. However, you can use WINISIS as a bridge. You export records from PMB and import with WINISIS. Then you can reorganize MARC tags to convert from UNIMARC to MARC21 and integrate records with Greenstone using plugin available for CDS-ISIS.
Extensible Markup Language.xml,.xslHTMLPlugin*, or create ownXML files cannot be directly processed by Greenstone. the XML page provides options if you would like to include XML files in your collection
Refer format bibliographies ReferPlugin
*The file type cannot be processed by this plugin directly; the file must be modified in some way first.

Associating Documents

You may want to have multiple versions of the same document in your collection, or a supplemental file for another document. For example, perhaps you want to have both the Word and PDF version of a document available, but you want them to be connected, since they have the same content. You can do this by associating documents using either the associate_ext or the associate_tail_re options, which are available for all document plugins in Greenstone.

First, you must decide which document will be the primary document. This is the document that will actually be processed by a plugin. In the plugin that will process the primary document (e.g. WordPlugin for Word documents), you can set one of the associated file options:

  • associate_ext: This option takes the file extension for the associated documents (e.g. pdf). More than one filename extension can be provided as part of this option, separated by a comma. For example, setting the value of the associate_ext in TextPlugin to avi,png would allow both an AVI video file (say an oral history interview) and a PNG image (say a picture of the interviewee taken at the time of the recording) to bind to a text version of the document (say representing a transcript of the interview). Both AVI and PNG versions of the file can be present at the same time, or alternatively only one of the two file types need be present, or neither, and Greenstone will process the situation accordingly. This option requires that the associated document(s) have the exact same name as the primary document except for the file extension.
  • associate_tail_re: This is a more general option, which is able to group files together that share a similar filename root, but might start to differ in characters before the filename extension. For instance, the Word version of the document might be my-article.doc but the PDF version might be my-article-ver13.pdf. Using associate_tail_re, such differences can be surmounted, and the two files still processed automatically as different versions of the same document.