Document Types

Greenstone can process a wide array of document types. The table below provides information about document types, their file extensions, the Greenstone plugin(s) that can process these file types, and any additional information about the document type in relation to Greenstone.

Remember that every document in your collection must be processed by a plugin, or it will not be included in your collection. However, if Greenstone does not currently include a plugin that will process your document type, it is possible to create your own.

Document TypeExtensionPlugin(s)Notes
word.doc/.docxWordPlugin
Plain Text.txtTextPlugin
Rich Text.rtfRTFPlugin
Portable Document.pdfPDFPlugin
PowerPoint.ppt, .pptxPowerPointPlugin
Web pages.html, .php, .cgi,
.asp, etc.
HTMLPlugin
MediaWiki MediaWikiPlugin
Image.jpg,.gif,.jpeg,
.png,etc
ImagePlugin
video.avi,.mpeg, etc
MP3.mp3MP3Plugin
Scanned images with OCR
CDS/ISIS ISISPlugin
Comma-Separated Value.csvCSVPlugin,MetadataCSVPlugin
ContentDM ContentDMPlugin
DSpace DSpacePlugin
MARC.marcMARCPlugin
Compressed/Archived.gz,.tgz,.z,.taz, .bz,.bz2,.zip,.jar,.tarZIPPlugin
PMB ISISPlugin* PMB is a open source integrated library management software. It stands for "PhpMyBibli". It supports Unimarc format (not MARC 21). Greenstone doesn't support PMB files. However, you can use WINISIS as a bridge. You export records from PMB and import with WINISIS. Then you can reorganize MARC tags to convert from UNIMARC to MARC21 and integrate records with Greenstone using plugin available for CDS-ISIS.
Extensible Markup Language.xml,.xslHTMLPlugin*, or create ownXML files cannot be directly processed by Greenstone. the XML page provides options if you would like to include XML files in your collection
Refer format bibliographies ReferPlugin
*The file type cannot be processed by this plugin directly; the file must be modified in some way first.

Associating Documents

You may want to have multiple versions of the same document in your collection, or a supplemental file for another document. For example, perhaps you want to have both the Word and PDF version of a document available, but you want them to be connected, since they have the same content. You can do this by associating documents using either the associate_ext or the associate_tail_re options, which are available for all document plugins in Greenstone.

First, you must decide which document will be the primary document. This is the document that will actually be processed by a plugin. In the plugin that will process the primary document (e.g. WordPlugin for Word documents), you can set one of the associated file options: