This page is in the 'old' namespace, and was imported from our previous wiki. We recommend checking for more up-to-date information using the search box.
See this page.
(This applies to Grenstone 2.80 and earlier. Needs updating for 2.81)
"Default" means that the metadata fields will be automatically assigned (or extracted if possible), while the "Available fields" lists other items of metadata that the plugin may be able to assign based on any arguments given to that plugin in the collect.cfg file. All plugins are derived from BasPlug, and have following metadata fields:
Plugin name | Default fields | Available fields |
---|---|---|
BasPlug | Language, Encoding, Source | FirstNNNN, Keyphrases, Acronym |
In addition, many plugins have additional fields available:
Plugin name | Default fields | Available fields |
---|---|---|
BibTexPlug | Title, Creator, Abstract, Author, Booktitle, Chapter, Copyright, Date, Edition, Editor, EntryType Journal, Keywords, Month, Note, Number, Pages, Publisher, PublisherAddress, Volume, Year | |
DBPlug | (arbitrary metadata field names based on Database configuration file) | |
EMAILPlug | Date, DateText, From, FromAddr, FromName, Headers, Subject, Title (based on subject, from, and date), To | |
ExcelPlug | (all fields as in HTMLPlug) | |
HTMLPlug | Title, URL | Author, Creator, Email (others as found in the -metadata_fields option) |
ImagePlug | Image, ImageHeight, ImageSize, ImageType, ImageWidth, ScreenHeight, screenicon, ScreenSize, ScreenType, ScreenWidth, Source, srclink, srcicon, Thumb, ThumbHeight, ThumbType, ThumbWidth | |
IndexPlug | as in the index.txt file | (use metadata.xml files instead of using this plugin) |
MARCPlug | Creator, Description, MarcIdentifier, MarcSource, URL, Publisher, Relation, Rights, Subject, Title, Type | (Metadata fields as in the marctodc.txt file) |
OAIPlug | URL, (all metadata in .oai markup file) | |
PagedImgPlug | Image, ImageHeight, ImageSize, ImageType, ImageWidth, ScreenHeight, screenicon, ScreenSize, ScreenType, ScreenWidth, Source, srclink, srcicon, Thumb, ThumbHeight, ThumbType, ThumbWidth | |
PDFPlug | (all fields in HTMLPlug) | |
PPTPlug | (all fields in HTMLPlug) | |
PSPlug | Title | Date, Pages, (all fields in TextPlug) |
ReferPlug | Abstract, BookConfOnly, Booktitle, Copyright, Creator, Date, Editor, Keywords, Journal, JournalsOnly, Number, Pages, Publisher, Publisheraddr, Report, Title, Volume | |
RTFPlug | (all fields in HTMLPlug) | |
SRCPlug | Title, filename, includes, class, classdecl | |
TEXTPlug | Title | |
UnknownPlug | (as given in the -assoc_field plugin argument) | |
WordPlug | (all fields in HTMLPlug) |
See section two of the Developer's Guide for information about options to plugins, or run the pluginfo.pl command on the plugin name after setting up your environment for Greenstone. (For example, "perl -S pluginfo.pl BasPlug".)
In addition, every document can be manually assigned arbitrary metadata fields and values through use of metadata.xml files, as discussed in the manual.
The standard PDF Plugin can process PDF versions up to 1.4. To process later versions, you'll need to download the PDFBox extension. See 2.85 Release Notes.
Security settings can prevent Greenstone from processing the files. Check these in Acrobat Reader. Go to File Menu, Document Properties and once there, go to security tab.
PDF is a "page description language". This means that the document contains objects and commands such as "draw this text here" and "draw this image here".
Greenstone uses an external program called "pdftohtml" to extract text out of PDF files. Sometimes, there is no text that can be extracted. This often depends on how the PDF was created.
format SearchVList "<td valign=top>[srclink][srcicon][/srclink]</td><td>[srclink][Title][/srclink]</td>"
UnknownPlugin is a simple plugin for importing files in formats that Greenstone doesn't know anything about. A dummy document will be created for every such file, and the file itself will be passed to Greenstone as the "associated file" of the document.
Here's an example where it is useful: A collection has pictures and includes a couple of quicktime movie files with names like DCP_0163.MOV. Rather than write a new plugin for quicktime movies, add this line to the collection configuration file:
plugin UnknownPlugin -process_extension "MOV" -assoc_field "movie"
A document is created for each movie, with the associated movie file's name in the "movie" metadata field. In the collection's format strings, use the {If} macro to output different text for each type of file, like this:
{If}{[movie],<HTML for displaying movie>} {If}{[Image],<HTML for displaying image>}
You can also add extra metadata, such as the Title, Subject, and Duration, using the Librarian Interface (or with metadata.xml files).
The -process_extension
option tells UnknownPlugin which file extension it should look for. Alternatively, you can use the -process_exp
option which specifies a regular expression to match against entire filenames. You can have several UnknownPlugins specified for a collection, each processing a different kind of file.
The -assoc_field
option is the name of the metadata field that will hold the associated file's name. This can be used to test for these files. You can also specify the mime type of the files to be processed using the -mime_type
option. To display the original file, use [srclink][/srclink]
metadata.
Creating digital libraries based on CDS/ISIS databases ( En Español) is a detailed guide for using CDS/ISIS databases in Greenstone.
PMB is a open source integrated library management software. It stands for "PhpMyBibli". It supports Unimarc format (not MARC 21). Greenstone doesn't support PMB files. However, you can use WINISIS as a bridge. You export records from PMB and import with WINISIS. Than you can reorganize MARC tags to convert from UNIMARC to MARC21 and integrate records with Greenstone using plugin available for CDS-ISIS (see above).
There are two main options for getting XML files into Greenstone: using XSL or writing a customised plugin.
Outside of Greenstone, you can use XSL (or other procedure) to generate either HTML, which can be processed by HTMLPlug, or Greenstone Archive files. If you generate archive files, you will not need to run the import phase of collection building. You will also not be able to build the collection in the Librarian Interface. You can use the Librarian Interface to configure your collection, but you will need to build it on the command line. See here for information about command line building.
The other option is to write a new plugin to process your particular XML format. This plugin will inherit from XMLPlug. You need to implement the new method, as well as the XML parsing call back methods, such as xml_doctype, xml_start_tag, xml_end_tag, xml_text.
The plugin will parse the source XML file and build up a doc object in memory, which gets written out as an archive file. greenstone/perllib/plugins/GreenstoneArchivesPlugin.pm
is an example of a plugin that inherits from XMLPlug—you can use this as an example.
DatabasePlugin uses Perl's DBI module to getting records out of databases, such as mysql, postgresql, comma separated values (CSV), MS Excel, ODBC, sybase etc. You will need to have the DBI module installed, as well as the appropriate back end module(s).
Assuming you have got all the necessary modules installed, then the basic way to use DBPlug is:
Here is what I had to do to process a comma-separated-value file.
Here is what I had to do to use DBPlug to get records out of a Mysql database.
Here is what I had to do to process a excel file.
Here is what I had to do to use DBPlug to get records out of a MS Access database.
Please see the following tutorials:
The CONTENTdm is a commercial digital library (http://www.dimema.com/) that provides tools for organizing, managing and searching digital collections over the Internet.
Collections in CONTENTdm digital library can be exported in the RDF format. CONTENTdmPlugin is implemented to process the RDF file only. It identifies each <rdf:Description> element in the RDF file as a document and transformes it into the Greenstone archieve file. Meanwhile Metadata are collected. In CONTENTdmPlugin, XML::parser class has been modified, it can process both well-formed and not-well-formed RDF files. A warning message will be output if the RDF file is not well formed. The image files are taken care by the pagedImg plug which is the secondary plugin of CONTENTdmPlugin.
Four parameters are created in the CONTENTdm plugin:
(html(default)|text|pagedimg) Compulsory option
xslt file is applied on the RDF file to avoid some content
CONTENTdmPlugin only handles .rdf file by default
CONTENTdmPlugin blocks (jpg|jpeg|gif) files by default
For example:
plugin CONTENTdmPlugin -convert_to html -keep_original_filename
The MediaWikiPlugin processes the HTML pages, supresses unnecessary fragments such as tabs, toolbox, and edit links, and converts files into Greenstone's internal format.
MediaWikiPlug has eight parameters:
Here is an example that uses all the options:
plugin MediaWikiPlug -show_toc -delete_toc -toc_exp <table([^>]*)id=(\"|')toc(\"|')(.|\n)*?</table> -nav_div_exp <div([^>]*)id=(\"|')p-navigation(\"|')(.|\n)*?</div> -delete_nav -searchbox_div_exp <div([^>]*)id=(\"|')p-search(\"|')(.|\n)*?</div> -delete_searchbox -remove_title_suffix_exp \s-(.*)$