METS in Greenstone

In Greenstone we use METS in a very specific way - as an alternative archive format to Greenstone Archive format. If the option '-saveas METS' is used with import.pl (and export.pl), then source documents will be converted to the Greenstone METS profile, which uses Dublin_Core as its metadata. This divides documents into sections, stores metadata at the section or document level and uses XML xpointer syntax to locate the content of the source documents stored in a temporary XML file. Then when building (indexing) the collection, the METS plugin is used to read in the METS files. It is therefore only designed to process METS documents that match the Greenstone METS profile.

If you want to see our METS format, you can import (or export) a collection and save as the "METS" format. You can try the command as follows:

import.pl (export.pl) -saveas METS collection_name

Then, in the archives (or export) directory in the collection, you will see two files: docmets.xml which stores metadata (at the document or section level) and associated file pointers; doctxt.xml which stores the content of the source document in an XML format.

The Greenstone METS profile has been officially approved by the Library of Congress and you can view the relevant document here.

To add a different kind of METS documents into a collection, you will need to convert them to either our Greenstone Archive format, or our METS Archive format. This can be done using XSLT. You could convert all the original METS documents into Greenstone METS, put them in the archives directory, and generate an archives.inf file, listing document ids and corresponding files. (import a small collection, e.g. demo, into METS format and have a look at the archives.inf file to see what its like). Then build the collection using buildcol.pl.

Alternatively, theoretically, the following should work, but I have not tried it.

Put the original METS documents in the import directory, write an XSLT to convert them to Greenstone METS format. Use METSPlug in your collection, set the process_exp to match the files you want processed, and set the xslt option to specify the xslt file that you created (relative to greenstone or collection directory). Then import and build as normal.