en:user_advanced:archive_formats
no way to compare when less than two revisions
Differences
This shows you the differences between two versions of the page.
— | en:user_advanced:archive_formats [2023/03/13 01:46] (current) – created - external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | |||
+ | |||
+ | ====== Greenstone Archive Formats ====== | ||
+ | During the import portion of the build process, Greenstone stores all metadata--extracted metadata, extracted text, and assigned metadata--in an XML format for use. Then, during the actual collection building, this metadata is processed by a plugin and utilized for creating browsing classifiers, | ||
+ | |||
+ | By default, Greenstone stores metadata in Greenstone Archival Format (also referred to as GreenstoneXML) and is processed by **[[en: | ||
+ | * select GreenstoneMETS as the storage format | ||
+ | * Include the **[[en: | ||
+ | |||
+ | ===== Greenstone XML Format ===== | ||
+ | During collection " | ||
+ | |||
+ | In XML, tags are enclosed in angle brackets for markup. The Greenstone archive format encodes documents that are already in HTML, and any embedded <, >, or " characters within the original text are escaped using the standard convention lt;, gt; and quot;. | ||
+ | |||
+ | [[http:// | ||
+ | |||
+ | [[http:// | ||
+ | |||
+ | The < | ||
+ | |||
+ | Some metadata elements are special to Greenstone: | ||
+ | |||
+ | |Identifier | The greenstone identifier for the document. Must be unique to the collection.| | ||
+ | |gsdlsourcefilename | the original file from which the archive file was generated (path relative to the import directory)| | ||
+ | |gsdldoctype | generally set to indexed_doc| | ||
+ | |gsdlassocfile | One or more associated files that belong to the document. These get copied over during collection building into the index directory. These may include a cover image, images that are linked to by an HTML page, the original file for a Word or PDF document.| | ||
+ | |assocfilepath | The sub directory of index/assoc in which this document' | ||
+ | |hascover | set to 1 if the document has a cover image - this must be named cover.jpg.| | ||
+ | |Source | the original file name| | ||
+ | |srclink | A link to the original source file, for file types that have been converted (such as Word, PDF) or binary file types (such as MP3, Images)| | ||
+ | |srcicon | an appropriate icon for the source file type| | ||
+ | |||
+ | |||
+ | |||
+ | ===== Greenstone METS format ===== | ||
+ | |||
+ | In Greenstone we use METS in a very specific way - as an alternative archive format to Greenstone Archive format. If the option ' | ||
+ | |||
+ | If you want to see our METS format, you can import (or export) a collection and save as the " | ||
+ | |||
+ | To build a collection using the GreenstoneMETS format in the GLI: | ||
+ | * Switch to **Expert** mode ('' | ||
+ | * In the **Create** panel under **Import Options**, check '' | ||
+ | * In the **Design** panel under **Document Plugins**, remove **GreenstoneXMLPlugin** and add **GreenstoneMETSPlugin** | ||
+ | |||
+ | |||
+ | |||
+ | On the commandline, | ||
+ | |||
+ | | ||
+ | |||
+ | Then, in the // | ||
+ | |||
+ | The Greenstone METS profile has been officially approved by the Library of Congress and you can view the relevant document [[http:// | ||
+ | |||
+ | To add a different kind of METS documents into a collection, you will need to convert them to either our Greenstone Archive format, or our METS Archive format. This can be done using XSLT. | ||
+ | You could convert all the original METS documents into Greenstone METS, put them in the archives directory, and generate an archives.inf file, listing document ids and corresponding files. (import a small collection, e.g. demo, into METS format and have a look at the archives.inf file to see what its like). Then build the collection using buildcol.pl. | ||
+ | |||
+ | Alternatively, | ||
+ | |||
+ | Put the original METS documents in the import directory, write an XSLT to convert them to Greenstone METS format. Use **METSPlugin** in your collection, set the **process_exp** to match the files you want processed, and set the **xslt** option to specify the xslt file that you created (relative to greenstone or collection directory). | ||
+ | Then import and build as normal. |
en/user_advanced/archive_formats.txt · Last modified: 2023/03/13 01:46 by 127.0.0.1