User Tools

Site Tools


Metadata database files

Metadata database file types, such as MARC, OAI, CDS/ISIS, BibTex, Refer and ProCite, generally contain multiple metadata records. These records can either be treated as documents themselves or, in some cases, can be used as metadata for other documents. In addition, a database file can be processed as a single file or can be "exploded" into individual records.

Documents or Metadata

The first thing you must determine is whether your database file will be providing metadata for other documents or will actually be documents in your collection. For example, if you want to create a collection of MARC records, every record in the MARC file will be considered a document in your collection. However, if you have created a CSV file (which can be created from a spreadsheet) with metadata for all of the documents in your collection, you would process the CSV file as metadata. To link metadata records to their corresponding documents, the records must have a metadata field containing the document's filename.

If you will be processing the database files as documents, you can either use the plugin for that file type (e.g. MARCPlugin for MARC files, OAIPlugin for OAI files, etc.) or you can "explode" the file into individual records so you can view/edit the records in the Enrich panel of the GLI (in this case, every record in the database will become a .nul file and will be processed by the NulPlugin).

If you would like to use a database file as metadata for other documents,

"Exploding" Metadata Files

Metadata database file types can be imported into Greenstone but their metadata cannot be IMMEDIATELY viewed or edited in the Librarian Interface. However, you can "explode" the file in the Librarian Interface. Then you can view (and potentially edit) the metadata afterwards. Alternatively, particularly if you are maintaining a master external application, you can go back to the program that created the file, make your corrections, and reimport.

"Exploding" a metadata database file splits it into individual records, with viewable metadata. This process is irreversible: the original metadata file is removed from the collection.

Explodable files have a grey icon in the Collection tree. To explode one, right click it and choose Explode Metadata Database. A popup window will appear with the following options for the exploding process:

pluginPlugin to use for explodingDrop-down menu of possible plugins, based on file extension
input_encodingEncoding to use when reading in the database fileBy default, will automatically identify the encoding for each document; for faster processing, select the specific encoding of the documents from the drop-down menu
metadata_setMetadata set (namespace) to export all metadata asDefault: Exploded Metadata Set
document_fieldThe metadata element specifying the file name of documents to obtain and include in the collection
document_prefixA prefix for the document locations (for use with the document_field option)
document_suffixA suffix for the document locations (for use with the document_field option)
records_per_folderThe number of records to put in each subfolderDefault: 100
verbosityControls the quantity of output0-3 (0=none;3=lots); Default: 1

The "plugin" option specifies the plugin to be used for exploding. In most cases, only one plugin will process a particular type of file, but in some cases, where different file types share the same filename extension, there may be two plugins that both process files with that extension. The "input_encoding" option can be used to specify the encoding of the database. The "metadata_set" option specifies the metadata set to which the new fields generated by exploding should be added. If none is specified, you will be prompted for what to do with each new field in the database: add it as a new element to an existing metadata set, merge with another element, or ignore.

When a file is exploded, a new empty document is created for each record, and the metadata from the record is assigned to the document. These are named using numbers such as 000001.nul, 000002.nul etc. If the "document_field" option is set (to a database field name), the value of this field, if present, will be used for the filename. The exploding process will also try to download the file and use it instead of an empty file. The "document_prefix" and "document_suffix" options can be used to make a valid URL or file path from the document_field value. The "records_per_folder" option can be used to group exploded records into sub-folders. If the database is very large, using this option will accelerate subsequent metadata editing.

Explodability is determined by file extension. In some cases, files may be incorrectly labelled as explodable if they have the same file extension as an explodable file. For example, the ProCite plugin processes files with a .txt extension, but most .txt files are plain text files, not ProCite files.


===== Time Taken to Explode Bibliographic Records =====

Number of recordsAverge size (chars)GLI MemoryTime to explode (min:sec)Time to switch to Enrich panel (min:sec)Time to modify a metadata entry (min:sec)
1000200 - 300standard (128 M)0:10 - 0:150:02 - 0:05 0:02 - 0:05
2000350 - 450 standard (128 M)0:15 - 0:20 0:02 - 0:05 0:02 - 0:05
4000500 - 600 standard (128 M)0:40 - 0:45 0:05 - 0:10 0:05 - 0:10
6000200 - 300 standard (128 M)1:10 -1:20 0:05 - 0:10 0:05 - 0:10
8000400 - 500 standard (128 M)2:30 - 3:00 0:05 - 0:10 0:10 - 0:15
10000300 - 400 Increased to 256 M2:30 - 3:00 0:05 - 0:10 0:10 - 0:15
12000300 - 400 Increased to 256 M2:30 - 3:00 0:05 - 0:10 0:15 - 0:20
14000400 - 500 Increased to 256 M4:30 - 5:00 0:10 - 0:15 0:20 - 0:25
16000400 - 500 Increased to 256 M6:30 - 7:00 0:10 - 0:15 0:25 - 0:30
19000350 - 450 Increased to 256 M9:30 - 10:00 0:10 - 0:15 0:25 - 0:30
Note: This experiment was carried out on a 2.16 GHz computer with 1 GB RAM. Results will vary on different computers.

We plan to add a progress bar during the exploding operation.

We hope to speed up the times taken to switch into the Enrich panel and edit metadata there.

en/filetype/metadata_database_files.txt · Last modified: 2023/03/13 01:46 by