User Tools

Site Tools


old:more_about_metadata

This page is in the 'old' namespace, and was imported from our previous wiki. We recommend checking for more up-to-date information using the search box.

More about Metadata

How to manually specify filenames in metadata.xml

If you're writing your own metadata.xml files that will specify what metadata is attached to which folders and files, you will need to specify the <FileName> element as a regular expression and any filepaths must be in URI format (which uses forward slashes). Because such filepaths represent regular expressions, backslashes can be used to escape special characters, e.g. "\." means the literal full-stop character.

An example of a valid metadata.xml file is:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE DirectoryMetadata SYSTEM "http://greenstone.org/dtd/DirectoryMetadata/1.0/DirectoryMetadata.dtd">
<DirectoryMetadata>
    <FileSet>
        <FileName>pinky/golala/filename1\.txt</FileName>
        <Description>
            <Metadata name="dc.Title">Lala</Metadata>
        </Description>
    </FileSet>
    <FileSet>
        <FileName>pinky/nono/filename2\.txt</FileName>
        <Description>
            <Metadata name="dc.Title">Nono</Metadata>
        </Description>
    </FileSet>
    <FileSet>
        <FileName>pinky/toto/filename3\.txt</FileName>
        <Description>
            <Metadata name="dc.Title">Toto</Metadata>
        </Description>       
    </FileSet>
</DirectoryMetadata>

Can I get any information about the metadata coverage in my collection?

Metadata coverage statistics can be gathered during collection building by adding the line store_metadata_coverage true to the collection's etc/collect.cfg file. Rebuild the collection (don't need to reimport), then the collection's GDBM database will contain the following information in the 'collection' entry. Examples are from the demo collection.

  • Which metadata sets have been used in the collection
<metadataset>dls
<metadataset>ex
  • Which elements are present in each metadata set.
<metadatalist-ex>URL
<metadatalist-ex>Plugin
<metadatalist-ex>Encoding
<metadatalist-ex>Language
<metadatalist-ex>SourceFile
<metadatalist-ex>Source
<metadatalist-ex>FileSize
<metadatalist-ex>Title
<metadatalist-dls>Subject
<metadatalist-dls>Language
<metadatalist-dls>Keyword
<metadatalist-dls>Organization
<metadatalist-dls>Title
  • The frequency of each metadata element.
<metadatafreq-dls-Subject>17
<metadatafreq-dls-Title>11
<metadatafreq-dls-Organization>11
<metadatafreq-dls-Keyword>6
<metadatafreq-dls-Language>11
<metadatafreq-ex-SourceFile>11
<metadatafreq-ex-Plugin>11
<metadatafreq-ex-URL>11
<metadatafreq-ex-Title>11
<metadatafreq-ex-Encoding>11
<metadatafreq-ex-FileSize>11
<metadatafreq-ex-Language>11
<metadatafreq-ex-Source>11

Note, to view all the entries in the GDBM database, run

 db2txt path-to-collection/index/text/collname.gdb > database.txt
old/more_about_metadata.txt · Last modified: 2023/03/13 01:46 by 127.0.0.1