IndexPlugin
This recursive plugin processes an index.txt file. The index.txt file should contain the list of files to be included in the collection followed by any extra metadata to be associated with each file.
The index.txt file should be formatted as follows: The first line may be a key
(beginning with key:) to name the metadata fields (e.g. key: Subject Organization Date).
The following lines will contain a filename followed by the value that metadata entry
is to be set to. (e.g. irma/iw097e 3.2 unesco 1993
will associate the metadata
Subject=3.2, Organization=unesco, and Date=1993 with the file irma/iw097e if the
above key line was used)
Note that if any of the metadata fields use the Hierarchy classifier plugin then the value they're set to should correspond to the first field (the descriptor) in the appropriate classification file.
Metadata values may be named separately using a tag (e.g. <Subject>3.2) and this will override any name given to them by the key line. If there's no key line any unnamed metadata value will be named 'Subject'.
The following table lists all of the configuration options available for IndexPlugin.
Option | Description | Value |
---|---|---|
IndexPlugin Options | ||
No options come directly from this plugin | ||
Options Inherited from BasePlugin | ||
process_exp | A perl regular expression to match against filenames. Matching filenames will be processed by this plugin. For example, using '(?i).html?\$' matches all documents ending in .htm or .html (case-insensitive). | |
no_blocking | Don't do any file blocking. Any associated files (e.g. images in a web page) will be added to the collection as documents in their own right. | |
block_exp | Files matching this regular expression will be blocked from being passed to any later plugins in the list. | |
store_original_file | Save the original source document as an associated file. Note this is already done for files like PDF, Word etc. This option is only useful for plugins that don't already store a copy of the original file. | |
associate_ext | Causes files with the same root filename as the document being processed by the plugin AND a filename extension from the comma separated list provided by this argument to be associated with the document being processed rather than handled as a separate list. | |
associate_tail_re | A regular expression to match filenames against to find associated files. Used as a more powerful alternative to associate_ext. | |
OIDtype | The method to use when generating unique identifiers for each document. | Default: auto List |
OIDmetadata | Specifies the metadata element that hold's the document's unique identifier, for use with -OIDtype=assigned. | Default: dc.Identifier |
no_cover_image | Do not look for a prefix.jpg file (where prefix is the same prefix as the file being processed) to associate as a cover image. | |
filename_encoding | The encoding of the source file filenames. | Default: auto List |
file_rename_method | The method to be used in renaming the copy of the imported file and associated files. | Default: url List |