en:plugin:hbplugin
Table of Contents
HBPlugin
Plugin which processes an HTML book directory. This plugin is used by the Humanity Development Library collections and does not handle input encodings other than ASCII or extended ASCII. This code is not very clean and could no doubt be made to run faster, by leaving it in this state we hope to encourage the utilisation of BookPlugin instead. Use BookPlugin if creating a new collection and marking up files like the Humanity Library collections. BookPlugin accepts all input encodings but expects the marked up files to be cleaner than those used by the Humanity Library collections.
- Processes files with extensions: This plugin does not use a process_
The following table lists all of the configuration options available for HBPlugin.
Option | Description | Value |
---|---|---|
HBPlugin Options | ||
process_exp | A perl regular expression to match against filenames. Matching filenames will be processed by this plugin. For example, using '(?i).html?\$' matches all documents ending in .htm or .html (case-insensitive). | Default: This plugin does not use a process_exp |
input_encoding | The encoding of the source documents. Documents will be converted from these encodings and stored internally as utf8. | Default: iso_8859_1 List |
Options Inherited from BasePlugin | ||
process_exp | A perl regular expression to match against filenames. Matching filenames will be processed by this plugin. For example, using '(?i).html?\$' matches all documents ending in .htm or .html (case-insensitive). | |
no_blocking | Don't do any file blocking. Any associated files (e.g. images in a web page) will be added to the collection as documents in their own right. | |
block_exp | Files matching this regular expression will be blocked from being passed to any later plugins in the list. | |
store_original_file | Save the original source document as an associated file. Note this is already done for files like PDF, Word etc. This option is only useful for plugins that don't already store a copy of the original file. | |
associate_ext | Causes files with the same root filename as the document being processed by the plugin AND a filename extension from the comma separated list provided by this argument to be associated with the document being processed rather than handled as a separate list. | |
associate_tail_re | A regular expression to match filenames against to find associated files. Used as a more powerful alternative to associate_ext. | |
OIDtype | The method to use when generating unique identifiers for each document. | Default: auto List |
OIDmetadata | Specifies the metadata element that hold's the document's unique identifier, for use with -OIDtype=assigned. | Default: dc.Identifier |
no_cover_image | Do not look for a prefix.jpg file (where prefix is the same prefix as the file being processed) to associate as a cover image. | |
filename_encoding | The encoding of the source file filenames. | Default: auto List |
file_rename_method | The method to be used in renaming the copy of the imported file and associated files. | Default: url List |
input_encoding option values
Value | Description |
---|---|
ascii | Plain 7 bit ASCII. This may be a bit faster than using iso_8859_1. Beware of using this when the text may contain characters outside the plain 7 bit ASCII set though (e.g. German or French text containing accents), use iso_8859_1 instead. |
iso_8859_1 | Latin1 (western languages) |
en/plugin/hbplugin.txt · Last modified: 2023/03/13 01:46 by 127.0.0.1