This version (2019/09/17 10:40) is a draft.
Approvals: 0/1

List classifier

A general and flexible list classifier with most of the abilities of AZCompactList, but with better Unicode, metadata and sorting capabilities.

The following table lists all of the configuration options available for List.

OptionDescriptionValues
List Options
metadata (REQUIRED) Metadata fields used for classification. Use '/' to separate the levels in the hierarchy and ';' or ',' to separate a list of metadata fields within each level.
For example, "dc.Title,Title" will make a titles classifier, using either/both of the titles (depending on -metadata_selection_mode). "Date/Title" will make a Date partitioning, and inside each Date, will be a Title partitioning.
metadata_selection_mode_within_level Determines how many metadata values the document is classified by, within each level. Use '/' to separate the levels. firstvalue: Only classify by a single metadata value, the first one encountered.
firstvalidmetadata: (Default) Classify by all the metadata values of the first element in the list that has values.
allvalues: Classify by all metadata values found, from all elements in the list.
metadata_sort_mode_within_level How to sort the values of metadata within each partition. Use '/' to separate the levels. unicode: Sort using the Unicode Collation Algorithm. Requires http://www.unicode.org/Public/UCA/latest/allkeys.txt file to be downloaded into perl's lib/Unicode/Collate folder.
alphabetic: Sort using alphabetical ordering, including for digits. E.g. 10 would sort before 9.
alphanumeric: (Default) Sort using a more natural sort, where digits are treated as numbers and sorted numerically. E.g. 10 would sort after 9.
bookshelf_type Controls when to create bookshelves. This only applies to the last level. Other levels will get bookshelf_type = always. always: Create a bookshelf icon for each value, even if there is only one item in each group at the leaf nodes.
duplicate_only: Create a bookshelf icon only when there is more than one item in each group at the leaf nodes.
never: (Default) Never create a bookshelf icon even if there is more than one one item in each group at the leaf nodes.
classify_sections Classify sections instead of documents.
partition_type_within_level The type of partitioning done. Can be specified for each level. Separate levels by '/'. per_letter: (Default) Create a partition for each letter (word character).
approximate_size: Create a partition per letter, then group or split the letters to get approximately the same sized partitions.
constant_size: Create partitions with constant size.
all_values: Create a partition for each metadata value.
none: No partitions. Will apply to the entire level, both numeric and non-numeric values; i.e. Setting none in either partition_type_within_level and numeric_partition_type_within_level will result in both these options being set to none.
partition_size_within_level The number of items in each partition (only applies when partition_type_within_level is set to 'constant_size' or 'approximate_size'). Can be specified for each level. Separate levels by '/'. Default: 30
partition_name_length The length of the partition name; defaults to a variable length from 1 up to 3 characters, depending on how many are required to distinguish the partition start from its end. This option only applies when partition_type_within_level is set to 'constant_size' or 'approximate_size'.
partition_sort_mode_within_level How to sort the values of metadata to create the partitions. unicode: Sort using the Unicode Collation Algorithm. Requires http://www.unicode.org/Public/UCA/latest/allkeys.txt file to be downloaded into perl's lib/Unicode/Collate folder.
alphabetic: Sort using alphabetical ordering, including for digits. E.g. 10 would sort before 9.
alphanumeric: (Default) Sort using a more natural sort, where digits are treated as numbers and sorted numerically. E.g. 10 would sort after 9.
numeric_partition_type_within_level The type of partitioning done at each level, for those values that start with digits 0-9. Separate levels by '/'. per_digit: Create a partition for each digit (0-9).
per_number: Create a partition for each number. Control how many digits are used to create numbers using the -numeric_partition_name_length_within_level option.
single_partition: Create a single partition '0-9' for all values that start with digits.
approximate_size: Create a partition per number (using -numeric_partition_name_length_within_level to determine how many digits to include in the number), then group or split the partitions to get approximately the same sized partitions.
constant_size: Create partitions with constant size.
all_values: Create a partition for each metadata value.
none: No partitions. Will apply to the entire level, both numeric and non-numeric values; i.e. Setting none in either partition_type_within_level and numeric_partition_type_within_level will result in both these options being set to none.
numeric_partition_size_within_level The number of items in each numeric partition (only applies when numeric_partition_type_within_level is set to 'constant_size' or 'approximate_size'). Can be specified for each level. Separate levels by '/'. Default: 30
numeric_partition_name_length Control how many consecutive digits are grouped to make the number for the numeric partition name. -1 implies all the digits. Default: -1
numeric_partition_sort_mode_within_level How to sort the values of numeric metadata to create the numeric partitions. unicode: Sort using the Unicode Collation Algorithm. Requires http://www.unicode.org/Public/UCA/latest/allkeys.txt file to be downloaded into perl's lib/Unicode/Collate folder.
alphabetic: Sort using alphabetical ordering, including for digits. E.g. 10 would sort before 9.
alphanumeric: (Default) Sort using a more natural sort, where digits are treated as numbers and sorted numerically. E.g. 10 would sort after 9.
numbers_first Sort the numbers to the start of the list (By default, metadata values starting with numbers are sorted at the end).
sort_leaf_nodes_using Metadata fields used for sorting the leaf nodes (i.e. those documents in a bookshelf). Use '|' to separate the metadata groups to stable sort by, and ';' or ',' to separate metadata fields within each group. For example, "dc.Title,Title|Date" will result in a list sorted by Titles (coming from either dc.Title or Title), with those documents having the same Title sorted by Date.Default: Title
sort_leaf_nodes_sort_mode How to sort the leaf node metadata fields. unicode: Sort using the Unicode Collation Algorithm. Requires http://www.unicode.org/Public/UCA/latest/allkeys.txt file to be downloaded into perl's lib/Unicode/Collate folder.
alphabetic: Sort using alphabetical ordering, including for digits. E.g. 10 would sort before 9.
alphanumeric: (Default) Sort using a more natural sort, where digits are treated as numbers and sorted numerically. E.g. 10 would sort after 9.
reverse_sort_leaf_nodes Sort the leaf documents in reverse order.
sort_using_unicode_collation Sort using the Unicode Collation Algorithm. Requires http://www.unicode.org/Public/UCA/latest/allkeys.txt file to be downloaded into perl's lib/Unicode/Collate folder. This will override all other sort mode arguments: they will all be set to 'unicode'.
filter_metadata Metadata element to test against for a document's inclusion into the classifier. Documents will be included if they define this metadata.
filter_regex Regular expression to use in the filter_metadata test. If a regex is specified, only documents with filter_metadata that matches this regex will be included.
use_formatted_metadata_for_bookshelf_display Metadata values are formatted for sorting (unless -no_metadata_formatting is specified). This might include lower-casing, tidying up whitespace, removing articles. Set this option to use these formatted values for bookshelf names. Otherwise the original value variant that occurs most frequently will be used.
removeprefix A prefix to ignore in metadata values when sorting.
removesuffix A suffix to ignore in metadata values when sorting.
Options Inherited from BaseClassifier
buttonname The label for the classifier screen and button in navigation bar. The default is the metadata element specified with -metadata.
no_metadata_formatting Don't do any automatic metadata formatting (for sorting.)
builddir Where to put the built indexes.
outhandle The file handle to write output to. Default: STDERR
verbosity Controls the quantity of output. 0=none, 3=lots. Default: 2