Buildcol.pl and classifiers

buildcol.pl

  • process the building args (archivedir, maxdocs etc) and read collect.cfg
  • create a builder object according to the build type (mg, mgpp or lucene)
  • call the following methods of a builder according to the mode: all, compress_text, build_index or infodb
     $builder->compress_text
               ->build_indexes
               ->make_infodatabase
               ->collect_specific
               ->make_auxiliary_files 

builder

basebuilder.pm

         mgbuilder.pm
         mpggbuilder.pm
         lucenebuilder.pm
  • new
    • load plugins
    • load classifiers
  • init

load up the document processor ($buildproc) for building if a buildproc class has been created for this collection, use it otherwise, use the corresponding buildproc such as mgbuildproc, mgppbuildproc.

  • compress_text
    • call the following methods of a builderproc
                  $buildproc->set_mode ('text')
                            ->set_indexing_text(0)
                            ->set_index($textindex)
                            ->set_output_handle
             
    • call &plugin::begin, &plugin::read(…,$builproc,…) and &plugin::end
  • build_indexes
    for each of the indexes
    call $builder→build_index
  • build_index
    • call the following methods of a builderproc

$buildproc→set_mode ('text')

  1. >set_indexing_text(1)
  2. >set_index($index)
  3. >set_output_handle
  • call &plugin::begin, &plugin::read(…,$builproc,…) and &plugin::end
  • make_infodatabase
  • call the following methods of a builderproc
                $buildproc->set_mode (infodb')
                          ->set_indexing_text(0)
                          ->set_classifiers
                          ->set_output_handle
         
  • call &plugin::begin, &plugin::read(…,$builproc,…) and &plugin::end
  • call &classify::output_classify_info

buildproc

basebuildproc.pm

         mgbuildproc.pm
         mpggbuildproc.pm
         lucenebuildproc.pm
  • process
    • call the method text or infodb according to its mode value, either "text" or "infodb", which is set through $buildproc→set_mode by the builder (see above)
    • text
    • infodb
      • call &classify::classify_doc

classify.pm and classifiers

  • output_classify_info
    for each classify
    call the get_classify_info method
  • classify_doc
    for each classify
    call the classify method

More about classifiers

  • Classifiers create browsing structures at build time (during the final phase of buildcol.pl). These hierarchical browsing structures are stored in the collection information database.
  • The leaves of the structure are usually documents, but in some classifiers they are document sections. The internal nodes of the structure are VLists, HLists, or DateLists. For example, an AZList is a two-level hierarchy with an HList for the A-Z selectors, and VList children. Most classifiers have a fixed number of levels, except for the Hierarchy and GenericList classifiers.
  • Classifiers inherit from BasClas.pm. When they are executed:
    • the new() method creates the classifier object
    • the init() method initialises the classifier object with parameters such as metadata type, button name etc.
    • the classify() method is invoked once for each document, and stores the appropriate metadata values in the classifier object
    • the get_classify_info() method performs all sorting and classifier-specific processing, then returns the built classifier structure to the build process, which writes it to the collection information database