====== Buildcol.pl and classifiers ====== **buildcol.pl** * process the building args (archivedir, maxdocs etc) and read collect.cfg * create a builder object according to the build type (mg, mgpp or lucene) * call the following methods of a builder according to the mode: //all//, //compress_text//, //build_index// or //infodb// $builder->compress_text ->build_indexes ->make_infodatabase ->collect_specific ->make_auxiliary_files **builder** basebuilder.pm mgbuilder.pm mpggbuilder.pm lucenebuilder.pm * new * load plugins * load classifiers * init load up the document processor (//$buildproc//) for building if a //buildproc// class has been created for this collection, use it otherwise, use the corresponding //buildproc// such as //mgbuildproc//, //mgppbuildproc//. * compress_text * call the following methods of a builderproc\\ $buildproc->set_mode ('text') ->set_indexing_text(0) ->set_index($textindex) ->set_output_handle * call &plugin::begin, &plugin::read(...,$builproc,...) and &plugin::end * build_indexes\\ for each of the indexes\\ call //$builder->build_index// * build_index * call the following methods of a builderproc $buildproc->set_mode ('text') ->set_indexing_text(1) ->set_index($index) ->set_output_handle * call &plugin::begin, &plugin::read(...,$builproc,...) and &plugin::end * make_infodatabase * call the following methods of a //builderproc//\\ $buildproc->set_mode (infodb') ->set_indexing_text(0) ->set_classifiers ->set_output_handle * call &plugin::begin, &plugin::read(...,$builproc,...) and &plugin::end * call &classify::output_classify_info **buildproc** basebuildproc.pm mgbuildproc.pm mpggbuildproc.pm lucenebuildproc.pm * process * call the method text or //infodb// according to its mode value, either "text" or "infodb", which is set through //$buildproc->set_mode// by the builder (see above) * text * infodb * call &classify::classify_doc **classify.pm and classifiers** * output_classify_info\\ for each classify\\ call the //get_classify_info// method * classify_doc\\ for each classify\\ call the //classify// method ====More about classifiers==== * Classifiers create browsing structures at build time (during the final phase of buildcol.pl). These hierarchical browsing structures are stored in the collection information database. * The leaves of the structure are usually documents, but in some classifiers they are document sections. The internal nodes of the structure are VLists, HLists, or DateLists. For example, an AZList is a two-level hierarchy with an HList for the A-Z selectors, and VList children. Most classifiers have a fixed number of levels, except for the Hierarchy and GenericList classifiers. * Classifiers inherit from BasClas.pm. When they are executed: * the new() method creates the classifier object * the init() method initialises the classifier object with parameters such as metadata type, button name etc. * the classify() method is invoked once for each document, and stores the appropriate metadata values in the classifier object * the get_classify_info() method performs all sorting and classifier-specific processing, then returns the built classifier structure to the build process, which writes it to the collection information database