legacy:manuals:en:develop:the_greenstone_runtime_system
no way to compare when less than two revisions
Differences
This shows you the differences between two versions of the page.
— | legacy:manuals:en:develop:the_greenstone_runtime_system [2023/03/13 01:46] (current) – created - external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | |||
+ | |||
+ | |||
+ | ====== The Greenstone runtime system ====== | ||
+ | |||
+ | This chapter describes the Greenstone runtime system so that you can understand, augment and extend its capabilities. The software is written in C++ and makes extensive use of virtual inheritance. If you are unfamiliar with this language you should learn about it before proceeding. Deitel and Deitel (1994) provide a comprehensive tutorial, while Stroustroup (1997) is the definitive reference. | ||
+ | |||
+ | We begin by explaining the design philosophy behind the runtime system since this has a strong bearing on implementation. Then we provide the implementation details, which forms the main part of this chapter. The version of Greenstone described here is the CGI version (Web Library if for Windows users). The Windows Local Library uses the same source code but has a built-in webserver front end. Also, the Local Library is a persistent process. | ||
+ | |||
+ | ===== Process structure ===== | ||
+ | |||
+ | < | ||
+ | {{..: | ||
+ | |||
+ | Figure <imgref figure_overview_of_a_general_greenstone_system> | ||
+ | |||
+ | Two components are central to the design of the runtime system: “receptionists” and “collection servers.” From a user's point of view, a receptionist is the point of contact with the digital library. It accepts user input, typically in the form of keyboard entry and mouse clicks; analyzes it; and then dispatches a request to the appropriate collection server (or servers). This locates the requested piece of information and returns it to the receptionist for presentation to the user. Collection servers act as an abstract mechanism that handle the content of the collection, while receptionists are responsible for the user interface. | ||
+ | |||
+ | < | ||
+ | {{..: | ||
+ | |||
+ | As Figure <imgref figure_overview_of_a_general_greenstone_system> | ||
+ | |||
+ | Usually, a “server” is a persistent process that, once started, runs indefinitely, | ||
+ | |||
+ | Surprisingly, | ||
+ | |||
+ | As an alternative to the null protocol, the Greenstone protocol has also been implemented using the well-known CORBA scheme (Slama //et al.//, 1999). This uses a unified object oriented paradigm to enable different processes, running on different computer platforms and implemented in different programming languages, to access the same set of distributed objects over the Internet (or any other network). Then, scenarios like Figure <imgref figure_overview_of_a_general_greenstone_system> | ||
+ | |||
+ | < | ||
+ | {{..: | ||
+ | |||
+ | This allows far more sophisticated interfaces to be set up to exactly the same digital library collections. As just one example, Figure <imgref figure_graphical_query_interface_to_greenstone> | ||
+ | |||
+ | The distributed protocol is still being refined and readied for use, and so this manual does not discuss it further (see Bainbridge //et al//., 2001, for more information). | ||
+ | |||
+ | ===== Conceptual framework ===== | ||
+ | |||
+ | < | ||
+ | {{..: | ||
+ | |||
+ | Figure <imgref figure_generating_the_about_this_collection_page> | ||
+ | |||
+ | > //For the Project Gutenberg collection (c=gberg), the action is to generate a page (a=p), and the page to generate is called “about” (p=about).// | ||
+ | |||
+ | < | ||
+ | {{..: | ||
+ | |||
+ | Figure <imgref figure_greenstone_runtime_system> | ||
+ | |||
+ | The macro language, which we met in Section [[# | ||
+ | |||
+ | The Macro Language object in Figure <imgref figure_greenstone_runtime_system> | ||
+ | |||
+ | The layout of the “about this collection” page (Figure <imgref figure_generating_the_about_this_collection_page> | ||
+ | |||
+ | One further important ingredient is the Format object. Format statements in the collection configuration file affect the presentation of particular pieces of information, | ||
+ | |||
+ | At the bottom of Figure <imgref figure_greenstone_runtime_system>, | ||
+ | |||
+ | Ignoring blank lines, the receptionist contains 15,000 lines of code. The collection server contains only 5,000 lines (75% of which are taken up by header files). The collection server is more compact because content retrieval is accomplished through two pre-compiled programs. mg, a full-text retrieval system, is used for searching, and gdbm, a database management system, is used to hold the collection information database. | ||
+ | |||
+ | To encourage extensibility and flexibility, | ||
+ | |||
+ | ===== How the conceptual framework fits together ===== | ||
+ | |||
+ | Sections [[# | ||
+ | |||
+ | ==== Performing a search ==== | ||
+ | |||
+ | < | ||
+ | {{..: | ||
+ | |||
+ | When a user enters a query by pressing //Begin search// on the search page, a new Greenstone action is invoked, which ends up by generating a new html page using the macro language. Figure <imgref figure_searching_gutenberg_for_darcy> | ||
+ | |||
+ | Filters are an important basic function of collection servers. Tailored for both searching and browsing activities, they provide a way of selecting a subset of information from a collection. In this case, the // | ||
+ | |||
+ | * setting the filter request type to be // | ||
+ | * storing the user's search preferences—case-folding, | ||
+ | * calling the // | ||
+ | |||
+ | Calls to the protocol are synchronous. The receptionist is effectively blocked until the filter request has been processed by the collection server and any data generated has been returned. | ||
+ | |||
+ | When a protocol call of type // | ||
+ | |||
+ | Once the search results have been returned to the receptionist, | ||
+ | |||
+ | ==== Retrieving a document ==== | ||
+ | |||
+ | Following the above query for //Darcy//, consider what happens when a document is displayed. Figure <imgref figure_the_golf_course_mystery> | ||
+ | |||
+ | < | ||
+ | {{..: | ||
+ | |||
+ | The source text for the Gutenberg collection comprises one long file per book. At build time, these files are split into separate pages every 200 lines or so, and relevant information for each page is stored in the indexes and collection information database. The top of Figure <imgref figure_the_golf_course_mystery> | ||
+ | |||
+ | The action for retrieving documents, // | ||
+ | |||
+ | The action follows a similar procedure to // | ||
+ | |||
+ | ==== Browsing a hierarchical classifier ==== | ||
+ | |||
+ | Figure <imgref figure_browsing_titles_in_the_gutenberg_collection> | ||
+ | |||
+ | < | ||
+ | {{..: | ||
+ | |||
+ | Records that represent classifier nodes in the database use the prefix //CL//, followed by numbers separated by periods (.) to designate where they lie within the nested structure. Ignoring the search button (leftmost in the navigation bar), classifiers are numbered sequentially in increasing order, left to right, starting at 1. Thus the top level classifier node for titles in our example is //CL1// and the page sought is generated by setting // | ||
+ | |||
+ | To process a //cl// document request, the Filter object is used to retrieve the node over the protocol. Depending on the data returned, further protocol calls are made to retrieve document metadata. In this case, the titles of the books are retrieved. However, if the node were an interior one whose children are themselves nodes, the titles of the child nodes would be retrieved. From a coding point of view this amounts to the same thing, and is handled by the same mechanism. | ||
+ | |||
+ | Finally, all the retrieved information is bound together, using the macro language, to produce the web page shown in Figure <imgref figure_browsing_titles_in_the_gutenberg_collection> | ||
+ | |||
+ | ==== Generating the home page ==== | ||
+ | |||
+ | < | ||
+ | {{..: | ||
+ | |||
+ | As a final example, we look at generating the Greenstone home page. Figure <imgref figure_greenstone_home_page> | ||
+ | |||
+ | The purpose of the home page is to show what collections are available. Clicking on an icon takes the user to the “about this collection” page for that collection. The menu of collections is dynamically generated every time the page is loaded, based on the collections that are in the file system at that time. When a new one comes online, it automatically appears on the home page when that page is reloaded (provided the collection is stipulated to be “public”). | ||
+ | |||
+ | To do this the receptionist uses the protocol (of course). As part of appraising the CGI arguments, // | ||
+ | |||
+ | ===== Source code ===== | ||
+ | |||
+ | < | ||
+ | |< - 132 397 >| | ||
+ | | // | ||
+ | | //getpw/// | Password support for Unix. | | ||
+ | | //txt2db/// | Convert an XML-like ASCII text format to Gnu's database format. | | ||
+ | | //db2txt/// | Convert the Gnu database format to an XML-like ASCII text format. | | ||
+ | | //phind/// | Hierarchical phrase browsing tool. | | ||
+ | | // | ||
+ | | //mgpp/// | Rewritten and updated version of Managing Gigabytes package in C++. | | ||
+ | | // | ||
+ | | // | ||
+ | |||
+ | The source code for the runtime system resides in // | ||
+ | |||
+ | Another directory, // | ||
+ | |||
+ | Greenstone makes extensive use of the Standard Template Library (STL), a widely-used C++ library from Silicon Graphics (// www.sgi.com //) that is the result of many years of design and development. Like all programming libraries it takes some time to learn. Appendix A gives a brief overview of key parts that are used throughout the Greenstone code. For a fuller description, | ||
+ | |||
+ | ===== Common Greenstone types ===== | ||
+ | |||
+ | The objects defined in // | ||
+ | |||
+ | ==== The text_t object ==== | ||
+ | |||
+ | Greenstone works with multiple languages, both for the content of a collection and its user interface. To support this, Unicode is used throughout the source code. The underlying object that realises a Unicode string is //text_t//. | ||
+ | |||
+ | < | ||
+ | <code 1> | ||
+ | typedef vector< | ||
+ | |||
+ | class text_t { | ||
+ | protected: | ||
+ | usvector text; | ||
+ | unsigned short encoding; // 0 = unicode, 1 = other | ||
+ | |||
+ | public: | ||
+ | // constructors | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | void setencoding (unsigned short theencoding); | ||
+ | | ||
+ | |||
+ | // STL container support | ||
+ | | ||
+ | | ||
+ | |||
+ | void erase(iterator pos); | ||
+ | void push_back(unsigned short c); | ||
+ | void pop_back(); | ||
+ | |||
+ | void reserve (size_type n); | ||
+ | |||
+ | bool empty () const {return text.empty(); | ||
+ | | ||
+ | |||
+ | // added functionality | ||
+ | void clear (); | ||
+ | void append (const text_t &t); | ||
+ | |||
+ | // support for integers | ||
+ | void appendint (int i); | ||
+ | void setint (int i); | ||
+ | int getint () const; | ||
+ | |||
+ | // support for arrays of chars | ||
+ | void appendcarr (char *s, size_type len); | ||
+ | void setcarr (char *s, size_type len); | ||
+ | }; | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | Unicode uses two bytes to store each character. Figure <imgref figure_the_text_t_api> | ||
+ | |||
+ | The constructor functions (lines 10—12) explicitly support three forms of initialisation: | ||
+ | |||
+ | Following this, most of the detail (lines 17—28) is taken up maintaining an STL vector-style container: // | ||
+ | |||
+ | < | ||
+ | <code 1> | ||
+ | class text_t { | ||
+ | // ... | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | // ... | ||
+ | }; | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | There are many overloaded operators that do not appear in Figure <imgref figure_the_text_t_api> | ||
+ | |||
+ | Member functions that take //const// arguments instead of non- //const// ones are also provided (but not shown here). Such repetition is routine in C++ objects, making the API fatter but no bigger conceptually. In reality, many of these functions are implemented as single in-line statements. For more detail, refer to the source file // | ||
+ | |||
+ | ==== The Greenstone library code ==== | ||
+ | |||
+ | The header files in // | ||
+ | |||
+ | < | ||
+ | |< - 100 450 >| | ||
+ | | **cfgread.h** | Functions to read and write configuration files. For example, // | ||
+ | | **display.h** | A sophisticated object used by the receptionist for setting, storing and expanding macros, plus supporting types. Section [[# | ||
+ | | **fileutil.h** | Function support for several file utilities in an operating system independent way. For example, // | ||
+ | | **gsdlconf.h** | System-specific functions that answer questions such as: does the operating system being used for compilation need to access // | ||
+ | | **gsdltimes.h** | Function support for date and times. For example, // | ||
+ | | **gsdltools.h** | Miscellaneous support for the Greenstone runtime system: clarify if littleEndian or bigEndian; check whether Perl is available; execute a system command (with a few bells and whistles); and escape special macro characters in a //text_t// string. | | ||
+ | | **gsdlunicode.h** | A series of inherited objects that support processing Unicode //text_t// strings through IO streams, such as Unicode to UTF-8 and //vice versa//; and the removal of zero-width spaces. Support for map files is also provided through the // | ||
+ | | **text_t.h** | Primarily the Unicode text object described above. It also provides two classes for converting streams: // | ||
+ | |||
+ | ===== Collection server ===== | ||
+ | |||
+ | Now we systematically explain all the objects in the conceptual framework of Figure <imgref figure_greenstone_runtime_system> | ||
+ | |||
+ | Most of the classes central to the conceptual framework are expressed using virtual inheritance to aid extensibility. With virtual inheritance, | ||
+ | |||
+ | For example, suppose a base class called // | ||
+ | |||
+ | ==== The Search object ==== | ||
+ | |||
+ | < | ||
+ | < | ||
+ | class searchclass { | ||
+ | public: | ||
+ | | ||
+ | | ||
+ | // the index directory must be set before any searching | ||
+ | // is done | ||
+ | | ||
+ | // the search results are returned in queryresults | ||
+ | // search returns ' | ||
+ | | ||
+ | | ||
+ | // the document text for ' | ||
+ | // docTargetDocument returns ' | ||
+ | // try to get a document | ||
+ | // collection is needed to see if an index from the | ||
+ | // collection is loaded. If no index has been loaded | ||
+ | // defaultindex is needed to load one | ||
+ | | ||
+ | const text_t & | ||
+ | const text_t & | ||
+ | const text_t & | ||
+ | int docnum, | ||
+ | | ||
+ | protected: | ||
+ | | ||
+ | | ||
+ | }; | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | Figure <imgref figure_search_base_class_api> | ||
+ | |||
+ | The class also includes two protected data fields: // | ||
+ | |||
+ | Both data fields are applicable to every inherited object that implements a searching mechanism. This is why they appear in the base class, and are declared within a protected section of the class so that inherited classes can access them directly. | ||
+ | |||
+ | ==== Search and retrieval with MG ==== | ||
+ | |||
+ | Greenstone uses MG (short for Managing Gigabytes, see Witten //et al//., 1999) to index and retrieve documents, and the source code is included in the // | ||
+ | |||
+ | < | ||
+ | < | ||
+ | enum result_kinds { | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | }; | ||
+ | int mgq_ask(char *line); | ||
+ | int mgq_results(enum result_kinds kind, int skip, int howmany, | ||
+ | int (*sender)(char *, int, int, float, void *), | ||
+ | void *ptr); | ||
+ | int mgq_numdocs(void); | ||
+ | int mgq_numterms(void); | ||
+ | int mgq_equivterms(unsigned char *wordstem, | ||
+ | int (*sender)(char *, int, int, float, void *), | ||
+ | void *ptr); | ||
+ | int mgq_docsretrieved (int *total_retrieved, | ||
+ | int mgq_getmaxstemlen (); | ||
+ | void mgq_stemword (unsigned char *word); | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | MG is normally used interactively by typing commands from the command line, and one way to implement // | ||
+ | |||
+ | The way to supply parameters to mg is via // | ||
+ | |||
+ | < | ||
+ | mgq_ask( ".set casefold off "); | ||
+ | </ | ||
+ | |||
+ | It is also used to invoke a query. Results are accessed through // | ||
+ | |||
+ | ==== The Source object ==== | ||
+ | |||
+ | < | ||
+ | < | ||
+ | class sourceclass { | ||
+ | public: | ||
+ | | ||
+ | | ||
+ | // configure should be called once for each configuration line | ||
+ | | ||
+ | // init should be called after all the configuration is done but | ||
+ | // before any other methods are called | ||
+ | | ||
+ | // translate_OID translates OIDs using " .pr " , . " fc " etc. | ||
+ | | ||
+ | // get_metadata fills out the metadata if possible, if it is not | ||
+ | // responsible for the given OID then it return s false. | ||
+ | | ||
+ | bool getParents, const text_tset & | ||
+ | | ||
+ | | ||
+ | | ||
+ | }; | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | The role of Source in Figure <imgref figure_greenstone_runtime_system> | ||
+ | |||
+ | Other member functions seen in Figure <imgref figure_source_base_class_api> | ||
+ | |||
+ | The remaining one, // | ||
+ | |||
+ | As well as hierarchical section numbers, the document identifier syntax supports a form of relative access. For the current section of a document it is possible to access the //first child// by appending //.fc//, the //last child// by appending //.lc//, the //parent// by appending //.pr//, the //next sibling// by appending //.ns//, and the //previous sibling// by appending //.ps//. | ||
+ | |||
+ | The // | ||
+ | |||
+ | === Database retrieval with gdbm === | ||
+ | |||
+ | GDBM is the Gnu database manager program (// www.gnu.org //). It implements a flat record structure of key/data pairs, and is backwards compatible with dbm and ndbm. Operations include storage, retrieval and deletion of records by key, and an unordered traversal of all keys. | ||
+ | |||
+ | < | ||
+ | < | ||
+ | [HASH01d7b30d4827b51282919e9b] | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | ———————————————————————- | ||
+ | [CL1] | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | ———————————————————————- | ||
+ | [CL1.1] | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | HASH12c88a01da6e8379df86a7; | ||
+ | | ||
+ | HASHce55006513c47235ac38ba; | ||
+ | HASH010dd1e923a123826ae30e4b; | ||
+ | | ||
+ | | ||
+ | | ||
+ | ... | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | Figure <imgref figure_gdbm_database_for_the_gutenberg_collection> | ||
+ | |||
+ | The document record stores the book's title, author, and any other metadata provided (or extracted) when the collection was built. It also records values for internal use: where files associated with this document reside (//< | ||
+ | |||
+ | The //< | ||
+ | |||
+ | The second record in Figure <imgref figure_gdbm_database_for_the_gutenberg_collection> | ||
+ | |||
+ | The children in the //< | ||
+ | |||
+ | === Using MG and GDBM to implement a Source object === | ||
+ | |||
+ | < | ||
+ | < | ||
+ | class mggdbmsourceclass : public sourceclass { | ||
+ | protected: | ||
+ | // Omitted, data fields that store: | ||
+ | // | ||
+ | // | ||
+ | // | ||
+ | // | ||
+ | public: | ||
+ | | ||
+ | | ||
+ | void set_gdbmptr (gdbmclass *thegdbmptr); | ||
+ | void set_mgsearchptr (searchclass *themgsearchptr); | ||
+ | void configure (const text_t &key, const text_tarray & | ||
+ | bool init (ostream & | ||
+ | bool translate_OID (const text_t &OIDin, text_t & | ||
+ | | ||
+ | bool get_metadata (const text_t & | ||
+ | const text_t & | ||
+ | bool getParents, const text_tset & | ||
+ | const text_t &OID, MetadataInfo_tmap & | ||
+ | | ||
+ | bool get_document (const text_t &OID, text_t &doc, | ||
+ | | ||
+ | }; | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | The object that puts mg and gdbm together to realise an implementation of // | ||
+ | |||
+ | ==== The Filter object ==== | ||
+ | |||
+ | < | ||
+ | < | ||
+ | class filterclass { | ||
+ | protected: | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | public: | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | // returns the name of this filter | ||
+ | | ||
+ | // returns the current filter options | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | }; | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | The base class API for the Filter object in Figure <imgref figure_greenstone_runtime_system> | ||
+ | |||
+ | * // | ||
+ | * // | ||
+ | * // | ||
+ | |||
+ | // | ||
+ | |||
+ | The member functions // | ||
+ | |||
+ | < | ||
+ | < | ||
+ | struct FilterOption_t { | ||
+ | void clear (); \ void check_defaultValue (); | ||
+ | | ||
+ | | ||
+ | enum type_t {booleant=0, | ||
+ | | ||
+ | enum repeatable_t {onePerQuery=0, | ||
+ | | ||
+ | | ||
+ | | ||
+ | }; | ||
+ | struct OptionValue_t { | ||
+ | void clear (); | ||
+ | | ||
+ | | ||
+ | }; | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | Central to the filter options are the two classes shown in Figure <imgref figure_how_a_filter_option_is_stored> | ||
+ | |||
+ | The request and response objects passed as parameters to // | ||
+ | |||
+ | ==== Inherited Filter objects ==== | ||
+ | |||
+ | < | ||
+ | {{..: | ||
+ | |||
+ | Two levels of inheritance are used for filters, as illustrated in Figure <imgref figure_inheritance_hierarchy_for_filter> | ||
+ | |||
+ | ==== The collection server code ==== | ||
+ | |||
+ | Here are the header files in // | ||
+ | |||
+ | < | ||
+ | |< - 120 420 >| | ||
+ | | **browsefilter.h** | Inherited from // | ||
+ | | | | ||
+ | | **collectserver.h** | This object binds Filters and Sources for one collection together, to form the Collection object depicted in Figure <imgref figure_greenstone_runtime_system> | ||
+ | | **colservrconfig.h** | Function support for reading the collection-specific files // | ||
+ | | **filter.h** | The base class Filter object // | ||
+ | | **maptools.h** | Defines a class called // | ||
+ | | **mggdbmsource.h** | Inherited from // | ||
+ | | **mgppqueryfilter.h** | Inherited from // | ||
+ | | **mgppsearch.h** | Inherited from // | ||
+ | | **mgq.h** | Function-level interface to the mg package. Principal functions are // | ||
+ | | **mgqueryfilter.h** | Inherited from // | ||
+ | | **mgsearch.h** | Inherited from // | ||
+ | | **phrasequeryfilter.h** | Inherited from // | ||
+ | | **phrasesearch.h** | Functional support to implement phrase searching as a post-processing operation. | | ||
+ | | **querycache.h** | Used by // | ||
+ | | **queryfilter.h** | Inherited from the Filter base class // | ||
+ | | **queryinfo.h** | Support for searching: data structures and objects to hold query parameters, document results and term frequencies. | | ||
+ | | **search.h** | The base class Search object // | ||
+ | | **source.h** | The base class Source object // | ||
+ | |||
+ | ===== Protocol ===== | ||
+ | |||
+ | < | ||
+ | |< - 132 397 >| | ||
+ | | // | ||
+ | | // | ||
+ | | // | ||
+ | | //ping()// | Returns //true// if a successful connection was made to the named collection. In the null protocol the implementation is identical to // | ||
+ | | // | ||
+ | | // | ||
+ | | // | ||
+ | | // | ||
+ | | // | ||
+ | |||
+ | Table <tblref table_list_of_protocol_calls> | ||
+ | |||
+ | < | ||
+ | < | ||
+ | class nullproto : public recptproto { | ||
+ | public: | ||
+ | | ||
+ | | ||
+ | comerror_t &err, ostream & | ||
+ | | ||
+ | bool & | ||
+ | comerror_t &err, ostream & | ||
+ | | ||
+ | bool & | ||
+ | comerror_t &err, ostream & | ||
+ | | ||
+ | ColInfoResponse_t & | ||
+ | comerror_t &err, ostream & | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | const InfoFilterOptionsRequest_t & | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | const DocumentRequest_t & | ||
+ | DocumentResponse_t & | ||
+ | comerror_t &err, ostream & | ||
+ | }; | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | Figure <imgref figure_null_protocol_api> | ||
+ | |||
+ | This protocol inherits from the base class // | ||
+ | |||
+ | With the exception of // | ||
+ | |||
+ | Most functions take the collection name as an argument. Three of the member functions, // | ||
+ | |||
+ | ===== Receptionist ===== | ||
+ | |||
+ | The final layer of the conceptual model is the receptionist. Once the CGI arguments are parsed, the main activity is the execution of an Action, supported by the Format and Macro Language objects. These are described below. Although they are represented as objects in the conceptual framework, Format and Macro Language objects are not strictly objects in the C++ sense. In reality, Format is a collection of data structures with a set of functions that operate on them, and the Macro Language object is built around // | ||
+ | |||
+ | ==== Actions ==== | ||
+ | |||
+ | < | ||
+ | |< - 132 397 >| | ||
+ | | //action// | Base class for virtual inheritance. | | ||
+ | | // | ||
+ | | // | ||
+ | | // | ||
+ | | // | ||
+ | | // | ||
+ | | // | ||
+ | | // | ||
+ | | // | ||
+ | | // | ||
+ | | // | ||
+ | |||
+ | Greenstone supports the eleven actions summarised in Table <tblref table_actions_in_greenstone> | ||
+ | |||
+ | < | ||
+ | <code 1> | ||
+ | cgiarginfo arg_ainfo; | ||
+ | arg_ainfo.shortname = " a " ; | ||
+ | arg_ainfo.longname = " action" | ||
+ | arg_ainfo.multiplechar = true; | ||
+ | arg_ainfo.argdefault = " p" ; | ||
+ | arg_ainfo.defaultstatus = cgiarginfo:: | ||
+ | arg_ainfo.savedarginfo = cgiarginfo:: | ||
+ | argsinfo.addarginfo (NULL, arg_ainfo); | ||
+ | |||
+ | arg_ainfo.shortname = " p" ; | ||
+ | arg_ainfo.longname = " page" ; | ||
+ | arg_ainfo.multiplechar = true; | ||
+ | arg_ainfo.argdefault = " home" ; | ||
+ | arg_ainfo.defaultstatus = cgiarginfo:: | ||
+ | arg_ainfo.savedarginfo = cgiarginfo:: | ||
+ | argsinfo.addarginfo (NULL, arg_ainfo); | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | The CGI arguments needed by an action are formally declared in its constructor function using // | ||
+ | |||
+ | For each CGI argument, the constructor must specify its short name (lines 2 and 10), which is the name of the CGI variable itself; a long name (lines 3 and 11) that is used to provide a more meaningful description of the action; whether it represents a single or multiple character value (lines 4 and 12); a possible default value (lines 5 and 13); what happens when more than one default value is supplied (lines 6 and 14) (since defaults can also be set in configuration files); and whether or not the value is preserved at the end of this action (lines 7 and 15) . | ||
+ | |||
+ | Since it is built into the code, web pages that detail this information can be generated automatically. The // | ||
+ | |||
+ | The twelve inherited actions are constructed in //main()//, the top-level function for the //library// executable, whose definition is given in // | ||
+ | |||
+ | < | ||
+ | < | ||
+ | class action { | ||
+ | protected: | ||
+ | | ||
+ | | ||
+ | public: | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | cgiargsclass &args, ostream & | ||
+ | | ||
+ | cgiargsclass &args, | ||
+ | outconvertclass & | ||
+ | const text_t & | ||
+ | ostream & | ||
+ | | ||
+ | recptprotolistclass *protos, | ||
+ | response_t & | ||
+ | text_t & | ||
+ | ostream & | ||
+ | | ||
+ | | ||
+ | cgiargsclass &args, | ||
+ | recptprotolistclass *protos, | ||
+ | ostream & | ||
+ | | ||
+ | cgiargsclass &args, | ||
+ | recptprotolistclass *protos, | ||
+ | ostream & | ||
+ | | ||
+ | recptprotolistclass *protos, | ||
+ | browsermapclass *browsers, | ||
+ | displayclass &disp, | ||
+ | outconvertclass & | ||
+ | ostream & | ||
+ | ostream & | ||
+ | }; | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | Figure <imgref figure_action_base_class_api> | ||
+ | |||
+ | Explanations of the member functions are as follows. | ||
+ | |||
+ | * // | ||
+ | * // | ||
+ | * // | ||
+ | * // | ||
+ | * // | ||
+ | * // | ||
+ | * // | ||
+ | * // | ||
+ | |||
+ | At the beginning of the class definition, // | ||
+ | |||
+ | ==== Formatting ==== | ||
+ | |||
+ | < | ||
+ | < | ||
+ | enum command_t {comIf, comOr, comMeta, comText, comLink, comEndLink, | ||
+ | | ||
+ | | ||
+ | enum pcommand_t {pNone, pImmediate, pTop, pAll}; | ||
+ | enum dcommand_t {dMeta, dText}; | ||
+ | enum mcommand_t {mNone, mCgiSafe}; | ||
+ | struct metadata_t { | ||
+ | void clear(); | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | }; | ||
+ | // The decision component of an {If}{decision, | ||
+ | // formatstring. The decision can be based on metadata or on text; | ||
+ | // normally that text would be a macro like | ||
+ | // _cgiargmode_. | ||
+ | struct decision_t { | ||
+ | void clear(); | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | }; | ||
+ | struct format_t { | ||
+ | void clear(); | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | }; | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | Although formatting is represented as a single entity in Figure <imgref figure_greenstone_runtime_system>, | ||
+ | |||
+ | < | ||
+ | {{..: | ||
+ | |||
+ | The implementation is best explained using an example. When the format statement | ||
+ | |||
+ | < | ||
+ | format CL1Vlist | ||
+ | " | ||
+ | </ | ||
+ | |||
+ | is read from a collection configuration file, it is parsed by functions in // | ||
+ | |||
+ | One complication is that when metadata is retrieved, it might include further macros and format syntax. This is handled by switching back and forth between parsing and evaluating, as needed. | ||
+ | |||
+ | ==== Macro language ==== | ||
+ | |||
+ | The Macro Language entity in Figure <imgref figure_greenstone_runtime_system>, | ||
+ | |||
+ | Again, the implementation is best explained using an example. First we give some sample macro definitions that illustrate macro precedence, then—with the aid of a diagram—we describe the core data structures built to support this activity. Finally we present and describe the public member functions to // | ||
+ | |||
+ | < | ||
+ | < | ||
+ | package query | ||
+ | _header_ [] | ||
+ | _header_ [l=en] | ||
+ | _header_ [c=demo] | ||
+ | _header_ [v=1] {_textquery_} | ||
+ | _header_ [l=fr, | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | In a typical Greenstone installation, | ||
+ | |||
+ | < | ||
+ | macroprecedence c,v,l | ||
+ | </ | ||
+ | |||
+ | in the main configuration file // | ||
+ | |||
+ | < | ||
+ | {{..: | ||
+ | |||
+ | Figure <imgref figure_data_structures_representing_the_default_macros> | ||
+ | |||
+ | < | ||
+ | < | ||
+ | class displayclass | ||
+ | { | ||
+ | public: | ||
+ | | ||
+ | | ||
+ | int isdefaultmacro (text_t package, const text_t & | ||
+ | int setdefaultmacro (text_t package, const text_t & | ||
+ | | ||
+ | int loaddefaultmacros (text_t thisfilename); | ||
+ | void openpage (const text_t & | ||
+ | const text_t & | ||
+ | void setpageparams (text_t thispageparams, | ||
+ | | ||
+ | int setmacro (const text_t & | ||
+ | | ||
+ | const text_t & | ||
+ | void expandstring (const text_t & | ||
+ | void expandstring (text_t package, const text_t & | ||
+ | | ||
+ | void setconvertclass (outconvertclass *theoutc) {outc = theoutc;} | ||
+ | | ||
+ | | ||
+ | }; | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | The central object that supports the macro language is // | ||
+ | |||
+ | When a page is to be produced, // | ||
+ | |||
+ | < | ||
+ | cout << text_t2ascii << display << " | ||
+ | << | ||
+ | </ | ||
+ | |||
+ | The result is that macros are expanded according to the page parameter settings. If required, these settings can be changed partway through an action by using // | ||
+ | |||
+ | ==== The receptionist code ==== | ||
+ | |||
+ | The principal objects in the receptionist have now been described. Below we detail the supporting classes, which reside in // | ||
+ | |||
+ | A second set of lexically scoped files include the prefix //z3950//. The files provide remote access to online databases and catalogs that make their content publicly available using the Z39.50 protocol. | ||
+ | |||
+ | Another large group of supporting files include the term // | ||
+ | |||
+ | * // | ||
+ | * // | ||
+ | * // | ||
+ | * // | ||
+ | * // | ||
+ | * // | ||
+ | |||
+ | Actions access // | ||
+ | |||
+ | < | ||
+ | |< - 140 390 >| | ||
+ | | **OIDtools.h** | Function support for evaluating document identifiers over the protocol. | | ||
+ | | **action.h** | Base class for the Actions entity depicted in Figure <imgref figure_greenstone_runtime_system> | ||
+ | | **authenaction.h** | Inherited action for handling authentication of a user. | | ||
+ | | **browserclass.h** | Base class for abstract browsing activities. | | ||
+ | | **browsetools.h** | Function support that accesses the // | ||
+ | | **cgiargs.h** | Defines // | ||
+ | | **cgiutils.h** | Function support for CGI arguments using the data structures defined in // | ||
+ | | **cgiwrapper.h** | Function support that does everything necessary to output a page using the CGI protocol. Access is through the function \\ '' | ||
+ | | **collectoraction.h** | Inherited action that facilitates end-user collection-building through the Collector. The page generated comes from // | ||
+ | | **comtypes.h** | Core types for the protocol. | | ||
+ | | **converter.h** | Object support for stream converters. | | ||
+ | | **datelistbrowserclass.h** | Inherited from // | ||
+ | | **documentaction.h** | Inherited action used to retrieve a document or part of a classification hierarchy. | | ||
+ | | **extlinkaction.h** | Inherited action that controls whether or not a user goes straight to an external link or passes through a warning page alerting the user to the fact that they are about to move outside the digital library system. | | ||
+ | | **formattools.h** | Function support for parsing and evaluating collection configuration //format// statements. Described in more detail in Section [[## | ||
+ | | **historydb.h** | Data structures and function support for managing a database of previous queries so a user can start a new query that includes previous query terms. | | ||
+ | | **hlistbrowserclass.h** | Inherited from // | ||
+ | | **htmlbrowserclass.h** | Inherited from // | ||
+ | | **htmlgen.h** | Function support to highlight query terms in a //text_t// string. | | ||
+ | | **htmlutils.h** | Function support that converts a //text_t// string into the equivalent html. The symbols ", //&//, //<//, and //>// are converted into //& | ||
+ | | **infodbclass.h** | Defines two classes: // | ||
+ | | **invbrowserclass.h** | Inherited from // | ||
+ | | **nullproto.h** | Inherited from // | ||
+ | | **pageaction.h** | Inherited action that, in conjunction with the macro file named in //p=page//, generates a web page. | | ||
+ | | **pagedbrowserclass.h** | Inherited from // | ||
+ | | **pingaction.h** | Inherited action that checks to see whether a particular collection is responding. | | ||
+ | | **queryaction.h** | Inherited action that takes the stipulated query, settings and preferences and performs a search, generating as a result the subset of //o=num// matching documents starting at position //r=num//. | | ||
+ | | **querytools.h** | Function support for querying. | | ||
+ | | **receptionist.h** | Top-level object for the receptionist. Maintains a record of CGI argument information, | ||
+ | | **recptconfig.h** | Function support for reading the site and main configuration files. | | ||
+ | | **recptproto.h** | Base class for the protocol. | | ||
+ | | **statusaction.h** | Inherited action that generates, in conjunction with // | ||
+ | | **tipaction.h** | Inherited action that produces, in conjunction with //tip.dm//, a web page containing a tip taken at random from a list of tips stored in //tip.dm//. | | ||
+ | | **userdb.h** | Data structure and function support for maintaining a gdbm database of users: their password, groups, and so on. | | ||
+ | | **usersaction.h** | An administrator action inherited from the base class that supports adding and deleting users, as well as modifying the groups they are in. | | ||
+ | | **vlistbrowserclass.h** | Inherited from // | ||
+ | | **z3950cfg.h** | Data structure support for the Z39.50 protocol. Used by // | ||
+ | | **z3950proto.h** | Inherited from // | ||
+ | | **z3950server.h** | Further support for the Z39.50 protocol. | | ||
+ | |||
+ | ===== Initialisation ===== | ||
+ | |||
+ | Initialisation in Greenstone is an intricate operation that processes configuration files and assigns default values to data fields. In addition to inheritance and constructor functions, core objects define //init()// and // | ||
+ | |||
+ | Greenstone uses several configuration files for different purposes, but all follow the same syntax. Unless a line starts with the hash symbol (#) or consists entirely of white space, the first word defines a keyword, and the remaining words represent a particular setting for that keyword. | ||
+ | |||
+ | The lines from configuration files are passed, one at a time, to // | ||
+ | |||
+ | After processing the keyword and before the function terminates, some versions of // | ||
+ | |||
+ | In C++, data fields are normally initialized by the object' | ||
+ | |||
+ | < | ||
+ | < | ||
+ | ============ | ||
+ | Main program | ||
+ | ============ | ||
+ | Statically construct Receptionist | ||
+ | Statically construct NullProtocol | ||
+ | Establish the value for ’gsdlhome’ by reading gsdlsite.cfg | ||
+ | Foreach directory in GSDLHOME/ | ||
+ | Add directory name (now treated as collection name) to NullProtocol: | ||
+ | Dynamically construct Collection | ||
+ | Dynamically construct Gdbm class | ||
+ | Dynamically construct the Null Filter | ||
+ | Dynamically construct the Browse Filter | ||
+ | Dynamically construct MgSearch | ||
+ | Dynamically construct the QueryFilter | ||
+ | Dynamically construct the MgGdbmSource | ||
+ | Configure Collection with ’collection’ | ||
+ | Passing ’collection’ value on to Filters and Sources: | ||
+ | Configure Receptionist with ’collectinfo’: | ||
+ | Passing ’collectinfo’ value on to Actions, Protocols, and Browsers: | ||
+ | Add NullProtocol to Receptionist | ||
+ | Add in UTF-8 converter | ||
+ | Add in GB converter | ||
+ | Add in Arabic converter | ||
+ | Foreach Action: | ||
+ | Statically construct Action | ||
+ | Add Action to Receptionist | ||
+ | Foreach Browsers: | ||
+ | Statically construct Browser | ||
+ | Add Browser to Receptionist | ||
+ | Call function cgiwrapper: | ||
+ | ================= | ||
+ | Configure objects | ||
+ | ================= | ||
+ | Configure Receptionist with ’collection’ | ||
+ | Passing ’collection’ value on to Actions, Protocols, and Browsers: | ||
+ | NullProtocol not interested in ’collection’ | ||
+ | Configure Receptionist with ’httpimg’ | ||
+ | Passing ’httpimg’ value on to Actions, Protocols, and Browsers: | ||
+ | NullProtocol passing ’httpimg’ on to Collection | ||
+ | Passing ’httpimg’ value on to Filters and Sources: | ||
+ | Configure Receptionist with ’gwcgi’ | ||
+ | Passing ’gwcgi’ value on to Actions, Protocols, and Browsers: | ||
+ | NullProtocol passing ’gwcgi’ on to Collection | ||
+ | Passing ’gwcgi’ value on to Filters and Sources: | ||
+ | Reading in site configuration file gsdlsite.cfg | ||
+ | Configure Recptionist with ’gsdlhome’ | ||
+ | Passing ’gsdlhome’ value on to Actions, Protocols, and Browsers: | ||
+ | NullProtocol passing ’gsdlhome’ on to Collection | ||
+ | Passing ’gsdlhome’ value on to Filters and Sources: | ||
+ | Configure Recptionist with ... | ||
+ | ... and so on for all entries in gsdlsite.cfg | ||
+ | Reading in main configuration file main.cfg | ||
+ | Configure Recptionist with ... | ||
+ | ... and so on for all entries in main.cfg | ||
+ | ==================== | ||
+ | Initialising objects | ||
+ | ==================== | ||
+ | Initialise the Receptionist | ||
+ | Configure Receptionist with ’collectdir’ | ||
+ | Passing ’collectdir’ value on to Actions, Protocols, and Browsers: | ||
+ | NullProtocol not interested in ’collectdir’ | ||
+ | Read in Macro files | ||
+ | Foreach Actions | ||
+ | Initialise Action | ||
+ | Foreach Protocol | ||
+ | Initialise Protocol | ||
+ | When Protocol==NullProtocol: | ||
+ | Foreach Collection | ||
+ | Reading Collection’s build.cfg | ||
+ | Reading Collection’s collect.cfg | ||
+ | Configure Collection with ’creator’ | ||
+ | Passing ’creator’ value on to Filters and Sources: | ||
+ | Configure Collection with ’maintainer’ | ||
+ | Passing ’maintainer’ value on to Filters and Sources: | ||
+ | ... and so on for all entries in collect.cfg | ||
+ | Foreach Browsers | ||
+ | Initialise Browser | ||
+ | ============= | ||
+ | Generate page | ||
+ | ============= | ||
+ | Parse CGI arguments | ||
+ | Execute designated Action to produce page | ||
+ | End. | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | Figure <imgref figure_initialising_greenstone_using_the_null_protocol> | ||
+ | |||
+ | Next //main()// adds the NullProtocol object to the Receptionist, | ||
+ | |||
+ | There are three sections to // | ||
+ | |||
+ | The second phase of // | ||
+ | |||
+ | The final phase of // | ||
+ | |||
+ | The reason for the separation of the configuration, | ||
legacy/manuals/en/develop/the_greenstone_runtime_system.txt · Last modified: 2023/03/13 01:46 by 127.0.0.1