Z3950 support

'' This page has been updated for Greenstone 2.82. For an older version of this page, please see this page.''

What is Z39.50?
Z39.50 is an international client/server protocol for searching bibliographic data. It can use the Internet Protocol (TCP/IP), which makes the databases on a server available from almost anywhere around the globe. It is widely used, for example, in on-line library catalogues. It allows a user to search one or more databases and retrieve the results of the query.

Greenstone has support for z3950, both as a client and a server. GLI can download records using Z3950 and SRW. These records can then be included in a collection. This support is enabled by default. The Greenstone run-time can also act as a client to multiple Z3950 servers, and a Z3950 server program is also available. This support is not enabled by default, and recompilation is needed to enable it. To do this, you need the source code. If you haven't already got the source, download the version of the sourcecode component that matches the distribution you are using.

z3950 support in Greenstone is based around the YAZ toolkit, written by IndexData. We are currently using YAZ version 2.1.4. This is included in the Greenstone distribution without modification. Greenstone links against the libyaz.a library. We have also written yaz_zclient.h/cpp, which is based on the sample client.h/cpp. This is found in greenstone/runtime-src/src/z3950.

Download through Z39.50 from GLI
Greenstone can download records from a specified Z39.50 server from GLI. Start the Greenstone Librarian Interface. On the left-hand side of the Librarian Interface's Download panel, select Z39.50, and then specify the arguments on the right-hand side of the panel. There are five arguments:
 * host: name of the Z39.50 server, for example z3950.loc.gov
 * port: port number of the server, for example 7090
 * database: name of the database to which the query will be sent
 * find: query
 * The YAZ client supports the type-1 RPN query model. Both simple and advanced queries are supported, like single term query, boolean search and fielded search, etc. This page describes in detail the query syntax of YAZ. For exmaple, the following query finds records that have the phrase car history in title field

@attr 1=4 "car history"
 * max_records: select to limit the size of result set, by default 500 results will be retrieved

You can view the downloaded MARC files on the Gather panel. On the left-hand side of the panel, double click the Downloaded Files folder to expand its content. The subfolders are named by the Z39.50 server url. The MARC files are named as the combination of database name, query, and max_records if max_records is specified. These MARC files are physically stored in a temporary cache directory.

You can build a collection using these downloaded MARC files by dragging them across to the Collection section on the right-hand side of the Gather panel. MARCPlugin must be included in the collection plugin list. MARCPlugin important options include:
 * metadata_mapping_file: this is the name of the file that includes mapping details from MARC values to Greenstone metadata names. marc2dc.txt, which resides in the site's etc directory, is used by default and provides a mapping to Dublin Core. An alternative, marc2qdc.txt is also provided by Greenstone, and provides a mapping to qualified Dublin Core.
 * process_exp: specifies a perl regular expression to match against filenames.
 * split_exp: specifies a perl regular expression to split MARC files into segments, and each segment becomes a record in subsequent processing.

Download through SRW from GLI
SRW (Search/Retrieve Web service) is an alternative method that uses a web service to search and retrieve records from Z39.50 repositories. It replaces the Z39.50 communications protocol with HTTP and SOAP, but still supports the Z39.50 query syntax. Search results from SRW are in XML format.

Greenstone also supports downloading from a Z39.50 server through SRW. Go to the Download panel in Greenstone Librarian Interface and select SRW. On the right-hand side of the panel, there are five parameters, which are the same as for Z39.50 download. But different host and port values should be used here. For example, to connect to the Library of Congress Z39.50 server through SRW, the following host and port should be specified: host: http://z3950.loc.gov port: 7090/voyager?

The downloaded records are in XML format. Here is a sample record. You can view the downloaded records on the Gather panel. They are in the same subfolder as downloading from Z39.50 above.

To build a collection using the downloaded XML files, MARCXMLPlugin must be added to the collection plugin list. MARCXMLPlugin also uses the metadata_mapping_file option in the smae way as MARCPlugin.

Compiling Greenstone with z3950 support
./configure --enable-z3950 (--enable-apache-httpd) make make install
 * Linux/MacOS: in the top level greenstone directory, run
 * Use the enable-apache-httpd if you have been using or want to use the apache built in to Greenstone.

nmake /f win32.mak USE_Z3950=1
 * Windows:
 * needs Visual Studio 6
 * It only works with the web library, not the local library
 * In the runtime-src\packages\yaz directory, extract the files from yaz-2.1.4.tar.gz. You'll need to run the extraction twice, once for gzip, once for tar. You should end up with a yaz-2.1.4 directory here.
 * Edit the yaz-2.1.4/include/yaz/oid.h file and change "list" on line 256 to "greenstone_list".
 * Edit the runtime-src\src\z3950\z3950proto.cpp file and delete "extern" from lines 38 and 39 (yyin and yyout).
 * in the greenstone directory, run
 * To enable iconv or xml2, edit greenstone\runtime-src\win32.mak, and remove HAVE_ICONV=0 and/or HAVE_LIBXML2=0 from the make command for Yaz. You'll need to install these libraries.

Using the z3950 client
Once Greenstone has z3950 support compiled in, it can act as a client to multiple z3950 servers. The file greenstone/etc/packages/z3950/z3950.cfg specifies a list of servers to connect to. By default, no servers are set up, although the config file comes with one (commented out) example Z39.50 server, for the United States' Library of Congress.

Each entry consists of:


 * A unique "short name" for internal use by Greenstone. (Note, this should not be the same as any local collection short name.)
 * The internet name or address of the server, and optionally the port that the server is running on if not the default port 210.
 * The name of the database to search on that server.
 * A string that provides a meaningful name for the "collection".
 * An optional "About" string, providing some information about the database and/or server.
 * Optional icon fields, which are displayed instead of the text on the front page.

The entries need only be separated by whitespace, but for the purposes of clarity the sample entry uses newlines and tabs.

There is a list at the Library Of Congress website containing some servers publicly available for testing.

Greenstone will display a new "collection" for each server listed.

Using the z3950 server

 * The z3950server program is installed into bin/linux (or bin/darwin, bin/windows, depending on the operating system). It can be run from there, or copied to somewhere else.
 * The gsdlsite.cfg file needs to be copied from the cgi-bin directory to the directory you are running z3950server from. Check that the 'gsdlhome' entry is valid. The other entries don't matter.
 * 'setup' (Windows) or 'source setup.bash' (Linux/MacOS) needs to be run in the top level greenstone directory before running the server.

z3950server tcp:server-name:port-num
 * By default, the server listens on localhost:9999. To change the port or address, run
 * For example, 'z3950server tcp:localhost:8080' or 'z3950server tcp:kanuka.cs.waikato.ac.nz:7070'


 * On windows, the yaz.dll file (runtime-src/packages/yaz/yaz-2.1.4/bin/yaz.dll) needs to be on your Path, or put in the same directory as the z3950server.exe.
 * You can run 'z3950server -h' to see the list of options for the server.

Testing the server
You can test the z3950 server by connecting to it using the Greenstone z3950 client. In the greenstone/etc/packages/z3950/z3950.cfg file, add a server entry similar to the following:

zdemo kanuka.cs.waikato.ac.nz:9999 demo "The demo collection via z3950" About "This collection contains a few records from the Humanity Development Library"


 * The database name (line 3) specifies which Greenstone collection to search.
 * This works with MG, MGPP and Lucene collections.

Known Problems/Issues list

 * Because of the open nature of the standard, our client may not work with some servers.
 * Because of the large number of Marc fields, only the most frequently used fields have been given explicit names in the results. Furthermore, these are currently hard-coded to correspond to the USMARC field names.
 * The z39.50 client will not work when fast-cgi is used!. Currently, the use of the fast-cgi package (off by default) will disable the use of the z39.50 client code for Greenstone.
 * When viewing a z3950 "collection" in Greenstone, sometimes the browser may insert a large amount of space between the collection title and the navigation bar. Reload or Shift Reload should fix this problem.