This is an old revision of the document!

NZDL projects and Demonstrations

New Zealand Digital Library Project members have developed a range of practical software packages in the course of their research. Much of this software is available for download.

Digital libraries and indexing


Greenstone is the digital library system that generates most of the pages of the New Zealand Digital Library website. It is freely available under the GNU General public license, and has been adopted by numerous other projects. It is used to disseminate information by humanitarian organisations including Global Help Projects and United Nations organisations. Greenstone is available for download from


MG is an enhancement of the Managing Gigabytes full-text retrieval system that provides flexible stemming methods, weighting terms, term frequencies, merged indexes, machine independent indexes, and a port to MSDOS.


PreScript converts PostScript to plain ASCII or HTML. It detects paragraph boundaries, removes hyphenation, and interprets many ligatures.

Extracting data and metadata


links dont work. Sequitur is a method for inferring compositional hierarchies from strings by detecting repetition and factoring it out of the string by forming rules in a grammar. Sequitur is useful for recognizing lexical structure in strings, and excels at very long sequences. The Sequitur WWW interface detects structure in text sequences.


Kea is a program for automatically extracting keywords and keyphrases from the full text of documents. Candidate keyphrases are identified using rudimentary lexical processing, features are computed for each candidate, and machine learning is used to determines which candidates should be assigned as keyphrases.

Text Mining

See our Text Mining Webpage. ?? what link?

Browsing interfaces

Realistic Books

Realistic Books is a suite of programs for creating and interacting with an interactive three-dimensional simulation of a paper-based book.

3D Book Visualizer

The 3D Book Visualizer is an early version of the Realistic Book software. It supports these interactive features:

  • Spinning the book around
  • Zooming in and out
  • Turning a single page or a wodge of pages
  • Flipping through key pages
  • Switching between handling mode and reading mode.

It supports the PDF and DjVu document formats.

Phind - no longer works

Phind is an interface for browsing the phrases that occur in a collection. The phrases form an approximation of the topics covered. They are extracted from the noun-phrases occuring in the text, so nonsense phrases and phrases with very little information content are excluded. Each phrase is part of a hierarchy, and the user can browse more specialised topics, or retrieve documents that contain the phrase, at any point. You can see Phind in action in the UN Food and Agriculture Organisation collection.


The collage applet dynamically displays a given set of images. When an image is clicked, a new browser window opens and the associated URL is displayed.

The applet can be used in two different contexts: either within the Greenstone Digital Library Software or externally using a directory of images and associated links.

Collages have been included in the following Greenstone collections:

A collage using a directory of images can be found at Ian Witten's Collage.

Chinese Text Segmentation

Word segmentation is designed to find word boundaries in languages like Chinese and Japanese, which are (unlike English) written without spaces or other word delimiters (except for punctuation marks). It plays a significant role in applications that use the word as the basic unit due to the fact that machine-readable Chinese text is invariably stored in unsegmented form.

We have implemented a WWW interface for segmenting Chinese text.

If your web browser does not support Chinese text, illustrations of the transformation are available. Currently at


Electronic Lexical Knowledge Base (ELKB) is software for accessing and exploring the Roget's thesaurus. It also provides solutions for various natural language processing tasks. All scripts were originally developed as a part of Mario Jarmasz' Master thesis at the University of Ottawa, Canada.