Introduction

One goal of the Greenstone Digital Library software is to empower organizations such as universities, United Nations agencies, non-governmental organizations, non-profit organizations and governments to create varied collections of information that can be delivered online or on CD-ROM.

Typical steps that have to be implemented are:

  1. Selecting the documents to be included
  2. Securing copyrights permissions to use these documents in the digital library
  3. Scanning and OCR of the hard-copy documents which are not available in to digital form to have a perfect digital format
  4. Converting all documents to a format (integrating text and images) which can be imported into Greenstone (preferably HTML or Microsoft Word, but others are also covered at varying levels of precision by a “plugin” (see the Greenstone User’s Manual)
  5. Tagging the chapters, paragraphs and images of the digital documents
  6. Organising the collection into a optimally structured digital library
  7. Building the digital library using the Greenstone software
  8. Printing and distributing the collection on CD-ROM and/or distributing it over the Internet

In order to create a digital collection, the publications must be available in digital format. If books, newsletters or other documents are only available on paper, they will need to be scanned and processed into machine-readable form (step iii). Usually this is done using optical character recognition (OCR), but sometimes by manual retyping. This process is covered in Chapters 2-4 of this manual.

Step v. enables the different parts of a document to be independently selected and displayed by readers in the final library, while step vi. involves assigning attributes to the documents such as subject categories, keywords and bibliographic data for ordering and searching the library. These steps are covered in Chapter 5 of this manual.

This manual introduces many issues that affect the editorial process of creating a collection from paper. Before reading on, you should consider these questions:

  • What is the goal of your collection?
  • What is your target group?
  • How big is it—local, regional, or global?
  • How many documents are you making available?
  • How many pages?
  • How much graphics content?
  • Does the material split into parts that will be consulted by a limited audience and parts that need to be disseminated widely?
  • Are the documents already available electronically?
  • If so, in which formats? (Note incidentally that PDF files are not automatically equivalent to digital full-text form, as they often contain only page images.)
  • What is the copyright status of the documents?
  • Who owns the copyright?
  • Are there other organizations with the same target audience?
  • Are you willing to collaborate with other groups?
  • What budget is available for the whole project?
  • What human resources are available (in person-months) for co-ordination, editing, scanning and programming?
  • How many computers are available for this project?
  • How many CD-ROMs do you want to distribute?
  • Will they be free, or for sale?