Setting up the Webserver

In this section we describe how to set up your webserver to work with Greenstone. Note that all this is unnecessary when using the Windows Local Library, because this software works “out of the box” and does not require a webserver.

We discuss both the Apache webserver, which is freely available for both Windows and Unix (see the Appendix appendix_associated_software for details) and Microsoft's Personal Web Server (PWS) and Internet Information Services (IIS) webserver. PWS is the standard Microsoft server for Windows 95/98; IIS is the standard webserver for Windows 2000 and the forthcoming Win dows XP ; Windows NT can use either. The Apache description applies equally to the Windows Web Library and Unix versions (though we use Windows-style terminology and pathnames); the PWS/IIS section applies only to the Windows Web Library.

Once you have installed your webserver, the next step is to install Greenstone. We will assume that during the install procedure you have taken the default action for each stage by clicking on the Next button. The result is that the directory C:\Program Files\gsdl is created and the Web Library binary is stored there, along with some supporting files.

All webservers use the special URL “localhost” to denote the computer that the webserver is running on. Thus when you install a webserver, you can get at your html documents by typing the URL http://localhost into a browser. If your computer has a domain name set up, this is used instead of localhost to identify your computer from remote sites. Thus on the New Zealand Digital Library's computer, http://nzdl.org and http://localhost are equivalent. If you type http://nzdl.org on your computer you will get the New Zealand Digital Library webserver, whereas if you type http://localhost you will get your own computer's webserver.

The Apache web server

The Apache webserver is usually installed in C:\Program Files\Apache Group\Apache and is configured so that the cgi-bin directory is in the subdirectory \cgi-bin and the document root is the subdirectory \htdocs. It is reconfigured by editing the configuration file C:\Program Files\Apache Group\Apache\conf\httpd.conf. This is a text file: it's quite easy to read it to see how things are set up.

Depending on how your computer's networking software is set up, you may have to add this line to Apache's httpd.conf configuration file:

ServerName localhost

If this line is not included, the system attempts to find your server's name. However, there are bugs in some versions of Windows that cause this to fail. In this case, Apache will exit immediately when you start it up. It does display an error message, but it is immediately erased and you probably can't read it.

Setting up the Greenstone cgi-bin directory

Cgi-bin is a directory from which the webserver treats documents as executable programs. Apache's ScriptAlias directive is used to create a cgi-bin directory. Note that this directive can make any directory a cgi executable directory—it doesn't have to be called “cgi-bin”! Conversely, a directory called “cgi-bin” isn't special unless ScriptAlias has been applied to it.

When installed, Apache has a cgi-bin directory of C:\Program Files\Apache Group\Apache\cgi-bin. This means that if presented with the URL http://localhost/cgi-bin/hello , the webserver will attempt to execute a file called hello from within the above directory.

There is one Greenstone program, which is called “library.exe”, that needs to be executed by the webserver; it in turn reads a file called the Greenstone site configuration file, or “gsdlsite.cfg”, which needs to be located in the same directory.

The best way of arranging this is to use Apache's ScriptAlias directive to create a new cgi-bin directory. Here's the excerpt from Apache's httpd.conf configuration file that adds C:\Program Files\gsdl\cgi-bin as an additional cgi-bin directory:

ScriptAlias /gsdl/cgi-bin/ "C:/Program Files/gsdl/cgi-bin/"
<Directory C:/Program Files/gsdl/cgi-bin>
  Options None
  AllowOverride None
</Directory>

(It's a curious fact that Apache configuration files use forward slashes in place of standard Windows backslashes.)

This means that any URLs of the form http://localhost/gsdl/cgi-bin … will be sought in the directory C:\Program Files\gsdl\cgi-bin, and executed by the web server. For example, if presented with the URL http://localhost/gsdl/cgi-bin/hello , the web server will attempt to retrieve the file C:\Program Files\gsdl\cgi-bin\hello and execute it. However, the URL http://localhost/cgi-bin/hello looks in Apache's regular cgi-bin directory for the file C:\Program Files\Apache Group\Apache\ cgi-bin\hello and executes it, just as it did before.

The document root directory

The document root directory is the root of your webserver's directory structure. When installed, Apache has a document root of C:\Program Files\Apache Group\Apache\htdocs. This means that if presented with the URL http://localhost/hello.html , the webserver will attempt to retrieve a file called hello.html from within the above directory.

Several files within Greenstone need to be read by the webserver. The simplest way to arrange this is to use the Alias directive, which is just like ScriptAlias except that it applies to ordinary web pages, not cgi scripts. Insert these lines into your Apache configuration file, after the ScriptAlias directive, to add C:\Program Files\gsdl as an additional place to look for documents.

Alias /gsdl/ "C:/Program Files/gsdl/"
<Directory C:/Program Files/gsdl>
 Options Indexes MultiViews FollowSymLinks
 AllowOverride None
 Order allow,deny
 Allow from all
</Directory>

This means that any URLs that match the first argument of Alias (gsdl) are sought as files in the place corresponding to the second argument. In other words, URLs of the form http://localhost/gsdl/ … will be sought as files in the directory C:\Program Files\gsdl. For example, if presented with the URL http://localhost/gsdl/hello.html , the webserver will attempt to retrieve the file C:\Program Files\gsdl\hello.html. However, the URL http://localhost/hello.html looks in the regular htdocs directory for the file C:\Program Files\Apache Group\Apache\htdocs\hello.html, just as it did before.

Be sure to add the Alias directive after the ScriptAlias directive. Instructing Apache to alias /gsdl before /gsdl/cgi-bin would match the URL /gsdl/cgi-bin/library against the Alias directive rather than the ScriptAlias, and it would be interpreted as a request for a document rather than the result of executing a program. The outcome would be to “display” the binary program file as a page in the Web browser, instead of executing it.

Security

You should be aware that if the web library version of Greenstone is set up as instructed above, anyone will be allowed to download any file in the gsdl directory structure. This includes the index files and source documents of any collections you make, the user database, usage logs, etc.

If you are concerned about this, you can easily tighten up your webserver configuration to improve security. For the Apache webserver, put these lines into the configuration file instead of those given in the previous subsection:

Alias /gsdl/ "C:/Program Files/gsdl/"
<Directory "C:/Program Files/gsdl">
   Order allow,deny
   Deny from all
   <FilesMatch
"\.(gif|jpe?g|png|css|mov|mpeg|ps|pdf|doc|rtf|jar|class)$">
         Order allow,deny
         Allow from all
   </FilesMatch>
</Directory>

This means that only files whose extensions match the regular expression in the FilesMatch line may be downloaded.

The PWS and IIS webservers

Although neither PWS nor IIS is installed by default on current Windows systems, they can easily be installed using the “Add/Remove programs” control panel . If they are not already on your Windows distribution CD-ROM you will have to download them from the Microsoft web site ( www.microsoft.com ).

The setup procedure for Greenstone is identical for both PWS and IIS. Invoke the Personal Web Manager and perform the following actions.

  1. Select Advanced to get the Advanced Options screen.
  2. Select Home and click AddFill out the fields as follows:
    • Directory field: C:\Program Files\gsdl
    • Aliasfield: gsdl
    • Access permissions: Read
    • Application permissions: None
    • Click OK
    • This makes Greenstone files accessible to the webserver.
  3. Back in Advanced Options, select gsdl and click AddFill out the fields as follows:
    • Directory field: C:\Program Files\gsdl\cgi-bin
    • Alias field: cgi-bin
    • Access permissions: None
    • Application permissions: Execute
    • Click OK
    • This allows the Greenstone program library.exe to be executed by the webserver.
  4. Go to the URL http://localhost/gsdl/cgi-bin/library.exe
    • Note: you need to specify the .exe file extension with PWS and IIS.