====== Setting up the Webserver ====== In this section we describe how to set up your webserver to work with Greenstone. Note that all this is unnecessary when using the Windows Local Library, because this software works “out of the box” and does not require a webserver. We discuss both the Apache webserver, which is freely available for both Windows and Unix (see the Appendix [[.:appendix_associated_software|appendix_associated_software]] for details) and Microsoft's Personal Web Server (PWS) and Internet Information Services (IIS) webserver. PWS is the standard Microsoft server for Windows 95/98; IIS is the standard webserver for Windows 2000 and the forthcoming Win dows XP ; Windows NT can use either. The Apache description applies equally to the Windows Web Library and Unix versions (though we use Windows-style terminology and pathnames); the PWS/IIS section applies only to the Windows Web Library. Once you have installed your webserver, the next step is to install Greenstone. We will assume that during the install procedure you have taken the default action for each stage by clicking on the //Next// button. The result is that the directory //C:\Program Files\gsdl// is created and the Web Library binary is stored there, along with some supporting files. All webservers use the special URL “localhost” to denote the computer that the webserver is running on. Thus when you install a webserver, you can get at your html documents by typing the URL // http:%%//%%localhost // into a browser. If your computer has a domain name set up, this is used instead of localhost to identify your computer from remote sites. Thus on the New Zealand Digital Library's computer, // http:%%//%%nzdl.org // and // http:%%//%%localhost // are equivalent. If you type // http:%%//%%nzdl.org // on your computer you will get the New Zealand Digital Library webserver, whereas if you type // http:%%//%%localhost // you will get your own computer's webserver. ===== The Apache web server ===== The Apache webserver is usually installed in //C:\Program Files\Apache Group\Apache// and is configured so that the cgi-bin directory is in the subdirectory //\cgi-bin// and the document root is the subdirectory //\htdocs//. It is reconfigured by editing the configuration file //C:\Program Files\Apache Group\Apache\conf\httpd.conf//. This is a text file: it's quite easy to read it to see how things are set up. Depending on how your computer's networking software is set up, you may have to add this line to Apache's //httpd.conf// configuration file: ServerName localhost If this line is not included, the system attempts to find your server's name. However, there are bugs in some versions of Windows that cause this to fail. In this case, Apache will exit immediately when you start it up. It does display an error message, but it is immediately erased and you probably can't read it. ==== Setting up the Greenstone cgi-bin directory ==== Cgi-bin is a directory from which the webserver treats documents as executable programs. Apache's //ScriptAlias// directive is used to create a cgi-bin directory. Note that this directive can make any directory a cgi executable directory—it doesn't have to be called “cgi-bin”! Conversely, a directory called “cgi-bin” isn't special unless //ScriptAlias// has been applied to it. When installed, Apache has a cgi-bin directory of //C:\Program Files\Apache Group\Apache\cgi-bin//. This means that if presented with the URL // http:%%//%%localhost/cgi-bin/hello //, the webserver will attempt to execute a file called //hello// from within the above directory. There is one Greenstone program, which is called “library.exe”, that needs to be executed by the webserver; it in turn reads a file called the Greenstone site configuration file, or “gsdlsite.cfg”, which needs to be located in the same directory. The best way of arranging this is to use Apache's //ScriptAlias// directive to create a new cgi-bin directory. Here's the excerpt from Apache's //httpd.conf// configuration file that adds //C:\Program Files\gsdl\cgi-bin// as an additional cgi-bin directory: ScriptAlias /gsdl/cgi-bin/ "C:/Program Files/gsdl/cgi-bin/" Options None AllowOverride None (It's a curious fact that Apache configuration files use forward slashes in place of standard Windows backslashes.) This means that any URLs of the form // http:%%//%%localhost/gsdl/cgi-bin //... will be sought in the directory //C:\Program Files\gsdl\cgi-bin//, and executed by the web server. For example, if presented with the URL // http:%%//%%localhost/gsdl/cgi-bin/hello //, the web server will attempt to retrieve the file //C:\Program Files\gsdl\cgi-bin\hello// and execute it. However, the URL // http:%%//%%localhost/cgi-bin/hello // looks in Apache's regular //cgi-bin// directory for the file //C:\Program Files\Apache Group\Apache\// //cgi-bin\hello// and executes it, just as it did before. ==== The document root directory ==== The document root directory is the root of your webserver's directory structure. When installed, Apache has a document root //of C:\Program Files\Apache Group\Apache\htdocs//. This means that if presented with the URL // http:%%//%%localhost/hello.html //, the webserver will attempt to retrieve a file called //hello.html// from within the above directory. Several files within Greenstone need to be read by the webserver. The simplest way to arrange this is to use the //Alias// directive, which is just like //ScriptAlias// except that it applies to ordinary web pages, not cgi scripts. Insert these lines into your Apache configuration file, after the //ScriptAlias// directive, to add //C:\Program Files\gsdl// as an additional place to look for documents. Alias /gsdl/ "C:/Program Files/gsdl/" Options Indexes MultiViews FollowSymLinks AllowOverride None Order allow,deny Allow from all This means that any URLs that match the first argument of Alias (gsdl) are sought as files in the place corresponding to the second argument. In other words, URLs of the form // http:%%//%%localhost/gsdl/ //... will be sought as files in the directory //C:\Program Files\gsdl//. For example, if presented with the URL // http:%%//%%localhost/gsdl/hello.html //, the webserver will attempt to retrieve the file //C:\Program Files\gsdl\hello.html//. However, the URL // http:%%//%%localhost/hello.html // looks in the regular //htdocs// directory for the file //C:\Program Files\Apache Group\Apache\htdocs\hello.html//, just as it did before. Be sure to add the //Alias// directive after the //ScriptAlias// directive. Instructing Apache to alias ///gsdl// before ///gsdl/cgi-bin// would match the URL ///gsdl/cgi-bin/library// against the Alias directive rather than the ScriptAlias, and it would be interpreted as a request for a document rather than the result of executing a program. The outcome would be to “display” the binary program file as a page in the Web browser, instead of executing it. ==== Security ==== You should be aware that if the web library version of Greenstone is set up as instructed above, anyone will be allowed to download any file in the //gsdl// directory structure. This includes the index files and source documents of any collections you make, the user database, usage logs, etc. If you are concerned about this, you can easily tighten up your webserver configuration to improve security. For the Apache webserver, put these lines into the configuration file instead of those given in the previous subsection: Alias /gsdl/ "C:/Program Files/gsdl/" Order allow,deny Deny from all Order allow,deny Allow from all This means that only files whose extensions match the regular expression in the //FilesMatch// line may be downloaded. ===== The PWS and IIS webservers ===== Although neither PWS nor IIS is installed by default on current Windows systems, they can easily be installed using the “Add/Remove programs” control panel . If they are not already on your Windows distribution CD-ROM you will have to download them from the Microsoft web site (// www.microsoft.com //). The setup procedure for Greenstone is identical for both PWS and IIS. Invoke the Personal Web Manager and perform the following actions. - Select //Advanced// to get the //Advanced Options// screen. - Select //Home// and click //Add//Fill out the fields as follows: * //Directory// field: //C:\Program Files\gsdl// * //Alias//field: //gsdl// * Access permissions: //Read// * Application permissions: //None// * Click //OK// * This makes Greenstone files accessible to the webserver. - Back in //Advanced Options//, select //gsdl// and click //Add//Fill out the fields as follows: * //Directory// field: //C:\Program Files\gsdl\cgi-bin// * //Alias// field: //cgi-bin// * Access permissions: //None// * Application permissions: //Execute// * Click //OK// * This allows the Greenstone program //library.exe// to be executed by the webserver. - Go to the URL // http:%%//%%localhost/gsdl/cgi-bin/library.exe // * Note: you need to specify the //.exe// file extension with PWS and IIS.