Old NZDL usage logs

We have some old usage log files for the New Zealand Digital Library website, running the Greenstone Digital Library software.

The files can be found at nzdl-storage/puka-archive/nzdl.org/logs

The format of the log records is:

library_program IP_address  [ date ] (greenstone cgi args and their values ) " browser information"

The IP addresses have been anonymized in a consistent manner, so that session tracking is available without user identification.

log filesizedates covered
usage 115.0 MMar 13, 2000 – Jun 21, 2001
usage 24.6 MJun 21, 2001 – Aug 10, 2001
usage 33.8MAug 10, 2001 – Oct 10, 2001
usage 45.5 MOct 10, 2001 – Dec 19, 2001
usage 53.9 MDec 19, 2001 – Mar 25, 2002
usage 68.7 MMar 25, 2002 – Oct 25, 2002
latest6.8 MApr 2, 2003 – Sep 31, 2003

Some points to note:

  • Logging started on March 13 2000.
  • Hits from cs.waikato.ac.nz are included so some stats might be affected slightly by use while testing.
  • Some stats will be affected quite a lot by users web browser's caching pages. That is, if a user has caching turned on (most will) then they could visit the same page many times while only creating a single entry in the log (for the first time). If caching wasn't in effect they'd create an entry for each return visit.
  • The cookies we use to identify unique users are useful but may not be completely accurate. Users with cookies turned off will appear as a different user for each hit they make. Other users might clear their cookies regularly so appear as a different user each time they return to the site. We can probably assume that the cookie data is correct for most normal users though.
  • Users who bookmarked the site will produce weird results if the greenstone arguments were changed since the bookmark was created (as it quite often is). An example of this is where the language value appears to have sometimes been set to incorrect values (like "20" or "00"). This is a case of the arguments being screwed up and the wrong value being assigned to the argument. Note that this won't effect only the language argument. Most of the arguments will be bogus for these entries. These bogus entries should make up a fairly insignificant proportion of the total however.
  • There is a large gap in the logs between Oct 25 2002 and April 2 2003.
  • There is a smaller gap in the logs between Dec 21 2000 and Jan 29 2001.
  • There are a few entries with dates in 1999. These probably occured at some point when the clock was wrong on the server.
  • Over time there will be lots of short term server errors like the above where things have gone wrong for a few weeks before anyone noticed. Examples are when the disk becomes full so the log file can't be added to (this probably accounts for the gap in January 2001). There may be other gaps.