Using Database Plugin

DatabasePlugin uses Perl's DBI module to import records from a database. DBI includes back-ends for mysql, postgresql, comma separated values (CSV), MS Excel, ODBC, sybase, etc. You will need to have the DBI module installed, as well as the appropriate back end module(s).

A dbi configuration file is needed, which specifies how to get records out of a database.

Access Processing

This describes my experience of getting DatabasePlugin to get records out of MS Access database.

  • My test collection is called testacc installed in c:\shaoqun\Greenstone (on Windows).
  • My access database file was called TestAccess.mdb and was put in my collection's directory (i.e. c:\shaoqun\Greenstone\collect\testacc). This location doesn't matter. If it goes in the import directory then you will either get warnings about how no plugin could process the file, or it will be inappropriately processed by another plugin as well as DatabasePlugin.
  • My test access table is called Students
  • I copied C:\shaoqun\Greenstone\etc\packages/example.dbi into the import directory of my collection.
  • This file was modified by setting the following variables:
 $db='DBI:ADO:Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\shaoqun\Greenstone\collect\testacc\TestAccess.mdb';
 $sql_query = 'SELECT * FROM Students';
 %db_to_greenstone_fields=(
 "Name" => "Title",
 "Address" => "text",
 "StudentsID" => "Identifier",
 );

This is a mapping between column names in the Students table, and metadata names in the Greenstone archive files.

  • Then I built the collection.

Notes about Perl modules.

  • download following modules from CPAN
    • DBI
    • DBD::ADO
    • Win32::OLE

Note:I couldn't find the Win32::OLE module through searching Win32::OLE at http://search.cpan.org, so I downloaded libwin32-0.26.tar.gz

  • install these modules: make sure you have run 'setup.bat' in your greenstone directory first)
  • put them in %GSDLHOME%/packages/cpan.
  • unpack them
  • setup the C/C++ compiler for Windows. I used VC++.
  • run
 perl Makefile.PL INSTALLSITELIB="%GSDLHOME%/perllib/cpan/perl-5.8" <br/> PREFIX="%GSDLHOME%/perllib/cpan/XXX" SITEPREFIX="%GSDLHOME%/perllib/cpan" 
 nmake 
 nmake test
 nmake install

This installs the modules into %GSDLHOME%/perllib/cpan/perl-5.8 (XXX in the perl line should be set to the first component of the module name, e.g. DBI,DBD,Win32)

Note: I added

 unshift(@INC, "$ENV{'GSDLHOME'}/perllib/cpan/perl-5.8");

in the BEGIN block of Makefile.PL When installing DBD::ADO as it complained not be able to find the DBI module which was installed in %GSDLHOME%/perllib/cpan/perl-5.8

I also added that line in the BEGIN block of DBPlug.pm When building the collection

MySQL Processing

This describes my experience of getting DBPlug to get records for a mysql database.

  • My test mysql is installed on wesson.cs.waikato.ac.nz
  • My greenstone is installed in /research/shaoqun/testing/gsdl/collect/testsql (on Linux).
  • My test database is called gswikidb.
  • My test database table is gw_user.
  • I copied /research/shaoqun/testing/gsdl/etc/packages/example.dbi into the import directory of my collection testsql.
  • This file was modified by setting the following variables:
 $db='DBI:mysql:gswikidb:wesson.cs.waikato.ac.nz';
 
 $username='root';
 (I used //root// without a password to log in. You may need to set $password if authentication is required.)
  
 $sql_query = 'SELECT * FROM gw_user';
  
 %db_to_greenstone_fields=(
 "user_name" => "Title",
 "user_real_name" => "text",
 "user_id" => "Identifier",
 ); 
 (This is a mapping between field names in gw_user, and metadata names in the <br/> Greenstone archive files.)
  • Then I built the collection.

Notes about Perl modules.

  • download following modules from CPAN
    • DBI
    • Data::ShowTable
    • DBD::mysql
  • install these modules: make sure you have run 'setup.bat' or 'source setup.bash' in your greenstone directory first)
  • put them in $GSDLHOME/packages/cpan.
  • untarred them (tar xzvf file.tar.gz)
  • for DBI and Data:ShowTable, run
 perl Makefile.PL INSTALLSITELIB="$GSDLHOME/perllib/cpan/perl-5.8"  PREFIX="$GSDLHOME/perllib/cpan/XXX" SITEPREFIX="$GSDLHOME/perllib/cpan" 
 make 
 make test
 make install

*for DBD:mysql, run

 perl Makefile.PL --testdb=gswikidb --testhost=wesson.cs.waikato.ac.nz --testuser=root <br/> INSTALLSITELIB="$GSDLHOME/perllib/cpan/perl-5.8" <br/>  PREFIX="$GSDLHOME/perllib/cpan/XXX" SITEPREFIX="$GSDLHOME/perllib/cpan" 
 make 
 make test
 make install

This installs the modules into $GSDLHOME/perllib/cpan/perl-5.8

(XXX in the perl line should be set to the first component of the module name, e.g. DBI,DBD,Data)

CSV Processing

This describes my experience of getting DBPlug to process CSV (comma separated value) files using DBPlug.

  • My test collection is called csvtest, and my greenstone is installed in /research/kjdon/home/gsdl (on Linux).
  • My csv file was called demo.txt, and I put it in my collection's directory (i.e. /research/kjdon/home/gsdl/collect/csvtest). This location doesn't matter. If it goes in the import directory then you will either get warnings about how no plugin could process the file, or it will be inappropriately processed by another plugin as well as DBPlug.
  • I copied /research/kjdon/home/gsdl/etc/packages/example.dbi into the import directory of my collection.
  • This file was modified by setting the following variables:
 $db='DBI:CSV:f_dir=/research/kjdon/home/gsdl/collect/csvtest;csv_quote_char=\";csv_sep_char=,';

f_dir is the directory containing the csv file. If you want to use ; as a separator, then you need to escape it, e.g. csv_sep_char=\;

 $sql_query = 'SELECT * FROM demo.txt';
%db_to_greenstone_fields=(
    "name" => "Title",
    "data" => "text",
    "language" => "Language",
    "filename" => "Filename"
);

This is a mapping between field names in the CSV file, and metadata names in the Greenstone archive files.

  • The csv file should consist of one line containing the field names (no . allowed, so don't put namespaces here). Then each line after that is a new record, with the values for each field. The separator char used between fields needs to be specified in the $db line in the dbi file. Don't have blank lines as they end up as new (useless) records too.
  • My csv file looked like this:
 filename,name,language,data
 b17mie/b17mie.htm,Microlivestock - Little-Known Small Animals with a Promising Economic Future (b17mie),English,"Animal Husbandry and Animal Product Processing|Other animals (micro-livestock, little known animals, silkworms, reptiles, frogs, snails, game, etc.)"
 b18ase/b18ase.htm,Little Known Asian Animals With a Promising Economic Future (b18ase),English,"Animal Husbandry and Animal Product Processing|Other animals (micro-livestock, little known animals, silkworms, reptiles, frogs, snails, game, etc.)"
  • Then I built the collection.

Notes about Perl modules.

My Linux distribution had the DBI module installed, but other needed modules were missing. I discovered which ones were needed by running the import: If a module is missing, you get an error like:

 install_driver(CSV) failed: Can't locate DBI/SQL/Nano.pm in @INC (@INC contains:.....) at 
 /research/kjdon/home/gsdl/perllib/cpan/perl-5.8/DBD/File.pm line 25.
 Compilation failed in require at /research/kjdon/home/gsdl/perllib/cpan/perl-5.8/DBD/CSV.pm line 26.
 Compilation failed in require at (eval 42) line 3.
 Perhaps a module that DBD::CSV requires hasn't been fully installed
 at /research/kjdon/home/gsdl/perllib/plugins/DBPlug.pm line 210

this message tells us that DBI/SQL/Nano.pm module is needed. I had to download and install the following modules:

  • DBD::CSV
  • DBD::File
  • DBI::SQL::Nano
  • SQL::Statement
  • Text::CSV_XS

To install these, I downloaded each from CPAN, (all tar files), and put them in $GSDLHOME/packages/cpan. I untarred them (tar xzvf file.tar.gz), and ran the following for each one: (make sure you have run 'setup.bat' or 'source setup.bash' in your greenstone directory first)

 perl Makefile.PL INSTALLSITELIB="$GSDLHOME/perllib/cpan/perl-5.8" PREFIX="$GSDLHOME/perllib/cpan/XXX" SITEPREFIX="$GSDLHOME/perllib/cpan"
 make
 make test
 make install

(XXX in the perl line should be set to the first component of the module name, e.g. DBD, DBI, SQL etc)

This installs the modules into $GSDLHOME/perllib/cpan/perl-5.8

Excel Processing

This describes my experience of getting DatabasePlugin to process MS excel files.

  • My test collection is called testexce installed in /research/shaoqun/testing/gsdl(on Linux).
  • My excel file was called phonebook.xls and was put in my collection's directory (i.e. /research/shaoqun/testing/gsdl/collect/testexce). This location doesn't matter. If it goes in the import directory then you will either get warnings about how no plugin could process the file, or it will be inappropriately processed by another plugin as well as DatabasePlugin.
  • I copied /research/shaoqun/testing/gsdl/etc/packages/example.dbi into the import directory of my collection.
  • This file was modified by setting the following variables:
 $db='DBI:Excel:file=/research/shaoqun/testing/gsdl/collect/testexce/phonebook.xls';
 $sql_query = 'SELECT * FROM Sheet1';

Important: the perl excel driver module used in this testing assumes TABLE = Worksheet and the contents of first row of each worksheet as column name. (Sheet1 is the name of the first worksheet of phonebook.xls). Be aware that the worksheet name is case-sensitive.

%db_to_greenstone_fields=(
    "name" => "Title",
    "address" => "text",
    "id" => "Identifier",
);

This is a mapping between column names in the Sheet1 worksheet, and metadata names in the Greenstone archive files.

  • Then I built the collection.

Notes about Perl modules.

  • download following modules from CPAN
    • DBI
    • DBD::Excel
    • Spreadsheet::ParseExcel
    • Spreadsheet::WriteExcel
    • SQL::Statement
    • OLE::Store_Lite
  • install these modules: make sure you have run 'setup.bat' or 'source setup.bash' in your greenstone directory first)
  • put them in $GSDLHOME/packages/cpan.
  • untarred them (tar xzvf file.tar.gz)
  • run
 perl Makefile.PL INSTALLSITELIB="$GSDLHOME/perllib/cpan/perl-5.8" <br/> PREFIX="$GSDLHOME/perllib/cpan/XXX" SITEPREFIX="$GSDLHOME/perllib/cpan" 
 make 
 make test
 make install

This installs the modules into $GSDLHOME/perllib/cpan/perl-5.8

(XXX in the perl line should be set to the first component of the module name, e.g. DBI,DBD,SQL,etc)

Note:for more information about using DBD::Excel, please refer to the perl files in the sample directory included in the Module.