Differences

This shows you the differences between two versions of the page.

Link to this comparison view

en:user_advanced:using_databaseplugin [2014/04/14 11:52] (current)
Line 1: Line 1:
 +======Using Database Plugin======
 +
 +[[en:​plugin:​DatabasePlugin|DatabasePlugin]] uses Perl's DBI module to import records from a database.
 +DBI includes back-ends for mysql, postgresql, comma separated values (CSV), MS Excel, ODBC, sybase, etc. You will need to have the DBI module installed, as well as the appropriate back end module(s).
 +
 +A dbi configuration file is needed, which specifies how to get records out of a database. ​
 +
 +<TABAREA tabs="​Greenstone3,​Greenstone2">​
 +<TAB>
 +See ''<​GSDL3HOME>/​gs2build/​etc/​packages/​example.dbi''​ for an example config file.
 +
 +Assuming you have got all the necessary modules installed, then the basic way to use DBPlugin is:
 +
 +  * Add DatabasePlugin to the list of plugins for your collection.
 +  * Copy ''<​GSDL3HOME>/​gs2build/​etc/​packages/​example.dbi''​ into the import directory of your collection.
 +  * Modify this file appropriately
 +  * You may want to have more than one copy of the file, for different database connections/​queries. The name does not matter, but the file extension should be .dbi
 +  * Import and build the collection.
 +</​TAB>​
 +<TAB>
 +See ''<​GSDLHOME>/​etc/​packages/​example.dbi''​ for an example config file.
 +
 +Assuming you have got all the necessary modules installed, then the basic way to use DBPlugin is:
 +
 +  * Add DatabasePlugin to the list of plugins for your collection.
 +  * Copy ''<​GSDLHOME>/​etc/​packages/​example.dbi''​ into the import directory of your collection.
 +  * Modify this file appropriately
 +  * You may want to have more than one copy of the file, for different database connections/​queries. The name does not matter, but the file extension should be .dbi
 +  * Import and build the collection.
 +</​TAB>​
 +</​TABAREA>​
 +
 +
 +
 +===== Access Processing =====
 +
 +This describes my experience of getting DatabasePlugin to get records out of MS Access database.
 +
 +  * My test collection is called //testacc// installed in //​c:​\shaoqun\Greenstone//​ (on Windows).
 +  * My access database file was called TestAccess.mdb and was put in my collection'​s directory (i.e. c:​\shaoqun\Greenstone\collect\testacc). This location doesn'​t matter. If it goes in the import directory then you will either get warnings about how no plugin could process the file, or it will be inappropriately processed by another plugin as well as DatabasePlugin.
 +  * My test access table is called //​Students//​
 +  * I copied //​C:​\shaoqun\Greenstone\etc\packages/​example.dbi//​ into the import directory of my collection.
 +  * This file was modified by setting the following variables:
 +<​code>​
 + ​$db='​DBI:​ADO:​Provider=Microsoft.Jet.OLEDB.4.0;​Data Source=C:​\shaoqun\Greenstone\collect\testacc\TestAccess.mdb';​
 +</​code>​
 +<​code>​
 + ​$sql_query = '​SELECT * FROM Students';​
 +</​code>​
 +<​code>​
 + ​%db_to_greenstone_fields=(
 + "​Name"​ => "​Title",​
 + "​Address"​ => "​text",​
 + "​StudentsID"​ => "​Identifier",​
 + );
 +</​code>​
 +
 +This is a mapping between column names in the //​Students//​ table, and metadata names in the Greenstone archive files.
 +
 +  * Then I built the collection.
 +
 +**Notes about Perl modules.**
 +
 +  * download following modules from [[http://​search.cpan.org/​|CPAN]]
 +    * DBI
 +    * DBD::ADO
 +    * Win32::OLE
 +//Note//:I couldn'​t find the Win32::OLE module through searching Win32::OLE at http://​search.cpan.org,​ so I downloaded libwin32-0.26.tar.gz
 +  * install these modules: make sure you have run '​setup.bat'​ in your greenstone directory first)
 +  * put them in %GSDLHOME%/​packages/​cpan.
 +  * unpack them 
 +  * setup the C/C++ compiler for Windows. I used VC++. 
 +  * run
 +<​code>​
 + perl Makefile.PL INSTALLSITELIB="​%GSDLHOME%/​perllib/​cpan/​perl-5.8"​ <br/> PREFIX="​%GSDLHOME%/​perllib/​cpan/​XXX"​ SITEPREFIX="​%GSDLHOME%/​perllib/​cpan" ​
 + ​nmake ​
 + nmake test
 + nmake install
 +</​code>​
 +
 +This installs the modules into %GSDLHOME%/​perllib/​cpan/​perl-5.8 (XXX in the perl line should be set to the first component of the module name, e.g. DBI,​DBD,​Win32)
 +
 +//Note:// I added  ​
 +<​code>​
 + ​unshift(@INC,​ "​$ENV{'​GSDLHOME'​}/​perllib/​cpan/​perl-5.8"​);​
 +</​code>​
 +
 +in the //BEGIN// block of //​Makefile.PL//​ When installing DBD::ADO as it complained not be able to find the DBI module which was installed in //​%GSDLHOME%/​perllib/​cpan/​perl-5.8//​
 +
 +
 +I also added that line in the //BEGIN// block of //​DBPlug.pm//​ When building the collection
 +
 +===== MySQL Processing =====
 +This describes my experience of getting DBPlug to get records for a mysql database.
 +  * My test mysql is installed on //​wesson.cs.waikato.ac.nz//​
 +  * My greenstone is installed in ///​research/​shaoqun/​testing/​gsdl/​collect/​testsql//​ (on Linux).
 +  * My test database is called //​gswikidb//​.
 +  * My test database table is //​gw_user//​.
 +
 +  * I copied // /​research/​shaoqun/​testing/​gsdl/​etc/​packages/​example.dbi // into the import directory of my collection //​testsql//​.
 +  * This file was modified by setting the following variables:
 +<​code>​
 + ​$db='​DBI:​mysql:​gswikidb:​wesson.cs.waikato.ac.nz';​
 + 
 + ​$username='​root';​
 + (I used //root// without a password to log in. You may need to set $password if authentication is required.)
 +  ​
 + ​$sql_query = '​SELECT * FROM gw_user';​
 +  ​
 + ​%db_to_greenstone_fields=(
 + "​user_name"​ => "​Title",​
 + "​user_real_name"​ => "​text",​
 + "​user_id"​ => "​Identifier",​
 + ​); ​
 + (This is a mapping between field names in gw_user, and metadata names in the <br/> Greenstone archive files.)
 +</​code>​
 +  * Then I built the collection.
 +
 +**Notes about Perl modules.**
 +  * download following modules from [[http://​search.cpan.org/​|CPAN]]
 +    * DBI
 +    * Data::​ShowTable
 +    * DBD::mysql
 +  * install these modules: make sure you have run '​setup.bat'​ or '​source setup.bash'​ in your greenstone directory first)
 +  * put them in $GSDLHOME/​packages/​cpan.
 +  * untarred them (tar xzvf file.tar.gz)
 +  * for DBI and Data:​ShowTable,​ run
 +<​code>​
 + perl Makefile.PL INSTALLSITELIB="​$GSDLHOME/​perllib/​cpan/​perl-5.8" ​ PREFIX="​$GSDLHOME/​perllib/​cpan/​XXX"​ SITEPREFIX="​$GSDLHOME/​perllib/​cpan" ​
 + ​make ​
 + make test
 + make install
 +</​code>​
 +*for DBD:mysql, run 
 +<​code>​
 + perl Makefile.PL --testdb=gswikidb --testhost=wesson.cs.waikato.ac.nz --testuser=root <br/> INSTALLSITELIB="​$GSDLHOME/​perllib/​cpan/​perl-5.8"​ <​br/> ​ PREFIX="​$GSDLHOME/​perllib/​cpan/​XXX"​ SITEPREFIX="​$GSDLHOME/​perllib/​cpan" ​
 + ​make ​
 + make test
 + make install
 +</​code>​
 +
 +
 +This installs the modules into $GSDLHOME/​perllib/​cpan/​perl-5.8
 +
 +
 +(XXX in the perl line should be set to the first component of the module name, e.g. DBI,​DBD,​Data)
 +
 +===== CSV Processing =====
 +
 +This describes my experience of getting DBPlug to process CSV (comma separated value) files using DBPlug.
 +
 +  * My test collection is called csvtest, and my greenstone is installed in /​research/​kjdon/​home/​gsdl (on Linux).
 +  * My csv file was called demo.txt, and I put it in my collection'​s directory (i.e. /​research/​kjdon/​home/​gsdl/​collect/​csvtest). This location doesn'​t matter. If it goes in the import directory then you will either get warnings about how no plugin could process the file, or it will be inappropriately processed by another plugin as well as DBPlug.
 +  * I copied /​research/​kjdon/​home/​gsdl/​etc/​packages/​example.dbi into the import directory of my collection.
 +  * This file was modified by setting the following variables:
 +<​code>​
 + ​$db='​DBI:​CSV:​f_dir=/​research/​kjdon/​home/​gsdl/​collect/​csvtest;​csv_quote_char=\";​csv_sep_char=,';​
 +</​code>​
 +f_dir is the directory containing the csv file. If you want to use ; as a separator, then you need to escape it, e.g. csv_sep_char=\;​
 +<​code>​
 + ​$sql_query = '​SELECT * FROM demo.txt';​
 +</​code>​
 +<​code>​
 +%db_to_greenstone_fields=(
 +    "​name"​ => "​Title",​
 +    "​data"​ => "​text",​
 +    "​language"​ => "​Language",​
 +    "​filename"​ => "​Filename"​
 +);
 +</​code>​
 +
 +This is a mapping between field names in the CSV file, and metadata names in the Greenstone archive files.
 +
 +  * The csv file should consist of one line containing the field names (no . allowed, so don't put namespaces here). Then each line after that is a new record, with the values for each field. The separator char used between fields needs to be specified in the $db line in the dbi file. Don't have blank lines as they end up as new (useless) records too.
 +
 +  * My csv file looked like this:
 +<​code>​
 + ​filename,​name,​language,​data
 + ​b17mie/​b17mie.htm,​Microlivestock - Little-Known Small Animals with a Promising Economic Future (b17mie),​English,"​Animal Husbandry and Animal Product Processing|Other animals (micro-livestock,​ little known animals, silkworms, reptiles, frogs, snails, game, etc.)"
 + ​b18ase/​b18ase.htm,​Little Known Asian Animals With a Promising Economic Future (b18ase),​English,"​Animal Husbandry and Animal Product Processing|Other animals (micro-livestock,​ little known animals, silkworms, reptiles, frogs, snails, game, etc.)"
 +</​code>​
 +  * Then I built the collection.
 +
 +**Notes about Perl modules.**
 +
 +My Linux distribution had the DBI module installed, but other needed modules were missing. I discovered which ones were needed by running the import: If a module is missing, you get an error like:
 +<​code>​
 + ​install_driver(CSV) failed: Can't locate DBI/​SQL/​Nano.pm in @INC (@INC contains:​.....) at 
 + /​research/​kjdon/​home/​gsdl/​perllib/​cpan/​perl-5.8/​DBD/​File.pm line 25.
 + ​Compilation failed in require at /​research/​kjdon/​home/​gsdl/​perllib/​cpan/​perl-5.8/​DBD/​CSV.pm line 26.
 + ​Compilation failed in require at (eval 42) line 3.
 + ​Perhaps a module that DBD::CSV requires hasn't been fully installed
 + at /​research/​kjdon/​home/​gsdl/​perllib/​plugins/​DBPlug.pm line 210
 +</​code>​
 +this message tells us that DBI/​SQL/​Nano.pm module is needed.
 +I had to download and install the following modules:
 +  * DBD::CSV
 +  * DBD::File
 +  * DBI::​SQL::​Nano
 +  * SQL::​Statement
 +  * Text::​CSV_XS
 +To install these, I downloaded each from [[http://​search.cpan.org/​|CPAN]],​ (all tar files), and put them in $GSDLHOME/​packages/​cpan. I untarred them (tar xzvf file.tar.gz),​ and ran the following for each one:
 +(make sure you have run '​setup.bat'​ or '​source setup.bash'​ in your greenstone directory first)
 +<​code>​
 + perl Makefile.PL INSTALLSITELIB="​$GSDLHOME/​perllib/​cpan/​perl-5.8"​ PREFIX="​$GSDLHOME/​perllib/​cpan/​XXX"​ SITEPREFIX="​$GSDLHOME/​perllib/​cpan"​
 + make
 + make test
 + make install
 +</​code>​
 +(XXX in the perl line should be set to the first component of the module name, e.g. DBD, DBI, SQL etc)
 +
 +This installs the modules into $GSDLHOME/​perllib/​cpan/​perl-5.8
 +
 +
 +
 +===== Excel Processing =====
 +This describes my experience of getting DatabasePlugin to process MS excel files.
 +
 +  * My test collection is called testexce installed in /​research/​shaoqun/​testing/​gsdl(on Linux).
 +  * My excel file was called phonebook.xls and was put in my collection'​s directory (i.e. /​research/​shaoqun/​testing/​gsdl/​collect/​testexce). This location doesn'​t matter. If it goes in the import directory then you will either get warnings about how no plugin could process the file, or it will be inappropriately processed by another plugin as well as DatabasePlugin.
 +  * I copied /​research/​shaoqun/​testing/​gsdl/​etc/​packages/​example.dbi into the import directory of my collection.
 +  * This file was modified by setting the following variables:
 +<​code>​
 + ​$db='​DBI:​Excel:​file=/​research/​shaoqun/​testing/​gsdl/​collect/​testexce/​phonebook.xls';​
 +</​code>​
 +<​code>​
 + ​$sql_query = '​SELECT * FROM Sheet1';​
 +</​code>​
 +
 +**Important**:​ the perl excel driver module used in this testing assumes TABLE = Worksheet and the contents of first row of each worksheet as column name. (//Sheet1// is the name of the first worksheet of phonebook.xls). Be aware that the worksheet name is case-sensitive.
 +<​code>​
 +%db_to_greenstone_fields=(
 +    "​name"​ => "​Title",​
 +    "​address"​ => "​text",​
 +    "​id"​ => "​Identifier",​
 +);
 +</​code>​
 +
 +
 +This is a mapping between column names in the //Sheet1// worksheet, and metadata names in the Greenstone archive files.
 +  * Then I built the collection.
 +**Notes about Perl modules.**
 +  * download following modules from [[http://​search.cpan.org/​|CPAN]]
 +    * DBI
 +    * DBD::Excel
 +    * Spreadsheet::​ParseExcel
 +    * Spreadsheet::​WriteExcel
 +    * SQL::​Statement
 +    * OLE::​Store_Lite
 +  * install these modules: make sure you have run '​setup.bat'​ or '​source setup.bash'​ in your greenstone directory first)
 +  * put them in $GSDLHOME/​packages/​cpan.
 +  * untarred them (tar xzvf file.tar.gz)
 +  * run
 +<​code>​
 + perl Makefile.PL INSTALLSITELIB="​$GSDLHOME/​perllib/​cpan/​perl-5.8"​ <br/> PREFIX="​$GSDLHOME/​perllib/​cpan/​XXX"​ SITEPREFIX="​$GSDLHOME/​perllib/​cpan" ​
 + ​make ​
 + make test
 + make install
 +</​code>​
 +This installs the modules into $GSDLHOME/​perllib/​cpan/​perl-5.8
 +
 +(XXX in the perl line should be set to the first component of the module name, e.g. DBI,​DBD,​SQL,​etc)
 +
 +//​Note://​for more information about using DBD::Excel, please refer to the perl files in the //sample// directory included in the Module.