User Tools

Site Tools


old:csv_processing_using_dbplug

This page is in the 'old' namespace, and was imported from our previous wiki. We recommend checking for more up-to-date information using the search box.

CSV Processing Using DatabasePlugin

This describes my experience of getting DBPlug to process CSV (comma separated value) files using DBPlug.

  • My test collection is called csvtest, and my greenstone is installed in /research/kjdon/home/gsdl (on Linux).
  • My csv file was called demo.txt, and I put it in my collection's directory (i.e. /research/kjdon/home/gsdl/collect/csvtest). This location doesn't matter. If it goes in the import directory then you will either get warnings about how no plugin could process the file, or it will be inappropriately processed by another plugin as well as DBPlug.
  • I copied /research/kjdon/home/gsdl/etc/packages/example.dbi into the import directory of my collection.
  • This file was modified by setting the following variables:
 $db='DBI:CSV:f_dir=/research/kjdon/home/gsdl/collect/csvtest;csv_quote_char=\";csv_sep_char=,';

f_dir is the directory containing the csv file. If you want to use ; as a separator, then you need to escape it, e.g. csv_sep_char=\;

 $sql_query = 'SELECT * FROM demo.txt';
%db_to_greenstone_fields=(
    "name" => "Title",
    "data" => "text",
    "language" => "Language",
    "filename" => "Filename"
);

This is a mapping between field names in the CSV file, and metadata names in the Greenstone archive files.

  • The csv file should consist of one line containing the field names (no . allowed, so don't put namespaces here). Then each line after that is a new record, with the values for each field. The separator char used between fields needs to be specified in the $db line in the dbi file. Don't have blank lines as they end up as new (useless) records too.
  • My csv file looked like this:
 filename,name,language,data
 b17mie/b17mie.htm,Microlivestock - Little-Known Small Animals with a Promising Economic Future (b17mie),English,"Animal Husbandry and Animal Product Processing|Other animals (micro-livestock, little known animals, silkworms, reptiles, frogs, snails, game, etc.)"
 b18ase/b18ase.htm,Little Known Asian Animals With a Promising Economic Future (b18ase),English,"Animal Husbandry and Animal Product Processing|Other animals (micro-livestock, little known animals, silkworms, reptiles, frogs, snails, game, etc.)"
  • Then I built the collection.

Notes about Perl modules.

My Linux distribution had the DBI module installed, but other needed modules were missing. I discovered which ones were needed by running the import: If a module is missing, you get an error like:

 install_driver(CSV) failed: Can't locate DBI/SQL/Nano.pm in @INC (@INC contains:.....) at 
 /research/kjdon/home/gsdl/perllib/cpan/perl-5.8/DBD/File.pm line 25.
 Compilation failed in require at /research/kjdon/home/gsdl/perllib/cpan/perl-5.8/DBD/CSV.pm line 26.
 Compilation failed in require at (eval 42) line 3.
 Perhaps a module that DBD::CSV requires hasn't been fully installed
 at /research/kjdon/home/gsdl/perllib/plugins/DBPlug.pm line 210

this message tells us that DBI/SQL/Nano.pm module is needed. I had to download and install the following modules:

  • DBD::CSV
  • DBD::File
  • DBI::SQL::Nano
  • SQL::Statement
  • Text::CSV_XS

To install these, I downloaded each from CPAN, (all tar files), and put them in $GSDLHOME/packages/cpan. I untarred them (tar xzvf file.tar.gz), and ran the following for each one: (make sure you have run 'setup.bat' or 'source setup.bash' in your greenstone directory first)

 perl Makefile.PL INSTALLSITELIB="$GSDLHOME/perllib/cpan/perl-5.8" PREFIX="$GSDLHOME/perllib/cpan/XXX" SITEPREFIX="$GSDLHOME/perllib/cpan"
 make
 make test
 make install

(XXX in the perl line should be set to the first component of the module name, e.g. DBD, DBI, SQL etc)

This installs the modules into $GSDLHOME/perllib/cpan/perl-5.8

old/csv_processing_using_dbplug.txt · Last modified: 2023/03/13 01:46 by 127.0.0.1