Table of Contents
Converting Greenstone 2 collections to Greenstone 3
While the indexes and databases of a Greenstone 3 collection are the same as for a Greenstone 2 version, the configuration files are very different.
A Greenstone 2 collection doesn't need to be rebuilt to work in Greenstone 3, but the two configuration files, etc/collect.cfg and index/build.cfg need to be converted to their Greenstone 3 equivalents etc/collectionConfig.xml and index/buildConfig.xml.
The easiest way to achieve this is to copy the Greenstone 2 collection into the Greenstone 3 collect folder, then open it in Greenstone 3's GLI. (The Greenstone 2 collection will need to have both a etc/collect.cfg file and an index/build.cfg file.) GLI will notice that it is a greenstone 2 collection and will create the Greenstone 3 versions of the configuration files.
The tricky part of the conversion process is converting the format statements. For Greenstone versions 3.06 and later a Format Conversion Wizard is provided to help you with this process.
The Format Conversion wizard tries to automatically do the conversions itself, then presents you with the tentative Greenstone 3 format statements generated from the Greenstone 2 ones, one format statement at a time, so that you can adjust them within the Wizard itself, or accept the suggestions for now and adjust them in GLI's Format Features later.
Once you have adjusted or accepted the format statements, you can go to the Create panel and preview the collection. It does not need to be rebuilt at this stage. Depending on how the collection looks, you may need to go back to the Format panel and modify the format statements manually.
A complex example
Sometimes, the Greenstone 3 format statements generated are accurate but can be simplified further. For example, if you were to open the Small Beatles collection from the Greenstone 2 Multimedia tutorial in Greenstone 3.06 GLI, then the Format Conversion wizard will produce the following format statement for the documentNode template, and something similar for the VList classifier.
<td valign="top"> <gsf:switch> <gsf:metadata name="dc.Format"/> <gsf:when test="equals" test-value="Lyrics"> <gsf:link type="document">_iconlyrics_</gsf:link> </gsf:when> </gsf:switch> <gsf:switch> <gsf:metadata name="dc.Format"/> <gsf:when test="equals" test-value="Discography"> <gsf:link type="document">_icondisc_</gsf:link> </gsf:when> </gsf:switch> <gsf:switch> <gsf:metadata name="dc.Format"/> <gsf:when test="equals" test-value="Tablature"> <gsf:link type="document">_icontab_</gsf:link> </gsf:when> </gsf:switch> <gsf:switch> <gsf:metadata name="dc.Format"/> <gsf:when test="equals" test-value="MARC"> <gsf:link type="document">_iconmarc_</gsf:link> </gsf:when> </gsf:switch> <gsf:switch> <gsf:metadata name="dc.Format"/> <gsf:when test="equals" test-value="Images"> <gsf:link type="source"> <gsf:metadata name="thumbicon"/> </gsf:link> </gsf:when> </gsf:switch> <gsf:switch> <gsf:metadata name="dc.Format"/> <gsf:when test="equals" test-value="Supplementary"> <gsf:link type="source"> <gsf:metadata name="srcicon"/> </gsf:link> </gsf:when> </gsf:switch> <gsf:switch> <gsf:metadata name="dc.Format"/> <gsf:when test="equals" test-value="Audio"> <gsf:link type="source"> <gsf:switch> <gsf:metadata name="FileFormat"/> <gsf:when test="equals" test-value="MIDI">_iconmidi_</gsf:when> <gsf:otherwise>_iconmp3_</gsf:otherwise> </gsf:switch> </gsf:link> </gsf:when> </gsf:switch> </td>
Compare this format statement with the equivalent documentNode template in the manually-created collectionConfig.xml for the same collection in the Greenstone 3 Multimedia tutorial:
<td valign="top"> <gsf:switch> <gsf:metadata name="dc.Format"/> <gsf:when test="equals" test-value="Lyrics"> <gsf:link type="document"> <gsf:icon file="lyrics.gif" select="collection"/> </gsf:link> </gsf:when> <gsf:when test="equals" test-value="Discography"> <gsf:link type="document"> <gsf:icon file="disc.gif" select="collection"/> </gsf:link> </gsf:when> <gsf:when test="equals" test-value="Tablature"> <gsf:link type="document"> <gsf:icon file="tab.gif" select="collection"/> </gsf:link> </gsf:when> <gsf:when test="equals" test-value="MARC"> <gsf:link type="document"> <gsf:icon file="marc.gif" select="collection"/> </gsf:link> </gsf:when> <gsf:when test="equals" test-value="Images"> <gsf:link type="source"> <gsf:metadata name="thumbicon"/> </gsf:link> </gsf:when> <gsf:when test="equals" test-value="Supplementary"> <gsf:link type="source"> <gsf:metadata name="srcicon"/> </gsf:link> </gsf:when> <gsf:when test="equals" test-value="Audio"> <gsf:link type="source"> <gsf:switch> <gsf:metadata name="FileFormat"/> <gsf:when test="equals" test-value="MIDI"> <gsf:icon file="midi.gif" select="collection"/> </gsf:when> <gsf:otherwise> <gsf:metadata name="srcicon"/> </gsf:otherwise> </gsf:switch> </gsf:link> </gsf:when> </gsf:switch> </td> <td valign="top"> <gsf:link type="document"> <!-- Defined in the global format statement --> <xsl:call-template name="choose-title"/> </gsf:link> </td>
There are two significant differences between the two.
1. First, the automatically generated format statements refer to macros like _icondisc_, _icontab_, whereas the hand-written format statements make mention of icons using the form <gsf:icon file="tab.gif" select="collection"/>. You could manually type these out in the Format Features section of the Greenstone 3 GLI.
2. The other thing to notice is that there are many individual <gsf:switch/> statements in the automatically generated format statement, all testing the value of dc.Format field and deciding what to do based on that, whereas the manually generated one has simplified this into a single <gsf:switch/>. This simplification is only possible because all the individual switch statements are on the same variable, the dc.Format.
Automatically generated format statement:
<gsf:switch> <gsf:metadata name="dc.Format"/> <gsf:when test="equals" test-value="SOMETHING-1"> do something 1 </gsf:when> </gsf:switch> <gsf:switch> <gsf:metadata name="dc.Format"/> <gsf:when test="equals" test-value="SOMETHING-2"> do something 2 </gsf:when> </gsf:switch> ...
The hand-written version looks like:
<gsf:switch> <gsf:metadata name="dc.Format"/> <gsf:when test="equals" test-value="Something-1"> do something 1 </gsf:when> <gsf:when test="equals" test-value="Something-2"> do something 2 </gsf:when> ... </gsf:switch>
Using a remote Greenstone server
The new Format Conversion Wizard only appears when you're working with GLI, not client-GLI. The client-GLI for GS3 will only perform the most basic initial step in the conversion process, which is to preserve the GS2 format statements in inactive XML tags in the new collection's collectionConfig.xml.
However, if you have a local Greenstone 3 installed, you can still manage to convert a remote collection's collect.cfg file to its GS3 equivalent.
- Open the GS2 collection on the remote GS3 server with client-GLI. Doing so for the first time will perform the preliminary conversion step of the GS2 collect.cfg into collectionConfig.xml.
- Download the remote collection's
etc/collectionConfig.xmlfile. If you open a remote collection, it is likely to download a zipped up a copy of the collection's etc folder and its contents into your client machine's .gli folder in the user area. - Open regular GLI of a GS3 installation on the client machine. Create a new empty collection. Quit GLI.
- Go to the local GS3 installation's
web/site/localsite/collection/<new-collection-name>/etcfolder and place the downloadedcollectionConfig.xmlin here. - Start regular GLI again. Re-open the newly created collection and now it should present you with the Format Conversion Wizard to lead you through inspecting the automatic conversion of the GS2 format statements to GS3.
- Once you're finished with the Wizard, go to the Format > Format features tab and copy all the format statements into a temporary text file.
- Restart client-GLI and connect to the remote GS3 server.
- Re-open the remote collection whose format statements need updating from GS2 to GS3.
- Go to the Format Features tab.
- Back in your temporary text file, copy over the format statement of each format feature into its correct section in the Format Features tab, replacing the text already there.
- The collection including its updated
collectionConfig.xmlfile is automatically saved.
Mapping gs2 links into gs3 versions
Scenario: you have an existing Greenstone 2 collection, with a lot of external sites pointing to documents in it. Eg external databases link to your Greenstone documents. The links might look like this: http://myserver.com/gsdl/cgi-bin/library.cgi?e=d-01000-00---off-0400years--00-1----0-10-0---0---0direct-10---4-------0-1l--10-hy-50---20-about---00-3-1-00-0--4--0--0-0-11-10-0utfZz-8-00&a=d&c=400years&cl=CL1&d=HASH890dfd99a182f878a55a60
You want to upgrade your library to Greenstone 3. Now, links to the documents look like this: http://myserver.com/greenstone3/library/collection/400years/document/HASHcb152708d0031d02aab2f3
Is there a way of automatically mapping the gs2 links into the gs3 collection?
Part 1: apache rewrite
Edit the apache httpd.conf file to rewrite the requests.
(If you want to test this out using the apache webserver that came with Greenstone2, add these lines to the end of the file greentone2/apache-httpd/linux/conf/httpd.conf.in - this is a template used to generate the actual runtime version - and then restart apache (gs2-server.sh)
RewriteEngine On
# a test rule to make sure rewriting is working at all - visit localhost:8282/test
RewriteRule ^/test$ http://example.com/ [R=302,L]
# redirect to Greenstone 3
# Match library.cgi requests
RewriteCond %{REQUEST_URI} ^/greenstone/cgi-bin/library\.cgi$ [NC]
#only a=d versions
RewriteCond %{QUERY_STRING} a=d [NC]
# Extract collection (c=...) and document (d=...) from query string
RewriteCond %{QUERY_STRING} ^.*c=([^&]+).*[&]d=([^&]+).*$ [NC]
# the captured groups will be %1 and %2
RewriteRule ^/greenstone/cgi-bin/library\.cgi$ https://localhost:8383/greenstone3/library/collection/%1/document/%2? [R=301,L]
# optional, if things aren't working, you can log some output to see if matching is happening
RewriteLog "/tmp/rewrite.log"
RewriteLogLevel 5
Notes:
- older versions of Greenstone 2 used gsdl instead of greenstone in URLS, so modify these lines accordingly.
- If this apache is not on the same server as the running Greenstone 3, then change localhost:8383 to the public URL of the gs3 library.
Part 2: Get matching doc ids
If you are using HASH ids, then these can change between greenstone versions. Tip - if you want external links into your library, use an OID type that won't change. eg hash_on_full_filename, assigned, filename, dirname, full_filename.
If the Greenstone 2 used hash ids, when you rebuild in greenstone 3, then hash ids will change. and then the redirects won't work.
Some options to try:
Don't rebuild
Copy the collection over to greenstone3, open it in GLI to convert the config files - format statements may need tweaking - then see if it works in the library without rebuilding. The HASH ids won't have changed, but whether the collection works properly will depend on the gap between versions. A downside to this way is that you can never rebuild the collection.
Use archives as import
If you have the archives folder available in the greenstone 2 collection, you could use this as the import for the greenstone 3 collection. Set up the collection so that it looks right in Greenstone 3 - either by starting from scratch, or by copying over the old collection as above. Copy all the HASHxx folders from the greenstone 2 collection's archives folder into the greenstone3 collection's import folder. You don't want archiveinf-* files, rss.items, earliestDatestamp files. Then do a full rebuild.
Extract ids and add as metadata
If you no longer have the archives folder in the Greenstone 2 collection, then your task is a bit more complicated. What might work is: 1. run db2txt.pl over the gdbm metadata database in the greenstone 2 collection and save to a file, eg: db2txt collect/demo/index/text/demo.gdb > db.out (If the Db is .jdb use jdb2txt.pl instead) 2. Write a script to extract filenames and identifiers from the output. Probably into a CSV file would be easiest:
Filename,prev.Identifier sample.pdf,HASH01e86960c45a06eaa801e869
3. Put the source documents into the Greenstone 3 collection, add the csv file. Add CSVPlugin to process the csv file. Then use -OIDtype assigned, and -OIDmetadata prev.Identifier import options to get it to use these identifiers as the new doc ids.
This relies on no subfolders in import.
