User Tools

Site Tools


en:user:gs2_to_gs3

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:user:gs2_to_gs3 [2026/04/27 22:39] – [Mapping gs2 links into gs3 versions] kjdonen:user:gs2_to_gs3 [2026/04/30 02:33] (current) – [Extract ids and add as metadata] kjdon
Line 211: Line 211:
  
 **Scenario:** you have an existing Greenstone 2 collection, with a lot of external sites pointing to documents in it. Eg external databases link to your Greenstone documents. **Scenario:** you have an existing Greenstone 2 collection, with a lot of external sites pointing to documents in it. Eg external databases link to your Greenstone documents.
-The links might look like this: [[http://myserver.com/gsdl/cgi-bin/library.cgi?e=d-01000-00---off-0400years--00-1----0-10-0---0---0direct-10---4-------0-1l--10-hy-50---20-about---00-3-1-00-0--4--0--0-0-11-10-0utfZz-8-00&a=d&c=400years&cl=CL1&d=HASH890dfd99a182f878a55a60]]+The links might look like this: [[http://myserver.com/greenstone/cgi-bin/library.cgi?e=d-01000-00---off-0400years--00-1----0-10-0---0---0direct-10---4-------0-1l--10-hy-50---20-about---00-3-1-00-0--4--0--0-0-11-10-0utfZz-8-00&a=d&c=400years&cl=CL1&d=HASH890dfd99a182f878a55a60]]
  
 You want to upgrade your library to Greenstone 3. Now, links to the documents look like this: You want to upgrade your library to Greenstone 3. Now, links to the documents look like this:
Line 218: Line 218:
 Is there a way of automatically mapping the gs2 links into the gs3 collection? Is there a way of automatically mapping the gs2 links into the gs3 collection?
  
-** Thoughts:**+==== Part 1: apache rewrite ==== 
 + 
 +Edit the apache httpd.conf file to rewrite the requests. 
 + 
 +(If you want to test this out using the apache webserver that came with Greenstone2, add these lines to the end of the file greentone2/apache-httpd/linux/conf/httpd.conf.in - this is a template used to generate the actual runtime version - and then restart apache (gs2-server.sh) 
 + 
 +<code> 
 +  RewriteEngine On 
 +  # a test rule to make sure rewriting is working at all - visit localhost:8282/test 
 +  RewriteRule ^/test$ http://example.com/ [R=302,L] 
 +   
 +  # redirect to Greenstone 3 
 +   
 +  # Match library.cgi requests 
 +  RewriteCond %{REQUEST_URI} ^/greenstone/cgi-bin/library\.cgi$ [NC] 
 +  #only a=d versions 
 +  RewriteCond %{QUERY_STRING} a=d [NC] 
 +   
 +  # Extract collection (c=...) and document (d=...) from query string 
 +  RewriteCond %{QUERY_STRING} ^.*c=([^&]+).*[&]d=([^&]+).*$ [NC] 
 +  # the captured groups will be %1 and %2 
 +  RewriteRule ^/greenstone/cgi-bin/library\.cgi$ https://localhost:8383/greenstone3/library/collection/%1/document/%2? [R=301,L] 
 + 
 +  # optional, if things aren't working, you can log some output to see if matching is happening 
 +  RewriteLog "/tmp/rewrite.log" 
 +  RewriteLogLevel 5 
 +</code>  
 + 
 +Notes:  
 +  older versions of Greenstone 2 used gsdl instead of greenstone in URLS, so modify these lines accordingly. 
 +  If this apache is not on the same server as the running Greenstone 3, then change localhost:8383 to the public URL of the gs3 library. 
 + 
 +==== Part 2: Get matching doc ids ==== 
 + 
 +If you are using HASH ids, then these can change between greenstone versions. Tip - if you want external links into your library, use an OID type that won't change. eg hash_on_full_filename, assigned, filename, dirname, full_filename.  
 + 
 +If the Greenstone 2 used hash ids, when you rebuild in greenstone 3, then hash ids will change. and then the redirects won't work. 
 + 
 +Some options to try: 
 +=== Don't rebuild === 
 + 
 +Copy the collection over to greenstone3, open it in GLI to convert the config files - format statements may need tweaking - then see if it works in the library without rebuilding. The HASH ids won't have changed, but whether the collection works properly will depend on the gap between versions. A downside to this way is that you can never rebuild the collection. 
 + 
 +=== Use archives as import === 
 +If you have the archives folder available in the greenstone 2 collection, you could use this as the import for the greenstone 3 collection. 
 +Set up the collection so that it looks right in Greenstone 3 - either by starting from scratch, or by copying over the old collection as above. 
 +Copy all the HASHxx folders from the greenstone 2 collection's archives folder into the greenstone3 collection's import folder. You don't want archiveinf-* files, rss.items, earliestDatestamp files. 
 +Then do a full rebuild. 
 + 
 +=== Extract ids and add as metadata === 
 + 
 +If you no longer have the archives folder in the Greenstone 2 collection, then your task is a bit more complicated.  
 +What might work is: 
 + 
 +1. run db2txt.pl over the gdbm metadata database in the greenstone 2 collection and save to a file, eg: db2txt collect/demo/index/text/demo.gdb > db.out (If the Db is .jdb use jdb2txt.pl instead) 
 + 
 +2. Write a script to extract filenames and identifiers from the output. Probably into a CSV file would be easiest: 
 +<code> 
 +Filename,prev.Identifier 
 +sample.pdf,HASH01e86960c45a06eaa801e869 
 +</code> 
 +3. Put the source documents into the Greenstone 3 collection, add the csv file. Add CSVPlugin to process the csv file. Then use -OIDtype assigned, and -OIDmetadata prev.Identifier import options to get it to use these identifiers as the new doc ids. 
 + 
 +This relies on no subfolders in import.
en/user/gs2_to_gs3.1777329589.txt.gz · Last modified: 2026/04/27 22:39 by kjdon