Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
en:release:3.09_release_notes [2018/09/13 15:00]
anupama [Firefox browser doesn't remember you being logged into greenstone]
en:release:3.09_release_notes [2019/02/04 16:17] (current)
anupama [PDF plugin restructuring and the NEW PDFv2Plugin]
Line 56: Line 56:
  
 ===== Further instructions ===== ===== Further instructions =====
 +==== Setting up your Greenstone to run over https ====
 +The more secure https protocol is increasingly required by browsers and gradually superseding http. Given that you meet the following requirements and configure your GS3 as below, Greenstone 3 has now been automated to obtain an https certificate for you from the free Certification Authority "​Let'​s Encrypt"​.
  
 +Requirements:​ because we need to temporarily run a server on port 80 to get a certificate issued and because port 80 has some access restrictions surrounding it on most machines,
 +  * on unix (linux and mac) systems you need to have sudo permissions
 +  * on windows, you probably need admin rights
 +  * ensure nothing is running on port 80 when you're ready to set up https certification your GS3 
 +
 +Steps:
 +  - Edit build.properties as follows:
 +     * set ''​tomcat.server''​ to the //primary// hostname/​domain name that you want your Greenstone3 to run as and which is to be registered in your certificate. This would be the host name of your machine.
 +     * set a value for ''​keystore.pass''​.\\ This will be the password on your final certificate used by tomcat.
 +     * Ensure ''​server.protocols''​ contains ''​https''​.\\ The ''​server.protocols''​ property is a comma-separated list that indicates which protocols are to be supported by your Greenstone 3 server. This property can be set to one of ''​http'',​ ''​https'',​ ''​http,​ https''​ or ''​https,​ http''​. The first in the list becomes the default protocol used for previewing with the GS3 server application,​ ''​gs3-server''​.
 +     * By default ''​tomcat.port.https''​ is set to 8443. Ensure this port is not already in use, otherwise change it to a port value that's not in use.
 +  - Make sure you have read and agree with the [[https://​letsencrypt.org/​documents/​LE-SA-v1.2-November-15-2017.pdf|Let'​s Encrypt Subscriber Agreement]]
 +  - Use a terminal to go into your GS3 installation folder, run ''​gs3-setup.bat''​ on windows and ''​source ./​gs3-setup.sh''​ on linux and mac to set up the GS3 environment,​ then run the ''​ant setup-https-cert''​ target. For example on Linux,\\ <​code>​cd /​path/​to/​GS3
 +source ./​gs3-setup.sh
 +ant setup-https-cert
 +</​code>​\\ You'll be asked for an **email** that Let's Encrypt can optionally communicate to you on, as well as **any additional domain names** you want in the //same// certificate (additional domains are **untested**),​ and whether you **agree** with the Let's Encrypt Subscriber Agreement.\\ On linux or mac, you may be asked to provide your sudo password to run a server on port 80. (On Mac and windows, GS3 uses [[https://​zerossl.com/​usage.html|ZeroSSL]] to get Let's Encrypt to issue certificates,​ which will result in GS3's own tomcat server to be run on port 80 during certificate issuance. On Linux, we use //​Let'​s Encrypt//'​s own certbot-auto script for the certification process, and have it set to run a standalone temporary server on port 80.)
 +  - Once the setup-https-cert ant target has finished, you can start your web GS3 server by either running the gs3-server application or by running "ant start" from the terminal.
 +  - If you ran the gs3-server application,​ press the Enter Library button to open your DL home page. If you ran ''​ant start''​ from the command line, then open a browser manually. Point your browser to ''​https:​%%//​%%<​tomcat.server>:<​tomcat.port.https>/​greenstone3/​library'',​ adjusting the tomcat.server and tomcat.port.https values as per what you set for thse properties in your GS3 installation folder'​s toplevel ''​build.properties''​ file.
 +  - Once your https home page has loaded, confirm that your certificate is properly installed by looking for a green padlock next to the address bar. (Depending on your browser, you can click the padlock to get more information on the certificate issuer.)
 +
 +There are 2 more https-related automated ant targets you can run from the command line:
 +  * ''​ant remove-https-cert'':​ to revoke your https certificate
 +  * ''​ant renew-existing-https-cert'':​ This is to renew a certificate that you'd already earlier obtained with ''​ant setup-https-cert''​ explained above. A Let's Encrypt certificate needs renewing every 90 days, at which point your certificate will need to be reinstalled. For renewal, you will once more need to ensure all the same conditions as for issuance (the same conditions as when you ran ''​ant setup-https-cert''​),​ such as nothing running on port 80. Since renewal reinstall your certificate,​ you will need to stop your GS3 server first before running the ''​ant renew-existing-https-cert''​ target, then after the target has finished, run your GS3 server once more. Renewal will not take place despite running the ''​ant renew-existing-https-cert''​ target unless the approximate time for expiry has been reached (+/- 10 days on Windows/​Mac).
 +
 +**Important:​** Beware that if you've configured your GS3 to support http and https, by setting the ''​server.protocols''​ property to include both http and https, then switching between the two protocols when you visit your GS3 pages in your browser could result in the //http// variants of GS3 web pages not remembering you when you log in to them. For the solution and for further details, consult the section [[en:​release:​3.09_release_notes#​troubleshooting|Troubleshooting > Your browser doesn'​t remember you being logged into greenstone]]. ​
 +
 +
 +==== PDF plugin restructuring and the NEW PDFv2Plugin ====
 +
 +From GS3.09 onward, the GS3 binaries will henceforth include additional tools for converting from PDF to various text/​html/​image/​image+text formats. (For GS2, only the nightly binaries at http://​www.greenstone.org/​caveat-emptor/?​latest=latest will contain these changes.)
 +
 +We're deprecating the old "​PDFPlugin"​. And in its place there will be 2 plugins to handle PDFs:
 +  * "//​PDFv1Plugin//"​ which is the same as the old PDFPlugin but minus the PDFBox_conversion option. It returns to using the old ''​pdftohtml''​ tool to do the conversions,​ and is limited to older versions of PDFs.
 +  * **the recommended "//​PDFv2Plugin//"​**,​ which will contain the new functionality and should handle a greater range of PDF versions, including the newer ones that the old ''​pdftohtml''​ (now used by PDFv1Plugin) can't handle. The "​PDFBox conversion"​ facility has been moved to the new PDFv2Plugin,​ but is now invisible: it will be triggered automatically depending on the "​convert_to"​ format that you select when you Configure the PDFv2Plugin. PDFv2Plugin also uses additional conversion tools in the background to support the additional output formats.
 +
 +For the eventual 3.09 release, the old PDFPlugin that you're familiar with, the one which has the ''​pdfbox_conversion''​ flag but also makes use of the old ''​pdftohtml''​ tool behind the scenes, will hang around with a deprecated warning, to allow people to port over their collections and keep rebuilding with the old settings or to rebuild their collection with one of the 2 new PDF plugins. However, **new collections will have the //​PDFv2Plugin//​ in the Document Plugins pipeline by default for GS3, and PDFv1Plugin by default for GS2, since GS2 doesn'​t come with the PDFbox extension out of the box.** So GS2 users will have to manually add in PDFv2Plugin in place of PDFv1Plugin for new collections,​ after setting up the pdfbox extension. But then it should work as usual.
 +
 +
 +The "​convert_to"​ options/​output formats of the new PDFv2Plugin are:
 +  * ''​text'':​ a single stream of text;
 +  * ''​html'':​ a single stream of basic html from just the extracted text, no images;
 +  * ''​pretty_html'':​ each page is now an HTML page consisting of extracted text overlaid on top of a screenshot of the rest of the PDF page;
 +  * ''​paged_pretty_html''​ (also the default when convert_to is set to auto): ''​pretty_html'',​ but each page is a section;
 +  * ''​pagedimg_<​png|jpg>'':​ every PDF page as an image, sectionalised by page. Not searchable, since there'​s only images;
 +  * ''​pagedimgtxt_<​png|jpg>'':​ every PDF page as an image plus that page's extracted text, sectionalised by page.
 +
 +As always, text is only extracted from a PDF where extractable. This depends on user permissions for a PDF, whether the PDF contains actual extractable text and not just images of text, whether the PDF is undamaged, and any other such factors.
 +
 +There may be further adjustments made, including to display strings, but so far, we've decided on the above output formats and they seem to work on my regular PDF test documents. ​
 ====Changing the admin password==== ====Changing the admin password====
 Login to the administration page, '​edit'​ the admin account, and click '​change password'​. Alternatively,​ you can login as admin via the login button at the top right of each page. Once you are logged in, this button will change to say '​admin'​. Click this button and select '​account settings'​. From there, you can select '​change password'​. Login to the administration page, '​edit'​ the admin account, and click '​change password'​. Alternatively,​ you can login as admin via the login button at the top right of each page. Once you are logged in, this button will change to say '​admin'​. Click this button and select '​account settings'​. From there, you can select '​change password'​.
Line 220: Line 271:
  
 ===== Important Changes and Bug Fixes ===== ===== Important Changes and Bug Fixes =====
- +  * **HTTPS support:** Greenstone will obtain a certificate from the Certification Authority Let's Encrypt to run your GS3 tomcat over https. However, on unix systems (macs and linux), you will need to have sudo permissions. And on Windows you will probably need admin rights. For instructions on usage, see [[#​setting_up_your_greenstone_to_run_over_https|Setting up your Greenstone to run over https]] 
 +  * **GreenstoneSQLPlugin/​-out:​** used in place of GreenstoneXMLPlugin/​-out to write metadata and/or fulltext into a MySQL database instead of Greenstone doc.xml files. You can then use SQL statements to mass-edit metadata/​fulltext and rebuild your collection with the modified metadata/​fulltext. See the wiki page on [[http://​wiki.greenstone.org/​doku.php?​id=en:​user_advanced:​greenstonesqlplugs|Using the GreenstoneSQLPlugout with GreenstoneSQLPlugin]]. 
 +  * **The UnknownConverterPlugin:​** if you have a command line tool installed that can convert from a document format to text or html (or png/jpg/gif images) and which you're able to successfully run from the command line to do such a conversion, then you can configure the new UnknownConverterPlugin to launch that command line tool and run the conversion automatically. This will allows document formats unrecognised by other Greenstone plugins to have their full text extracted and made searchable in Greenstone. There is a tutorial for Greenstone 3 that covers how to use the UnknownConverterPlugin. 
 +  * **User comments** are now supported in GS3 as well. Refer to [[http://​wiki.greenstone.org/​doku.php?​id=en:​user:​user_comments|Enabling user comments]] 
 +  * OAI deletion policy 
 +  * Better way to run processes from GLI will avoid some occasional and unexpected errors when GLI runs perl scripts 
 +  * Bug fixes to file locking issues on Windows when using Lucene as indexer 
 +  * Patch to SOLR extension to circumvent SIGPIPE errors on large collections 
 +  * Patches to perl code upgrading perl syntax to work with newer versions of perl
 ===== IMPORTANT information ===== ===== IMPORTANT information =====
  
Line 227: Line 286:
 ==== Troubleshooting ==== ==== Troubleshooting ====
  
-=== Firefox ​browser doesn'​t remember you being logged into greenstone ===+=== Content Encoding Error when visiting the local solr servlet page ===  
 +If you see a Content Encoding Error when opening your GS3's solr servlet page at ''​http://​127.0.0.1:​8383/​solr''​ or ''​http://​localhost:​8383/​solr''​ in your browser, then this may have to do with the version of Java you have installed on your machine. From GS3.09 onward, if your machine has its own Java installed, then assuming that its version is sufficient and its bit-architecture (32 or 64 bit) matches, Greenstone will use your Java in preference to the bundled Java Runtime (JRE) that Greenstone ships with. We found that a recent version of Java (version 1.8.0_161 was problematic for us), caused the Content Encoding Error when visiting the solr servlet, whereas the bundled JRE and slightly earlier and much newer versions of Java such as 1.8.0_144 and 1.8.0_191 did not have these issues. 
 + 
 +**Solution:​** if you have a problematic version of Java installed,  
 +- either unset JAVA_HOME and remove this Java's ''​bin''​ folder from the PATH environment variable too, thus helping Greenstone 3 use its bundled JRE instead 
 +- install a newer version of Java on your system. We found that the current latest one, 1.8.0_191 worked successfully for this purpose. 
 + 
 +=== SIGPIPE errors when building a collection === 
 +We've added a work around to one kind of SIGPIPE errors which could occur with large collections when using ''​solr''​ as indexer. However, a couple of people on the mailing list encountered SIGPIPE errors on occasions when solr was not the indexer. **If your collection is using ''​jdbm''​ as the database type** and the error messages surrounding the SIGPIPE mention issues with "​transaction commit",​ then Mariana Pichinini on the mailing list found that the following helped: 
 +  * change the database type from ''​jdbm''​ to ''​gdbm''​ 
 +  * or leave the database type at ''​jdbm''​ and move your GS3's bundled JRE (the GS3's ''​packages/​jre''​ subfolder) outside your GS3 installation. Next install a newer Java on your system so that GS3.08 can find that. If on Linux, ensure you open a new terminal before running GLI or command line building your collection. 
 +=== Your browser doesn'​t remember you being logged into greenstone === 
 +**The issue:** 
 +The following scenario can occur if you set up GS3 with https, and your server.protocols property in build.properties contains both ''​http''​ and ''​https''​ (i.e. you have ''​server.protocols=http,​https''​ or ''​server.protocols=https,​http''​). 
 + 
 +Switching between visiting your Greenstone 3 digital library (DL) using http and https URLs can result in the http version of the pages not remembering your login details despite you logging in. This can happen if you ever started off with the https version of the URL to a Greenstone3 DL page and then moved to using the http version of your GS3 URL, or if you ever logged in to your GS3 over https and then attempt to log in later using http. 
 + 
 +**The solution:​** 
 +The solution is to either start a private window if you want to access your GS3 DL pages over http, or to first clear your browser cookies related to your GS3 DL before swapping from https to http. 
 + 
 +**The cause:** 
 +Using https causes session cookies to have the secure flag set to true. When a session cookie has the secure flag thus set, non http URLs cannot return that cookie in their subsequent requests to the server. Only https URLs can. See https://​developer.mozilla.org/​en-US/​docs/​Web/​HTTP/​Cookies section "​Secure and HttpOnly cookies"​ which states "A secure cookie is only sent to the server with an encrypted request over the HTTPS protocol"​ and https://​stackoverflow.com/​questions/​2321224/​cookie-across-http-and-https-in-php\\ It further seems that in http mode, the browser does not want to overwrite secure cookies created in https mode with new cookies sent by the server when using http mode. Thus after using https and acquiring secure session cookies, the server can no longer track a user's session when they switch to http until the cookies are cleared either explicitly or through opening a private window. 
 + 
 +<!--
 If you're on firefox and you just logged in to to a running Greenstone 3 digital library (DL), but visiting subsequent pages in the DL shows you that it has forgotten you're logged in, then you're probably encountering a restriction that your firefox browser has.  If you're on firefox and you just logged in to to a running Greenstone 3 digital library (DL), but visiting subsequent pages in the DL shows you that it has forgotten you're logged in, then you're probably encountering a restriction that your firefox browser has. 
  
Line 245: Line 327:
  
 After relaunching Firefox in Safe Mode, test whether your login details are being remembered this time. If it works now, it could indeed be an addon/​extension/​plugin or the hardware acceleration feature. Follow the suggestions and instructions at https://​support.mozilla.org/​en-US/​questions/​1213229 and https://​support.mozilla.org/​en-US/​kb/​forum-response-disable-hardware-acceleration to narrow down which of these it is. After relaunching Firefox in Safe Mode, test whether your login details are being remembered this time. If it works now, it could indeed be an addon/​extension/​plugin or the hardware acceleration feature. Follow the suggestions and instructions at https://​support.mozilla.org/​en-US/​questions/​1213229 and https://​support.mozilla.org/​en-US/​kb/​forum-response-disable-hardware-acceleration to narrow down which of these it is.
 +-->
  
 === Mac Installer fails === === Mac Installer fails ===