Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
en:release:3.09_release_notes [2018/09/13 15:00]
anupama [Firefox browser doesn't remember you being logged into greenstone]
en:release:3.09_release_notes [2018/10/02 19:31] (current)
anupama [PDF plugin restructuring and the NEW PDFv2Plugin]
Line 56: Line 56:
  
 ===== Further instructions ===== ===== Further instructions =====
 +==== Setting up your Greenstone to run over https ====
 +The more secure https protocol is increasingly required by browsers and gradually superseding http. Given that you meet the following requirements and configure your GS3 as below, Greenstone 3 has now been automated to obtain an https certificate for you from the free Certification Authority "​Let'​s Encrypt"​.
  
 +Requirements:​ because we need to temporarily run a server on port 80 to get a certificate issued and because port 80 has some access restrictions surrounding it on most machines,
 +  * on unix (linux and mac) systems you need to have sudo permissions
 +  * on windows, you probably need admin rights
 +  * ensure nothing is running on port 80 when you're ready to set up https certification your GS3 
 +
 +Steps:
 +  - Edit build.properties as follows:
 +     * set ''​tomcat.server''​ to the //primary// hostname/​domain name that you want your Greenstone3 to run as and which is to be registered in your certificate. This would be the host name of your machine.
 +     * set a value for ''​keystore.pass''​.\\ This will be the password on your final certificate used by tomcat.
 +     * Ensure ''​server.protocols''​ contains ''​https''​.\\ The ''​server.protocols''​ property is a comma-separated list that indicates which protocols are to be supported by your Greenstone 3 server. This property can be set to one of ''​http'',​ ''​https'',​ ''​http,​ https''​ or ''​https,​ http''​. The first in the list becomes the default protocol used for previewing with the GS3 server application,​ ''​gs3-server''​.
 +     * By default ''​tomcat.port.https''​ is set to 8443. Ensure this port is not already in use, otherwise change it to a port value that's not in use.
 +  - Make sure you have read and agree with the [[https://​letsencrypt.org/​documents/​LE-SA-v1.2-November-15-2017.pdf|Let'​s Encrypt Subscriber Agreement]]
 +  - Use a terminal to go into your GS3 installation folder, run ''​gs3-setup.bat''​ on windows and ''​source ./​gs3-setup.sh''​ on linux and mac to set up the GS3 environment,​ then run the ''​ant setup-https-cert''​ target. For example on Linux,\\ <​code>​cd /​path/​to/​GS3
 +source ./​gs3-setup.sh
 +ant setup-https-cert
 +</​code>​\\ You'll be asked for an **email** that Let's Encrypt can optionally communicate to you on, as well as **any additional domain names** you want in the //same// certificate (additional domains are **untested**),​ and whether you **agree** with the Let's Encrypt Subscriber Agreement.\\ On linux or mac, you may be asked to provide your sudo password to run a server on port 80. (On Mac and windows, GS3 uses [[https://​zerossl.com/​usage.html|ZeroSSL]] to get Let's Encrypt to issue certificates,​ which will result in GS3's own tomcat server to be run on port 80 during certificate issuance. On Linux, we use //​Let'​s Encrypt//'​s own certbot-auto script for the certification process, and have it set to run a standalone temporary server on port 80.)
 +  - Once the setup-https-cert ant target has finished, you can start your web GS3 server by either running the gs3-server application or by running "ant start" from the terminal.
 +  - If you ran the gs3-server application,​ press the Enter Library button to open your DL home page. If you ran ''​ant start''​ from the command line, then open a browser manually. Point your browser to ''​https:​%%//​%%<​tomcat.server>:<​tomcat.port.https>/​greenstone3/​library'',​ adjusting the tomcat.server and tomcat.port.https values as per what you set for thse properties in your GS3 installation folder'​s toplevel ''​build.properties''​ file.
 +  - Once your https home page has loaded, confirm that your certificate is properly installed by looking for a green padlock next to the address bar. (Depending on your browser, you can click the padlock to get more information on the certificate issuer.)
 +
 +There are 2 more https-related automated ant targets you can run from the command line:
 +  * ''​ant remove-https-cert'':​ to revoke your https certificate
 +  * ''​ant renew-existing-https-cert'':​ This is to renew a certificate that you'd already earlier obtained with ''​ant setup-https-cert''​ explained above. A Let's Encrypt certificate needs renewing every 90 days, at which point your certificate will need to be reinstalled. For renewal, you will once more need to ensure all the same conditions as for issuance (the same conditions as when you ran ''​ant setup-https-cert''​),​ such as nothing running on port 80. Since renewal reinstall your certificate,​ you will need to stop your GS3 server first before running the ''​ant renew-existing-https-cert''​ target, then after the target has finished, run your GS3 server once more. Renewal will not take place despite running the ''​ant renew-existing-https-cert''​ target unless the approximate time for expiry has been reached (+/- 10 days on Windows/​Mac).
 +
 +**Important:​** Beware that if you've configured your GS3 to support http and https, by setting the ''​server.protocols''​ property to include both http and https, then switching between the two protocols when you visit your GS3 pages in your browser could result in the //http// variants of GS3 web pages not remembering you when you log in to them. For the solution and for further details, consult the section [[en:​release:​3.09_release_notes#​troubleshooting|Troubleshooting > Your browser doesn'​t remember you being logged into greenstone]]. ​
 +
 +
 +==== PDF plugin restructuring and the NEW PDFv2Plugin ====
 +
 +From GS3.09 onward, the GS3 binaries will henceforth include additional tools for converting from PDF to various text/​html/​image/​image+text formats. (For GS2, only the nightly binaries at http://​www.greenstone.org/​caveat-emptor/?​latest=latest will contain these changes.)
 +
 +We're deprecating the old "​PDFPlugin"​. And in its place there will be 2 plugins to handle PDFs:
 +  * "//​PDFv1Plugin//"​ which is the same as the old PDFPlugin but minus the PDFBox_conversion option. It returns to using the old ''​pdftohtml''​ tool to do the conversions,​ and is limited to older versions of PDFs.
 +  * **the recommended "//​PDFv2Plugin//"​**,​ which will contain the new functionality and should handle a greater range of PDF versions, including the newer ones that the old ''​pdftohtml''​ (now used by PDFv1Plugin) can't handle. The "​PDFBox conversion"​ facility has been moved to the new PDFv2Plugin,​ but is now invisible: it will be triggered automatically depending on the "​convert_to"​ format that you select when you Configure the PDFv2Plugin. PDFv2Plugin also uses additional conversion tools in the background to support the additional output formats.
 +
 +For the eventual 3.09 release, the old PDFPlugin that you're familiar with, the one which has the ''​pdfbox_conversion''​ flag but also makes use of the old ''​pdftohtml''​ tool behind the scenes, will hang around with a deprecated warning, to allow people to port over their collections and keep rebuilding with the old settings or to rebuild their collection with one of the 2 new PDF plugins. However, **new collections will have the //​PDFv2Plugin//​ in the Document Plugins pipeline by default, for GS3, and PDFv1Plugin by default for GS2, since GS2 doesn'​t come with the PDFbox extension out of the box.** So GS2 users will have to manually add in PDFv2Plugin in place of PDFv1Plugin for new collections,​ after setting up the pdfbox extension. But then it should work as usual.
 +
 +
 +The "​convert_to"​ options/​output formats of the new PDFv2Plugin are:
 +  * ''​text'':​ a single stream of text;
 +  * ''​html'':​ a single stream of basic html from just the extracted text, no images;
 +  * ''​pretty_html'':​ each page is now an HTML page consisting of extracted text overlaid on top of a screenshot of the rest of the PDF page;
 +  * ''​paged_pretty_html''​ (also the default when convert_to is set to auto): ''​pretty_html'',​ but each page is a section;
 +  * ''​pagedimg_<​png|jpg>'':​ every PDF page as an image, sectionalised by page. Not searchable, since there'​s only images;
 +  * ''​pagedimgtxt_<​png|jpg>'':​ every PDF page as an image plus that page's extracted text, sectionalised by page.
 +
 +As always, text is only extracted from a PDF where extractable. This depends on user permissions for a PDF, whether the PDF contains actual extractable text and not just images of text, whether the PDF is undamaged, and any other such factors.
 +
 +There may be further adjustments made, including to display strings, but so far, we've decided on the above output formats and they seem to work on my regular PDF test documents. ​
 ====Changing the admin password==== ====Changing the admin password====
 Login to the administration page, '​edit'​ the admin account, and click '​change password'​. Alternatively,​ you can login as admin via the login button at the top right of each page. Once you are logged in, this button will change to say '​admin'​. Click this button and select '​account settings'​. From there, you can select '​change password'​. Login to the administration page, '​edit'​ the admin account, and click '​change password'​. Alternatively,​ you can login as admin via the login button at the top right of each page. Once you are logged in, this button will change to say '​admin'​. Click this button and select '​account settings'​. From there, you can select '​change password'​.
Line 220: Line 271:
  
 ===== Important Changes and Bug Fixes ===== ===== Important Changes and Bug Fixes =====
- +  * HTTPS support: Greenstone will obtain a certificate from the Certification Authority Let's Encrypt to run your GS3 tomcat over https. However, on unix systems (macs and linux), you will need to have sudo permissions. And on Windows you will probably need admin rights.
 ===== IMPORTANT information ===== ===== IMPORTANT information =====
  
Line 227: Line 278:
 ==== Troubleshooting ==== ==== Troubleshooting ====
  
-=== Firefox ​browser doesn'​t remember you being logged into greenstone ===+=== Your browser doesn'​t remember you being logged into greenstone === 
 +**The issue:** 
 +The following scenario can occur if you set up GS3 with https, and your server.protocols property in build.properties contains both ''​http''​ and ''​https''​ (i.e. you have ''​server.protocols=http,​https''​ or ''​server.protocols=https,​http''​). 
 + 
 +Switching between visiting your Greenstone 3 digital library (DL) using http and https URLs can result in the http version of the pages not remembering your login details despite you logging in. This can happen if you ever started off with the https version of the URL to a Greenstone3 DL page and then moved to using the http version of your GS3 URL, or if you ever logged in to your GS3 over https and then attempt to log in later using http. 
 + 
 +**The solution:​** 
 +The solution is to either start a private window if you want to access your GS3 DL pages over http, or to first clear your browser cookies related to your GS3 DL before swapping from https to http. 
 + 
 +**The cause:** 
 +Using https causes session cookies to have the secure flag set to true. When a session cookie has the secure flag thus set, non http URLs cannot return that cookie in their subsequent requests to the server. Only https URLs can. See https://​developer.mozilla.org/​en-US/​docs/​Web/​HTTP/​Cookies section "​Secure and HttpOnly cookies"​ which states "A secure cookie is only sent to the server with an encrypted request over the HTTPS protocol"​ and https://​stackoverflow.com/​questions/​2321224/​cookie-across-http-and-https-in-php\\ It further seems that in http mode, the browser does not want to overwrite secure cookies created in https mode with new cookies sent by the server when using http mode. Thus after using https and acquiring secure session cookies, the server can no longer track a user's session when they switch to http until the cookies are cleared either explicitly or through opening a private window. 
 + 
 +<!--
 If you're on firefox and you just logged in to to a running Greenstone 3 digital library (DL), but visiting subsequent pages in the DL shows you that it has forgotten you're logged in, then you're probably encountering a restriction that your firefox browser has.  If you're on firefox and you just logged in to to a running Greenstone 3 digital library (DL), but visiting subsequent pages in the DL shows you that it has forgotten you're logged in, then you're probably encountering a restriction that your firefox browser has. 
  
Line 245: Line 308:
  
 After relaunching Firefox in Safe Mode, test whether your login details are being remembered this time. If it works now, it could indeed be an addon/​extension/​plugin or the hardware acceleration feature. Follow the suggestions and instructions at https://​support.mozilla.org/​en-US/​questions/​1213229 and https://​support.mozilla.org/​en-US/​kb/​forum-response-disable-hardware-acceleration to narrow down which of these it is. After relaunching Firefox in Safe Mode, test whether your login details are being remembered this time. If it works now, it could indeed be an addon/​extension/​plugin or the hardware acceleration feature. Follow the suggestions and instructions at https://​support.mozilla.org/​en-US/​questions/​1213229 and https://​support.mozilla.org/​en-US/​kb/​forum-response-disable-hardware-acceleration to narrow down which of these it is.
 +-->
  
 === Mac Installer fails === === Mac Installer fails ===