====== HTML files ====== ====== Downloading files from the web ====== See the [[en:user:download|download]] page for general information on downloading records through Greenstone. Greenstone can download records using HTTP or FTP protocols from the GLI (in the Download panel) or the [[en:user_advanced:command_line_download|command line]] (using the ''downloadfrom.pl'' script). Either way, you have the following options: ^Argument^Description^ |Source URL(''-url '')|(REQUIRED) Source URL. In case of http redirects, this value may change| |Download Depth (''-depth '')|How many hyperlinks deep to go when downloading (Default: 0)| |Only files below URL (''-below'')|Only mirror files below this URL| |Only files within site (''-within'')|Only mirror files within the same site| |Only HTML files (''-html_only'')|Download only HTML files, and ignore associated files e.g images and stylesheets| If downloading via the GLI, you can view the downloaded files on the Gather panel. On the left-hand side of the panel, double click the Downloaded Files folder to expand its content. The subfolders are named by the URL. These files are physically stored in a temporary cache directory. You can build a collection using these downloaded files by dragging them across to the Collection section on the right-hand side of the Gather panel. An example web download on the command line would be: perl -S downloadfrom.pl -document_mode Web -url http://www.waikato.ac.nz/ -depth 1 -below -html_only This would download only html files below the url ''http://www.waikato.ac.nz/'' to one hyperlink deep. If you are downloading html files, they will be handled by the [[en:plugin:htmlplugin|HTMLPlugin]]. ===== Additional Resources ===== There are several tutorials on creating collections of HTML documents on Greenstone: * **[[http://wiki.greenstone.org/gsdoc/tutorial/gs3-current/en/small_html_collection.htm|Building a small collection of HTML files]]** * **[[http://wiki.greenstone.org/gsdoc/tutorial/gs3-current/en/large_html_collection.htm|A large collection of HTML files--Tudor]]** * **[[http://wiki.greenstone.org/gsdoc/tutorial/gs3-current/en/enhanced_html_collection.htm|Enhanced collection of HTML files--Tudor]]** * **[[http://wiki.greenstone.org/gsdoc/tutorial/gs3-current/en/downloading_from_internet.htm|Downloading files from the web]]** * **[[http://wiki.greenstone.org/gsdoc/tutorial/gs3-current/en/web_linking.htm|Pointing to documents on the web]]** There are several tutorials on creating collections of HTML documents on Greenstone: * **[[http://wiki.greenstone.org/gsdoc/tutorial/gs2-current/en/small_html_collection.htm|Building a small collection of HTML files]]** * **[[http://wiki.greenstone.org/gsdoc/tutorial/gs2-current/en/large_html_collection.htm|A large collection of HTML files--Tudor]]** * **[[http://wiki.greenstone.org/gsdoc/tutorial/gs2-current/en/enhanced_html_collection.htm|Enhanced collection of HTML files--Tudor]]** * **[[http://wiki.greenstone.org/gsdoc/tutorial/gs2-current/en/downloading_from_internet.htm|Downloading files from the web]]** * **[[http://wiki.greenstone.org/gsdoc/tutorial/gs2-current/en/web_linking.htm|Pointing to documents on the web]]**