User Tools

Site Tools


en:user_advanced:metadata

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revisionBoth sides next revision
en:user_advanced:metadata [2017/02/12 20:49] – [Specifying filenames manually in metadata.xml] kjdonen:user_advanced:metadata [2017/02/12 20:52] – [Manually editing extracted metadata] kjdon
Line 164: Line 164:
 ===== Manually editing extracted metadata ===== ===== Manually editing extracted metadata =====
  
 +Can extracted metadata be edited manually? 
  
 +In general, for the default collection building scenario, it is not possible. When you reimport the document again, all the extracted metadata will be regenerated.
 +However, if you are happy not to reimport that document again, you can edit the metadata.
 +Whether you want to do this or not depends on your process for adding documents and metadata to a collection.
 +
 +Summary of building process:
 +source documents ----import---> archive files = greenstone xml format. contains extracted text and metadata ----index--->search indexes and metadata database.
 +
 +The importing and indexing phases can be run separately, so it is possible to import your documents, edit the greenstone xml format to change metadata, then index the modified files. If you later reimport though, any changes will be lost. This cannot be done in GLI.
 +
 +Cases where this might be useful: \\
 +* you have a static collection that will not change over time. import, edit the xml files, then index. And never rebuild the collection. \\
 +* the collection will grow, but you don't modify existing documents. put current docs into import folder, and import (optionally modify archive files)and index them. clear out the import folder, add new documents and import -keepold. this will add the new documents into the current archives without changing what is already there. then reindex the collection.
 +
 +A third option, if you are using Greenstone 3, is to use the web based metadata editing facility.
 +
 +You build the collection normally (in GLI or command line) for the first time. Then in the browser, you can log in, and if you have edit privileges for that collection, you can modify the section text and/or metadata for documents, from the document view page. This is the easiest solution for the user as you don't need to worry about running build scripts on the command line.
 +
 +Behind the scenes, it is modifying the xml archive files, then reindexing the document.
 +If you use this scenario, then you cannot reimport the existing documents or your changes will be lost. So you cannot go back and use GLI to build the collection. You can add new files using the process described above, where you put new docs into an empty import folder, and run import.pl -keepold.
 +
 +A fourth option is also available, which doesn't actually change the metadata. What do you want the metadata for? Say you are looking at the Language metadata, and it has been extracted wrong. This doesn't actually affect the document unless you want to eg display it, or use it for classifying on, for example. For these situations, what you can do is have two metadata fields. eg ex.Language and dc.Language. If ex.Language has been set wrongly, then the user could set dc.Language to the correct value.
 +Then in classifiers or format statements, you use dc.Language if it is there, or ex.Language if it is not.
 +While this option doesn't modify the extracted metadata, it overrides it. Actually, I should have put this option first as it is probably the most useful one. Using this means you don't have any restrictions on how you build the collection in future, as you are not modifying the extracted metadata. 
en/user_advanced/metadata.txt · Last modified: 2023/03/13 01:46 by 127.0.0.1