User Tools

Site Tools


nzdl:projects

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
nzdl:projects [2017/11/05 22:04] – [Phind] kjdonnzdl:projects [2017/11/06 00:27] – [Chinese Text Segmentation] kjdon
Line 49: Line 49:
 ==== Phind==== ==== Phind====
          
-    nzdl.org/phind - no longer works 
 [[http://www.nzdl.org/phind|Phind]] is an interface for browsing the phrases that occur in a collection. The phrases form an approximation of the topics covered. They are extracted from the noun-phrases occuring in the text, so nonsense phrases and phrases with very little information content are excluded. Each phrase is part of a hierarchy, and the user can browse more specialised topics, or retrieve documents that contain the phrase, at any point. You can see Phind in action in the [[http://collections.nzdl.org/gsdlmod?a=p&p=about&c=fi1998|UN Food and Agriculture Organisation collection]]. [[http://www.nzdl.org/phind|Phind]] is an interface for browsing the phrases that occur in a collection. The phrases form an approximation of the topics covered. They are extracted from the noun-phrases occuring in the text, so nonsense phrases and phrases with very little information content are excluded. Each phrase is part of a hierarchy, and the user can browse more specialised topics, or retrieve documents that contain the phrase, at any point. You can see Phind in action in the [[http://collections.nzdl.org/gsdlmod?a=p&p=about&c=fi1998|UN Food and Agriculture Organisation collection]].
  
Line 68: Line 67:
 ===== Chinese Text Segmentation===== ===== Chinese Text Segmentation=====
  
-[[http://www.nzdl.org/cgi-bin/congb]]+Word segmentation is designed to find word boundaries in languages like Chinese and Japanese, which are (unlike English) written without spaces or other word delimiters (except for punctuation marks)It plays a significant role in applications that use the word as the basic unit due to the fact that machine-readable Chinese text is invariably stored in unsegmented form.
  
-[[http://www.nzdl.org/chinese-text-segmenter/demo1.htm]] +We have implemented a WWW interface for segmenting Chinese text. A demo used to be available at www.nzdl.org/cgi-bin/congb but that is no longer running. You can see an illustration of the transform at [[http://www.nzdl.org/chinese-text-segmenter/demo1.htm]]. (Currently at [[http://community.nzdl.org/www/chinese-text-segmenter/demo1.htm]])
  
-Word segmentation is designed to find word boundaries in languages like Chinese and Japanesewhich are (unlike English) written without spaces or other word delimiters (except for punctuation marks). It plays a significant role in applications that use the word as the basic unit due to the fact that machine-readable Chinese text is invariably stored in unsegmented form.+(Note, the code can be found on community, in the chinese-text-segmenter directory.)
  
-We have implemented a WWW interface for segmenting Chinese text.+More information can be found in the paper: [[https://www.cs.waikato.ac.nz/~ihw/papers/00WT-YW-RMN-IHW-Comprsbased.pdf| A Compression-based Algorithm for Chinese Word Segmentation]] 
 +===== Music Query Corpus =====
  
-If your web browser does not support Chinese text, [[http://www.nzdl.org/chinese-text-segmenter/demo1.htm|illustrations of the transformation]] are available. +For details about the [[http://community.nzdl.org/www/waikato-music-query-corpus/waikato-query-corpus.zip|Waikato corpus of music queries]], see our paper 
-Currently at [[http://commdev.nzdl.org/www/chinese-text-segmenter/demo1.htm]]+[[http://ismir2002.ismir.net/proceedings/03-SP04-2.pdf|Forming a Corpus of Voice Queries for Music Information Retrieval: A Pilot Study]].
  
 =====Others===== =====Others=====
nzdl/projects.txt · Last modified: 2023/03/13 01:46 by 127.0.0.1