October 30, 2011

Mandarin Conversational Corpus Wordlist

From linguistlist.org/issues/22/22-3844.html

The Mandarin Conversational Corpus Wordlist is generated from the transcripts of 30 free conversations between strangers, 29 topic-specific conversations between friends/family members, and 26 map task dialogues between friends/family members, recorded in Taiwan. The wordlist contains automatically segmented words and their frequency, part of speech, and size in syllables - in total 405K word tokens in approximately 42 hours of recording. You can download the wordlist at http://mmc.sinica.edu.tw/home_c.htm

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.