From linguistlist.org/issues/22/22-3844.html
The Mandarin Conversational Corpus Wordlist is generated from the transcripts of 30 free conversations between strangers, 29 topic-specific conversations between friends/family members, and 26 map task dialogues between friends/family members, recorded in Taiwan. The wordlist contains automatically segmented words and their frequency, part of speech, and size in syllables - in total 405K word tokens in approximately 42 hours of recording. You can download the wordlist at http://mmc.sinica.edu.tw/home_c.htm
October 30, 2011
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.