Statistical Dictionary Models for Mining Chinese Texts

  报 告 人:邓柯

  报告地点:数学与统计学院105室

  报告时间:2014年10月31日星期五14:00-15:00

  报告简介:With the explosive growth of the internet and digital technologies, large quantities of digitalized text data can be easily collected. Thus there is great appeal in developing text mining tools to automatically extract information from these data and create new knowledge. Because natural languages are very noisy and the data size is huge, it is not productive to base this on precise linguistics. Instead, as many have seen, methods based on statistical models have great advantages, even if they miss some subtleties in the text. In this talk, I will introduce a series of novel approaches to model and mine Chinese text data: a"word dictionary model"to discover patterns of serial units such as words/phrases and achieve text segmentation, a"theme dictionary model"to recognize long range associations among text units, and a"concept network"to incorporate domain knowledge into text analysis. Using these approaches separately or jointly, we can give answers to many important practical problems.

  主讲人简介:

  Dr. Ke Deng got his Ph.D. at Department of Probability and Statistics, Peking University in 2008. He received his B.S. in mathematics at Peking University in 2003. Before joining MSC, he was a research associate at Statistics Department, Harvard University. His research interests are Bayesian methodology, sequential monte carlo, bioinformatics, statistical genetics, text mining, network tomography, social sciences, Chinese medicine. He will be a tenure track assistant professor of MSC from September, 2013.