A Parallel Platform for Web Text Mining

Release Date:2013-10-14 Author:Ping Lu, Zhenjiang Dong, Shengmei Luo, Lixia Liu, Shanshan Guan, Shengyu Liu, and Qingcai Chen Click:

[Abstract] With user-generated content, anyone can be a content creator. This phenomenon has infinitely increased the amount of information circulated online, and it is becoming harder to efficiently obtain required information. In this paper, we describe how natural language processing and text mining can be parallelized using Hadoop and Message Passing Interface. We propose a parallel web text mining platform that processes massive amounts of data quickly and efficiently. Our web knowledge service platform is designed to collect information about the IT and telecommunications industries from the web and process this information using natural language processing and data-mining techniques.

[Keywords] natural language processing; text mining; massive data; parallel; web knowledge service