Corporation Consumer Carrier Home and Enterprise

Language

简体中文 English

Parallel Web Mining System Based on Cloud Platform

Release Date：2013-01-06 Author：Shengmei Luo, Qing He, Lixia Liu, Xiang Ao, Ning Li, and Fuzhen Zhuang

[Abstract] Traditional machine-learning algorithms are struggling to handle the exceedingly large amount of data being generated by the internet. In real-world applications, there is an urgent need for machine-learning algorithms to be able to handle large-scale, high-dimensional text data. Cloud computing involves the delivery of computing and storage as a service to a heterogeneous community of recipients. Recently, it has aroused much interest in industry and academia. Most previous works on cloud platforms only focus on the parallel algorithms for structured data. In this paper, we focus on the parallel implementation of web-mining algorithms and develop a parallel web-mining system that includes parallel web crawler; parallel text extract, transform and load (ETL) and modeling; and parallel text mining and application subsystems. The complete system enables variable real-world web-mining applications for mass data.

[Keywords] web mining; large scale; high volume; high dimension; cloud computing

Related Articles

Design and Implementation of ZTE Object Storage System

Hierarchical Template Matching for Robust Visual Tracking with Severe Occlusions

WiGig and IEEE 802.11ad for Multi-Gigabyte-Per-Second WPAN and WLAN

Modeling Human Blockers in Millimeter Wave Radio Links

60 GHz SIW Steerable Antenna Array in LTCC