典型大数据计算框架分析

发布时间:2016-03-17 作者:赵晟,姜进磊 阅读量:

[摘要] 认为大数据计算技术已逐渐形成了批量计算和流计算两个技术发展方向。批量计算技术主要针对静态数据的离线计算,吞吐量好,但是不能保证实时性;流计算技术主要针对动态数据的在线实时计算,时效性好,但是难以获取数据全貌。从可扩展性、容错性、任务调度、资源利用率、时效性、输入输出(IO)等方面对现有的主流大数据计算框架进行了分析与总结,指出了未来的发展方向和研究热点。

[关键词] 大数据分类;大数据计算;批量计算;流计算;计算框架

[Abstract] Big data computing technologies have two typical processing modes: batch computing and stream computing. Batch computing is mainly used for high-throughput processing of static data and does not produce results in real time. Stream computing is used for processing dynamic data online in real time but has difficulty providing a full view of data. In this paper, we analyze some typical big data computing frameworks from the perspective of scalability, fault-tolerance, task scheduling, resource utilization, real time guarantee, and input/output (IO) overhead. We then points out some future trends and hot research topics.

[Keywords] big data; big data computing; batch computing; stream computing; computing framework

下载阅览: PDF