大数据分析平台——从扩展性优先到性能优先

发布时间:2016-03-17 作者:郑纬民,陈文光 阅读量:

[摘要] 认为现有以MapReduce/Spark等为代表的大数据处理平台在解决大数据问题的挑战问题方面过多考虑了容错性,忽视了性能。大数据分析系统的一个重要的发展方向就是兼顾性能和容错性,而图计算系统在数据模型上较好地考虑了性能和容错能力的平衡,是未来的重要发展方向。

[关键词] 大数据;分布与并行处理;并行编程;容错;可扩展性

[Abstract] Existing big data analytic platforms, such as MapReduce and Spark, focus on scalability and fault tolerance at the expense of performance. We discuss the connections between performance and fault tolerance and show they are not mutually exclusive. Distributed graph processing systems are promising because they make a better tradeoff between performance and fault tolerance with mutable data models.

[Keywords] big data; distributed and parallel processing; parallel programming; fault tolerance; scalability

下载阅览: PDF