BC-BSP:一个基于BSP的高可扩展并行迭代图处理系统

发布时间:2016-03-17 作者:刘恩孚,冷芳玲,鲍玉斌 阅读量:

[摘要] 提出了一个基于整体同步并行计算(BSP)模型的、具有磁盘暂存功能的大规模图处理系统——BC-BSP。该系统通过提供应用程序接口(API)实现系统配置和有关策略的可扩展性,通过优化的图数据磁盘存储实现了数据处理规模的高可扩展性以及高性能的容错方案,并且可以处理普通数据集的聚类和分类等需要迭代计算的数据挖掘算法。通过实验验证了该系统的可扩展性,其在真实数据集上性能优于Giraph1.0.0,在模拟数据集上稍逊于Giraph的内存版。

[关键词] BSP;大规模图处理;迭代计算;磁盘缓存

[Abstract] We describe a bulk synchronous parallel (BSP)-based parallel iterative processing system for graph data with disk caching assist. This system is called BC-BSP. The system can achieve the scalability of system configuration and policy by providing APIs, high scalability of the data scale processed, and high performance of fault-tolerant scheme by disk storage optimization to graph data. It can also execute some data mining algorithms with iterative processing, such as clustering and classification on non-graph data sets. The experimental results show that the scalability and performance of the proposed system are better than that of Giraph1.0.0 on the real data set,but it is lightly poorer than the memory version of Giraph.

[Keywords] BSP; large-scale graph processing; iterative computing; disk cache

下载阅览: PDF