SRSC: Improving Restore Performance for Deduplication-Based Storage Systems

Release Date:2019-07-19 Author:ZTE Click:

SRSC: Improving Restore Performance for Deduplication-Based Storage Systems

ZUO Chunxue1, WANG Fang1, TANG Xiaolan2, ZHANG Yucheng1, and FENG Dan1

( 1. Key Laboratory of Information Storage Systems, Engineering Research Center of Data Storage Systems and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China; 2. 5G Application Product Line, ZTE Corporation, Shenzhen, Guangdong 518057, China )

Abstract:Modern backup systems exploit data deduplication technology to save storage space whereas suffering from the fragmentation problem caused by deduplication. Fragmentation degrades the restore performance because of restoring the chunks that are scattered all over different containers. To improve the restore performance, the state-of-the-art History Aware Rewriting Algorithm (HAR) is proposed to collect fragmented chunks in the last backup and rewrite them in the next backup. However, due to rewriting fragmented chunks in the next backup, HAR fails to eliminate internal fragmentation caused by self-referenced chunks (that exist more than two times in a backup) in the current backup, thus degrading the restore performance. In this paper, we propose Selectively Rewriting Self-Referenced Chunks (SRSC), a scheme that designs a buffer to simulate a restore cache, identify internal fragmentation in the cache and selectively rewrite them. Our experimental results based on two real-world datasets show that SRSC improves the restore performance by 45% with an acceptable sacrifice of the deduplication ratio.
Keywords:data deduplication; fragmentation; restore performance

Download: PDF