|
|
sponsored by Data Domain
|
|
|
Posted:
|
10 Apr 2009
|
|
Published:
|
10 Apr 2009
|
|
Format:
|
PDF
|
|
Length:
|
14
Page(s)
|
|
Type:
|
White Paper
|
|
Language:
|
English
|
|
|
ABSTRACT:
Disk-based deduplication storage has emerged as the new-generation storage system for enterprise data protection to replace tape libraries. Deduplication removes redundant data segments to compress data into a highly compact form
and makes it economical to store backups on disk instead of tape. A crucial requirement for enterprise data protection is high throughput, typically over 100 MB/sec, which enables backups to complete quickly. A significant challenge is to identify and eliminate duplicate data segments at this rate on a low-cost system that cannot afford enough RAM to store an index of the stored segments and may be forced to access an on-disk index for every input segment.
This paper describes three techniques employed in the production Data Domain deduplication file system to relieve the disk bottleneck. These techniques include:
The Summary Vector, a compact in-memory data structure for identifying new segments
Stream-Informed Segment Layout, a data layout method to improve on-disk locality for sequentially accessed segments
Locality Preserved Caching, which maintains the locality of the
fingerprints of duplicate segments to achieve high cache hit ratios.
Together, they can remove 99% of the disk accesses for deduplication of real world workloads. These techniques enable a modern two-socket dual-core system
to run at 90% CPU utilization with only one shelf of 15 disks and achieve 100 MB/sec for single-stream throughput and 210 MB/sec for multi-stream throughput.
|
|
|
Authors
Hugo Patterson
Chief Architect
,
Data Domain
Benjamin Zhu
Data Domain, Inc.
Kai Li
Data Domain, Inc. and Princeton University
|
BROWSE RELATED
RESOURCES
Backups | Data Center Management | Data Management | Data Storage | Disk Backups | Storage Consolidation | Storage Management | Tape Backups | Tape Libraries
|
View All Resources
sponsored by Data Domain
|
|
|
|
|
|
TechTarget provides enterprise IT professionals with the information they need to perform their jobs
- from developing strategy, to making cost-effective IT purchase decisions and managing their
organizations' IT projects - with its network of
|
|
|
Definitions:
|
|
 |
|
All Rights Reserved,
Copyright 2000 - 2009, TechTarget |
|
|
|
|