| The Challenge | | | | The disadvantage is that it is impossible to predict |
| As data stores continue to grow and the need for | | | | how long the deduplication process will take. Also, |
| retaining more and more organizational data for legal | | | | since the data needs to be written to the target |
| reasons increases, IT professionals are working to | | | | first, more disk space will be required until the |
| determine if their current backup strategies can keep | | | | process finishes. |
| up. Tapes - while offering easy transferability to an | | | | * In-line deduplication: With in-line deduplication, the |
| off-site location - can be extremely costly to store. It | | | | hash calculations are created on the target device as |
| also can be very time-consuming to restore data | | | | the data is written. If a duplicate is found, the new |
| from tapes. Alternatively, the cost of disk has | | | | block of data is not stored. This method requires less |
| decreased to the point where using disk-to-disk | | | | storage on the target, but can be slower due to |
| backup is a viable option. For customers using a | | | | hash calculations and lookups taking a long time. |
| combination of disk and tape backup solutions, data | | | | Performance varies across vendors. |
| deduplication can help that cost come down even | | | | What Are the Advantages? |
| more, plus save valuable time at every level. | | | | Data deduplication brings a wide variety of benefits |
| | | | to organizations: |
| Wikipedia defines data deduplication as "a specific | | | | * Save on storage space for disk-to-disk backups: |
| form of compression where redundant data is | | | | According to the Enterprise Strategy Group's report |
| eliminated." Take the example of a 50 MB PowerPoint | | | | by Tony Asaro and Heidi Biggar entitled "Data |
| presentation emailed to 10 people. If each person | | | | De-duplication and Disk-to-Disk Backup Systems" (July |
| stores the presentation in their home directory, we | | | | 2007), "Through hands-on testing, ESG has found |
| now have 500 MB allocated to storing the same data! | | | | that data deduplication technologies can provide 10 |
| If each person then forwards the presentation to 1 | | | | times, 20 times, 30 times and even great reduction in |
| other individual and those people also store the | | | | capacity needed for backup." Thus, companies can |
| presentation, we have 1G of storage dedicated to a | | | | see savings not only in the disk needed for the |
| single file! Incremental and differential backups aside, | | | | primary backup, but also in the cost of disk for a |
| this one file will take up 1G of storage for its initial | | | | secondary site, or in monthly charges for an off-site |
| backup. | | | | backup service. |
| Data deduplication takes care of this redundancy by | | | | * Save on heating and cooling: By decreasing the |
| recognizing that the data in each of these individual | | | | amount of disk needed, organizations can see a |
| files is the same. It therefore stores one copy of the | | | | reduction in heating and cooling costs. |
| file and creates pointers to the rest. Now, instead of | | | | * Save on space: With less disk needed, organizations |
| using 1G of storage, 20 people have used a total of | | | | also save on the amount of floor/rack space needed |
| only 50 MB of disk space. | | | | to house the backup solution. |
| However, let's assume that each person makes a | | | | * Save on bandwidth: Less data going across the |
| change in one slide. Now the data across all the files | | | | wire means lowered bandwidth costs. |
| is not the same. Some data deduplication products | | | | * Decrease time and costs for data restoration: |
| are smart enough to work on the subfile level: they | | | | Recovery from disc is instantaneous, while recovery |
| locate the blocks of data that are the same, store | | | | from tape can be slow and time-consuming. If the |
| those one time, and then store the differing blocks | | | | tape needed is in off-site storage, more time and |
| separately. Because of the pointers the data | | | | costs will be incurred. |
| deduplication product creates, each person can | | | | What Backup Vendors Support This Technology? |
| retrieve their unique version of the file, even though | | | | There are a host of vendors offering this technology, |
| it has been stored in separate blocks. | | | | including ExaGrid, EMC DataDomain, and Barracuda |
| How Does It Work? | | | | Backup (formerly BitLeap until Barracuda bought them |
| Deduplication technology works by comparing chunks | | | | last year). |
| of data and searching for duplicates. It does this by | | | | Where Can I Learn More? |
| assigning a unique identifier to each chunk, calculated | | | | Check Data Domain for whitepapers (like the one |
| by a cryptographic hash function. When a duplicate is | | | | mentioned in this article) and a deduplication calculator. |
| found, the file is removed and a link to the first file is | | | | ESG's report contains some great information, |
| created. If this file is changed, then a copy of the | | | | including questions to ask vendors when selecting a |
| changed file or block is written to disk during the | | | | solution. |
| next backup. | | | | Conclusion |
| Types of Deduplication Technology | | | | If you are considering a new backup strategy for |
| There are two types of data deduplication | | | | your organization, taking a look at what data |
| technology currently in use: | | | | deduplication can do for you is a must. We feel that |
| * Post-process deduplication: As the name implies, | | | | development of this technology is just getting |
| post-process deduplication runs after the data is sent | | | | started, and can only improve as more products hit |
| to the target device. The advantage of this is that | | | | the marketplace. |
| since the deduplication process can be slow, time for | | | | © Copyright 2010, Uptime NetManagement, Inc. |
| backup is not lost waiting for deduplication to occur. | | | | |