The test was done on systems running OnTap 8.1.2
I plan to segment this into 2 areas:
- Information on compression
- Command Sets
- Implementing Data compression
Information on Compression
The responses for this feature has been quite positive and there seems to be eager queries from customers on this. I will not explain how compression works, as this has been explained by NetApp in detail in one of their fantastic back to basics pages. For more details on how this works use the following link to access the back to basics NetApp page for data compression.
https://communities.netapp.com/docs/DOC-14329
https://communities.netapp.com/docs/DOC-8170
A few snippets that I picked up from the NetApp community
1. Does it affect snapshots?
Snapshots are taken against data on disk, since compression is done inline, before data is written to disk, this is not a factor.
2.What happens to data from an existing volume?
There is a “compression scanner” that allows you to compress existing data within a volume. All new writes with be compressed inline while the compression scanner can be used to compress the existing data.
3. What is the minimum release to utilise data compression?
OnTap 8.0.1
4. Why compression first before deduplication?
We actually compress inline (before blocks are written to disk) and then take fingerprints. That way the fingerprints are based on the compressed blocks. There has been multiple tests to see whether we get better savings running dedupe or compression first and have found the savings to be almost identical either way. The reason we do compression first is to get immediate space savings with compression and then cumulative savings post-process with deduplication.
5. Is this supported on 32 bit aggregates?
To enable compression you require the volume to be created within a 64-bit aggregate. If you were not running compression there is still support for 32-bit aggregates.
6. How is compression used effectively using 32 KB chunk groups?
A compression group consists of one file or part of one file to a maximum of 32kb. If you have a file that is 60k, the first 32k would be in the first compression group and then the remaining 28k would be contained in the next compression group. Another example, if you had two 16k files, each file would have it’s own compression group. Each compression group is compressed individually.
7. What happens to files smaller than 8k?
The file must be larger than 8k or it will be skipped for compression and written to disk uncompressed.
8. Does the compression group contain data from multiple files?
A compression group contains data from one file only
9. Are all files compressed?
The compression group is left uncompressed unless a savings of at least 25% can be achieved on a per-compression-group basis; this optimizes the savings while minimizing the resource overhead.
10. Expected Performance impacts/ Where should I not use them?
Inline compression can affect the write performance and thus should not be used for performance-sensitive environments go without proper testing to understand the impact.
11. Can I run compression on a volume without deduplication?
Compression requires that deduplication first be enabled on a volume; it can‘t be enabled without deduplication. Inline compression requires that both deduplication and postprocess compression also be enabled.
12. How many process can it run simultaneously?
Only one postprocess compression or deduplication process can run on a flexible volume at a time. Up to eight compression/deduplication processes can run concurrently on eight different volumes within the same NetApp storage system. If there is an attempt to run additional postprocess compression or deduplication processes beyond the maximum, the additional operations will be placed in a pending queue and automatically started when there are free processes.
Postprocess compression and deduplication processes periodically create checkpoints so that in the event of an interruption to the scan it can continue from the last checkpoint.
13. What is the maximum size of the volume supported?
Starting with Data ONTAP 8.1, compression and deduplication do not impose a limit on the maximum volume size supported; therefore, the maximum volume limit is determined by the type of storage system regardless of whether deduplication or compression is enabled.
14. Does it affect NDMP volumes?
Backup of the deduplicated/compressed volume using NDMP is supported, but there is no space optimization when the data is written to tape because it‘s a logical operation. To preserve the deduplication/compression space savings on tape, NetApp recommends NetApp SMTape.
15. Can I have only compression enabled without deduplication?
Yes. Deduplication will need to be enabled to avail the compression benefit. This can be achieved by enabling both deduplication and compression (both postprocess and inline) and set the schedule for postprocess compression and deduplication to never run.
16. Can I just enable inline compression?
There is no way to enable inline compression without postprocess compression and deduplication. The schedule can be disabled later for post compression.
Best Practices for Optimal Savings and Minimal Performance Overhead
- Both deduplication and compression consume system resources and can alter the data layout on disk. Due to the application‘s I/O pattern and the effect of deduplication/compression on the data layout, the read and write I/O performance can vary. The space savings and the performance impact depend on the application and the data contents.
- NetApp recommends that the performance impact of deduplication/compression should be carefully considered and measured in a test setup and taken into sizing consideration before deploying deduplication/compression in performance-sensitive solutions. For information about the impact of deduplication/compression on other applications, contact the specialists at NetApp for their advice and test results for your particular application.
- If there is only a small amount of new data, run deduplication infrequently, because there is no benefit in running it frequently in such a case, and it consumes system resources. The frequency for running deduplication depends on the rate of change of the data in the flexible volume.
- The more concurrent compression/deduplication processes you run, the more system resources are consumed.
- If NetApp Snapshot copies are required, run the compression/deduplication processes before creating the Snapshot copies to minimize the amount of data that gets locked in to the copies. (Make sure that the compression/deduplication processes have completed before creating the Snapshot copy.) If a Snapshot copy is created on a flexible volume before the deduplication processes have completed, the result is likely to be lower space savings. If a Snapshot copy is created on a flexible volume before the compression processes have completed, the result is likely to be that more space is used by the Snapshot copies.
- For deduplication to run properly, you need to leave some free space for the deduplication metadata.
Command Sets
Command | Summary |
sis on </vol/volname > | Enables deduplication on the flexible volume specified |
sis config | Displays which volumes have compression/deduplication enabled. |
sis config –C true </vol/volname> | Enables postprocess compression of subsequently created data. This requires that deduplication first be enabled on the volume. |
sis config –C true –I true </vol/volname> | Enables inline compression and postprocess of subsequently created data. This requires that deduplication first be enabled on the volume. |
sis config –I false </vol/volname> | Disables inline compression. It will not stop postprocess compression or decompress any existing compressed data. |
sis config –C false –I false </vol/volname> | Disables any newly created data from being compressed. It will not decompress any existing compressed data. |
Note: -l false is only necessary if inline compression is also enabled. | |
sis config [-s sched] </vol/volname > | Creates an automated compression/deduplication schedule. When compression/deduplication is first enabled on a flexible volume, a default schedule is configured, running it each day of the week at midnight. If the auto option is used, compression and deduplication are triggered when 20% new data is written to the volume. The 20% threshold can be adjusted by using the auto@num option, where num is a two-digit number to specify the percentage. The manual option can be used on SnapVault destinations to prevent postprocess compression and deduplication from running. |
sis start </vol/volname> | Begins the deduplication process on the flexible volume specified. If compression is also enabled it will start first followed by deduplication. |
This will compress and deduplicate any data that has been written to disk after compression/deduplication was enabled on the volume. This will not compress or deduplicate data that existed on the volume prior to compression/deduplication being enabled. | |
sis start –d </vol/volname > | Deletes the existing checkpoint information. This option is used to delete checkpoint information that is still considered valid. By default, checkpoint information is considered invalid after 24 hours. |
sis stop </vol/volname > | Suspends an active postprocess compression/deduplication operation running on the flexible volume without creating a checkpoint. |
sis stop –a </vol/volname> | Creates a checkpoint and stops the currently active postprocess compression/deduplication operations running on a volume. |
sis check </vol/volname > | Verifies and updates the fingerprint database for the flexible volume specified; includes purging stale fingerprints (requires advanced mode). |
sis check – d </vol/volname > | Deletes the existing checkpoint file and starts the check portion of the deduplication operation from the beginning (requires advanced mode). |
sis status </vol/volname> | Checks the progress of postprocess compression/deduplication operations running on a volume. Also shows if deduplication is enabled or disabled on a volume. |
sis status –l </vol/volname> | Returns the current status of the specified flexible volume. Can be used to check whether inline or postprocess compression or deduplication is enabled on a particular volume. |
sis help start / sis help stop | Lists the commands for starting and stopping compression |
df –S <volname> /df -S </vol/volname> | Shows space savings from compression and deduplication as well as actual physical used capacity per volume. |
Compression and Deduplication of Existing Data Commands
Command | Summary |
sis start –s </vol/volname> | Begins compression (if enabled) of existing data followed by deduplication on the flexible volume specified. It will use the latest checkpoint if one exists and is less than 24 hours old. This will bypass compression of any blocks that are already deduplicated or locked in Snapshot copies. |
sis start –sp <vol/volname > | Begins compression (if enabled) of existing data followed by deduplication on the flexible volume specified using the existing checkpoint information, regardless of the age of the checkpoint information. This option should only be used with the –s option. |
sis start –s –d </vol/volname> | Begins compression (if enabled) of existing data followed by deduplication on the flexible volume specified. It will disregard any checkpoints that exist. This will bypass compression of any blocks that are already deduplicated or locked in Snapshot copies. |
sis start –s –D </vol/volname> | Begins the deduplication process on the flexible volume specified and performs a scan of the flexible volume to process existing data. It will use the latest checkpoint if one exists (requires advanced mode). |
The –D option is used if compression is also enabled on the volume and you only want to run deduplication against the existing data, not compression. The new data will still be compressed | |
sis start –s –d –D </vol/volname> | Deletes any previous checkpoints and initiates the deduplication process of existing data to begin from scratch on the flexible volume specified (requires advanced mode). |
sis start –s –C </vol/volname> | Begins compression of the existing data on disk. It will use the latest checkpoint if one exists. It will not run deduplication against this data (requires advanced mode). |
sis start –s –C –d </vol/volname> | Deletes any previous checkpoints and initiates compression of existing data to begin from scratch on a flexible volume (requires advanced mode). |
sis start –s –C –a </vol/volname> | Initiates compression of existing data to begin and include shared blocks created by deduplication or cloning of data. It will not run deduplication against this data. The -a option can be used together with the -b option. |
sis start –s –C –b </vol/volname> | Initiates compression of existing data to begin and include blocks that are locked in existing Snapshot copies. It will not run deduplication against this data. |
sis start –s –C –a –b </vol/volname> | Initiates compression of all possible blocks containing existing data on disk. It will not run deduplication against this data. |
sis start –s –C –D –a –b </vol/volname> | Initiates compression of all possible blocks containing existing data on disk followed by running deduplication. |
Disabling Compression and Deduplication Commands
Command | Summary |
sis config –C false –I false </vol/volname> | Disables any newly created data from being compressed. It will not decompress any existing compressed data. |
sis off </vol/volname> | Disables inline and postprocess compression as well as deduplication on the specified volume. This means that there will be no additional change logging, compression, or deduplication operations, but the flexible volume remains a compressed and deduplicated volume, and the storage savings are preserved. |
If this command is used and then compression or deduplication is turned back on for this flexible volume, the flexible volume should be rescanned with the sis start –s command to gain the maximum savings. | |
sis undo </vol/volname> | Initiates the removal of compression and block sharing from deduplication on a volume; it first requires that compression and deduplication be turned off on the volume (requires advanced mode). |
sis undo </vol/volname> –D | Initiates the removal of block sharing from deduplication on a volume; it first requires that compression and deduplication be turned off on the volume (requires advanced mode). |
sis undo </vol/volname> –C | Initiates the uncompression of data in a volume; it first requires that compression be turned off on the volume (requires advanced mode). |
sis stop </vol/volname> | Suspends an active postprocess compression/deduplication process on the flexible volume without creating a checkpoint. |
sis stop –a </vol/volname> | Creates a checkpoint and stops the currently active postprocess compression/dedupe operations running on a volume. |
sis status </vol/volname> | Returns the current status of the specified flexible volume. Can be used to check the progress of the removal of compression / deduplication running on a particular volume. |
sis revert_to [<7.3|8.0>] </vol/volname> | Converts the deduplication metafiles to appropriate lower Data ONTAP version formats, currently 7.3 or 8.0. When no volume name is provided, revert_to runs on all volumes with deduplication enabled. |
sis revert_to [<7.3|8.0>] -delete </vol/volname> | Deletes the original metafiles that existed for Data ONTAP version [<7.3|8.0>], resulting in no metafiles being created when revert_to is run. When no volume is provided, the –delete option will be applied to all volumes that are reverted by the revert_to command. |
sis revert_to -cleanup </vol/volname> | Deletes the metafiles of the lower version that were created by previously running sis revert_to. Useful if user decides not to revert to previous release after all. When no volume is provided, the –cleanup option will delete all existing metafiles that are the format of the release that was specified when running the revert_to command. |
Implementing Data compression
Use cases
- Testing deduplication on a 64 bit volume in a 32 bit aggregate.
- Testing deduplication and compression on a volume that contains data. – Testing Postprocess Compression
- Testing deduplication and compression on a volume that does not have any existing data. – Test inline deduplication and inline compression
- Comparisons between the volumes tested
- Stopping deduplication and compression
- Extras – Testing additional commands
- Undoing SIS
Testing deduplication on a 64 bit volume in a 32 bit aggregate
Create a 64 bit aggrregate
Creating a new 64 bit aggregate with 3 disks using RAID 4.
Status of the 64 bit aggregate
Created a volume vol1 on the 64 bit aggregate to store data before enabling compression.
Created a CIFS share for the volume so as to add and remote files easily.
Created another volume vol2 to enable compression
Created a CIFS share for the volume
Able to browse to the volumes
Copied data to the share BeforeCompression (vol1)
Size on vol1 after files are copied
SIS status on the volume before enabling SIS
SIS configuration
Created another volume vol3 on a 32 bit aggregate and performed a vol copy from vol1 to vol3.
vol copy from a 64 bit volume to a 32 bit volume
This is how a 64 bit volume is displayed on a 32 bit aggregate.
64 bit volume in a 32 bit aggregate
Enabling SIS on vol3 in the 32 bit aggregate
Starting SIS on the volume
Capacity on the volume before dedupe.
Currently there are no savings as there are no duplicated files.
SIS config output for vol3.
What happens when you try compression on a volume in a 32 bit aggregate?
As expected, compression will not be enabled. The following message is displayed.
Enabling SIS and Compression on a volume in a 64 bit aggregate
Enabling Dedupe
Checking the dedupe status
Manually start dedupe process
Checking the SIS status
No savings as there are no duplicate files.
Status of SIS config before Compression is enabled
Enabling Compression for the first time (post compression only)
Status of SIS config after Postprocess Compression is enabled.
In this case only post compression is enabled.
Enabling Compression (Inline Compression)
Status of SIS config after Compression (both inline and post process) is enabled.
Monitoring the same volumes after a 16 hour period.
Duplicate files were added over a period of two hours before leaving it untouched for 16 hours.
Status of volume vol3 on the 32 bit aggregate.
Status of volume vol1 on the 64 bit aggregate
Started copying duplicate files to the share and verified the compression and dedupe.
Data being transferred from the OS side.
Note:
This was tested on two volumes
- Vol1 that contained data prior to enabling dedupe and compression. Vol1 is in a 64 bit aggregate.
- Vol3 that contained data prior to enabling dedupe. Vol3 is a volume on a 32 bit aggregate.
Testing a volume for dedupe and compression when there is no data
Procedure
- Created a new volume and a CIFS share.
- Enabled dedupe and compression to test inline compression and dedupe.
- Copied the same contents that were used in vol1 and vol2 in the previous test case.
- Monitored the inline compression and deduplication.
Inline compression and deduplication while data is being transferred to the volume.
Check SIS status before enabling dedupe
Checked the volume usage.
Checked the space saving just for reference.
Enabled dedupe on the volume.
Checked the SIS status
Checked the SIS config. Compression is not yet enabled.
Enabling both inline and post process compression
Checking the SIS status after enabling compression. The default schedule is shown in the screenshot.
CIFS shares for the volumes including vol2
Started copying files to the vol2.
Data now being copied
Checking the space savings
Comparison of the space saving with the current volume being tested and the previous test volumes
Space savings while copying is still in progress
Further Progress
Space savings after completion
Comparison of all test volumes after copying is complete
The same set of data was used in all cases on all volumes.
Extras
Running a few extra commands based on the command sets
sis start –s –d <vol name>
sis start –sp <vol name>
sis start –s –D <vol name>
sis start –s –d -D <vol name>
sis start –s –C <vol name>
sis start –s –C –D –a -b <vol name>
sis start –s –C -D –a -b<vol name>
sis status after a sis manual start from the above command.
On another volume
Comparing both volumes after a manual start
Progress
Remove Compression
SIS stop on a volume
SIS status after the stop
Removing Compression
First check status of config
Remove compression
Checking config after compression
Checking status of volume after disabling compression. In this case there will not be any change
SIS off
SIS undo
First SIS must be turned off for that volume
Check the status of the undo process
What happens when you try to turn sis off on a volume that is active?
What happens when you sis stop a volume that is running sis undo?
This will stop the process of sis undo.
Status after stopping an undo operation
Will need to run another undo command to continue the undo process.
SIS Configuration change – Schedules
Status before the change. The schedule shown below is the default.
Change the schedule to a customized schedule.
Status after the schedule change