SQL Server 2005 and 2008 - Compression
OK, I still have a way to go in learning about data compression in SQL Server 2008
but one thing that I do know is that nothing is free. So, the trade-off will be performance
(i.e. CPU) v. space. And, that's not really a new trade-off wrt to compression. Sometimes
that trade-off has other benefits that still minimize the overall cost (for example,
backup compression compresses in-memory and before it goes to disk... this actually
makes the overall backup process faster because the actual backup written to disk
is smaller). However, if we're talking about data and data access, then we need to
think more about how the data is going to be used as well as the impact on performance.
I can definitely think of many reasons to compress older (and read-mostly, if not
read-only) data (mostly due to volume) but depending on the queries and the impact
to uncompress it (based on the volume of data being accessed), I'm going to do a lot
of testing before I compress high performance/OLTP data. To help estimate the
savings on space, SQL Server 2008 offers a stored-proc: sp_estimate_data_compression_savings.
Compression in SQL Server 2005
SQL Server 2005 offers the ability to have read-only data compressed using
Windows NTFS file compression. File compression is only supported for secondary non-primary
data files and only when they're set to read-only. If the entire database is set to
read-only then all files (incl. the primary and log) can be on compressed drives.
While supported, and it can make sense to do this when you have large amounts of historical
data, it's still not very granular.
The other form of compression in SQL Server 2005 was introduced in SP2 as data compression
for the decimal/numeric data types, called vardecimal. First, you enable compression
at the database level and then you turn it on at the table level. The primary form
of compression used by vardecimal is when your actual values are generally much smaller
than the defined/declared decimal/numeric column. For example, if you've chosen to
define a lot of columns as precision/scale (38,4) then as a decimal column each value
(per column, per row) will take 17 bytes whether you use all of it or not. If you
only store the value 87.5 (which would normally take only 5 bytes as a decimal(3,1))
then you're wasting 12 bytes. This form of compression will still be supported
in SQL Server 2008 so if you're interested in how the vardecimal type works, check
out this whitepaper.
As for the new forms of compression... row-level compression is similar to vardecimal,
but the other forms are quite different, and very interesting (especially the page-level
dictionary compression)!
Compression in SQL Server 2008
In addition to offering support for NTFS file compression and vardecimal, SQL Server
2008 offers row-level compression or page-level compression (which includes
row-level compression) AND it offers the ability to turn these on at
the partition-level or at the table-level for all partitions. While I think the per-partition
option is excellent, you might still want to separate your OLTP and read-only
data into separate tables for other benefits (like online index operations which I
mentioned here)
but, the "table-level only" options are certainly starting to decrease! And, more
granular options always means better manageability.
So, how does compression work in SQL Server 2008:
Paul wrote about
backup compression here.
Sunil wrote
about data compression here and here.
Chad Boyd wrote
about both here.
Paul and I will post more on compression... I really want to get some numbers regarding
performance and Paul will dive into all of the internals using DBCC
PAGE (go figure! :).
Enjoy!
kt
This weblog is sponsored by
SQLskills. ©2007
Kimberly L. Tripp. This Feed is for personal non-commercial use only.