Monday, November 9, 2009

SQL PASS 09 - Semi-Structured Data in SQL Server

A couple coworkers and I just had the great opportunity to present a post-conference session at SQL PASS 2009. The session topic was "Managing Unstructured and Semi-Structured Data with SQL Server"; I own the semi-structured features in SQL Server, so I presented that part of the session.

For simplicity, think of "semi-structured data" as structured data that's not relational. To manage semi-structured data, SQL Server 2005 introduced XML, which is powerful and general (as someone once said, tongue in cheek, "XML is like violence: if it doesn't solve your problem, you're not using enough of it"). However, XML isn't ideal for certain situations, and doesn't integrate perfectly with relational databases.

Enter SQL Server 2008, which introduces some really useful features for managing semi-structured data more relationally. Briefly, if you find yourself managing the following kinds of data, take a look at these features:

  • Trees or hierarchies (e.g., product categories, org charts, conversation threads): check out hierarchyid, a new type representing nodes in a tree.
  • Properties that apply to only some items (e.g., product descriptions, custom document properties, tagging): check out the SPARSE column option, which gives you NULLs that take no space, constant-time addition of new columns, and up to 30,000 columns in a table.
  • Tables where only some rows need indexing (e.g., columns with many NULLs, "hot" rows, heterogeneous row groups): check out filtered indexes, where you specify which rows are indexed. This makes the index much smaller, yields faster look-ups, and avoid unnecessary index maintenance. Filtered indexes are cool enough that I'll make a post about just them sometime soon.

Read more...