Monday, November 9, 2009

SQL PASS 09 - Semi-Structured Data in SQL Server

A couple coworkers and I just had the great opportunity to present a post-conference session at SQL PASS 2009. The session topic was "Managing Unstructured and Semi-Structured Data with SQL Server"; I own the semi-structured features in SQL Server, so I presented that part of the session.

For simplicity, think of "semi-structured data" as structured data that's not relational. To manage semi-structured data, SQL Server 2005 introduced XML, which is powerful and general (as someone once said, tongue in cheek, "XML is like violence: if it doesn't solve your problem, you're not using enough of it"). However, XML isn't ideal for certain situations, and doesn't integrate perfectly with relational databases.

Enter SQL Server 2008, which introduces some really useful features for managing semi-structured data more relationally. Briefly, if you find yourself managing the following kinds of data, take a look at these features:

  • Trees or hierarchies (e.g., product categories, org charts, conversation threads): check out hierarchyid, a new type representing nodes in a tree.
  • Properties that apply to only some items (e.g., product descriptions, custom document properties, tagging): check out the SPARSE column option, which gives you NULLs that take no space, constant-time addition of new columns, and up to 30,000 columns in a table.
  • Tables where only some rows need indexing (e.g., columns with many NULLs, "hot" rows, heterogeneous row groups): check out filtered indexes, where you specify which rows are indexed. This makes the index much smaller, yields faster look-ups, and avoid unnecessary index maintenance. Filtered indexes are cool enough that I'll make a post about just them sometime soon.

For those interested, here's a slide deck and demo T-SQL code on these features, focusing on how you can use them to solve real-world problems. If you have any feedback or suggestions — about the deck, the demo code, or the features themselves — please post a comment. For example, here are some of the feedback and takeaways I got from attendees at SQL PASS:

  • One attendee wondered how hierarchyid compares to left-right value tables (a.k.a. nested set model, or modified preorder traversal) for representing trees. Briefly, hierarchyid has the same sub-tree properties, so is equally good at descendant queries. Furthermore, it has three advantages: more compact (one column averaging 10 bytes per value can represent ~100M nodes); constant-time node insertion anywhere in the tree; and easier, more efficient level queries (e.g., (grand)children of a node, all nodes with level 3).
  • People seem to like column sets, especially how they automatically shred XML inserts and updates into the proper sparse columns.
  • People also seem to really like filtered indexes. One "wow" example was choosing which rows were worth indexing based on your workload. To illustrate, suppose table SoldAt tracks which products are sold at which stores. If only some stores have enough traffic to warrant indexing, you can create an index for just those stores with CREATE INDEX SoldAt_Key_Stores ON SoldAt(...) WHERE StoreId IN (1, 4, ...).
  • Some people mentioned that they often need to manually update filtered statistics (such as are generated automatically for filtered indexes) to ensure good plans. This is something we can look into.

I'd love to hear more, so please comment away!


Tuesday, August 25, 2009

Five-Minute Ph.D.

As my Ph.D. defense nears, I'm thinking a lot about the most important lessons I learned. Here are my top five:

  1. Don't look for reasons to fail; find ways to succeed. If something should or must be done, find a way to do it.
  2. First figure out the right thing to do. Only then think about implementation, and see how close you can come. Even if you can't reach the ideal, at least you'll be pushing in the right direction.
  3. Understand the problem deeply. Any good problem solver can hack a good solution quickly. What's more valuable is identifying the true underlying problem, and how it relates to other problems. This tells you if something is a true solution, and helps identify opportunities.
  4. Think in a structured, disciplined way. First, separate out orthogonal issues. Then, solve them incrementally and iteratively. Don't try to attack the whole mess at once.
  5. Finally, when communicating with others, tell a story. Start with something familiar, then make sure your ideas flow.

Those are the big ones. The gems of a Ph.D. education, in five easy minutes. Interestingly, none of these are particularly technical. But deeply technical things are limited in application. I think that's the real secret: the work you do for a Ph.D. is technical, but a good Ph.D. is about becoming a better thinker and communicator.


Wednesday, July 29, 2009

Tutorial: Using Omea Reader to Listen to Customers

One of my daily tasks is listening to customers. This involves checking blogs, forums, and newsgroups (they're still around!). I also check several Intranet sites, such as SharePoint sites for my projects. Doing this every day was taking a substantial amount of time. The problem wasn't digesting the information, or acting on it; it was getting to it. So I went looking for a one-stop solution. My requirements were:

  • Sensible interface for reading feeds, such as RSS and Atom.
  • Handles newsgroups (the customers use them, thus so must I).
  • Desktop application, so that it can access the company Intranet.

I found Omea Reader, a free desktop application that reads newsgroups and feeds. Below, I give step-by-step instructions on how I set up and use it.

Setting up Omea Reader

First, download Omea Reader from here, then install it. This is a typical installation process, and the defaults work fine (though I chose not to install the Firefox plugin).

After installing, run Omea Reader. It asks about a Database Path and Log Files Path. The defaults here are also fine, so hit "OK". Next, it provides some import options. Just hit "Next >" until it requests User Information. Then, Enter your first and last name. Then, click "Add..." to add an email address. Finally, click "Finish".

At this point, Omea starts. We'll want to set some options, so go to Tools -> Options. I changed these options:

  • General
    • "Mark item read after displaying for 0 seconds"
    • "Open links to Web pages In a new browser window"
  • Mail Format
    • Check "Include signature in outgoing messages", and paste in an appropriate signature (e.g., your company may require some legalese here).
  • Plugins
    • Uncheck everything except News and RSS; we won't be using Omea Reader's other functionality.

Clicked "OK", and Omea Reader restarts. Then, click on the "All Resources" tab and set options in the "View" menu. I set the following options:

  • Uncheck "Shortcut Bar"
  • Uncheck "Workspace Bar"
  • Check "AutoPreview -> All Items"

Omea Reader now looks something like this:

Finally, after setting options, delete the feeds that came with Omea Reader. First, click on the "Feeds" tab. Then, in the bottom-left section, you'll see three feeds. Select all of these, then delete them with the Delete key.

Adding Newsgroups

To add newsgroups to Omea Reader, click the "Tools -> Manage Newsgroups" menu item. This brings up the "Manage Newsgroups" dialog. Clicking on the "Add..." button on the bottom left lets you add news servers.

Set the server name. The other fields fill in automatically, but you can change them. If your server requires a login (such as Microsoft's, click on the "Security" tab. Next, check "Authentication required", then set the user name and password. Finally, click "OK" to add the server.

Now you can subscribe to newsgroups. First, select a news server on the left. Then, check the newsgroups of interest on the right. By typing into the "Display newsgroups which contain:" field, you can filter the available newsgroups. Finally, click "OK", and Omea Reader will download posts.

Filtering Posts

If you are interested in only some posts, Omea Reader can filter incoming posts. For example, you can create a rule that deletes all posts from a newsgroup except those containing particular words. First click on the menu item "Tools -> Manage Rules -> Action Rules". Then, in the dialog, click the "New..." button on the right. This brings up the "New Action Rule" dialog. Give the rule a name, such as "Relevant". Then, under the "Conditions" field, click on "Add Condition...". Next, select "News Conditions -> Appeared in the specified newsgroup(s)", then click the "OK" button. This adds an entry in the "Conditions" field. Click on the blue "specified" link to select the newsgroups to filter.

Then, click on "Add Exception..." under the "Exceptions" field. Next, select "Text Query Conditions -> Matching query in the body", then click the "OK" button. This adds an entry in the "Exceptions" field. Click on the blue "query" link to enter keywords separated by "OR". Keywords can be single words, or phrases in double-quotes. For example, I use this query to keep only posts related to full-text searching: full-text OR fulltext OR "full text". To also search the post subject, repeat these steps, but select the exception "Matching query in the subject/header".

Finally, click on "Add Action..." under the "Actions" field. Check "Delete resource permanently", then click the "OK" button. This takes you back to the "Rules Manager" dialog. Click the "OK" button here to save your rule.

To apply your rule immediately, select the "Actions -> Apply Rules" menu item. Here, select "All resources", and check your rule. Finally, click the "OK" button.

Adding Feeds

Feeds let you keep up with forums, blogs, search engine results, SharePoint lists, etc. I use three types of feeds:
  1. Direct feeds: Many blogs, forums, and SharePoint sites provide feeds. For example, MSDN lets you subscribe to a forum, or even to searches on forums. Look for the RSS icon , or a link similar to "Subscribe to Feed".
  2. Search Engines: Some engines let you subscribe to searches as feeds. This is good for tracking a topic, or searching a site (such as forums without direct feeds). On Bing, transform searches into feeds by adding "&format=rss" to the URL. Google provides Google Alerts, which support various searches, including blog searches. In both engines, use "" to search only a particular site. Unfortunately, these feeds return items as the search engine crawls them, so they can sometimes be very stale.
  3. Scraping pages: When there's no direct feed, and freshness is important, Dapper provides an easy way to scrape pages as RSS feeds. This is also good for scraping data not typically in feeds, such as the result count for a search.

Some representative examples of feeds I check are:

  • MSDN feed for the SQL Server XML forum
  • Bing searches over SQL Monster, such as ("column set" OR "column sets")
  • Google Blog searches (through Google Alerts), such as ("sparse columns" OR semi-structured) "sql server" -job.
  • SharePoint lists on project sites, such as the Tasks or Announcements list.

However you obtain the feed, copy its URL with Ctrl-C. Then, in Omea Reader, click the menu item "Tools -> Subscribe to a Feed." The dialog that pops up will auto-paste the feed URL into the proper field, so just hit "Next >". Omea Reader will then fetch the feed's name. Finally, click the "Finish" button to add the feed. Repeat these steps with as many feeds as you like.

Daily Use

After adding newsgroups and feeds, click on the "All Resources" tab to see your posts. I recommend deleting all these initial posts and starting fresh. Then, each day, start Omea Reader. It will download the day's posts automatically.

You can answer newsgroup posts from Omea Reader. For feed posts, clicking on the "Source" link takes you to the original post, where you can reply through the site's interface. This daily routine takes only a few minutes a day, and is much more efficient than visiting each newsgroup, blog, and forum individually.


Tuesday, June 23, 2009

Productive Tension, Checks and Balances

Many software houses have caught on to using separate teams for writing and testing code. This is a great example of building productive tension: developers take pride in writing code with no bugs, and testers take pride in catching bugs the developers missed. This is one of the checks and balances that leads to better software.

Recently, I've been thinking about how one beauty of the program manager role is that it adds its own dimension to this productive tension. The key is that PMs champion the customer when designing functionality, yet have no authority over developers and testers. I've seen this build productive tension in two ways.

The first is a tension over design. Part of a PM's job is to empathize with customers. We want to listen to their asks, anticipate what they need, and design conceptually clean and complete features. In other words, we worry first about customer functionality, then about implementation. However, developers worry first about tractable features, then about "ideal" functionality. This difference in primary goals builds productive tension. But the overall overlap in goals means the design iterates until the PM feels pretty good about the customer functionality, and the developers feel pretty good about tractability. Since each role champions a different goal, both desirable, the result tends to be good compromises.

The second is a tension between influence and authority. While the PM is responsible for designing features, he cannot tell developers and testers "do it or leave". He's not their manager. At the same time, their manager does not design functionality (though he or she typically has valuable input). The result is that no functional design is by mandate: if the PM has an idea, he has to convince the developers, testers, and their managers; if the manager has an idea, he or she has to convince the PM. Having to convince several other smart people, over whom you have no authority, is a great way to select for good ideas.

These are the ways I've noticed PMs contributing to productive tension, and to the checks and balances that make for good software. Anything else I should look for?


Wednesday, June 10, 2009

Being a Mentee

A good mentor is one of those things that's often listed as a key ingredient in success. It can let you shortcut long and potentially painful learning by teaching you the lessons of someone who, since they're successful, has presumably learned the right lessons. Observing a mentor, and asking the right questions, gives you potently distilled experience.

I've learned from a lot of people, but I've only had a true mentor/mentee relationship with a few. It's not always a relationship that comes naturally, and I've had to learn to be a better mentee. I've just started with a new PM mentor, and it's gotten me thinking about things I wish I had known since my first:

Know what you want to learn: be specific. Don't look for a "life mentor", or even a "business mentor". Know that you want to learn leadership and influence, or to write and communicate clearly, or how to manage a group, or how to differentiate against competitors. But keep an open mind: one thing your mentor may teach you is what you really ought to be learning.

Pick the right mentor: look for a few key characteristics. They should be successful, meaning they've gotten where you want to go, using what you want to learn. They should also be willing to answer questions. Ideally, look for someone you can observe, rather than only meeting at set times. And, while not necessary, it certainly helps if you get along.

Drive the relationship: learn actively, not passively. Ask questions, and invite feedback. Always look for what you can do or ask that will tease out important lessons from your mentor's actions and experience.

Assume they're right: at least initially, anyway. This was a hard-earned lesson for me. If your mentor does something that seems odd, don't assume they're wrong. Instead, assume they know what they're doing, and try to figure out what you're missing. Remember that you chose this person precisely because they're successful, so don't assume you know better.

My Ph.D. advisor really drove this lesson home. He is a great researcher, and I was learning to do research. Sometimes, he would suggest ideas I just knew were wrong. Being strong-willed, I would push back. Tensions often rose. Yet time after time, in retrospect, I found his ideas worked better in our papers. I finally realized that while I measured ideas only on technical merit, he also considered their conceptual cleanliness, how well they would sell in a paper, current trends in the field, conversations with other professors, etc, etc. Instead of assuming I was right, I should have assumed he was right, and tried to figure out why.

Any other lessons I should take into this new mentorship?


Tuesday, May 26, 2009

Preparing canned explanations

As a developer, I spent a lot of time explaining things to computers (in other words, coding). Starting as a graduate student, and now as a program manager, I spend a lot of time explaining things to people. Turns out that's often harder.

It's particularly hard with canned explanations, such as presentations and specs, where you prepare the explanation in advance. If your audience doesn't understand something, you may lose them; you may not get the chance to try another tack.

Preparing canned explanations is an art, and I know just enough to realize I know almost nothing. Due to the patient mentoring of a great explainer, and through watching several others, I've managed to learn a few pointers. The most important is to tell a story. This is very broad, and includes:

  • Start with what they know. Don't throw your audience in the deep end. Start with common knowledge, then build to new things.
  • Give an intuitive overview. Tell the audience where you're headed. Suspense and surprise work sometimes, but don't keep them guessing about your point. A rule of thumb is "tell them what you'll tell them, then tell them, then tell them what you told them."
  • Flow is everything. Top down, bottom up, temporally, procedurally (first, then, next, finally)...however you do it, make points flow from one to another. If you break the flow, or have none, it's easy to lose your audience.
  • Make every word count. Be brief. Try to make one point at a time. Then, drop unnecessary points.
  • Be concrete. Define nouns, quantify adjectives, and favor processes over vague verbs. Numbers, anecdotes, and examples go a long way.
  • Use the conclusion. The audience may skip everything else, but they'll usually wake up for the conclusion. At minimum, reiterate your story outline and emphasis your main points.

Apart from telling a story, I've learned a couple tips about polishing explanations:

  • Get feedback on your drafts. Give practice presentations, ask peers to review your specs, etc. Feedback from first-timers is invaluable, so don't waste fresh eyes: have some people review your first draft, others your second, and so forth. It also helps if someone reads multiple drafts, but make sure they're either very patient, or owe you a favor.
  • Insist on criticism. Ask what you could have done better. Insist people tell you something. There's always something to improve, and people can be surprisingly reluctant to give criticism face-to-face.

I'm always looking for new tips, so if anyone has a good one, please share!


Tuesday, May 19, 2009

Nice thing about PMing at SQL Server

Fair warning: shameless plug ahead.

Groups within Microsoft vary widely, including everything from the culture, to the engineering process...even the acronyms. Makes it hard to paint the whole company with one brush.

For instance, take SQL Server. We're not the 500lb gorilla; that's Oracle. In the database arena, we're one of the scrappy underdogs, and it shows in the culture. At the same time, our division is one of Microsoft's established workhorses; we're not an experimental foray into a new business. That also shows in the culture.

It's an interesting combination. It gives us constant drive and motivation, but directed by a maturity that comes from knowing we impact the company's bottom line. As a PM, I really feel both of these factors. I'm always encouraged to think about differentiating, and solving the biggest customer pain points. At the same time, I know we're working on cool stuff that will actually make a difference.


Monday, May 18, 2009

Defining Program Management

I joined Microsoft SQL Server as a Program Manager in January. Before then, my direct PM experience was herding 20 undergraduates into developing a video game for a class, reading parts of "The Art of Project Management", and preparing for the Microsoft interview. In other words, I'm still very new at this.

About the first thing I learned is that the PM role is hard to define (which is part of what makes it fun).

I've heard developers describe their role as "we fix bugs", and testers as "we find bugs". Oversimplified, but, given a broad enough definition of "bugs", mostly accurate. It's harder for PMs. "We write specs" fits the pattern, but no definition of "spec" covers most of what we do. Scott Burken renamed his book "Making Thing Happen", but that's not really concrete. A developer friend suggested "we schedule meetings", which, while funny, isn't very specific.

In an effort to understand and communicate the role, here's a list of what I think it means to be a PM (which, by the way, is a very PM-ish thing to do). The goal is to keep it updated over time, and to track the changes.

What do you think? Anything I should add to or take off the list?

So, what do Program Managers do? I (currently) think

  • We make things easier: for customers, by designing good features; for engineers, by getting stuff out of their way; for managers, by abstracting away the project's day-to-day; for sales, by differentiating and evangelizing.
  • We (try to) understand customers: listen to them, shadow them, think like them, and champion them.
  • We drive consensus: get stakeholders to agree on a vision.
  • We champion the vision: keep the project on-target.
  • We understand the business: how do we differentiate? What's the next big thing?
  • We design the functionality: specify what the feature does, and help developers design how it does it.
  • We project manage: plan the project, track progress, and manage risks.
  • We explain things: communicate between engineering, customer, and business worlds, through presentations and writing.
  • We evangelize: get people excited about the project.