Monday, March 22, 2010

Tracking Multiple Estimates with Project

MS Project is a powerful tool. However, it has certain expectations about tasks, and deviating from them can be difficult. For instance, Project presumes tasks have a single work amount and duration.

I prefer multiple estimates. I typically ask developers for a realistic and a pessimistic estimate (i.e., "should be done by, could take as long as"). Then, I also track a weighted estimate somewhere between these, based on my confidence in the given estimates. Finally, I want a projected duration based on how long the work has taken so far (e.g., if a task is half done in a week, the projection should be two weeks).

Since Project expects only one work/duration value, tracking progress and making predictions against multiple estimates is hard. You can create extra columns for them, but these don't integrate well with many features, such as Gantt charts. And, as far as I can tell, the projected duration isn't a built-in option.

To work around these issues, I created a Project 2010 template that lets me track work items with multiple estimates. The key advantages are:

  • Easy pessimistic/optimistic/etc estimates, Gantt charts, and end dates.
  • Projected estimate based on how long tasks have taken so far.
  • Simple steps to track progress, whether reported as percent done or time left.

Here are links to the template and to instructions. I think it's fairly easy to use, but I'm very interested in any feedback. If you give it a try, let me know!

Edit: A commenter requested a Project 2007 version of the template, so I've put one here. However, not having actually used it myself in Project 2007, I can't guarantee how well it'll work.


Tuesday, January 26, 2010

Economics of Extraction

As I stated in my last post, managing unstructured data is increasingly crucial. A common estimate bandied about is that upwards of 80% of enterprise data is unstructured, including office documents, email, etc. The need to manage the information represented by these bits isn't just theoretical. It's painfully real. Real enough that people pay lots of money for third party solutions to help them do this.

To give an idea of exactly how much money, here are prices for some of the top players in this roughly $2.5 billion market:

  • Autonomy IDOL Server: $220K bundled
  • SAS Enterprise Miner: $100-$400K in 2001
  • Open Text Enterprise 2.0: $600K for 1,000 users
(Thanks to Naveen Garg, a fellow PM, for these data).

Consider these numbers in the context of my last post. Not only is there a good conceptual argument for extraction in databases, there's also a clear customer need. If there wasn't pain, vendors couldn't charge hundreds of thousands of dollars for a solution. And that's what customers are willing to pay for a solution that's not fully integrated into the database: a separate system to buy and maintain and support.

Imagine what they'd think of true unstructured data management as a first-class database feature.


Wednesday, January 6, 2010

Databases Need Extraction

Databases are traditionally awful at managing unstructured data, such as office documents, media files, or large blocks of text. Yet users still want to store documents in them. The reason is that documents often have associated structured data already in the database. For instance, an MP3 has an embedded artist, album, and title, which are typically mirrored in a database so they can be queried. Likewise, a resume may be a Word document, but the applicant's name and contact information were probably typed into a form and stored in a table.

Managing documents and their associated structured data separately is painful. Just consider the common approach of keeping the file in the file system and storing its path in the database. What if the artist embedded in the MP3 changes? Or the file is moved, or deleted? Issues like consistency control, synchronizing backups, queries over both structured and unstructured data, and even supporting multiple systems can become a nightmare. So people start putting documents in the database.

This is a call to arms: as database people, if there's a compelling reason for users to store data in a database, the database should help manage it. So what's involved in managing unstructured data?

At a high level, there are two types of management tasks over unstructured data:
  1. Managing the bits: let users efficiently insert, update, and delete documents, as well as seek within them, stream them, etc. This includes other typical data services, such as backup and restore.
  2. Managing the information represented by the bits: an MP3 has an artist, beats-per-minute, and lyrics, while a resume mentions schools, companies, and skills. To fully manage such documents, users need to query this information.

Recently, databases have gotten better at managing unstructured bits. For example, SQL Server has FILESTREAM, which improves streaming performance, and Remote Blob Storage, which stores unstructured data in a dedicated file server.

However, managing the information represented by the bits requires extraction. Specifically, it requires cracking open the document, extracting or inferring interesting content, then exposing it as queryable structured data. This means extraction is not a side task; it is an integral part of managing unstructured data.

Full-text search in databases is a step in this direction, but there's a lot more work to do. When we put MP3s in a table, we should be able to query them by artist, title, or even lyrics. When we store a resume, we should be able to find related job descriptions, or join the schools it mentions to our Employee table to find old classmates that work here.

It's estimated that about 80% of the data out there is unstructured, and thus probably not in databases. If we're going to take a serious shot at managing it, then databases need extraction.


Monday, November 9, 2009

SQL PASS 09 - Semi-Structured Data in SQL Server

A couple coworkers and I just had the great opportunity to present a post-conference session at SQL PASS 2009. The session topic was "Managing Unstructured and Semi-Structured Data with SQL Server"; I own the semi-structured features in SQL Server, so I presented that part of the session.

For simplicity, think of "semi-structured data" as structured data that's not relational. To manage semi-structured data, SQL Server 2005 introduced XML, which is powerful and general (as someone once said, tongue in cheek, "XML is like violence: if it doesn't solve your problem, you're not using enough of it"). However, XML isn't ideal for certain situations, and doesn't integrate perfectly with relational databases.

Enter SQL Server 2008, which introduces some really useful features for managing semi-structured data more relationally. Briefly, if you find yourself managing the following kinds of data, take a look at these features:

  • Trees or hierarchies (e.g., product categories, org charts, conversation threads): check out hierarchyid, a new type representing nodes in a tree.
  • Properties that apply to only some items (e.g., product descriptions, custom document properties, tagging): check out the SPARSE column option, which gives you NULLs that take no space, constant-time addition of new columns, and up to 30,000 columns in a table.
  • Tables where only some rows need indexing (e.g., columns with many NULLs, "hot" rows, heterogeneous row groups): check out filtered indexes, where you specify which rows are indexed. This makes the index much smaller, yields faster look-ups, and avoid unnecessary index maintenance. Filtered indexes are cool enough that I'll make a post about just them sometime soon.

For those interested, here's a slide deck and demo T-SQL code on these features, focusing on how you can use them to solve real-world problems. If you have any feedback or suggestions — about the deck, the demo code, or the features themselves — please post a comment. For example, here are some of the feedback and takeaways I got from attendees at SQL PASS:

  • One attendee wondered how hierarchyid compares to left-right value tables (a.k.a. nested set model, or modified preorder traversal) for representing trees. Briefly, hierarchyid has the same sub-tree properties, so is equally good at descendant queries. Furthermore, it has three advantages: more compact (one column averaging 10 bytes per value can represent ~100M nodes); constant-time node insertion anywhere in the tree; and easier, more efficient level queries (e.g., (grand)children of a node, all nodes with level 3).
  • People seem to like column sets, especially how they automatically shred XML inserts and updates into the proper sparse columns.
  • People also seem to really like filtered indexes. One "wow" example was choosing which rows were worth indexing based on your workload. To illustrate, suppose table SoldAt tracks which products are sold at which stores. If only some stores have enough traffic to warrant indexing, you can create an index for just those stores with CREATE INDEX SoldAt_Key_Stores ON SoldAt(...) WHERE StoreId IN (1, 4, ...).
  • Some people mentioned that they often need to manually update filtered statistics (such as are generated automatically for filtered indexes) to ensure good plans. This is something we can look into.

I'd love to hear more, so please comment away!


Tuesday, August 25, 2009

Five-Minute Ph.D.

As my Ph.D. defense nears, I'm thinking a lot about the most important lessons I learned. Here are my top five:

  1. Don't look for reasons to fail; find ways to succeed. If something should or must be done, find a way to do it.
  2. First figure out the right thing to do. Only then think about implementation, and see how close you can come. Even if you can't reach the ideal, at least you'll be pushing in the right direction.
  3. Understand the problem deeply. Any good problem solver can hack a good solution quickly. What's more valuable is identifying the true underlying problem, and how it relates to other problems. This tells you if something is a true solution, and helps identify opportunities.
  4. Think in a structured, disciplined way. First, separate out orthogonal issues. Then, solve them incrementally and iteratively. Don't try to attack the whole mess at once.
  5. Finally, when communicating with others, tell a story. Start with something familiar, then make sure your ideas flow.

Those are the big ones. The gems of a Ph.D. education, in five easy minutes. Interestingly, none of these are particularly technical. But deeply technical things are limited in application. I think that's the real secret: the work you do for a Ph.D. is technical, but a good Ph.D. is about becoming a better thinker and communicator.


Wednesday, July 29, 2009

Tutorial: Using Omea Reader to Listen to Customers

One of my daily tasks is listening to customers. This involves checking blogs, forums, and newsgroups (they're still around!). I also check several Intranet sites, such as SharePoint sites for my projects. Doing this every day was taking a substantial amount of time. The problem wasn't digesting the information, or acting on it; it was getting to it. So I went looking for a one-stop solution. My requirements were:

  • Sensible interface for reading feeds, such as RSS and Atom.
  • Handles newsgroups (the customers use them, thus so must I).
  • Desktop application, so that it can access the company Intranet.

I found Omea Reader, a free desktop application that reads newsgroups and feeds. Below, I give step-by-step instructions on how I set up and use it.

Setting up Omea Reader

First, download Omea Reader from here, then install it. This is a typical installation process, and the defaults work fine (though I chose not to install the Firefox plugin).

After installing, run Omea Reader. It asks about a Database Path and Log Files Path. The defaults here are also fine, so hit "OK". Next, it provides some import options. Just hit "Next >" until it requests User Information. Then, Enter your first and last name. Then, click "Add..." to add an email address. Finally, click "Finish".

At this point, Omea starts. We'll want to set some options, so go to Tools -> Options. I changed these options:

  • General
    • "Mark item read after displaying for 0 seconds"
    • "Open links to Web pages In a new browser window"
  • Mail Format
    • Check "Include signature in outgoing messages", and paste in an appropriate signature (e.g., your company may require some legalese here).
  • Plugins
    • Uncheck everything except News and RSS; we won't be using Omea Reader's other functionality.

Clicked "OK", and Omea Reader restarts. Then, click on the "All Resources" tab and set options in the "View" menu. I set the following options:

  • Uncheck "Shortcut Bar"
  • Uncheck "Workspace Bar"
  • Check "AutoPreview -> All Items"

Omea Reader now looks something like this:

Finally, after setting options, delete the feeds that came with Omea Reader. First, click on the "Feeds" tab. Then, in the bottom-left section, you'll see three feeds. Select all of these, then delete them with the Delete key.

Adding Newsgroups

To add newsgroups to Omea Reader, click the "Tools -> Manage Newsgroups" menu item. This brings up the "Manage Newsgroups" dialog. Clicking on the "Add..." button on the bottom left lets you add news servers.

Set the server name. The other fields fill in automatically, but you can change them. If your server requires a login (such as Microsoft's, click on the "Security" tab. Next, check "Authentication required", then set the user name and password. Finally, click "OK" to add the server.

Now you can subscribe to newsgroups. First, select a news server on the left. Then, check the newsgroups of interest on the right. By typing into the "Display newsgroups which contain:" field, you can filter the available newsgroups. Finally, click "OK", and Omea Reader will download posts.

Filtering Posts

If you are interested in only some posts, Omea Reader can filter incoming posts. For example, you can create a rule that deletes all posts from a newsgroup except those containing particular words. First click on the menu item "Tools -> Manage Rules -> Action Rules". Then, in the dialog, click the "New..." button on the right. This brings up the "New Action Rule" dialog. Give the rule a name, such as "Relevant". Then, under the "Conditions" field, click on "Add Condition...". Next, select "News Conditions -> Appeared in the specified newsgroup(s)", then click the "OK" button. This adds an entry in the "Conditions" field. Click on the blue "specified" link to select the newsgroups to filter.

Then, click on "Add Exception..." under the "Exceptions" field. Next, select "Text Query Conditions -> Matching query in the body", then click the "OK" button. This adds an entry in the "Exceptions" field. Click on the blue "query" link to enter keywords separated by "OR". Keywords can be single words, or phrases in double-quotes. For example, I use this query to keep only posts related to full-text searching: full-text OR fulltext OR "full text". To also search the post subject, repeat these steps, but select the exception "Matching query in the subject/header".

Finally, click on "Add Action..." under the "Actions" field. Check "Delete resource permanently", then click the "OK" button. This takes you back to the "Rules Manager" dialog. Click the "OK" button here to save your rule.

To apply your rule immediately, select the "Actions -> Apply Rules" menu item. Here, select "All resources", and check your rule. Finally, click the "OK" button.

Adding Feeds

Feeds let you keep up with forums, blogs, search engine results, SharePoint lists, etc. I use three types of feeds:
  1. Direct feeds: Many blogs, forums, and SharePoint sites provide feeds. For example, MSDN lets you subscribe to a forum, or even to searches on forums. Look for the RSS icon , or a link similar to "Subscribe to Feed".
  2. Search Engines: Some engines let you subscribe to searches as feeds. This is good for tracking a topic, or searching a site (such as forums without direct feeds). On Bing, transform searches into feeds by adding "&format=rss" to the URL. Google provides Google Alerts, which support various searches, including blog searches. In both engines, use "" to search only a particular site. Unfortunately, these feeds return items as the search engine crawls them, so they can sometimes be very stale.
  3. Scraping pages: When there's no direct feed, and freshness is important, Dapper provides an easy way to scrape pages as RSS feeds. This is also good for scraping data not typically in feeds, such as the result count for a search.

Some representative examples of feeds I check are:

  • MSDN feed for the SQL Server XML forum
  • Bing searches over SQL Monster, such as ("column set" OR "column sets")
  • Google Blog searches (through Google Alerts), such as ("sparse columns" OR semi-structured) "sql server" -job.
  • SharePoint lists on project sites, such as the Tasks or Announcements list.

However you obtain the feed, copy its URL with Ctrl-C. Then, in Omea Reader, click the menu item "Tools -> Subscribe to a Feed." The dialog that pops up will auto-paste the feed URL into the proper field, so just hit "Next >". Omea Reader will then fetch the feed's name. Finally, click the "Finish" button to add the feed. Repeat these steps with as many feeds as you like.

Daily Use

After adding newsgroups and feeds, click on the "All Resources" tab to see your posts. I recommend deleting all these initial posts and starting fresh. Then, each day, start Omea Reader. It will download the day's posts automatically.

You can answer newsgroup posts from Omea Reader. For feed posts, clicking on the "Source" link takes you to the original post, where you can reply through the site's interface. This daily routine takes only a few minutes a day, and is much more efficient than visiting each newsgroup, blog, and forum individually.


Tuesday, June 23, 2009

Productive Tension, Checks and Balances

Many software houses have caught on to using separate teams for writing and testing code. This is a great example of building productive tension: developers take pride in writing code with no bugs, and testers take pride in catching bugs the developers missed. This is one of the checks and balances that leads to better software.

Recently, I've been thinking about how one beauty of the program manager role is that it adds its own dimension to this productive tension. The key is that PMs champion the customer when designing functionality, yet have no authority over developers and testers. I've seen this build productive tension in two ways.

The first is a tension over design. Part of a PM's job is to empathize with customers. We want to listen to their asks, anticipate what they need, and design conceptually clean and complete features. In other words, we worry first about customer functionality, then about implementation. However, developers worry first about tractable features, then about "ideal" functionality. This difference in primary goals builds productive tension. But the overall overlap in goals means the design iterates until the PM feels pretty good about the customer functionality, and the developers feel pretty good about tractability. Since each role champions a different goal, both desirable, the result tends to be good compromises.

The second is a tension between influence and authority. While the PM is responsible for designing features, he cannot tell developers and testers "do it or leave". He's not their manager. At the same time, their manager does not design functionality (though he or she typically has valuable input). The result is that no functional design is by mandate: if the PM has an idea, he has to convince the developers, testers, and their managers; if the manager has an idea, he or she has to convince the PM. Having to convince several other smart people, over whom you have no authority, is a great way to select for good ideas.

These are the ways I've noticed PMs contributing to productive tension, and to the checks and balances that make for good software. Anything else I should look for?


Wednesday, June 10, 2009

Being a Mentee

A good mentor is one of those things that's often listed as a key ingredient in success. It can let you shortcut long and potentially painful learning by teaching you the lessons of someone who, since they're successful, has presumably learned the right lessons. Observing a mentor, and asking the right questions, gives you potently distilled experience.

I've learned from a lot of people, but I've only had a true mentor/mentee relationship with a few. It's not always a relationship that comes naturally, and I've had to learn to be a better mentee. I've just started with a new PM mentor, and it's gotten me thinking about things I wish I had known since my first:

Know what you want to learn: be specific. Don't look for a "life mentor", or even a "business mentor". Know that you want to learn leadership and influence, or to write and communicate clearly, or how to manage a group, or how to differentiate against competitors. But keep an open mind: one thing your mentor may teach you is what you really ought to be learning.

Pick the right mentor: look for a few key characteristics. They should be successful, meaning they've gotten where you want to go, using what you want to learn. They should also be willing to answer questions. Ideally, look for someone you can observe, rather than only meeting at set times. And, while not necessary, it certainly helps if you get along.

Drive the relationship: learn actively, not passively. Ask questions, and invite feedback. Always look for what you can do or ask that will tease out important lessons from your mentor's actions and experience.

Assume they're right: at least initially, anyway. This was a hard-earned lesson for me. If your mentor does something that seems odd, don't assume they're wrong. Instead, assume they know what they're doing, and try to figure out what you're missing. Remember that you chose this person precisely because they're successful, so don't assume you know better.

My Ph.D. advisor really drove this lesson home. He is a great researcher, and I was learning to do research. Sometimes, he would suggest ideas I just knew were wrong. Being strong-willed, I would push back. Tensions often rose. Yet time after time, in retrospect, I found his ideas worked better in our papers. I finally realized that while I measured ideas only on technical merit, he also considered their conceptual cleanliness, how well they would sell in a paper, current trends in the field, conversations with other professors, etc, etc. Instead of assuming I was right, I should have assumed he was right, and tried to figure out why.

Any other lessons I should take into this new mentorship?


Tuesday, May 26, 2009

Preparing canned explanations

As a developer, I spent a lot of time explaining things to computers (in other words, coding). Starting as a graduate student, and now as a program manager, I spend a lot of time explaining things to people. Turns out that's often harder.

It's particularly hard with canned explanations, such as presentations and specs, where you prepare the explanation in advance. If your audience doesn't understand something, you may lose them; you may not get the chance to try another tack.

Preparing canned explanations is an art, and I know just enough to realize I know almost nothing. Due to the patient mentoring of a great explainer, and through watching several others, I've managed to learn a few pointers. The most important is to tell a story. This is very broad, and includes:

  • Start with what they know. Don't throw your audience in the deep end. Start with common knowledge, then build to new things.
  • Give an intuitive overview. Tell the audience where you're headed. Suspense and surprise work sometimes, but don't keep them guessing about your point. A rule of thumb is "tell them what you'll tell them, then tell them, then tell them what you told them."
  • Flow is everything. Top down, bottom up, temporally, procedurally (first, then, next, finally)...however you do it, make points flow from one to another. If you break the flow, or have none, it's easy to lose your audience.
  • Make every word count. Be brief. Try to make one point at a time. Then, drop unnecessary points.
  • Be concrete. Define nouns, quantify adjectives, and favor processes over vague verbs. Numbers, anecdotes, and examples go a long way.
  • Use the conclusion. The audience may skip everything else, but they'll usually wake up for the conclusion. At minimum, reiterate your story outline and emphasis your main points.

Apart from telling a story, I've learned a couple tips about polishing explanations:

  • Get feedback on your drafts. Give practice presentations, ask peers to review your specs, etc. Feedback from first-timers is invaluable, so don't waste fresh eyes: have some people review your first draft, others your second, and so forth. It also helps if someone reads multiple drafts, but make sure they're either very patient, or owe you a favor.
  • Insist on criticism. Ask what you could have done better. Insist people tell you something. There's always something to improve, and people can be surprisingly reluctant to give criticism face-to-face.

I'm always looking for new tips, so if anyone has a good one, please share!


Tuesday, May 19, 2009

Nice thing about PMing at SQL Server

Fair warning: shameless plug ahead.

Groups within Microsoft vary widely, including everything from the culture, to the engineering process...even the acronyms. Makes it hard to paint the whole company with one brush.

For instance, take SQL Server. We're not the 500lb gorilla; that's Oracle. In the database arena, we're one of the scrappy underdogs, and it shows in the culture. At the same time, our division is one of Microsoft's established workhorses; we're not an experimental foray into a new business. That also shows in the culture.

It's an interesting combination. It gives us constant drive and motivation, but directed by a maturity that comes from knowing we impact the company's bottom line. As a PM, I really feel both of these factors. I'm always encouraged to think about differentiating, and solving the biggest customer pain points. At the same time, I know we're working on cool stuff that will actually make a difference.