Monday, March 22, 2010

Tracking Multiple Estimates with Project

MS Project is a powerful tool. However, it has certain expectations about tasks, and deviating from them can be difficult. For instance, Project presumes tasks have a single work amount and duration.

I prefer multiple estimates. I typically ask developers for a realistic and a pessimistic estimate (i.e., "should be done by, could take as long as"). Then, I also track a weighted estimate somewhere between these, based on my confidence in the given estimates. Finally, I want a projected duration based on how long the work has taken so far (e.g., if a task is half done in a week, the projection should be two weeks).

Since Project expects only one work/duration value, tracking progress and making predictions against multiple estimates is hard. You can create extra columns for them, but these don't integrate well with many features, such as Gantt charts. And, as far as I can tell, the projected duration isn't a built-in option.

To work around these issues, I created a Project 2010 template that lets me track work items with multiple estimates. The key advantages are:

  • Easy pessimistic/optimistic/etc estimates, Gantt charts, and end dates.
  • Projected estimate based on how long tasks have taken so far.
  • Simple steps to track progress, whether reported as percent done or time left.

Here are links to the template and to instructions. I think it's fairly easy to use, but I'm very interested in any feedback. If you give it a try, let me know!


Edit: A commenter requested a Project 2007 version of the template, so I've put one here. However, not having actually used it myself in Project 2007, I can't guarantee how well it'll work.

Tuesday, January 26, 2010

Economics of Extraction

As I stated in my last post, managing unstructured data is increasingly crucial. A common estimate bandied about is that upwards of 80% of enterprise data is unstructured, including office documents, email, etc. The need to manage the information represented by these bits isn't just theoretical. It's painfully real. Real enough that people pay lots of money for third party solutions to help them do this.

To give an idea of exactly how much money, here are prices for some of the top players in this roughly $2.5 billion market:

  • Autonomy IDOL Server: $220K bundled
  • SAS Enterprise Miner: $100-$400K in 2001
  • Open Text Enterprise 2.0: $600K for 1,000 users
(Thanks to Naveen Garg, a fellow PM, for these data).

Consider these numbers in the context of my last post. Not only is there a good conceptual argument for extraction in databases, there's also a clear customer need. If there wasn't pain, vendors couldn't charge hundreds of thousands of dollars for a solution. And that's what customers are willing to pay for a solution that's not fully integrated into the database: a separate system to buy and maintain and support.

Imagine what they'd think of true unstructured data management as a first-class database feature.

Wednesday, January 6, 2010

Databases Need Extraction

Databases are traditionally awful at managing unstructured data, such as office documents, media files, or large blocks of text. Yet users still want to store documents in them. The reason is that documents often have associated structured data already in the database. For instance, an MP3 has an embedded artist, album, and title, which are typically mirrored in a database so they can be queried. Likewise, a resume may be a Word document, but the applicant's name and contact information were probably typed into a form and stored in a table.

Managing documents and their associated structured data separately is painful. Just consider the common approach of keeping the file in the file system and storing its path in the database. What if the artist embedded in the MP3 changes? Or the file is moved, or deleted? Issues like consistency control, synchronizing backups, queries over both structured and unstructured data, and even supporting multiple systems can become a nightmare. So people start putting documents in the database.

This is a call to arms: as database people, if there's a compelling reason for users to store data in a database, the database should help manage it. So what's involved in managing unstructured data?

Read more...