Why Analysts Can't ONLY use Excel 2010

Submitted by jonpearce on Fri, 2011-03-18 14:01

In a previous posting, we extolled the virtues of Excel 2010 for healthcare data analysts.  Excel 2010 offers significant advantages over previous versions of Excel, as that article describes. 
 
However, anyone who has done in-depth work with with significant volumes of healthcare data knows that you can't do everything with Excel.  In fact, you can't do MOST things with Excel.  You can't do on-the-fly data cleanup with Excel since you don't have control of the input stream until it gets deposited in the spreadsheet.  It would be virtually impossible to do HCC grouping, or any other type of complex lookup in Excel, since those processes require multiple steps.  Without using PowerPivot, you can't even fit most claims datasets into Excel. 
 
Excel excels (unsurprisingly enough) at presentation and analysis of data that has already been loaded, cleaned up, enhanced and organized for analysis.  Combined with the right back-end tools, it can allow users to rapidly pivot through multiple dimensions of data using a variety of metrics, drilling down into to the data to unearth knowledge and guide actions.  Excel also provides detailed graphical capabilities to allow depicting those metrics and dimensions in a way that users can rapidly discern patterns and relationships.  It's the right tool for that part of the process.
 
But something needs to create those dimensions and metrics before they can be served up to Excel.  And something else needs to clean up the data before the data can be used for much of anything.  That's where the "heavy lifting" comes in. 
 
There are three major components necessary to build a claims data analytics system.  You need something that will extract, transform and load the data (called the "ETL" process), a robust database system for the cleanup and intermediate processing, and an analysis engine that can structure the data into a multdimensional format so that it can be served up to Excel or whatever analsis or reporting tool you're using.  Multi-dimensional data strucures are key to working with large amounts of healthcare data, since a multidimensional structure is much faster than a relational database, and can allow "stream of conciousness" analysis where the user can utilize the results of one analysis immediately to structure the next one. 
 
Singletrack Analytics uses the Microsoft data stack for these tasks (which is no surprise considering we're a Microsoft Partner and our team has multiple Microsoft certifications in Business Intelligence).  We use SQL Server for the database tasks since it's easily available; relatively inexpensive; has a wealth of documentation, training and online expertise available; and is scalable from a small installation to the multi-terabyte databases that one of our clients uses for a significant population that it manages.  SQL Server Integration Services (SSIS) provides the ETL engine, allowing high-speed data loading with very granular control over the data loaded, and allowing quality management steps to be implemented before the data even gets into the initial databases.  And SQL Server Analysis Services (SSAS) provides the multidimensional structure that drives the back-end analytics, provding a quick, customized user experience for the analyst. These tools aren't generally found on analysts' desktops, but they're essential to a healthcare organization that believes that data analytics are key to their success. 
 
Without the right tools, large buildings can't be built.  And neither can healthcare organizations.