Responsible BI for Excel (Even for Older Versions)
On Wednesday, I will have the honor of co-presenting, for both The Data Warehouse Institute (TDWI) and the New York Technology Council, on the subject of Excel and business intelligence (BI). My co-presenter will be none other than Bill Baker, who was a Microsoft Distinguished Engineer and, essentially, the father of BI at that company. Details on the events are here
We'll be talking about PowerPivot, of course, but that's not all. Probably even more important than any one product, will be our discussion of whether the usual characterization of Excel as the nemesis of IT, the guilty pleasure of business users and the antithesis of formal BI is really valid and/or hopelessly intractable. Without giving away our punch line, I'll tell you that we are much more optimistic than that.
There are huge upsides to Excel. And while there are real dangers to using it in the BI space, there are standards and practices you can employ to ensure Excel is used responsibly. And when those practices are followed, Excel becomes quite powerful indeed. One of the keys to this is using Excel as a data consumer rather than data storage mechanism. Caching data in Excel is OK, but only if that data is (a) not modified and (b) configured for automated periodic refresh.
PowerPivot meets both criteria -- it stores a read-only copy of your data in the form of a model, and once a workbook containing a PowerPivot model is published to SharePoint, it can be configured for scheduled data refresh, on the server, requiring no user intervention whatsoever. Data refresh is a bit like hard drive backup: it will only happen reliably if it's automated and super-easy to configure. PowerPivot hits a real home run here (as does Windows Home Server for PC backup, but I digress).
The thing about PowerPivot is that it's an add-in for Excel 2010. What if you're not planning to go to that new version for quite a while? What if you've just deployed Office 2007 in your organization? What if you're still on Office 2003, or an even earlier version? What can you do immediately to share data responsibly and easily?
As it turns out, there's a feature in Excel that's been around for quite a while, that can help: Web Queries. The Web Query feature was introduced, ostensibly, to allow Excel to pull data in from Internet Web pages... for example, data in a stock quote history table will come in nicely, as will any data in a Web page that is displayed in an HTML table. To use the feature In Excel 2007 or 2010, click the Data Tab or the ribbon and click the "From Web" button towards the left; in older versions use the corresponding option in the menu or toolbars. Next, paste a URL into the resulting dialog box and tap Enter or click the Go button. A preview of the Web page will come up, and the dialog will allow you to select the specific table within the page whose data you'd like to import. Here's an example:
[Click on image for larger view.]
Now just click the table, click the Import button, and the Import Data dialog appears. You can simply click OK to bring in your data or you can first click the Properties... button and configure the data import to be refreshed at an interval in minutes that you select. Now your data's in the spreadsheet and ready to worked with:
Your data may be vulnerable to modification, but if you've set up the data refresh, any accidental or malicious changes will be corrected in time anyway.
The thing about this feature is that it's most useful not for public Web pages, but for pages behind the firewall. In effect, the Web Query feature provides an incredibly easy way to consume data in Excel that's "published" from an application. Users just need a URL. They don't need to know server and database names and since the data is read-only, providing credentials may be unnecessary, or can be handled using integrated security. If that's not good enough, the Web Query can be saved to a special .iqy file, which can be edited to provide POST parameter data.
The only requirement is that the data must be provided in an HTML table, with the first row providing the column names. From an ASP.NET project, it couldn't be easier: a simple bound GridView control is totally compatible. Use a data source control with it, and you don't even have to write any code. Users can link to pages that are part of an application's UI, or developers can create pages that are specially designed for the purpose of providing an interface to the Web Query import feature. And none of this is Microsoft- or .NET-specific. You can create pages in any language you want (PHP comes to mind) that output the result set of a query in HTML table format, and then consume that data in a Web Query. Then build PivotTables and charts on the data, and in Excel 2007 or 2010 you can use conditional formatting to create scorecards and dashboards.
This strategy allows you to create pages that function quite similarly to the OData XML feeds rendered when .NET developers create an "Astoria" WCF Data Service. And while it's cool that PowerPivot and Excel 2010 can import such OData feeds, it's good to know that older versions of Excel can function in a similar fashion, and can consume data produced by virtually any Web development platform.
As a final matter, instead of just telling you that "older versions" of Excel support this feature, I'll be more specific. To discover what the first version of Excel was to support Web queries, read Microsoft's documentation, titled XL97: How to Create Web Query (.iqy) Files.
Posted by Andrew J. Brust on 05/11/2010 at 1:15 PM