The Danger of Received Wisdom (or, Stupid Things I've Said)
Peter reviews some of the more spectacularly incorrect things he's told other people and then draws some conclusions.
There's a limit to how many words I'm allowed to put into this column. Patently, I've said so many foolish things in my life that I can't list them all here without exceeding that limit. Instead, I've picked out four highlights that will let me draw a conclusion at the end of this column (and if you want to skip me embarrassing myself and just go to the end to read the conclusion, I'm OK with that).
Learning from One Who Knows
I used to present at conferences (one year, I did 18 presentations over about a dozen conferences). One part of one of my standard presentations was about optimizing database access performance by using an ADO feature (note: not ADO.NET) -- and, quite frankly, this happened so long ago that I've forgotten what the feature was.
At one conference, the organizers invited the presenters (that would be me) to a special room where we could review our PowerPoint presentations prior to the conference start to make sure that the conference had the right version of our slides. I was busy flipping through my slides when Bill Vaughn (who knows more about SQL Server than I ever will) sat down beside me to review his slides. He happened to glance over to my presentation just as I got to the slide about this cool ADO feature. Bill said, "Actually, that's not true. We never got around to implementing that."
This slide crept into my presentation (which I had given at many, many prior conferences) because, of course, I had never actually done a performance comparison using the feature. Instead, I had believed the documentation which had been written (as is often required) before the ADO team finalized the shipping version of the package. I don't know if the ADO team ever got around to implementing the features -- I was so embarrassed that I never looked at the feature again.
Learning from My Own Experience
In the same way, I once read that I could improve performance by avoiding using * in SQL Select clauses. The rationale for this advice was that I should only retrieve the columns I actually needed and, by using * to retrieve all columns, I was slowing down my application. This advice seems obvious: More columns = slower performance. I parroted that advice for years.
But, over time, I recognized that I wasted a lot of time going back to old programs and adding more columns to existing Select statements (it may not be the most common change to an existing application, but I bet it's in the top three or four). Doing that, I also recognized that these changes had no effect whatsoever on my application's performance whatsoever.
Eventually, I came to realize that adding columns only impacts performance in edge cases, not in the typical case (for example, where a table has hundreds of columns, includes BLOB fields or where there's a covering index of which you can take advantage). On the other hand, using * in my Select clauses saved me endless maintenance work.
Learning from My Client's Experience
I should admit that I like using the ASP.NET Session object to maintain state for individual users. However, in its default settings, the Session object doesn't scale well over server farms.
Therefore, for years, I recommended to my clients that they use a session state computer that would hold Session information in its memory for all the servers in a farm. I also strongly recommended that they stay away from another Session-related option: Using SQL Server to hold all the session information for the server farm. There was a cost to staying away from SQL Server because using SQL Server would improve the reliability of using a session state computer. My rationale was that, because SQL Server would hold state information on disk, it would obviously be slower than the option I recommended, which would save state information in memory.
I repeated that advice until one of my clients, craving the improved reliability that SQL Server would provide, switched to the SQL Server option. Impact on performance compared to using the in-memory session state computer: zero.
My guess is that the tables holding the Session information never got written to disk and, as a result, ran as fast as the in-memory server. But, quite frankly, who cares why I was wrong?
One last SQL and LINQ example. It was conventional wisdom among SQL gurus that subqueries in SQL statements gave you worse performance than a complex Join clause that did the same thing. On that basis, for years I encouraged developers to write ever more complex (and unreadable) Join clauses.
I was, therefore, horrified when I saw the SQL generated by the initial version of LINQ-to-Entities: That generated code used subqueries liberally. I could see why, from a code-generation point of view, it was easier for the Entity Framework team to create SQL statements by tacking on subqueries than to write complex Join statements. However, when I complained about this to a member of the SQL Server team, she pointed out that the SQL Server team also knew about subqueries. So SQL Server, as part of the SQL optimizing process, automatically treated subqueries as part of the Join clause.
So what have I learned from all this? A bunch of things: Don't trust the documentation. Nothing is intuitively obvious. The rules you have only apply in some cases. Don't assert how smart you are unless you've actually tested it yourself.
And, even then: Stuff changes. You know what? It's probably safer to assume that everything you know is wrong until you run the code.
About the Author
Peter Vogel is a system architect and principal in PH&V Information Services. PH&V provides full-stack consulting from UX design through object modeling to database design. Peter tweets about his VSM columns with the hashtag #vogelarticles. His blog posts on user experience design can be found at http://blog.learningtree.com/tag/ui/.