A Near-Apocalyptic SQL Disaster Scenario
There once was an invoicing system that, when it worked, it worked very well, indeed. But when it was broken, it was horrid.
Mike B. works for a large shipping company where, if you ask any member of management, they'll tell you about how surprisingly few bugs they have in their systems. Ask the developers and they'll agree: Yes, most everything goes pretty smooth once you get past being under-staffed, but they've embraced this limitation. The dev team feels that they're organized and streamlined. So when something does show up, it can usually be attributed to a "bad day," and, unfortunately, it's typically a whopper.
How big, exactly?
Consider this: The application Mike works on, the in-house-developed invoice printing and shipment management system, controls the flow of every shipment that leaves the warehouse. Mike originally wrote parts of the original application and, with the help of a couple of teammates, eventually did a rewrite to add some new functionality that managed to increase the speed of quite a few parts of the application, much to the delight of the higher ups who funded the big project.
After months of coding and testing, all while keeping the now "legacy" system afloat, the project-close party was all high-fives and celebratory beers, and everybody was happy.
One day, weeks after the go-live, one of the major pieces of the application, the Ship-To Vendor Screen, mysteriously started timing out. First, shipping office clerks reported that it would start out in the morning by sitting there for 30 seconds. Then, by the peak of business day, it was timing out altogether. If they couldn't get to the Ship-To Vendor Screen, the clerks couldn't print bills of lading to accompany outgoing shipments. This caused a number of trucks to start queuing up outside of the loading dock, around the parking lot, and out into the street around the block.
This lead not only to some seriously ticked off drivers who would be late in picking up their shipments, but also to even angrier clerks who were forced into printing screenshots and combing through filed-away yellow copies of invoices to look up vendor details and hoping they didn't jot down the wrong vendor ID or that the copiers ran out of supplies.
When IT staff talked about planning for a worst-case scenario, they often brought up the question: "What if the warehouse turned into a smoking hole?" Well, between the delays, the exhaust, and the toner fumes, they were getting dangerously close to just such a situation.
Charged with saving the day, Mike started looking for the culprit. Previously, the page never took more than a second or so to load according to the automated tests they had run during development, so this was a surprise.
So, Mike went digging in the code behind the Vendor page and narrowed the slowness down to this particular function:
Public function AssignedVendors(IntGroup)
dim ObjConn, ObjRs, StrTemp, StrBg, StrSelTemp, StrCurrentGroup, StrGroup
set ObjConn = CreateConnection("SQLServer")
set ObjRs = server.createobject("ADODB.RecordSet")
ObjRs.Open "exec spGetGroupVendors @Group=" & IntGroup & "", ObjConn
StrBg = "lvl12"
StrTemp = "<table>"
StrTemp = StrTemp & "<tr><th>Vendor</th><th>Group</th><th>Update</th><th>Delete</th></tr>"
do while not ObjRs.eof
if StrBg = "lvl12" then
StrBg = "lvl11"
StrBg = "lvl12"
StrTemp = StrTemp & "<form action='VendUpdate.asp' method='post'>
<tr class='" & StrBg & "'>" & vbcrlf
StrTemp = StrTemp & "<td>" & vbcrlf
StrTemp = StrTemp & ObjRs.fields("vendor") & "<input name='vendor_ID' value='" &
ObjRs.fields("vendor") & "' type='HIDDEN'></td>" & vbcrlf
if ObjRs.fields("groupid") <> StrCurrentGroup then
StrTemp = StrTemp & "<td>" & StrSelTemp & "</td>" & vbcrlf
StrTemp = StrTemp & "<td><INPUT Type='submit' value='Update Group'></td>" & vbcrlf
StrTemp = StrTemp & "<td><a href='venddel.asp?vend_id=" & ObjRs.fields("vendor") &
"'>Delete entry</a></td>" & vbcrlf
StrTemp = StrTemp & "</tr></form>" & vbcrlf
set ObjConn = nothing
set ObjRs = nothing
At first glance, Mike couldn't see anything obviously wrong with the code. The function accepted an identifier and returned a table of matching vendors. It wasn't doing a heck of a lot, but it was still choking.
Could it be in the database? Mike jumped into SQL Server Management Studio to see what was going on with the spGetGroupVendors stored procedure and, after running it through Query Analyzer and seeing how it returned 500 records in a flash, the data was getting there quickly enough and there weren't any odd indexing issues.
But, it was still timing out.
With managers pacing outside his cube, it was time to perform some brute force analysis so, after ensuring it wouldn't do anything odd to the display of the page, Mike changed all the "StrTemp =" and "StrTemp = StrTemp &" to "Response.Write" so he could see exactly how far the page was getting.
After the change the page displayed the same 500 problem records in less than 1 second as one humungous string.
The real root of the problem appeared to be that there was a breaking point where, after a certain amount of data was passed from a function in ASP under IIS, instead of returning an error, the system simply timed out.
After throwing together some quick code to stop the bleeding (and get those trucks moving), Mike excused himself to the restroom to wash off the stink of the office air and get his head back together. The rewrite was such a huge success, but how did things end up this way? Was it that the function was written to pass a single large string back? That this real-life scenario wasn't tested during development? Or that the purchasing manager thought it was a good idea to be able to assign 500 vendors to a single individual?
In any case, hopefully at the next rewrite they'd have a few more hands on deck.
Mark Bowytz is a contributor to the popular Web site The Daily WTF. He has more than a decade of IT experience and is currently a systems analyst for PPG Industries.