DevDisasters
Build-Process Insanity
Many stories aim to share a saga of how code builds are successfully integrated with each other. This is not one of those stories.
Rich worked for a tiny, privately held company with about a dozen developers. That number included the owners and a handful of consultants they regularly worked with who were technically not employees, but might as well have been.
There wasn't a lot that made this company stand out to any real degree, save for one detail: It had a special relationship with a major client, who I'll refer to as MegaCorp. MegaCorp was a multi-national, publicly traded corporation worth billions and employed thousands of people, but not a lot who were IT folks. As such, MegaCorp always outsourced quite a bit of work, and that made things pretty sweet for Rich and his coworkers -- being showered with buckets of cash is a pretty good arrangement.
One of Rich's company's projects was one of the biggest initiatives in MegaCorp's history, all arising from the need to integrate some software built by a company that MegaCorp had acquired a year earlier.
Originally, the plan was to retrofit all of the existing software and come back and revamp everything in "2.0." However, in a rare sensible decision by MegaCorp, the company quickly realized that a total rewrite of some components would be faster, which meant some legacy software would be put on the path to retirement at the same time. This is how Rich and his team became involved, because they maintained (but were not the original developers of) one of the components in question, which I'll refer to as MegaLink.
Now, the MegaLink source code was positively ancient. It was the definition of "legacy," as were the tools required to work with it.
It could only be compiled by a version of 32-bit Windows and only if you had Visual Studio 2005 without any service packs installed. Also, because it was written in C, rather than .NET as one might think, most developers didn't have any interest in touching it, let alone risk breaking it. Rich's boss, who was the primary maintainer of the code, once admitted that even he didn't fully understand how parts of MegaLink worked. "There's this black box in the center and I just sort of build everything else around it," he explained once.
So, while MegaLink's maintenance was great for generating positive cash flow, Rich's boss jumped at the chance to start fresh.
But that meant the new code and old code found in other components -- both maintained in-house and by other development teams at other small companies -- would have to be changed to mesh with the new system. Responsibility of this task fell onto Rich's coworker, Ken.
A few months into the project, Rich wandered into Ken's office to ask him about something. As they conversed, Rich noticed a command prompt that was running a script on Ken's monitor. The title of the window revealed it to be "runJenkins.bat." Rich cocked his eyebrow at this, thinking, "They have a Jenkins server, and it doesn't live on his workstation."
"RunJenkins.bat? What's that for?"
"Oh, it's part of MegaCorp's build process."
"Wait, they run a CI server as part of their build process?"
"Yup. The script executes a build job, which executes other build jobs, which execute other build jobs and so on. In total, there are a couple dozen tools and projects that have to be built."
Abusing a CI server as a build system was a disaster in the making, but amazingly, that wasn't the worst part. Rich was still staring at the command prompt, wide-eyed, when the script ended with "BUILD FAILED" and a runtime around 11 minutes.
"Hey, the build just failed," Rich said. Without missing a beat, Ken Alt+Tab'd over to the prompt, hit Up and Enter to run the script again, then Alt+Tab'd back to the window he'd just been using.
Again, Rich blinked. "Aren't you going to investigate the failure? I mean, it's just going to fail again."
Ken shrugged. "Their developers told me that you have to run it three or four times before it succeeds."
"Three or four times?"
"Yeah, apparently the web of dependencies between the various tools and projects is so complicated that the same things get repeatedly rebuilt. The dependencies also aren't fully defined, which means that the first few builds fail because some jobs expect other things to have been built, but they haven't yet. A new build doesn't wipe away the output from the previous build, so you just keep rerunning it until all the dependencies are satisfied and the build succeeds."
One can only imagine the horrified, slack-jawed look on Rich's face while his brain struggled to process what he'd just heard. Questionable build processes on other MegaCorp projects were somewhat expected, but nothing remotely as bizarre as this.
Finally, after a few moments, his brain became capable of articulating speech again.
"You do realize that this is the definition of insanity, right? Repeating the same thing over and over and expecting a different result?"
"Yes, but eventually it does have a different result."
"How long does a successful build ultimately take?"
"About 17 minutes on this machine. On their machines, it's about an hour. But you only have to run it when you first pull the code, and after major changes."
"I assume they have plans to fix this, right?" Even before the words left his mouth, Rich was pretty sure that he already knew the answer.
"Yeah, whenever they have copious amounts of free time."
"So in other words, never."
"Pretty much."
Rich shook his head and returned to his office to resume working on his portion of the project, which thankfully had a build process that was actually sane.
About the Author
Mark Bowytz is a contributor to The Daily WTF. He has more than a decade of IT experience and is currently an analyst for PPG Industries.