Friday, August 10, 2007

Buffer Overflows Are So Twentieth Century

At one of the companies I worked for in the past, the engineering team was able to agree on a set of standards that met the criteria outlined in my previous post. These standards were short, usually 1-2 page documents. As this was at least Year 3 BW (Before Web), the documents were stored in Lotus Notes.

One of the standards that we quickly agreed on was titled "No unsafe string functions." That is, we banned (except for the very occasional exception case) the use of C-language functions such as strcpy, strcat and sprintf that can cause buffer overflow. We wrote a series of small safe functions that would make sure that buffer overflows would not happen. These functions all took a maximum buffer length as well as the target buffer pointer. We didn't want to convert to std::string because there had been a number of cases where some engineers didn't understand their usage and had caused very serious performance problems.

I wrote a Bourne Shell script to search across the appropriate files for all occurrences of the unsafe functions. I ran this once a week or so. Once in a while, I would find an incorrect usage that had crept in, but usually all the engineers understood the need for such a standard and would adhere to it.

A few years later the company was acquired and a new group of engineers was brought on board. We started over again with the standards process, but something strange happened with this particular standard. This new group could not accept it. I never fully understood why. I re-wrote the standard a number of times and had a long email correspondence with the leader of the group (who I have great respect for). However, those of us on the original team were unable to convince the newer members of the need for this standard.

A year or so later, one of the application engineers that used our product came to me and reported a crash. The product that I worked on was a server product, and crashes were taken seriously. I analyzed the crash and found it was caused by the use of one of those unsafe functions. I fixed the problem. Luckily the crash did not happen at a customer site, so we were able to prevent it before the particular application went live.

I then re-wrote the standard (yet again), and sent a pleading email to the leader of the group asking if they could now accept it.

They could, and did.

And now they understand.