Wednesday, October 24, 2007

Glosses on Linus Torvalds Coding Style

A gloss is a marginal note (see picture in the Wikipedia entry for an example). While one can't very well annotate someone else's web page, I will add my voice to the chorus of those who have commented on Linus Torvalds coding style document.

I strongly agree with almost everything said in this document, and have been glad for the "backup" when I've had to argue a point with some of my fellow workers. There are, however, a couple of items where I diverge.

The first point that I would gently argue is that 8-space tabs are too deep. I haven't "been looking at [my] screen for 20 straight hours" for a couple of years, but at the very outset I worked with 4-space tabs, and I find them very readable. I haven't worked with anyone who has strongly defended 8-space tabs. I will admit that I have had to try to persuade people to switch from 2-space to 4-space tabs, but that was back in the Apple ][ Basic days when people were trying to keep their programs as small as possible (remember cassettes?). I don't know of any readability studies on this topic, and would be interested to see them.

My second argument (or shall I say "disagreement" to be less disagreeable...) is with his distaste for mixed-case names. I find mixed-case easier to read (maybe it's just me), but, more importantly, I find it a lot easier to type than underscores. I know I'm not the best touch-typist in the world, but I always find myself looking down to make sure that my ring-finger is indeed hitting the dash/underscore key and not one of its neighbors. Using camelCase keeps my fingers on the home row much more, makes my typing faster and reduces the number of errors.

These are just friendly disagreements, however, and don't diminish the fact that I'm glad this document exists, and is available to us all.

Saturday, October 20, 2007

Pythonics Anonymous

I think I need a twelve-step program to wean myself from Python.

When I first looked at Python a few years ago I was put off by the fact that white space was significant. I also thought that having to do "self." to indicate object fields (vs. global variables or parameters) was just noise. I got tired, however, of always looking up Perl syntax, and I really didn't care for the Perl TMTOWTDI (I prefer one good way to do things, not many). I then came across Eric Raymond's article and decided to learn and use Python.

Now, after a few years of working with the language I appreciate the fact that white space is significant (no more bracket-placement wars!). I also appreciate the cross-platform-ability and the fact that (even though it is open-source) it isn't GPL, which has allowed us to use it at a company I used to work for. I like having the various supporting packages (although the minidom has a problem when pretty-printing XML that I had to work around).

However, I now find myself thinking that I'd better go back to Java. I was put off Java because the reality took a really long time to catch up with the hype. Remember "Write once, run anywhere?" (which was really "Write once, debug everywhere?") It also seemed too heavyweight to me. I liked being able to run the Python interpreter in command line mode and experiment without having to fire up an editor, create files, save them and run them. It was also satisfying to be able to be able to write a one-line "Hello World".

It seems now, however, that the world (at least the world of enterprise software) has decided that It's A Java World. I guess I better get back on board. I know I'll fall off the wagon occasionally, but it's time to put Python back in the corner and get back to business.

Tuesday, October 9, 2007

Data Always Goes In A Database, Right?

Not necessarily.

I have worked on a number of software products where data (usually metadata) is stored in text or binary files. The code opens the file, reads from the file and parses the text (if necessary) into the structures used internally by the code. Over the years, the question has been asked more than once: why don't you put this data into a database?

There are a number of reasons not to use a database. First, all the data needs to be in memory for performance, and there is no advantage to keeping parts of it out on the disk. Second, the data is usually not appropriate for being "table-ized." That is, the data is complex and tree-structured rather than row-structured. I know that one can always flatten data into multiple tables with the appropriate keys and joins, but parsers built with tools such as Yacc and Lex will always be faster than doing multiple database joins and reads.

A third reason is that upgrading the data in the database requires scripts to add or modify tables or columns (deletes not being done to reduce the potential for referential integrity problems). Parsers can recognize a version number and do an upgrade-on-read, which doesn't require another program to run to perform the upgrade.

Fourth, files are easier for end-users to handle. For example, if a user needs to send some of his data to tech support, it's much easier for him to package up one or more files, rather than having to run some database dump utility and send the dump file. There is also no issue with (once having that database dump) the possibility that tech support does not have the particular version of the database software that the user is running.

A fifth reason (for text files at least), is that in a pinch the file can be opened and edited in a text editor if there are problems with it. With a database you have to run the appropriate database client software, then tease out the table relationships to understand where changes need to be made.

When all you know how to use is a hammer, everything looks like a nail.