It is a mistake to consider the prime characteristic of high-level languages to be that they allow us to express programs merely in their shortest possible form and in terms of letters, words, and mathematical symbols instead of coded numbers. Instead, the language is to provide a framework of abstractions and structures that are appropriately adapted to our mental habits, capabilities, and limitations. – Niklaus Wirth [Ref]
Okay, now that I’ve got the framework for my argument in place, I am ready to start in earnest.
We have reviewed what case sensitivity is, and what the options are. How should we decide which one to use?
I think that case-preserving, case-insensitivity is, almost always, the optimal behaviour. This may be somewhat controversial, so let me explain my position.
Addressing the Arguments Against
I’ll start by addressing the reasons against case-insensitivity – that is, for case-sensitivity.
Simplicity of Development
One argument for case-sensitivity, in internationalised software, is that it neatly side-steps all the clumsiness and complexity of dealing with case issues in multiple characters sets and locales. It makes the programmer’s job a lot easier by not trying to find the corresponding upper- (or lower-) case character from the user’s input.
It is also slightly more efficient, with less effort (both computational and development) spent on processing the characters.
I can understand this argument, and I have a certain amount of sympathy for the programmer who is trying to support internationalisation of their software – I’ve been there myself.
However, this argument, to me, sounds too similar to the “elegance and efficiency” argument about Reverse Polish Notation that I described before. Making things slightly easier for the programmer, at a cost of ignoring the capabilities of the user, is a short-term solution.
Perhaps during the initial development of Unix in the 1970s, the extra effort involved in finding a canonical form of a filename – even in ASCII, rather than Unicode – was too slow, but this is no longer an adequate argument.
Conciseness of Identifiers
Another argument for case-sensitivity is that it makes it possible to use the same word as two different identifiers. A common example is to let an instance be named after a class, but with different case.
class Foo foo("bar");
While I will admit to having often adopted such a coding style in C++, I find it very hard to defend this practice.
We have seen before that similar sounding variable names in the same scope is deplorable, and the opportunity for confusion here is enormous. The gap between the concepts of an instance and the class is greater than a simple case-shift of the initial letter might indicate. The reason for the existence of the instance should be included in the identifier.
Yes, that may mean typing in a few more characters per identifier. Some programmers seem to find that abhorrently inefficient. I don’t share such views. The time taken to add a few more characters aids the readability of the code. Wirth’s quote describes my position well.
Addressing the Arguments For
Wirth asks for software to be adapted to our “mental habits, capabilities, and limitations”. When we examine these properties, as it comes to case, we can see that English-speaking humans treat upper- and lower-case to be both very similar, and yet slightly different. This similarity suggests that the software cannot afford to be case-sensitive. This difference suggests that non-case preservation will hinder understanding.
Upper- and Lower-Case are the same…
The capabilities and limitations of English speaking humans include getting easily confused between two strings that are very similar to our language-processing brains – even where they are clearly different to a character-encoding-processing machine.
Suppose I declare that “KEANU REEVES interferes with elephants.”. Can I claim that I was not libelling the wooden Hollywood actor, purely because I spelt his name in all-caps? Can I claim to a judge that
KEANU REEVES was a undeclared identifier and therefore the entire statement was semantically meaningless? Of course not. The English language is flexible enough to recognise that “KEANU” and “Keanu” are the same name. Even mail addressed to “KeAnU rEeVeS” will be delivered to the correct person.
If a computer can also disambiguate this accurately, it should do so too. If the software fails to adapt to the similarity of upper- and lower-case, it leads to frustration.
For me, an example of this frustration appears in both Python and PHP. Each of them have the same killer combination: they are case-sensitive with identifiers, but they are scripting language that do not resolve identifiers at parse-time. I consistently fall for the same traps. A distressingly large percentage of my debugging time is spent correcting mistyped identifiers – often not detected until several minutes into a test run. The most common mistyping I make is incorrect capitalisation. Of those, the two most common capitalisation errors I make are: HOlding DOwn THe SHift KEy TOo LOng, and being inconsistent in CamelCasing the term “fileName” (I never did resolve satisfactorily whether it was one word or two!)
I do not feel that the punishment I receive for this type of error fits the crime. The language should be case-insensitive and forgive me these transgressions.
Upper- and Lower-Case are different…
While upper- and lower-case versions of text may be considered similar, capitalisation in English is not completely irrelevant.
Capitalisation serves a useful purpose in clarifying proper nouns, acronyms and initialisms. Where identifiers are precluded from containing spaces, capitals also provide hints to word-breaks.
Smashing case, whether by technology (e.g. FAT16) or by convention (e.g. Unix programmers generally eshewing capital letters – and most vowels!) unnecessarily hinders the ability to produce easyToRead identifiers.
For the large number of computer users who speak English or any of the other Germanic, Italic or other minuscule-supporting language families, case is an integral part of the way they think. The difference between ‘A’ and ‘a’ is minuscule (pun intended!) compared to the difference between ‘A’ and ‘B’.
Ideally, our software should be adapted to this mind-set. Case-preserving, case-insensitive software is the best way to do this.
Even though case-transformation is not trivial, it is still easy enough. Computers, long ago, became powerful enough to perform these operations practically for free – in terms of run-time cost and development cost.
There is no longer any excuse for making humans learn and handle the quirks of the way computers store upper- and lower-case characters. Instead, software should handle the quirks of human language.
It is time for integration of the cases! Case-Preserving Case-Insensitivity: equal and yet different!