OddThinking

A blog for odd things and odd thoughts.

The Case for Case-Preserving, Case-Insensitivity


It is a mistake to consider the prime characteristic of high-level languages to be that they allow us to express programs merely in their shortest possible form and in terms of letters, words, and mathematical symbols instead of coded numbers. Instead, the language is to provide a framework of abstractions and structures that are appropriately adapted to our mental habits, capabilities, and limitations.Niklaus Wirth [Ref]

Introduction

Okay, now that I’ve got the framework for my argument in place, I am ready to start in earnest.

We have reviewed what case sensitivity is, and what the options are. How should we decide which one to use?

I think that case-preserving, case-insensitivity is, almost always, the optimal behaviour. This may be somewhat controversial, so let me explain my position.

Addressing the Arguments Against

I’ll start by addressing the reasons against case-insensitivity – that is, for case-sensitivity.

Simplicity of Development

One argument for case-sensitivity, in internationalised software, is that it neatly side-steps all the clumsiness and complexity of dealing with case issues in multiple characters sets and locales. It makes the programmer’s job a lot easier by not trying to find the corresponding upper- (or lower-) case character from the user’s input.

It is also slightly more efficient, with less effort (both computational and development) spent on processing the characters.

I can understand this argument, and I have a certain amount of sympathy for the programmer who is trying to support internationalisation of their software – I’ve been there myself.

However, this argument, to me, sounds too similar to the “elegance and efficiency” argument about Reverse Polish Notation that I described before. Making things slightly easier for the programmer, at a cost of ignoring the capabilities of the user, is a short-term solution.

Perhaps during the initial development of Unix in the 1970s, the extra effort involved in finding a canonical form of a filename – even in ASCII, rather than Unicode – was too slow, but this is no longer an adequate argument.

Conciseness of Identifiers

Another argument for case-sensitivity is that it makes it possible to use the same word as two different identifiers. A common example is to let an instance be named after a class, but with different case.

e.g. class Foo foo("bar");

While I will admit to having often adopted such a coding style in C++, I find it very hard to defend this practice.

We have seen before that similar sounding variable names in the same scope is deplorable, and the opportunity for confusion here is enormous. The gap between the concepts of an instance and the class is greater than a simple case-shift of the initial letter might indicate. The reason for the existence of the instance should be included in the identifier.

Yes, that may mean typing in a few more characters per identifier. Some programmers seem to find that abhorrently inefficient. I don’t share such views. The time taken to add a few more characters aids the readability of the code. Wirth’s quote describes my position well.

Addressing the Arguments For

Wirth asks for software to be adapted to our “mental habits, capabilities, and limitations”. When we examine these properties, as it comes to case, we can see that English-speaking humans treat upper- and lower-case to be both very similar, and yet slightly different. This similarity suggests that the software cannot afford to be case-sensitive. This difference suggests that non-case preservation will hinder understanding.

Upper- and Lower-Case are the same…

The capabilities and limitations of English speaking humans include getting easily confused between two strings that are very similar to our language-processing brains – even where they are clearly different to a character-encoding-processing machine.

Suppose I declare that “KEANU REEVES interferes with elephants.”. Can I claim that I was not libelling the wooden Hollywood actor, purely because I spelt his name in all-caps? Can I claim to a judge that KEANU REEVES was a undeclared identifier and therefore the entire statement was semantically meaningless? Of course not. The English language is flexible enough to recognise that “KEANU” and “Keanu” are the same name. Even mail addressed to “KeAnU rEeVeS” will be delivered to the correct person.

If a computer can also disambiguate this accurately, it should do so too. If the software fails to adapt to the similarity of upper- and lower-case, it leads to frustration.

For me, an example of this frustration appears in both Python and PHP. Each of them have the same killer combination: they are case-sensitive with identifiers, but they are scripting language that do not resolve identifiers at parse-time. I consistently fall for the same traps. A distressingly large percentage of my debugging time is spent correcting mistyped identifiers – often not detected until several minutes into a test run. The most common mistyping I make is incorrect capitalisation. Of those, the two most common capitalisation errors I make are: HOlding DOwn THe SHift KEy TOo LOng, and being inconsistent in CamelCasing the term “fileName” (I never did resolve satisfactorily whether it was one word or two!)

I do not feel that the punishment I receive for this type of error fits the crime. The language should be case-insensitive and forgive me these transgressions.

Upper- and Lower-Case are different…

While upper- and lower-case versions of text may be considered similar, capitalisation in English is not completely irrelevant.

Capitalisation serves a useful purpose in clarifying proper nouns, acronyms and initialisms. Where identifiers are precluded from containing spaces, capitals also provide hints to word-breaks.

Smashing case, whether by technology (e.g. FAT16) or by convention (e.g. Unix programmers generally eshewing capital letters – and most vowels!) unnecessarily hinders the ability to produce easyToRead identifiers.

Summing Up

For the large number of computer users who speak English or any of the other Germanic, Italic or other minuscule-supporting language families, case is an integral part of the way they think. The difference between ‘A’ and ‘a’ is minuscule (pun intended!) compared to the difference between ‘A’ and ‘B’.

Ideally, our software should be adapted to this mind-set. Case-preserving, case-insensitive software is the best way to do this.

Even though case-transformation is not trivial, it is still easy enough. Computers, long ago, became powerful enough to perform these operations practically for free – in terms of run-time cost and development cost.

There is no longer any excuse for making humans learn and handle the quirks of the way computers store upper- and lower-case characters. Instead, software should handle the quirks of human language.

It is time for integration of the cases! Case-Preserving Case-Insensitivity: equal and yet different!


Comments

  1. Good show! The core of your argument, for me anyway, lies in these quotes:

    If a computer can also disambiguate this accurately, it should do so too. …

    Instead, software should handle the quirks of human language. …

    There is no longer any excuse for making humans learn and handle the quirks of … computers.

    If the role of the computer is to do our bidding, and to make our lives easier, this is indeed the direction in which they must go. Remind me to rant later about why we are slaves to the machine, and why computational linguistics is fundamental to this changing.

    In the specific case of software development, we are pandering to the compiler’s syntax and semantics, despite high-level languages being explicitly aimed at pandering to our syntax and semantics.

    The ideal analysis is a “linguistic” one (where the “language” is still code-like, not free text) that performs tasks such as part-of-speech tagging (noun=variables/classes, verb=method, etc), entity classification (differentiate between objects, classes, literals), and coreference resolution (when two potentially different tokens, eg. two different capitalisations, actually refer to the same thing). These also give you rapid prototyping and code-writing assistance: by identifying as-yet-unwritten classes, etc.

    There has been some work related to capitalisation in computational linguistics: on restoring case to uncapitalised or noisy-cased data; on identifying different orthographic versions (case differences, abbreviations, etc) of named entities (people, places, etc)… most of it can be done accurately, too.

  2. Pfft. You’d expect someone called “Casey” to support such an argument. Come clean – you’re just a sock puppet Julian’s been developing at the same time as his argument framework aren’t you ;)

    I’m curious about the etymology (if that’s the right word to use) history of case sensitivity. Did it originate with the first computers to use ASCII because they couldn’t afford the resources to fold case? Was the shift key hard enough to use for novice typists (like language designers) that it needed to have a great deal of significance?

    I find it interesting that people talk about “the temp variable” and not “the variable temp.” We don’t refer to you as “The Julian blogger,” for instance. I also think that if language designers were primarily concerned with providing an English-human-friendly language (adapted to our mental habits) then they would have enforced capitalization of identifiers of all identifying nouns (methods, functions, variables) so that you would have to refer to “the variable Temp.” I’m surprised that I can’t find an example of this in computer languages.

    Possibly it’s because language designers didn’t have your case handling manifesto to draw upon.

  3. You make a compelling ..err… case.

    I think I would say that case insensitive/preserving is more important in a filesystem than it is in, say, a programming language.

    As far as programming languages go, besides case insensitivity I think there are other, more productive ways in which languages can be designed to make them more support their human overlords.

    The use of English-only keywords must form something of a speedbump in the path to comprehension by a non-native English speakers. English speakers have a leg-up on understanding “while x do y”, but I’m sure others don’t.

    Solving this problem sounds tricky though, and in the meantime case insensitivity would be a good step forward.


  4. Even mail addressed to “KeAnU rEeVeS” will be delivered to the correct person.

    Unfortunately.

    A distressingly large percentage of my debugging time is spent correcting mistyped identifiers – often not detected until several minutes into a test run. The most common mistyping I make is incorrect capitalisation.

    Perl has use strict;, which will complain about the use of any undeclared variables at compile time. Arguably, other types of identifiers are still only runtime-checked, but I can’t remember the last time I had to fix a misspelt function call.

    Smashing case […] unnecessarily hinders the ability to produce easyToRead identifiers.

    Yuck. YouDon’tWriteLikeThis. You write like this. This_is_easy_to_read, thisIsAnythingBut.

    And if I chose to write CONSTANTS_UPPERCASE, I meant to signify something by the all-uppercase which would be lost if the programming language would just as soon accept constants_uppercase to mean the same thing.

  5. Aristotle,

    Perl has use strict;, which will complain about the use of any undeclared variables.

    The use of static versus dynamic typing is another controversial topic, which I also have some opinions about, but I won’t get distracted here.

    I can’t remember the last time I had to fix a misspelt
    function call. [...] This_is_easy_to_read

    I have followed both coding standards – camelCase and underscore. The conventions seem to follow language-boundaries, so I try to adapt as I cross over. It is not always easy.

    In fact, it is funny you should mention that, because the last time I corrected a function name, it was because I had just moved from Python (local convention is to shove lower-case words together into function names) to PHP (local convention is to separate lower-case words with underscores) and I slipped up.

    I’d still have the same dilemma between filename and file_name!

    [...] I meant to signify something by the all-uppercase which would be lost [...]

    You haven’t lost the ability to signify this: you can still type in all upper-case (case-preserving). Just as I still would spell your name Aristotle, even though you would probably answer to ARIsTOTLE.


  6. You haven’t lost the ability to signify this: you can still type in all upper-case (case-preserving).

    Sure, and it’s just a whim at that point.

    Maybe it helps to know that I originally came from such a case-preserving, case-insensitive background: Pascal in MS-DOS, and later Windows. I spent 6 years living and breathing Pascal, earned my OOP chops in Turbo Pascal, wrote my first GUI apps in it, did low-level graphics demos in Pascal. Pascal, Pascal, Pascal. Filenames weren’t case-preserving, but they sure were case-insensitive. So was COMMAND.COM. In the world I grew up in, case didn’t matter.

    When I first came into the world of C and Perl, the idea of case sensitivity was bizarre; I would have argued at the time that it was stupid for many of the same reasons you line up. But thinking back to my habits of then, I can’t see why I though it was better, other than for permitting some sloppiness. Inconsistency ruled the day, and I spent many an hour (in aggregate) fiddling with the casing of things, writing things this way today, then another tomorrow, only to decide yet differently the day after.

    All pointless. I didn’t gain anything from any of that.

    Then I trained myself to the Unix convention – because in this annoying, inconvenient new world of case-sensitivity, I had to make up my mind –, and I never wasted another second on it since.

    I am unconvinced about your proposition because I actually have experience with living it day-in day-out for years.

    I’d still have the same dilemma between filename and file_name!

    I suppose this would solve itself if you could write filename or fileName or FileName or FILENAME and have it mean the same thing regardless, but is that a really a foundation on which to build an argument in favour of case-insensitivity? You’ll only end up deciding to write it a new way every day, if you can; trust me.

    Personally, I’ve standardised on “filename,” which is the easiest one to remember, and comes with the pleasant bonus of requiring the fewest keystrokes of any alternative, thereby also appealing to my laziness.

    In any case, it’s simple to show that a) ample spacing is important for readability b) casing is not. That would make underscore-spacing objectively better than CamelCasing; and if casing matters little and you’re underscore-spacing things to make them better readable anyway, you might as well write them all-lowercase so you don’t waste time with silly things like what the “correct casing” is.

    F.ex., whenever I find myself writing in CamelCase because I’m working in a system which has that convention already established, I find myself agonising about the casing of initialisms. get_url looks just fine, whereas GetUrl looks weird – but GetURL is no better, because now there’s a special casing convention. What about acronyms? Is it MakeAjaxPostLink or MakeAJAXPostLink? Or maybe, because that’s really a HTTP verb, it should be MakeAJAXPOSTLink? Ugh, ugly, maybe MakeAjaxPOSTLink? Compare to make_ajax_post_link, which looks just fine and didn’t require any thought on my part.

    So from where I sit, the Unix-ish identifier naming style appears to be better beyond a mere question of style: it’s more readable and makes consistency easy.

    That’s why I use it.

  7. PS.:

    The use of static versus dynamic typing is another controversial topic, which I also have some opinions about, but I won’t get distracted here.

    There’s nothing statically typed about use strict;. All it does is check that you’ve declared your intent to use the identifier name in question within the scope where it was found. This keeps you from spawning a new global variable every time you make a typo, thereby turning typos into bugs (AKA the BASIC Syndrome).

  8. Aristotle,

    Last things first: Sorry about the misunderstanding over strict;. I am clearly not very familiar with Perl.

    Now, I will go off on an apparent tangent before coming back to address some of the quite interesting comments you made…

  9. I wrote that “case-preserving, case-insensitivity is, almost always, the optimal behaviour”. Let me give some examples of exceptions to the rule.

    An obvious exception is string handling libraries. They should offer the choice to the programmer (e.g. offering both case-sensitive equality and a case-insensitive equality operators).

    A similar exception is when you are referring to outside interfaces to case-sensitive systems, like C libraries.

    For language identifiers, I am a big fan of the Dictionary Definition approach; let the declaration of your variable determine the correct case for the variable. All other references can be corrected by the IDE. The argument here is that, while it aids readability to be able to set the case correctly, it doesn’t aid readability to change the case each time you use it. It has all the benefits of being generous to the typist, with all the advantages of meaningful case selection.

  10. My comment got rather long so I’ve posted it on my website as a blog entry. You can reply below, however.

  11. My ideal solution to your dilemma, Aristotle, about wasting time on prevaricating about variable names, is to use an IDE that supports the Dictionary Definition approach. That way, it is relatively simple to change it in one place, without lots of lost effort.

    Of course, the downside is that I would use the time saved to spend more time prevaricating over word-selection in the variable names! I look forward to seeing some of the features of “refactoring” editors becoming more mainstream in the future.

    You’ll only end up deciding to write it a new way every day, if you can; trust me.

    Ah, but that is happening anyway!

    Personally, I’ve standardised on “filename,” which is the easiest one to remember [...]

    Ah, that’s what I did… until I worked on a class that contained a filename, a pathName, a userName etc. The lower-case “n” stuck out like a sore thumb, and left me prevaricating ever since. (Feel free to consider this with underscores here, it doesn’t help!) I am sure I will resolve this soon!

    That would make underscore-spacing objectively better than CamelCasing

    I agree with you. If I was to think through which I thought was truly better, I would go with underscores.

    However, I think the “be consistent with the conventions” rule outweighs the “write clearer” in this case. If I am working in a framework where everyone has chosen a stupid coding standard, I am better off following it than trying to fight it.

    Next chance I have to set the conventions for a whole new language, I’ll use the underscores, promise!

    I’ll probably modify it slightly, though. I would prefer get_URL or make_AJAX_POST_link, confident that if I type it in lower-case as a short-cut, my IDE will correct it.

    So from where I sit, the Unix-ish identifier naming style appears to be better beyond a mere question of style: it’s more readable and makes consistency easy.

    I think we are on the same side of the fence here.

    You are saying make_ajax_post_url is more readable than makeAjaxPostUrl. I think you are right. I think your solution is a better one in a case-sensitive environment.

    In a case-insensitive environment, I think make_AJAX_POST_link is even more readable.

    Furthermore, without case-sensitivity, consistency is even easier because it doesn’t matter – especially with an IDE that corrects your case automatically.

    The issue of consistency over the use of filename versus file_name still stands. At least languages like Pascal, Perl and Ada leave the opportunity for the IDE to warn you.

  12. Aristotle:

    YouDon’tWriteLikeThis. You write like this. This_is_easy_to_read, thisIsAnythingBut.

    Just for the purpose of comparison, it should be pointed out that AppleScript allows applications (but not scripters AFAIK) to define identifiers that have spaces in them. This enables code to be written that is quite readable:

    set tmp to the path to the temporary folder
    

    Here “path to” is a function, and “temporary folder” is a constant. The “the”s are sprinkled in for readability and have no semantic significance.

    Unfortunately AppleScript suffers from a baffling array of subtle and difficult-to-debug problems, some of which can be attributed to strange parser problems as a result of allowing spaces in identifiers. So I am at this point reluctant to pursue readability (meaning similarity to spoken language) as a goal above all others. Or to put it another way, productivity of a computer language is not measured by the similarity to spoken language.

  13. alastair:

    Unfortunately AppleScript suffers from a baffling array of subtle and difficult-to-debug problems, some of which can be attributed to strange parser problems as a result of allowing spaces in identifiers.

    Yes. Using literal spaces makes it nearly impossible for a human reader to tell where the actual tokens start and end, which is why saner programming languages do not allow them in identifiers.

    I don’t know how this affects the argument that spacing is important though. Which you can achieve with underscores, as has been practiced in nearly every programming language since the dawn of time.

    Julian:

    I would use the time saved to spend more time prevaricating over word-selection in the variable names!

    That I see as a good thing. I spend quite a bit of effort on my identifiers; most are easy, but I always take the time to name them properly, even if a difficult one takes me 5 minutes.

    (One rule I’ve stumbled into by doing this is that if you have a temporary variable which you have a really hard time giving a descriptive name to, it is likely that this variable is either not necessary, or that other stages of the computation should be held in temporary variables instead (i.e. you need more temporaries). In contrast to the case-fiddling of times past, my code has clearly benefitted from this.)

  14. OK, so mostly I’m for Julian’s proposal. It would make it just a little harder to write code that is hard to read, and that’s an excellent outcome. It even resolves an issue that comes up when you’re using classes from several libraries: class use is now clearer. Libraries can and do use classes of the same name, but different case (often they use the exact same name. I’ve seen one program which defined it’s own System class, which was quite tricky to use in Java at the same time as the built-in java.lang.System [for those who don't know, java.lang is imported by default in all Java programs]). With Julian’s proposal, these clashing classes would have to be fully qualified in order to be used, which greatly improves the readability of the code at the same time.

    My issues with the proposal are minor, but I’m sure you’ll agree would need addressing:
    1) The language the code is written in now matters. Previously, you could just search and replace (even using notepad) any word you didn’t understand in someone else’s source, and it might start to make sense. Now, you need to know what the human language is, so that you know which case-transform rules apply. This can make getting even a rudimentary understanding of someone else’s program that much harder. Hell, even among the native english population, getting capitalization correct can be tough (the filename example rears its head once more). This must be compounded for people who learn english as a second language.

    Consider also what it means for me to write my code in one language, but making calls to code that was written in another. What case transform rules apply?

    A solution to this would require some kind of explicit markup on each sourcefile (or method!) saying the language that was used, and developers to be vigilant for these. <?xml?> tags and their ilk could help here.

    2) Even with these metatags, most development would now require an IDE that is case-magic aware. The compiler can (and should!) generate warnings about mismatched variables, but only an IDE can prompt the user for the appropriate fix, and apply it (otherwise, we’d be arguing for case-smashing). Editors like emacs (which have a great many things, but no on the fly semantic checks) could become redundant in this new era.

    There’s not a lot that can be done about this. The developer could manually correct each and every warning that arises, but lazier/poorer typists will come to need the IDE’s autocorrecting crutch. In fact, there’s some evidence (membership or subscription required, sorry) that all developers will.

    3) This does nothing to eliminate the use of bad or similar names. No aspect of the “no, num and number” example would be addressed by moving to case-insensitivity.

    Using underscores instead of camel case alleviates, but does not resolve, the similar names issue, but it still suffers from transposition errors (e.g. cars_pace vs. car_space. Finding a genuine example in a public API is left as an exercise for the reader).

    Another possible step towards making bad code not compile is banning names that are one letter different (for any transpose, insert, delete, or change). This would make it very difficult to accidentally send all the filenames to the log, when just oneFilename would do (the kind of situation where static type checking doesn’t help), but has the side effect of banning single letter variables!

    4) Generated code should not get this lenient treatment. Language-enforced formatting rules like Haskell’s offside rule can be turned off for a generated source file (unlike Python, as far as I can tell). In contrast, case sensitivity should be turned on when processing generated source. This is because errors like these may point out other, more serious and less carefully thought through, issues in the generator.

  15. Richard,

    Re: 1) Having to know the (natural) language the identifiers use.

    Interesting point. It sounds like another obligation when dealing with case-insensitivity and Unicode.

    It only applies in a small number of cases (editing code in both Turkish and English, for example), and a sensible default would cover the vast majority of languages.

    My gut is telling me that I should pass this off as a natural consequence of internationalising the file. However, I can’t find an example in a case-sensitive language, like Java, where having such a property is necessary. (I wonder, though, how case-insensitive search-and-replace works in the Java IDEs.)

    It seems alien to require this, because we don’t do it in most ASCII-based languages, but HTML has this feature. Python also requires encoding declarations, which are somewhat similar. (I am unfamiliar with any such requirement in Java.)

    Re: 2) The need for IDEs.

    The compiler can (and should!) generate warnings about mismatched variables

    That’s a bit stronger than I would say. I would desire that the compiler offer such an option, but I wouldn’t make it mandatory. Ada worked without it.

    Editors like emacs (which have a great many things, but no on the fly semantic checks) could become redundant in this new era.

    What I am about to say is based on the latest version of Emacs as of 10 years ago, so take it with a grain of salt. Emacs is a great text editor, but its language modes are (were) pretty woeful and are holding us back from what IDEs could really do for us.

    I haven’t used Eclipse myself, but I am hearing that it, plus a language built with parsing in mind, can achieve far higher levels of abstraction than Emacs alone. I plan to nostalge later about the IDE I used many years ago, to give an idea of what I mean here.

    lazier/poorer typists will come to need the IDE’s autocorrecting crutch

    I bet that’s what professional typists said when people started moving from typewriters to word-processors. The different between case-insensitivity and spell-checking (or Sunny’s complaint about intuitive software) is that I am proposing a scheme that a computer can actually get right easily and quickly.

    Re: 3 – This does nothing to eliminate the use of bad or similar names.

    You are absolutely right. This doesn’t solve the “no, num and number” problem. I didn’t mean to suggest that.

    I used that example to demonstrate that easily-confused variable name combinations should be avoided, and then tried to show that “foo” and “Foo” are an easily-confused variable name combination.

    banning names that are one letter different

    A very interesting idea, but I don’t I can support it! For example, I often pass around (xoffset, yoffset) tuples. While I think that tiny little difference between the two names is small, and easy to confuse, I can’t imagine coming up with another pair of names that is less confusing.

    Re: 4 – Generated code

    It depends whether the code is completely generated is, for example, instrumented code where the original human-written code is found surrounded by generated code.

    As an aside: My ears pricked up when you mentioned that Haskell’s offside rule can be turned off.

    I once had to generate some disgusting Miranda code, and Miranda is very similar to Haskell.

    Initially, I avoided the whole offside rule by making the code one long line and sprinkling in semicolons – they were a little-used statement separator. Eventually, I moved to correctly indented code, which was easier to read and hence debug.

    So, I assumed Python would have a similar little-used statement separator, but I checked and you were right. Python requires you to use indenting, and indenting alone, to determine scope.


  16. I haven’t used Eclipse myself, but I am hearing that it, plus a language built with parsing in mind, can achieve far higher levels of abstraction than Emacs alone. I plan to nostalge later about the IDE I used many years ago, to give an idea of what I mean here.

    Why would Eclipse be able to parse a language designed for parsability, but not Emacs?

  17. You’re right to pick me up on this, Aristotle. There’s no theoretical reason. Again, maybe in the last 10 years there have been huge advances in Emacs, so I am very nervous about making bold statements about it.

    However, the Emacs modes I saw back then were limited to some coarse features of the syntax. They didn’t have any knowledge about the semantics of the language, and could not offer much more than syntax-colouring and pretty-printing (as opposed to refactoring, incremental compilation, identifier completion, debugger integration, etc.)

    If Emacs (now) provides the infrastructure for modes to be more than that, I have unnecessarily dismissed it, and I apologise.

  18. This is getting a little off-topic, but to those who are dissin’ emacs, I refer you to semantic, which is “an infrastructure for parser based text analysis in Emacs. It is a lexer, parser-generator, and parser.”

    Now admittedly not every language mode uses this, but some do, most notably JDEE, the Java mode. Result is that you do get code completion in emacs (which I will assume does count as an “on-the-fly semantic check”).

    I will quite happily admit that emacs is showing its age, and hence requires a lot of care and feeding, but nevertheless it is a quite capable development environment.

    In no way do I think that it is “holding us back”. Frankly I don’t understand how this claim can be made of any one tool. What is stopping us (meaning the software development community at large) from abandoning tools that are found to be too constrictive?

    Which brings me to Richard’s point about requiring an IDE. I don’t get that one either. Why does a CICP language require more than (generic non-semantically-aware editor) can provide? In fact, this claim is pretty easily refuted empirically. Ada is a CICP language and can be edited with notepad.

    And while I agree with the general principle that CICP is a good thing, if the cost of this is to throw away all tools that are not semantically aware (and I’ll include all the unix generic text-file manipulation tools here) then I would happily say ‘no thanks’. At least not without other significant productivity benefits.

  19. Alastair,

    I don’t mean to diss EMACS. Last time I was doing serious development on a Unix box (1995), my chosen development environment was two EMACS windows that spawned from my login script, and no other windows on my desktop at all. (The second EMACS windows was purely to read Usenet while I waited for my application to finish running!) So I don’t hate EMACS, honest. Some of my best friends are modeless editors!

    The Semantic module looks exactly like what was missing before. Thanks for bringing this up; I hope it turns out to be popular.

    I also agree that the software development community at large can abandon tools that are too constrictive. You are right, it is not Emacs itself that is holding us back. It is (my inadequately researched claim) that people perceive the (“semantic”-less) Emacs modes as pretty cool that holds us back.

    This is a call-to-arms! We can do better! Let’s either bring our tools and programming languages up to the level I think we should come to expect, or lets abandon them and choose ones that do support they way we think. Case-insensitivity is one small part of that.

    (I fully understand that the choice of tools is not always driven by usability alone, but until we recognise we are wasting time due to poor usability issues, the sooner the tool makers will start making versions that meet our needs. I was disappointed that Java was case-sensitive – I thought the lesson had already been learned by language-designers. I want to help ensure that the next big thing isn’t case-sensitive.)

    I think Richard’s comment regarding requiring an IDE is simply a response to my call for Dictionary-based case-smashing. I see his point. Either you accept that you will probably end up with inconsistent case (which will upset Sunny) or you accept that you require an IDE that will smash-case back into a canonical form.

    I think both of those scenarios are better than case-sensitivity.

    I think you will also get other significant productivity benefits of semantically-aware editing tools, but that’s another argument, for a later date.

  20. I AM NOT SO CONVINCED THAT CASE INSENSITIVITY IS GOOD. I CODE IN OBJECTPASCAL QUITE A LOT WHERE YOU’RE FREE TO PRESS OR NOT PRESS SHIFT, BUT I STILL FIND MYSELF BEING VERY CONSISTENT AND FOLLOW THE CONVENTIONS OF THE ENVIRONMENT. IT JUST FEELS SLOPPY TO SAY TClass ONE TIME AND TCLass ANOTHER. WHEN IT COMES TO THE FAT FILE SYSTEM, IT PISSES ME OFF THAT SOME TIMES, IT MESSES UP THE CASE OF MY FILES. CASING IS GOOD, WHEN COMUTERS START TO GUESS WHAT I MEAN THEY WILL ONLY BECOME LESS PREDICTABLE.

  21. Joost,

    I trust you were having a lark when you decided to post in all-caps! :-)

    I understand your comment about sloppiness; I applaud your attempt to remain consistent. Your example is a perfect illustration that, with a CICP language, civility in identifier-names doesn’t suddenly disappear.

    The Pascal-family should be perfect for the next step – your IDE looking at the declaration of an identifier, and making all references to that identifier match the case of the declaration. That was, you can continue to be consistent, without worrying about the accuracy of the shift-key all the time – an issue that you clearly don’t want to be wasting your time with!

    What version of FAT is pissing you off? As I explained before, FAT32 doesn’t smash-case like FAT12 and FAT16 do. (Actually, that might not entirely true – I think it may smash-case in a few cases where the filename happens to follow the 8.3 convention, for historical reasons.)

    I understand your concern that computers are sometimes poor at guessing what you meant, but I point to “case” as a clear example of where computers should have no problem.

  22. Let me make one more point here.

    In an effort to be consistent in a case-sensitive world, many Unix programmers appear to take the approach “leave everything in lower-case”. I have been gently ridiculed several times for using mixed-case in filenames on a Unix system.

    Is this really the outcome we want? Case-sensitivity leading to the abandonment of upper-case?

  23. The last time FAT was bugging me was a FAT32 but admittedly, it was on a Windows 98 box, using a memory stick that travels between lots of versions of Windows and some Linuxen too. It may be a display setting or something (it’s probably an issue with Total Commander, which has an option “show old 8.3 filenames in lowercase [like explorer]“).

    With respect to IDEs remembering the casing of identifiers, I don’t feel that shift is a waste of time, just as proper spelling is not a waste of time when I’m typing into Word – it will flag spelling errors most of the time (and then some!), but if I were to come to rely on it, I would write bad language in this here edit box, for example.

  24. I’ve just noticed that it is pretty common within the Boost C++ library to use Capital C Char as a template argument:

    template <Char>
    aFunction(Char t) {
        // operate on t without worrying if it is a char or wchar
    }
    

    This is obviously an extension of the type/instance naming convention. For some reason though it seems more acceptable than using Capital Letters for a type and lower case for an instance. I can’t explain why. Perhaps because within the template you tend to view the Char as a generalisation of char. (Which is obviously not the case for types/instances)

    So is this just irrational inconsistency on my part? Or is this an an acceptable exception to the proposed CICP rule?

  25. How about no single character difference may be permitted except on the first and last character?

    Previous poster used xoffset and yoffset which cannot confused, wheras offyset and offxset is more likely for me.

  26. Alastair,

    I hope you will consider the three-month delay in answering your comment above as evidence that I wanted to give is some careful thought and thorough research, rather than pure laziness.

    The first thing I notice is that “char” and “wchar” violate the common convention to capitalise types. That’s the problem with such conventions: third-party libraries, even the in-built ones, ignore them! With CICP you’d be free to follow the convention confident that your compiler would figure it out (at least until underscores get involved).

    I don’t think I share your acceptance for this convention. I can see that the concepts base-class-of-a-subclass and type-of-an-instance both fit into the broad concept of “generalisation”, but I interpret the capital letter convention to imply type-of-an-instance, not generalisation.

    I would have chosen a longer class name: CharType, AbstractChar, AnyChar, AnyWidthChar, BaseChar, SafeChar. I dunno, there must be some term that conveys the right meaning, but I still haven’t nailed it after three months. :-)

    (I note I won’t get any objections from the foolish “language efficiency is inversely proportional to (the square of?) the length of the source-code” believers, because they won’t have to type in the longer name to use my class.)

  27. This post was quoted at Coding Horror – one of my favouritest blogs.

    I just caught up with the dozens of comments Jeff Atwood received on the issue – no doubt helped by his gentle fanning of the flames by asking people to justify their pro-case-sensitivity claims with evidence.

    I think many of the comments were already addressed here, and Jeff demolished many of the others.

    I think there were two counter-arguments in the comments that deserved highlighting:

    Tim Bray was quoted as claiming a factor of three performance improvement when an early version of XML was changed from case-insensitive to case-sensitive. That’s an astonishing anecdote – I would have expected performance improvements closer to 1%. Unfortunately, it is difficult to confirm that the “monocasing” overheads were really required.

    Shawn Oster chastised us for wasting our passion on a worthless debate. Ouch!

    My suspicion is that there are some personality types that have no difficulty at all in managing to maintain case, and they see the rest of us as “sloppy”. They don’t realise how much harm case-sensitivity causes us.

    Perhaps we should encourage them to consider case-blindness as a minor disability, like colour-blindness. Case-insensitivity them becomes an accessibility issue.

    No-one would argue that there is no need for wheelchair ramps because they personally find the stairs faster. No-one would argue that there is no need to discuss wheelchair ramps because they personally find there to be no signficant difference in speed.

    Perhaps one day the same compassion will be applied to the case-blind…

  28. My suspicion is that there are some personality types that have no difficulty at all in managing to maintain case, and they see the rest of us as “sloppy”. They don’t realise how much harm case-sensitivity causes us.

    Then maybe there’s a case to be made for treating this the other way around, retaining case sensitivity in the infrastructure but leaving tools to offer aids in dealing with it appropriately. You are already leaning heavily towards the tool-supported camp, if I remember the discussion correctly (admittedly, it’s been a while and I’ve not bothered to revisit it), no?

    After all, noone is abolishing stairs because some people have to use wheelchairs…

  29. Note that I’m opposed in no small part for a quasi-philosophical reason:

    Most human scripts actually have no notion of case – even Latin did not. The concept originated in Greek, was imported into the Romanic languages, and proceeded to spread to all of the Western world. But it does not exist anywhere else.

    This doesn’t mean no other normalisations exist, of course; other scripts have their own forms – cf. Unicode NFC vs NFKC.

    So do you bake a westernly assumption into a programming language? If not, do you avoid it by being inclusive and mandating other normalisations? If so, which? (After all, while they are normalisations, their semantics differ from monocasing, which you will have to understand and consider.)

    I think the sensible non-parochial approach is follow Do The Simplest Thing That Could Possibly Work and just punt on any normalisation entirely. Baking assumptions into tools also makes more sense on this level.

  30. The simplest thing that could possibly work for the programmer developing software would be case insensitivity since to get his program running he does not have to debug case sensitive errors.

    The simplest thing that could possibly work for the compiler developer would be to use case sensitivity so that he does not have to implement UPPERCASE() functions around each variable and declaration name.

    Unfortunately there is no such thing as the simplest solution, and Einstein was basically over-generalizing in his quote. There is no such thing as “simple”, otherwise we wouldn’t have things like swimming sperms that bop their heads on eggs and turn into 6 feet monsters at the age of 19. Try to explain that simply. Einstein was not simple either. The simplest solution would be to kill all life so that there is no complexity of life. The simplicity statements and generalizations are essentially useless.

    One can always tell a sloppy programmer from a more organized programmer:

    1. If one works in a case insensitive language and he types consistent code with consistent case, then he is an organized programmer.

    2. If one works in a case sensitive language and he takes advantage of the case sensitivity for having types like Char while having another char type, and having this_var along with this_Var, then he is a sloppy programmer. Possibly a clever programmer, but a sloppy and dangerous programmer that shouldn’t be doing what he is doing.

    The language should be case insensitive, while strongly enforcing case to be preserved. For example, the compiler should give hints that you have declared ThisThong along with ThisThOng. In some fonts, a smaller o looks so close to a bigger O that it is hard to tell. oOoOoO. Some fonts, it is easier to tell. Should software be based around subtle differences in fonts? I think not. Should rockets be buggy because of subtle difference in fonts? I think not. Do developers user different fonts? I think so. Just because the o on your screen is different than O, it doesn’t mean so on his screen.

    Case sensitivity shouldn’t affect the program when it is compiled, so it should be case insensitive, but one should be given a warning for being sloppy. Or the editor should mark it as a spelling mistake. This way, it prevents idiots and clever programmers from doing tricky things like thIs_vAr and tHis_vAr and readme and Readme. Preserving case but keeping the language case insensitive also makes the language safer, in case someone was working on programming a rocket ship and accidentally didn’t get a warning or didn’t get a notification that they have a Thisvar and a ThisVar in their program.

    I am against those tools that “pretty up” a bunch of source code at once (i.e. the tools or IDE’s pretty up code on the fly are okay but I don’t use them personally as they get in my way).

    Why are the “pretty up” tools that format a bunch of source at once bad? While the programmer is creating his source code, he creates a picture of his code in his mind (for those that have photographic memories). He remembers how his code looks, and where those lines of code are that need some repairing or updating later. With this picture in his mind of his code, he knows where to find his code. The source code pretty tools screw up the picture … because it changes the code. Say you remember that there is this one line of code which you used ALL CAPS COMMENTS in order to remind yourself that it needs refactoring. So then the source code pretty tool goes over your comments and lowercases them for you in order to pretty up your source code. Uh oh.. there goes your big warning that you had to clean up that code… and you don’t even know that the big warning has been lowercased, because the “pretty up” tool did it behind your back!

    You may never know that the pretty up tool went in there and removed your big emergency warning in CAPS because the summary it gives you when it is done might not be descriptive enough, nor will you actually go in and check to see every little change it made for you. Sometimes, the code isn’t changed drastically enough to change the picture enough or change your formatting enough.. but it is dangerous to have a tool reformat your code – it is similar to having someone else forge your signature because your were busy at the time and didn’t have time to write your signature neatly.

    It is better for the programmer to use proper indentation himself while coding, and constantly correct his bad indentations than it is to write sloppy code and allow the source code “pretty up” tool to fix his errors.

    Another thing” I have seen several people abuse the case sensitivity on UNIX, such as making a Readme file along with a README file in the same directory. This is ludicrous.

    In addition to case sentivity issues, files without extensions, such as README files without a txt extension are also ludicrous. This can cause viruses to spread since a person unfamiliar with computers may click on that README file not knowing it has execute permissions (txt files should not be executed, even if their permission’s say they can be executed.. it is a basic security measure, and linux is supposed to be secure.. and if you are going to argue about this one, then I ask why files prefixed with a dots are hidden – why not make those visible if linux is so powerful, why hide those files from smart users – of course there should be a way to make them visible to advanced users, but that is besides the point).

    Understand that I am not a Windows advocate nor am I biased, as I mainly use BSD and Linux as I am a systems administrator and web hacker for a living (although I use Windows on the desktop because most companies use it and I have to open Word files, etc.). Many programs that run in Linux are becoming more like Windows programs where one does not immediately see the file permissions, so even a readme.txt file could be executed (operating systems should prevent

    I do enjoy the fact that case sensitivity makes sloppy programmers smarten up.. for example I hate the people that write end and END and End in modern pascal (I’m a modern pascal programmer). However, it should be a compiler warning or a spelling error in the editor. I can easily tell a sloppy programmer from a organized programmer by simply looking at his source code in case insensitive languages.. I can even tell if I have written a sloppy program – which is good, because I like to know if the program I wrote was an older one when I was sloppier, so I can fix it and review it closer. In case sensitive languages, it makes the code appear to be neat – but it doesn’t point out a sloppy programmer, because the sloppy programmer had to neaten up his code anyway in order to get it running (while he could have been sloppy in other areas which are harder to tell, such as overusing pointers when he doesn’t have to, or not allocating memory correctly, etc). At least if I see a SLoppy PRogram with horribly FormatteD slOppy Inconsistent source, I know to check for other sloppiness, such as bad memory allocations.

    p.s. unfortunately the freepascal compiler writers are some of the sloppiest programmers.. they use Begin BEGIN and begin and End and end and END and their indentation is all over the place.. one space, two space, three space, no space.. with sometimes begin indented, other times begin not indented, etc. Some of the worst and sloppiest case code I’ve seen is the freepascal sources. However, the freepascal developers are a very rare case – they are sloppy with regards to case but they are good programmers. Usually, this is not the case though.

    Looking back at some of my really old source files, I can easily find which code is bad code and which code is good – because when I began programming I was sloppy with case, and sloppy with indentation. By the way, if we should be enforcing case, shouldn’t we be enforcing indentation like Python does? I’m against that too because sometimes in rare cases you want to indent things without them being tied to a bondage tab.. i.e. large programs with long parameter lists.. long parameter lists need to be split up into separate lines and sometimes the enforced indentation can ruin special situations in large programs where custom indentation needs to be used.

  31. P.S. one of the reasons underscores are considered dangerous is not because this_is_easy_to_read, but rather because __this is similar to _this and some C programmers abuse __this and _this in the same program. It is very hard to tell that __this is different than _this and it is very _sad that some programmers __use double underscores and single underscores. Very sad. It’s like they are being clever for the sake of being clever, instead of being intelligent instead. The other problem is that with underscores one could accidentally type this__like_this instead of this_like_this although that is rare (but maybe important when designing rockets, so maybe ADA should ban underscores). It is easier to see ThissMistake than this__mistake because the two esses are easier to spot than the two underscores.

  32. With regards to the underscores, it does depend on the font used too.. but since different developers use different fonts, code should not be designed in a way that subtle font differences could cause a program to be buggy. And unfortunately, a lot of C code I’ve seen does rely on double underscores. In a weakly typed and dynamically typed language like PHP, it is even worse than in C because in PHP one can define new variables on the fly, so double underscores would go unnoticed by the compiler, since there is no compiler.

  33. On the SVN mailing lists, I found the perfect example of sloppy incompetent programmers who use clever underscore tricks where clever underscore tricks should not be used:

    > QUOTE
    > Signify internal variables by two underscores after the prefix. That
    > is, when a symbol must (for technical reasons) reside in the global
    > namespace despite not being part of a published interface, then use
    > two underscores following the module prefix. For example:
    >
    > svn_fs_get_rev_prop () /* Part of published API. */
    > svn_fs__parse_props () /* For internal use only. */
    > END QUOTE

    With some fonts, one would not be able to tell that svn_fs__parse_props is an internal function only, since some fonts don’t easily distinguish two underscores from one underscore. Instead of abusing underscores for these situations, the language should have a private implementation section or the word private or priv should be embedded into the function.

    Most likely, the programmers above are abusing the underscore because their language does not allow private functions, only global or public ones – which leads to spaghetti code and one large global dangerous namespace that uses kludges to become more privatized and modular.. instead of the modularization and privatization features being built into a proper language with units or modules instead.

  34. Lars has written some long comments about case-sensitivity, and I think some of his argument deserves a consideration. On the other hand, the comments do ramble a bit, so let me summarise and respond.

    Lars starts by pointing out that it is hard to make both the compiler writers’ lives and the developers’ lives easy. While I am not touching the Einstein/sperm analogies, I agree, and I am arguing that we should swing towards making the developers lives easier over those of the compiler writer.

    Lars then describes a form of Postel’s Law when it comes to using case, which is fine. However, he associates with it a pejorative term: “sloppy programmer”.

    I think life for a programmer is hard enough already, and we should allow flexibility and “sloppiness” where we can get a machine to take care of it. I don’t care if I remember to turn off my headlights when I leave my car, because the cars I drive do it for me. I try to be diligent about checking my blind-spot when turning. Not checking my blindspot would make me a sloppy driver; not switching off my headlights is irrelevant. Developers should have the same luxury of focusing on the important things, and ignoring the trivial computer-manageable issues.

    We no longer call developers who can’t browse and tweak assembly code to be sloppy. The day is coming (or perhaps it is here?) where we won’t look in askance at a professional developer who is unfamiliar with hand-coded memory management. Similarly, we need to cast off the old idea that holding the shift key down in a consistent manner makes you a better programmer.

    Lars attributes bugginess in rockets to font-choice. That seems a bit far-fetched to me.

    He takes an interesting approach to arguing against pretty-printing software: that the original developer has a mental picture of its structure, and that changing its formatting will confuse them. I am not convinced; code is written once and read many times. Preserving weird formatting to maintain short-term consistency seems unappealing to me.

    Lars brings up a strawman against pretty-printing programs that alter the case of comments. I have never seen one do this without an explicit request, and it certainly isn’t the norm. Another false analogy to forging signatures doesn’t help his case. Lars then takes a side-track into security risks of filenames without extensions. This is irrelevant to the case discussion.

    He promotes case-sensitivity as a method of “smartening up” sloppy programmers. I reject the notion, just as I would reject the notion that combing your hair makes you a better programmer. Lars helps me make this point by pointing out that Free Pascal programmers about both good programmers and sloppy with their case and indenting consistency. Lars claims this is rare; I disagree – I just think that the Free Pascal developers would benefit from a pretty-printing tool to hide the unnecessary inconsistency between developers.

    Lars warns about the use of underscore to separate words, because double underscores are hard to distinguish from single ones. I could equally argue that it is difficult to distinguish xyziiiabc from xyziiiiabc, so I am arguing against the letter ‘i’ in variable names. Or perhaps, we could just avoid variables that only differ by indistinguishable changes. Wait, wasn’t that my argument from the beginning?

    To be fair, Lars points out a coding style that uses double underscore to indicate a variable is private (presumably in a language that doesn’t support the concept directly). I agree that is a poor style.

    Lars has unwittingly helped my argument with two trivial (or should I say “sloppy”?) errors in the naming of programming languages: He calls Free Pascal ‘freepascal’ and Ada ‘ADA’. For someone who rails against sloppy programming and the need for case-sensitivity, these should seem like important differences. Fortunately, it was clear what he was referring to, and we can forgive him these trivial linguistic blurrings. I am just asking my compiler to offer the same grace.

  35. The day is coming (or perhaps it is here?) where we won’t look in askance at a professional developer who is unfamiliar with hand-coded memory management.

    That’s actually wrong. Garbage collectors cannot and do not relieve you from managing memory. All they do is remove the necessity of assigning responsibility for freeing heap objects; this removes some segfaults, but the rest are merely turned into memory leaks instead. This makes flawed code more robust, but not less flawed.

    Even with a garbage collector, you still have to think about the lifecycles of allocated objects.

  36. Aristotle,

    This smells very much like we are agreeing except that we are using a few ill-defined terms slightly differently. Let me make some statements and you can tell me if we really disagree.

    1) Some resources have a lifetime.

    Files are an obvious example of such a resource. You need to close your files to release the locks so other parts of your application can access them. This is true even with garbage collection.

    Semaphores are another such resource.

    A colleague told me about an early Swing bug where animated images were being stored in an object’s internal collection. They were not being tidied up when a window disappeared, and continued to consume CPU (and memory). So animations are another example of a resource with a lifetime.

    You need to manage these resources, even with garbage collection. I don’t see this as a memory issue. (While I care that a file is closed, I don’t care if the I/O buffer associated with that file is deallocated or not.) I see this as a resource-management problem for the type of resources that have an explicit lifetime with associated semantics.

    (I see a distinction here: The programmer must take care of resource management, but not memory management. However, I am rather tired and fear I may be making it too black and white, when the line is rather arbitrary. Is see the difference is that a resource intrinsically has a lifetime that the programmer wants to care about – i.e. this semaphore should be released now, rather than at the run-time’s earliest convenience. Heap memory (like stack memory and CPU cycles before it) has now been moved into the “don’t you worry your pretty little head about it” pile for the compiler/run-time environment/OS to take care of.)

    2) You can still consume memory very inefficiently and cause your program to run slowly or to fail – even with garbage collection.

    3) I have, on the cusp of my understanding, an idea that you could theoretically have your own data structures that you added objects to and never delete even though you have no intention of navigating to them again.

    I guess could be seen as a memory leak if you squint. (Normally, I would consider it a memory leak if the lost memory is inaccessible, rather than merely being ignored.)

    This idea seems odd to me; I can’t imagine a situation where I would accidentally end up in such a quandary, but – like I said before – I am tired and I guess it is theoretically possible. If you know of any real-life examples of this, I would like to hear.

    4) Some garbage collectors can’t detect cycles, so that’s another form of legitimate memory leak.

  37. Mostly #2 and #3.

    Depending on environment, #4 is also an issue (eg. in Perl 5, which uses refcounting, not a real garbage collector), but it’s distinct from the other two in that it’s a leaky abstraction rather than a fundamental issue. Both #2 and #3 are fundamental issues.

    It’s not hard to imagine causes for #3: just scope your variables a little too loosely, and you’ll have to pay explicit attention to the values you keep around. Sometimes scopes cannot be tight, either; object pools and various forms of caches come to mind.

    Garbage collection makes life easier by removing the need for one (and only one) of the “stakeholders” of an allocated object to be responsible for freeing it. But each of the stakeholders still needs to declare their non-interest in the allocated object in order for it to be picked up properly. Usually that is very implicit: if you scope variables tightly enough then the right thing will almost always happen by itself. But only almost.

    Most of the time, as you said, the resulting memory leaks are only transient, not persistent. But even the transient ones can still trip up your code if they are bad enough; and persistent ones remain possible, though much rarer.

    Saying “garbage collection means you don’t have to think about memory management” is kinda like saying “Java/Perl/Python/Ruby doesn’t have pointers”.

  38. Saying “garbage collection means you don’t have to think about memory management” is kinda like saying “Java/Perl/Python/Ruby doesn’t have pointers”.

    Well put. Thank you. I will moderate my language on this topic in the future.

  39. I’ve never had a case where I wanted to have two variables of the same name within the same scope that only differed by case. So I really see no point in making things case sensitive.

  40. I stumbled across some people trying to use a variant on my KEANU REEVES argument to evade the court’s authority.

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <br> <code> <del datetime=""> <dd> <dl> <dt> <em> <i> <ins datetime="" cite=""> <li> <ol> <p> <q cite=""> <strike> <strong> <sub> <sup> <u> <ul>

Web Mentions

  1. The USS Quad Damage

  2. girtby.net » Blog Archive » Punctuation Insensitivity

  3. Coding Horror

  4. Nick's Delphi Blog

  5. OddThinking » python.CodingStyle().ishated = TRUE