OddThinking

A blog for odd things and odd thoughts.

The World Of Case Sensitivity

Introduction

Let’s spend a moment exploring the world of case-sensitivity. I am, of course, not talking about the skills of a baggage-handler. I am talking about the way software deals with the differences between upper-case and lower-case characters.

Particularly relevant examples include:

  • identifiers in programming languages.
  • filenames in file systems.
  • URLs in web-servers.
  • search terms in search engines.

There is a logical hierarchy of the way software can treat case. I will talk through each one of these below.

  • Case Sensitivity
  • Case Insensitivity
    • Case Preserving
    • Non-Case Preserving
      • Fixed Style
      • Dictionary Definition
      • Last Usage
  • Asymmetric Case Sensitivity

(Why am I going on about the different types of case-sensitivity with a neutral point-of-view? Bear with me. I’m building up a framework for an argument. I’ll come back to this later on.)

Case Sensitivity

Many software systems are case-sensitive. The character string “foo” and the character string “Foo” are not considered to be equivalent.

Examples include:

  • Programming languages: C/C++, Java, Python, PHP, Modula-3, Perl
  • File Systems: UFS
  • Web Servers: Apache
  • Search engines: grep

Case Insensitivity

Conversely, many software systems are case-insensitive. The character string “foo” and the character string “Foo” are considered to be equivalent.

Examples include:

  • Programming languages: LISP, BASIC, Ada, Eiffel, Pascal
  • File Systems: NTFS, FATxx
  • Web Servers: IIS
  • Search engines: Most search engines including Google, Altavista, Yahoo, etc.

In some circumstances – particularly with search engines or the execution of scripts – this suffices to characterise the processing. However, in many circumstances, the software processing the text string has the opportunity to change the case of the text string into a canonical form. The following sections outline the options.

Case Preserving

If the case of the original text is not modified, the software is said to be case preserving.

Examples include:

Non-Case Preserving

If the software modifies the case of the original text to put it into a canonical form, it is non-case preserving.

Such software can be further classified into the following sub-categories.

Fixed Style

If the software always converts the case to a fixed style (whether it be upper-case, lower-case, title-case, sentence-case, etc.) then I refer to it here as using a fixed style canonical form.

Examples include:

Dictionary Definition

In some situations, each identified or word may have an official declaration – e.g. identifiers in Ada, proper nouns in spelling dictionaries. Each reference to the object can be transformed to match the declaration.

Last Use

This is a special category to describe an early IDE for Microsoft’s BASIC family. (I believe it was QuickBASIC, but it may have been Visual Basic 1.0 for DOS, or even QBASIC.) When an identifier was typed in with an unexpected capitalisation, all previous references to the identifier would be modified to match the most recent capitalisation.

Asymmetric Case Sensitivity

Another way of dealing with case, especially in searches, is to use what I have dubbed “asymmetric case-sensitivity”.

Under this system, if a user searched for “foo” it would match “foo“, “Foo” or “FOO“- it is case-insensitive.

However, if the user searched for “Foo” – the user has gone to the effort of specifying that some letters are upper-case, then it will only match “Foo” and not “foo” or “FOO“.

This system suffers from theoretical limits (you can’t do a case-sensitive search for “foo“) and user-friendliness issues (it is difficult to work out that this will be the behaviour), but in practice, for an experienced user, it can prove to be quite natural.

Examples include:

  • Emacs

Miscellaneous Pedantic Notes:

  • Many of the software examples here have modes, options or variants, or various quirks under certain circumstances, that alter their handling of case. The classifications here described their typical or default behaviour.
  • I use the terms case-smashing as synonymous with non-case preserving. I use the term case-folding as synonymous with case insensitivity. However, the definitions in the normally definitive Jargon File (viz fold case and smash case) are somewhat conflated, so I have avoided them here.
  • Strictly, it is the Windows operating system, not the file system, that provides NTFS its case-insensitivity feature.
  • Further Reading:

Comments

  1. skills of a baggage-handlers indeed!.

  2. It quite suffers from theoretically limits.

    [Editor’s note: Sunny is referring to a typo, which I have now corrected. Thanks, Sunny.]

  3. In “a” text editor? How about the text editor? Emacs uses the “Asymmetric” case sensitivity in its search commands.

    [Editor’s note: Alastair is referring to a section of the document which has since been updated.]

    To your list of case-insensitive file systems, add HFS Plus (used on MacOS 8+). Interestingly, they recently created a variant of it called HFSX which can be case-insensitive (depending on configuration). It is case-preserving.

  4. Alastair,

    Emacs! Of course! I was afraid it might have actually been Exco! I have updated the article to fill in this info.

    Re: HFS Plus

    Ahh! That explains why, during my research, I was reading conflicting rumours about Mac OS and case-sensitivity. Article updated. Thanks.

  5. The vim name for the asymmetric case sensitivity when finding is smartcase. Being more user-friendly than emacs, vim does not enable this option by default.

  6. No, I was wrong. Vim’s smartcase is something else. You think I would have learnt my lesson about posting before my first cup of coffee.

  7. But Alan, your link to the vim document perfectly described smartcase in the same way as asymmetric case-sensitivity. Perhaps you shouldn’t post after your fourth cup of coffee! 🙂

    By the way, I was willing to take on the case-sensitive versus case-insensitive controversy, but even I would dare not tread into the “vi versus emacs” debate!

Leave a comment

You must be logged in to post a comment.

Web Mentions

  1. OddThinking » The Case for Case-Preserving, Case-Insensitivity

  2. OddThinking » Rational 1000: Staged Compilation