{"id":112,"date":"2005-10-26T16:31:12","date_gmt":"2005-10-26T06:31:12","guid":{"rendered":"http:\/\/www.somethinkodd.com\/oddthinking\/?p=112"},"modified":"2007-12-29T12:46:42","modified_gmt":"2007-12-29T02:46:42","slug":"the-world-of-case-sensitivity","status":"publish","type":"post","link":"https:\/\/www.somethinkodd.com\/oddthinking\/2005\/10\/26\/the-world-of-case-sensitivity\/","title":{"rendered":"The World Of Case Sensitivity"},"content":{"rendered":"<p><!-- UnMarkedDown_2_01132526463--><\/p>\n<h2>Introduction<\/h2>\n<p>Let&#8217;s spend a moment exploring the world of case-sensitivity. I am, of course, not talking about the skills of a baggage-handler. I am talking about the way software deals with the differences between upper-case and lower-case characters. <\/p>\n<p>Particularly relevant examples include:<\/p>\n<ul>\n<li>identifiers in programming languages.<\/li>\n<li>filenames in file systems.<\/li>\n<li>URLs in web-servers.<\/li>\n<li>search terms in search engines.<\/li>\n<\/ul>\n<p>There is a logical hierarchy of the way software can treat case. I will talk through each one of these below.<\/p>\n<ul>\n<li>Case Sensitivity<\/li>\n<li>Case Insensitivity\n<ul>\n<li>Case Preserving<\/li>\n<li>Non-Case Preserving\n<ul>\n<li>Fixed Style <\/li>\n<li>Dictionary Definition<\/li>\n<li>Last Usage<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Asymmetric Case Sensitivity<\/li>\n<\/ul>\n<p><em>(Why am I going on about the different types of case-sensitivity with a neutral point-of-view? Bear with me. I&#8217;m building up a framework for an argument. I&#8217;ll come back to this later on.)<\/em><\/p>\n<h2>Case Sensitivity<\/h2>\n<p>Many software systems are case-sensitive. The character string &#8220;<code>foo<\/code>&#8221; and the character string &#8220;<code>Foo<\/code>&#8221; are not considered to be equivalent.<\/p>\n<p>Examples include:<\/p>\n<ul>\n<li>Programming languages: C\/C++, Java, Python, PHP, Modula-3, Perl<\/li>\n<li>File Systems: <a href=\"http:\/\/en.wikipedia.org\/wiki\/Unix_File_System\" title=\"Wikipedia definition of Unix_File_System\" class=\"wikipedia\">UFS<\/a><\/li>\n<li>Web Servers: <a href=\"http:\/\/www.apache.org\">Apache<\/a><\/li>\n<li>Search engines: <a href=\"http:\/\/www.gnu.org\/software\/grep\/\">grep<\/a><\/li>\n<\/ul>\n<h2>Case Insensitivity<\/h2>\n<p>Conversely, many software systems are case-insensitive. The character string &#8220;<code>foo<\/code>&#8221; and the character string &#8220;<code>Foo<\/code>&#8221; are considered to be equivalent.<\/p>\n<p>Examples include: <\/p>\n<ul>\n<li>Programming languages: LISP, BASIC, Ada, Eiffel, Pascal<\/li>\n<li>File Systems: <a href=\"http:\/\/en.wikipedia.org\/wiki\/NTFS\" title=\"Wikipedia definition of NTFS\" class=\"wikipedia\">NTFS<\/a>, <a href=\"http:\/\/en.wikipedia.org\/wiki\/File_Allocation_Table\" title=\"Wikipedia definition of File_Allocation_Table\" class=\"wikipedia\">FATxx<\/a><\/li>\n<li>Web Servers: <a href=\"http:\/\/www.microsoft.com\/WindowsServer2003\/iis\/default.mspx\">IIS<\/a><\/li>\n<li>Search engines: Most search engines including Google, Altavista, Yahoo, etc.<\/li>\n<\/ul>\n<p>In some circumstances &#8211; particularly with search engines or the execution of scripts &#8211; this suffices to characterise the processing. However, in many circumstances, the software processing the text string has the opportunity to <em>change<\/em> the case of the text string into a canonical form. The following sections outline the options.<\/p>\n<h3>Case Preserving<\/h3>\n<p>If the case of the original text is not modified, the software is said to be <em>case preserving<\/em>.<\/p>\n<p>Examples include:<\/p>\n<ul>\n<li>File Systems: <a href=\"http:\/\/en.wikipedia.org\/wiki\/NTFS\" title=\"Wikipedia definition of NTFS\" class=\"wikipedia\">NTFS<\/a>, <a href=\"http:\/\/en.wikipedia.org\/wiki\/File_Allocation_Table\" title=\"Wikipedia definition of File_Allocation_Table\" class=\"wikipedia\">FAT32<\/a>, <a href=\"http:\/\/en.wikipedia.org\/wiki\/HFS_Plus\" title=\"Wikipedia definition of HFS_Plus\" class=\"wikipedia\">HFS Plus<\/a><\/li>\n<\/ul>\n<h3>Non-Case Preserving<\/h3>\n<p>If the software modifies the case of the original text to put it into a <em>canonical form<\/em>, it is <em>non-case preserving<\/em>.<\/p>\n<p>Such software can be further classified into the following sub-categories.<\/p>\n<h4>Fixed Style<\/h4>\n<p>If the software always converts the case to a fixed style (whether it be upper-case, lower-case, title-case, sentence-case, etc.) then I refer to it here as using a <em>fixed style canonical form<\/em>.<\/p>\n<p>Examples include:<\/p>\n<ul>\n<li>File Systems: <a href=\"http:\/\/en.wikipedia.org\/wiki\/File_Allocation_Table\" title=\"Wikipedia definition of File_Allocation_Table\" class=\"wikipedia\">FAT12 and FAT16<\/a> (Upper-case)<\/li>\n<\/ul>\n<h4>Dictionary Definition<\/h4>\n<p>In some situations, each identified or word may have an official declaration &#8211; e.g. identifiers in Ada, proper nouns in spelling dictionaries. Each reference to the object can be transformed to match the declaration.<\/p>\n<h4>Last Use<\/h4>\n<p>This is a special category to describe an early IDE for Microsoft&#8217;s BASIC family. (I believe it was QuickBASIC, but it may have been Visual Basic 1.0 for DOS, or even QBASIC.)  When an identifier was typed in with an unexpected capitalisation, all previous references to the identifier would be modified to match the most recent capitalisation.<\/p>\n<h2>Asymmetric Case Sensitivity<\/h2>\n<p>Another way of dealing with case, especially in searches, is to use what I have dubbed &#8220;asymmetric case-sensitivity&#8221;. <\/p>\n<p>Under this system, if a user searched for &#8220;<code>foo<\/code>&#8221; it would match &#8220;<code>foo<\/code>&#8220;, &#8220;<code>Foo<\/code>&#8221; or &#8220;<code>FOO<\/code>&#8220;- it is case-insensitive.<\/p>\n<p>However, if the user searched for &#8220;<code>Foo<\/code>&#8221; &#8211; the user has gone to the effort of specifying that some letters are upper-case, then it will only match &#8220;<code>Foo<\/code>&#8221; and not &#8220;<code>foo<\/code>&#8221; or &#8220;<code>FOO<\/code>&#8220;.<\/p>\n<p>This system suffers from theoretical limits (you can&#8217;t do a case-sensitive search for &#8220;<code>foo<\/code>&#8220;) and user-friendliness issues (it is difficult to work out that this will be the behaviour), but in practice, for an experienced user, it can prove to be quite natural.<\/p>\n<p>Examples include:<\/p>\n<ul>\n<li>Emacs<\/li>\n<\/ul>\n<h2>Miscellaneous Pedantic Notes:<\/h2>\n<ul>\n<li>Many of the software examples here have modes, options or variants, or various quirks under certain circumstances, that alter their handling of case. The classifications here described their typical or default behaviour.<\/li>\n<li>I use the terms <em>case-smashing<\/em> as synonymous with <em>non-case preserving<\/em>. I use the term <em>case-folding<\/em> as synonymous with <em>case insensitivity<\/em>. However, the definitions in the normally definitive <a href=\"http:\/\/www.catb.org\/~esr\/jargon\/\">Jargon File<\/a> (viz <a href=\"http:\/\/www.catb.org\/~esr\/jargon\/html\/F\/fold-case.html\">fold case<\/a> and <a href=\"http:\/\/www.catb.org\/~esr\/jargon\/html\/S\/smash-case.html\">smash case<\/a>) are somewhat conflated, so I have avoided them here.<\/li>\n<li>Strictly, it is the Windows operating system, not the file system, that provides NTFS its case-insensitivity feature.<\/li>\n<li>Further Reading:\n<ul>\n<li>Wikipedia articles on <a href=\"http:\/\/en.wikipedia.org\/wiki\/Case_sensitivity\" title=\"Wikipedia definition of Case_sensitivity\" class=\"wikipedia\">Case sensitivity<\/a>, <a href=\"http:\/\/en.wikipedia.org\/wiki\/Case_Sensitivity\" title=\"Wikipedia definition of Case_Sensitivity\" class=\"wikipedia\">Case Sensitivity<\/a>, <a href=\"http:\/\/en.wikipedia.org\/wiki\/Case_preservation\" title=\"Wikipedia definition of Case_preservation\" class=\"wikipedia\">Case preservation<\/a> and <a href=\"http:\/\/en.wikipedia.org\/wiki\/Comparison_of_file_systems\" title=\"Wikipedia definition of Comparison_of_file_systems\" class=\"wikipedia\">Comparison of File Systems<\/a><\/li>\n<li>Merd&#8217;s <a href=\"http:\/\/merd.sourceforge.net\/pixel\/language-study\/syntax-across-languages\/Vrs.html#VrsTkns\">Language Comparison<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>A review of the different ways that different software treats case.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_s2mail":"","footnotes":""},"categories":[34],"tags":[200,199,202,374],"class_list":["post-112","post","type-post","status-publish","format-standard","hentry","category-software-development","tag-case-preserving","tag-case-sensitivity","tag-comparison","tag-software-development"],"_links":{"self":[{"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/posts\/112","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/comments?post=112"}],"version-history":[{"count":0,"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/posts\/112\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/media?parent=112"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/categories?post=112"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/tags?post=112"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}