Tuesday, February 14, 2012

Case sensitive URL distinction? Don't rely on it!

A Uniform Resource Locator (URL) should not be case sensitive. At least, the domain-name part of the URL string is not interpreted with respect to case sensitivity [1].  Of course, you can type a URL into the provided field of your browser anyway you want. The same applies to the href attribute in an anchor tag of your HTML page. But the server that is hosting the targeted website may interpret file paths differently, depending on the occurrence of upper- and lower-case letters in an otherwise identical character sequence [2]. Unless you know exactly the set-up and configuration (Apache/Linux or other hosting software) of the server you are trying to access, you do not want to rely on either a case or non-case sensitive interpretation of your query.

Obviously, the common concern is to locate a website by not caring for upper- or lower-case letter typing and by avoiding  to end up with a “404 Error File Not Found” page [3,4]. Here, I like to emphasize the “mirror problem:” let us assume the server presence of multiple files, whose names vary only by selective capitalization. This problem is not restricted to website location, but is a general issue of targeted search and annotation. For example, in fields such as chemistry, case-sensitive presentation can be critical to distinguish between different materials: the symbols/formulae Co and CO represent the chemical element cobalt and carbon monoxide; CsI and CSi represent  cesium iodide and silicon carbide. Within each pair, notations differ by case only. Two files, named Co.htm and CO.htm, may not correctly be addressed or resolved as separate files, when located in the same directory. Such ambiguities are avoided—although generating overhead—by employing a more distinctive naming scheme. In our ThermoML file repository for molecular-composition-based open access of thermodynamic data and chemical publication hyperlinks, we choose a host-independent system of file names. For cobalt and carbon monoxide the files happen to be Co_aaa.htm and CO_aax.htm, respectively. The x is making the difference.

Keywords: name disambiguation, formula disambiguation, file names, identifiers, web hosting, Windows, Linux, UNIX, case standardization 

References and more on URL case sensitivity
[1] Bin-Blog: www.bin-co.com/blog/2007/10/case-sensitivity-in-urls/.
[2] wiseGEEK: www.wisegeek.com/are-urls-case-sensitive.htm.
[3] Ted Kuik: Case Sensitive URLs. Does capitalization matter? [www.coolnotions.com/Articles/Article_02.htm].
[4] Case-Sensitive URL's: www.infocellar.com/networks/internet/URL-case-sensitive.htm.

No comments:

Post a Comment