Announcement

Collapse
No announcement yet.

Casefolding For Bcachefs File-System Posted

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • oiaohm
    replied
    Originally posted by Quackdoc View Post
    I don't see the issue here, I may need you to elaborate more on it, the chart supplied doesn't show where case makes an important distinction.
    ​I did point to one. Koo then kau in the one line. Without having to major-ally start explain that chart. It has the english alterative and the correct chars in a mapping table.

    .There is the Ko before but there is the lower case ko that comes out of the bottom table that a completely different char. Tamil encoded into latin/roman chars just has too many chars not have these overlaps. Tamil not the only language with this problem abusing the Latin/roman chars this way is fine while there is not case folding and was fine for a very long time before computers.

    Notice you just looked at the Tamil char set and did not see the problem. Please note Tamil site there gave you one of the neatest to read alphabets of all the problem child languages. Horrible number of letters with horrible number of mapping into latin/roman chars equals items that are really simple to miss that the language cannot be case folded particular as they end up doing things to attempt to compact the charts.

    Originally posted by Quackdoc View Post
    no it doesn't, just disable case folding if it's needed. pretty simple to me.
    Remember the person with a language mix that is not case fold compatible as a complete set yet there is a case fold compatible language in there can be working with a person who depends on case folding working. Think the git issues that happen because git is case sensitive and windows uses have case fold.

    locale does not work as means to enable/disable case fold in lots of cases. You end up needing per folder controls when user is are in the locale where you have mixed languages. Yes the person with mixed languages for sane workflow with others need to be able to enable case fold in particular locations and not others just so they don't cause problems with the people they are working with. Its really simple to forgot those doing translations and the like operated mixed language OS.

    The per folder enabling and disabling of case folding is the correct method for those that have a mix of languages in use some that are case fold compatible and some that are not.

    Also remember with Linux someone can install the OS then decide they are changing the locale. This brings it own set of problems. How to allow user to change the case fold setting on their home directory. Core Linux being case sensitive make since person before changing their locale might download software that has translations and documentation in a case fold incompatible language.

    Leave a comment:


  • Quackdoc
    replied
    Originally posted by oiaohm View Post
    Tamil has it own alphabet but then Tamil also has encoding using a to z and A to Z and few extra ascii chars. Remember Tamil there is no such thing as upper case or lower case in the Tamil original alphabets.
    Discover the beauty of the Tamil language by mastering its alphabets. With persistence and regular practice, you'll soon be proficient in reading, writing, and speaking Tamil!

    tamil chart has like Koo then kau,

    ​This is the problem it is valid to write tamil using just a-z/A-Z. The a-z/A-Z + a few extra ascii chars comes about that the early printing presses and typewriters did not have tamil chars. So encoding was developed using the chars that the typewriters and printing presses had.

    Tamil is a fun language where there are particular chars that link to alphabet organization that should never appear in words or ever be spoken.
    I don't see the issue here, I may need you to elaborate more on it, the chart supplied doesn't show where case makes an important distinction.


    Now it gets trickier countries that use tamil also use English. This is where locale setting starts coming apart. Tamil using person on their computer is totally valid to have a mix of tamil documents and mix of english documents. Case folding on english document name is useful case folding on Tamil document names written using ascii chars will end up matching up completely wrong documents and is also totally wrong because tamil is a caseless language.

    ---

    This is taken from a valid locale of Australian English speaker. Notice 3 languages set. Yes valid to set in LANGUAGE mixture of languages. A person has in Language lets say english and tamil using English chars how should you now case fold.

    This is not a simple problem. locale does not mandate you just pick one language or you pick languages with compatible case folding with each other.
    no it doesn't, just disable case folding if it's needed. pretty simple to me.

    Leave a comment:


  • oiaohm
    replied
    Originally posted by Quackdoc View Post
    the question is, what languages use case as a form, and is there an alternative, or the better question, why not just set case sensitivity by locale?
    Tamil has it own alphabet but then Tamil also has encoding using a to z and A to Z and few extra ascii chars. Remember Tamil there is no such thing as upper case or lower case in the Tamil original alphabets.
    Discover the beauty of the Tamil language by mastering its alphabets. With persistence and regular practice, you'll soon be proficient in reading, writing, and speaking Tamil!

    tamil chart has like Koo then kau,

    ​This is the problem it is valid to write tamil using just a-z/A-Z. The a-z/A-Z + a few extra ascii chars comes about that the early printing presses and typewriters did not have tamil chars. So encoding was developed using the chars that the typewriters and printing presses had.

    Tamil is a fun language where there are particular chars that link to alphabet organization that should never appear in words or ever be spoken.

    There are other languages with encoding like this that don't have their own original written alphabet. The problem of having more chars than what latin/europe languages have and no upper and lower cases in the language and you only have latin/europe charsets to work with create this mess. Some like hindi use a lot of lower case letter to equal the chars others like tamil since their language don't have upper case use upper case as part of their char encoding to prevent messing thing up this works as long as tamil using latin/europe charsets does not get case folded of course type writers and printing presses back in day could not automatically case fold so not a problem back then.

    Now it gets trickier countries that use tamil also use English. This is where locale setting starts coming apart. Tamil using person on their computer is totally valid to have a mix of tamil documents and mix of english documents. Case folding on english document name is useful case folding on Tamil document names written using ascii chars will end up matching up completely wrong documents and is also totally wrong because tamil is a caseless language.

    Folding from tamil typed english/latin chars to unicode tamil chars that could be valid but again you don't want fold english document names into Tamil unicode chars because that would be wrong.​

    Please note tamil is only one example but there are many more. Remember most of this problem languages you will have people using english or french as well and the reason for the encoding being in latin/europe language base is because some country from europe colonized them at some point bring europe printing presses and typewriters.. So at least bilingual this brings up a issue with locale design that is simple to over look as native english speakers..​

    LANG=en_AU.UTF-8
    LANGUAGE=en_AU:en_GB:en
    This is taken from a valid locale of Australian English speaker. Notice 3 languages set. Yes valid to set in LANGUAGE mixture of languages. A person has in Language lets say english and tamil using English chars how should you now case fold.

    This is not a simple problem. locale does not mandate you just pick one language or you pick languages with compatible case folding with each other.

    Yes as native english speakers we don't notice that our systems have multi versions of english set in locale so applications find compatible text for us. Yes people have a miss understanding of locale and things locale equals set 1 language when it in fact locale is set multi languages with no requirement for them to be compatible. Idea of locale is you set the languages you can read and application uses what ever one it thinks works.

    Basically locale does not help because legal locale settings is case fold incompatible languages. This is why I said being able to set by directory or something else like that.

    This is the problem case folding looks like a simple problem while you don't see the different encodings that exist for different languages that have been created because of historic colonization events so that native people language could take advantage of the hardware colonizers brought. Yes when you are encoding a language using a-z A-Z and your native language has no concept of case is really simple to see lower case a and upper case A as two complete different chars to use leading to messed up encodings.

    Also why would a person today type on english keyboard instead of their own language simple english keyboard can be more mass produced so simpler and cheaper to get hands on. This also leads to some people still learning to type their language into computers using these old encodings.

    Quackdoc basically this is not a simple problem with a straight forward answer. History made a mess with the different colonization events in more ways than one and we have to live with the fall out. Yes this issue with case fold is part of that fall out.​

    Leave a comment:


  • Quackdoc
    replied
    Originally posted by oiaohm View Post

    "Standard alphabet" that a problem. English/latin alphabet is not use by all languages correctly. Not all languages have capitalization majority of languages don't have capitalization.
    Answer (1 of 9): What languages don’t use capital letters when writing? Only languages that use the following alphabets have upper and lower case letters: * Armenian * Cyrillic * Greek * Latin So languages without case distinction include: * Amharic * Arabic * Assamese * Azerbaijani *...


    This leads to problem some of these languages are coded to use english/latin alphabet due to having extra chars as in more tones than a-z can represent they use upper case for what are in effect standard lower case alphabet chars to their language. This is what lead to the problem of a and A being two totally different sound when spoken and the like in these languages. Yes these languages to a normal English speaker written form look like a complete managed mess.

    The standard a-z alphabet has been abused. Case folding in fact only make sense in a small set of languages because only a small set of languages have upper and lower case the majority of languages don't have upper and lower case.

    There are issues that come into play where current casefolding leads to horrible problems due to letting thing work without error or notice to user that what they have done is wrong. #include <gl/Gl.h> comes to mind. Finding in projects where code is being altered because case has been changed and the code built because casefolding allowed it that don't alter functionality so making log of changes worse. Yes the correct C include is in fact #include <GL/gl.h> and that by the opengl standard documentation. Some of these things you start seeing that that you need spell check in casefold to inform user excuse me that capitalization you just did was wrong because some standard/legal requirement says it should be X way and you did something else.

    Its not that I don't see that case folding can be useful. There are catches.

    People dealing with languages where case folding make no sense even if they use a-z chars need to have means to have case folding off and have case sensitive.

    Current case folding solution are designed to be transparent to user. Items like the C includes and other things where users messing up case that end causing other issues functional case fold that informs that case fold is going on would be good.

    Case folding is not a solved problem. There are problems with case insensitive(case fold) and case sensitive. Its really simple to say it will make it better for user then miss that at times it will be worse. There is a old saying "Just because you can do something does not mean you should" this does apply.

    Case folding is one of those harder problems it fixes some problems but then causes others like the GL includes you find where people on Windows type it wrong and don't notice. Yes there is a need for something halfway in the middle between case folding case insensitive and case sensitive.. Yes this middle being where user is informed when they have a miss matched case problem so they know that it should be fixed but the user can decide to let it slide and allow a case fold.

    Current problem with case sensitive and case insensitive is that is a black and white choice when us humans like it or not operate in shades of grey somewhere in the middle.
    the question is, what languages use case as a form, and is there an alternative, or the better question, why not just set case sensitivity by locale?

    Leave a comment:


  • oiaohm
    replied
    Originally posted by Quackdoc View Post
    Indeed, I find from a general usability standpoint case insensitive is much better, for the vast majority of reasons, there is no reason, from a user facing perspective, to have files/folders that only differentiate by capitalization, at least for languages native to the "standard" alphabet anyways. it can cause a good chunk of confusion, which I have witnessed first hand.
    "Standard alphabet" that a problem. English/latin alphabet is not use by all languages correctly. Not all languages have capitalization majority of languages don't have capitalization.
    Answer (1 of 9): What languages don’t use capital letters when writing? Only languages that use the following alphabets have upper and lower case letters: * Armenian * Cyrillic * Greek * Latin So languages without case distinction include: * Amharic * Arabic * Assamese * Azerbaijani *...


    This leads to problem some of these languages are coded to use english/latin alphabet due to having extra chars as in more tones than a-z can represent they use upper case for what are in effect standard lower case alphabet chars to their language. This is what lead to the problem of a and A being two totally different sound when spoken and the like in these languages. Yes these languages to a normal English speaker written form look like a complete managed mess.

    The standard a-z alphabet has been abused. Case folding in fact only make sense in a small set of languages because only a small set of languages have upper and lower case the majority of languages don't have upper and lower case.

    There are issues that come into play where current casefolding leads to horrible problems due to letting thing work without error or notice to user that what they have done is wrong. #include <gl/Gl.h> comes to mind. Finding in projects where code is being altered because case has been changed and the code built because casefolding allowed it that don't alter functionality so making log of changes worse. Yes the correct C include is in fact #include <GL/gl.h> and that by the opengl standard documentation. Some of these things you start seeing that that you need spell check in casefold to inform user excuse me that capitalization you just did was wrong because some standard/legal requirement says it should be X way and you did something else.

    Its not that I don't see that case folding can be useful. There are catches.

    People dealing with languages where case folding make no sense even if they use a-z chars need to have means to have case folding off and have case sensitive.

    Current case folding solution are designed to be transparent to user. Items like the C includes and other things where users messing up case that end causing other issues functional case fold that informs that case fold is going on would be good.

    Case folding is not a solved problem. There are problems with case insensitive(case fold) and case sensitive. Its really simple to say it will make it better for user then miss that at times it will be worse. There is a old saying "Just because you can do something does not mean you should" this does apply.

    Case folding is one of those harder problems it fixes some problems but then causes others like the GL includes you find where people on Windows type it wrong and don't notice. Yes there is a need for something halfway in the middle between case folding case insensitive and case sensitive.. Yes this middle being where user is informed when they have a miss matched case problem so they know that it should be fixed but the user can decide to let it slide and allow a case fold.

    Current problem with case sensitive and case insensitive is that is a black and white choice when us humans like it or not operate in shades of grey somewhere in the middle.

    Leave a comment:


  • Quackdoc
    replied
    Originally posted by Knghtbrd View Post
    Casefolding is one of those things … you may or may not want it, right up until you need it. The same could be said for case sensitivity, though.
    Indeed, I find from a general usability standpoint case insensitive is much better, for the vast majority of reasons, there is no reason, from a user facing perspective, to have files/folders that only differentiate by capitalization, at least for languages native to the "standard" alphabet anyways. it can cause a good chunk of confusion, which I have witnessed first hand.

    Leave a comment:


  • Knghtbrd
    replied
    Originally posted by Quackdoc View Post
    Casefolding is one of the more important feature. I myself am of the opinion that I wish linux itself would default and run as case folded but alas, I know that won't happen. Am really excited for bcachefs, even though I only want to use it on single drive systems, having cow that isn't zfs or btrfs will be really nice. XFS is good bridge for features for a temporary stopgap
    There's a number of reasons that's just not going to happen. A couple of things depend on the current behavior of case sensitivity, and the kernel-side mantra is to not break userspace for a good reason. That's enough to kill the idea right there.

    Beyond that, though, a lot of people are passionate about this: Linux is case sensitive the way GOD INTENDED or some such. You won't convince them, and it ends up being a big mess that largely isn't worth it the holy war.

    I would like to see the feature enabled by default when a filesystem is created, however. At this point I think it's safe to default to it so that you can chattr +F any directory you need to. It's been available since Linux 5.3 or 5.4, and that's old enough at this point that I don't think anyone is going to expect those kernels to access a filesystem created today without compatibility modes or a patch/rebuild on the old kernel.

    Casefolding is one of those things … you may or may not want it, right up until you need it. The same could be said for case sensitivity, though.

    Leave a comment:


  • oiaohm
    replied
    Originally posted by billyswong View Post
    1. Right in you Google Group link, there is finding that Win98 is storing a full uppercase 8.3 short name in the legacy file name entry and only store the upper+lowercase name in long name fragment, a later invention in Windows. While technically one can put lowercase codepoints into the 8.3 field manually into the file system, practically this isn't happening. So this fits your earlier saying "you press lower case A or upper case A and all you get is upper case A". You are moving the goal post when you use CP/M and "attempting to use lower case is straight up system error" as the new example.
    You have a mistake here billlyswong.

    I get where you problem comes from. The wikipedia has it wrong on 8.3 filenames but they reference the correct document and then write made up garbage to avoid admitting the bug.

    This archive shows the Microsoft instructions for those implementing LFN.
    Direct quote from the Microsoft implementation guide.
    The short file name "Alongf~1.txt" is generated from the long file name "A long filename.txt" because the long file name contains more than eight characters.
    Notice the instructions for implementing LFN from Microsoft say the 8.3 should contain upper and lower cases chars. The fat documentation prior to LFN also says 8.3 should contain upper and lower case chars.
    Win98 doing all uppercase 8.3 is technically a bug. Yes as normal Microsoft manages to do implementation errors. Of course operating systems like Linux and BSD implemented their fat support as per Microsoft documentation so they support mixed case in the LFN 8.3 filenames.

    I will give you made I made mistake my brain though CP/M. https://en.wikipedia.org/wiki/Baudot_code there a set different encoding before ascii were used that don't have a lower case. Yes 5 bit and 6 bit encodings instead of the ascii 7 bit. CP/M is the first with ascii CP/M is not the first to use 8.3 filenames.

    CP/M why I was thinking of it is there was tools with CP/M to access some of these older horrible case less filesystems those tools were never ported to dos.

    Originally posted by billyswong View Post
    Whether a system halt with error when seeing lower case letter entry is an implementation detail.
    Case less error out because encoding only has one case what can it do with a data screen with more than one other than totally refuse.

    Originally posted by billyswong View Post
    Case-insensitive: the case information of the entry is lost and irretrievable, conversion is either done automatically right in the saving procedure or manually for some oldest systems by disallowing non-conforming input.
    https://www.collabora.com/news-and-b...ature-in-ext4/ No that does not match the general term of case-insensitive

    The oldest case-less that have like Baudot and morse keyboards there is no way to type upper and lower case you only have single A char on the keyboard with no shift. Please remember items like Morse code are today still case-less. The oldest is case-less and case-less is own thing. This is where the OS is using encoding and the keyboard for the OS don't have shift keys that work on A to Z.

    Case-insensitive is a group of technologies. But case insensitive does not equal the case information always lost and never has.

    Case-less you lose the case information without bugs because there is no way to store the cased chars. Try doing case in morse or Baudot some time these are case-less encodings.

    Originally posted by billyswong View Post
    ​Case-folding: the case information of the entry is kept, but string comparison and indexing is done in case-insensitive manner, following some arbitrary case rule.
    This is not true either that all the case information is kept when you are using case folding. With case fold what happens when I save abc but the file ABC already existed. Case-folding does have section of irretrievable conversion/data loss. Yes the fact person attempted to save abc lower case is lost because the case fold resulting in the system believing the user wanted to overwrite ABC.

    Case-fold does have irretrievable conversion. How much irretrievable conversion happens depends on how bad the case-fold implementation is. Case folding done wrong is what gives the windows 98 bug that resulted in 8.3 file names being done upper case when the documentation clearly said otherwise.

    You were very correct that when you have Case insensitive you have a percentage of case data loss some remains. Case-less you have 100 percent case data loss. Of course with implementations bugs like the Windows 98 one the case data loss in a case insensitive can increase lots.

    Originally posted by billyswong View Post
    Case-sensitive: no built-in case rule is required as everything is stored, retrieved, and compared/indexed as-is.
    Yes case sensitive is the simple case. But then you have the problem of the Unicode chars that are the same char but different code point numbers.

    China end up using legal sledge hammer to start fixing their problem and they don't care if they nuke the other cultures out of existence in their country. India is trying to avoid the sledge hammer because they are wanting to keep their different cultures alive.

    romaji, lacks tones that can be written in hiragana or katakana. This is why it just comes trouble when you trying to make the system correctly future proof with correct way of saying person name.

    Leave a comment:


  • billyswong
    replied
    oiaohm I think you misunderstood something.

    1. Right in you Google Group link, there is finding that Win98 is storing a full uppercase 8.3 short name in the legacy file name entry and only store the upper+lowercase name in long name fragment, a later invention in Windows. While technically one can put lowercase codepoints into the 8.3 field manually into the file system, practically this isn't happening. So this fits your earlier saying "you press lower case A or upper case A and all you get is upper case A". You are moving the goal post when you use CP/M and "attempting to use lower case is straight up system error" as the new example.

    Whether a system halt with error when seeing lower case letter entry is an implementation detail. I will stick to my own terminology for now.
    Case-insensitive: the case information of the entry is lost and irretrievable, conversion is either done automatically right in the saving procedure or manually for some oldest systems by disallowing non-conforming input.
    Case-folding: the case information of the entry is kept, but string comparison and indexing is done in case-insensitive manner, following some arbitrary case rule.
    Case-sensitive: no built-in case rule is required as everything is stored, retrieved, and compared/indexed as-is.

    Modern FAT filesystem is of course case-folding, not case-insensitive. That's why I used credit card and passport as legacy case-insensitive examples. Case-folding can't work for them as there is no internationally perfect case rule.

    2. I am born and grow up in East Asia. I can't speak for India but you seem to have misunderstood the Japanese name part. While there are 3 written forms "hiragana, katakana and kanji", a proper noun can't use the 3 written forms interchangeably. There is only 1 proper way to write a proper noun. Technically kanji in Japanese can have multiple pronunciations and people sometimes may denote which pronunciation it is using through hiragana or katakana, but that job can also be fulfilled by romaji, the "passport-use A-to-Z only" format.

    For China... they are squeezing people's name into "Simplified Chinese" and I once read a news story where they ruined the family name of some indigenous people / ethnic minority in the new ID cards. The old ID cards let them hand-write the characters but the new ID cards just changed their surname brutally. Unicode does provide 𪀋 U+2A00B the "Traditional Chinese" codepoint in "CJK Unified Ideograph Extension B". But for some stupid reasons China didn't add their corresponding "Simplified Chinese" codepoint for this character nor let them use the Traditional Chinese codepoint. They just forced the clan into using 鸭​ U+9E2D instead. No, the two words pronounce differently so it would have been intolerable in any other democratic countries.

    So politically, you may as well say there are no "case-folding" problems in China. The government just doesn't care and let the officials be lazy and do whatever convenient to them. Outside documents that are written explicitly discussing Simplified vs Traditional Chinese, no plain text need to let "Traditional Chinese" and "Simplified Chinese" coexist in the same paragraph or same article. They are not cases like Latin alphabets. So pragmatically "case-folding" problems don't exist for Chinese person name records. One can record both "Traditional Chinese" name and "Simplified Chinese" name in separate fields. In practice they are indeed recorded in separate fields if both are to be recorded, not chained together in one field.

    Similarly your "14 legal names" case for India isn't really refuting my framework. It is only making my "name in native script" & "script language metadata" multiply by at most 14. No matter it is 2 or 14, people do still have their preference in which one is their "native" choice and a system can choose to pick the "native" one only and skip others.

    3. My "name in native script" field is case-sensitive. The "case rule" in my "script language metadata" is not for case-unfolding "name in native script". It is for case-folding "name in native script" when prompted to as case-insensitive text searching is still necessary for practical purpose.

    Leave a comment:


  • oiaohm
    replied
    Originally posted by billyswong View Post
    While you disagree with the terminology and want to call it "case less", it is all what "case-insensitive" historical filesystems mean. In DOS days, FAT file names are 8.3 all uppercase. So we are talking about the same thing.
    No fat is a case insensitive because the 8.3 file name both cases are in fact stored. MS Dos always allowed file names with upper and lower case then in processing for compared converted them to upper for compare this is early case folding.


    This is why you see stuff like this.

    I like VXwork developer here going appearing in upper case is how it should always appear completing messing up that this is in fact wrong so you are not alone with that mistake.

    It was with the introduction in Dos 3.3 of code pages for regional support where toupper cease to be a constant conversion this leads to people under MS dos using different code pages resulting in 2 or more of the same file on the file system happening for some users. Dos 3.3 Fat being early case fold/case insensitive is also the first time we have lot of case fold failures. (yes 1987 before Linux)

    Case less is like your early CP/M where everything is stored as uppercase and attempting to use lower case is straight up system error because it cannot be done. Yes cannot be done because the char encoding for the file system only had like A no lower a value this is the way the case less OS are.

    Dos is "case-insensitive" but OS that Dos took lots of inspiration from CP/M is "case less". The big thing a case less file system you cannot encode two letter A one being upper or lower because there is only 1 A. Yes one of the selling points of first version of Dos that was sold to IBM was the means to store lower case chars in file names and that Dos would be case-insensitive instead of being case-less.

    Originally posted by billyswong View Post
    Ideally no person names would use multiple languages for one name,
    Japan/India/china.... yes you might have 1 name in one language but you might have 5 different ways of writing that one language then add in places like india with regional dialectics and country wide dialects. We have a mess.

    Basically there are 3.
    1) case-less this is where there is no method to store upper and lower chars you only have option to store 1 letter A and so on.
    2) case-insensitive this case folding where you have means to store upper and lower but the way they are processed upper and lower are meant be equal to each other but this has always failed once multi languages get in mix.
    3) case-sensitive this is means to store upper and lower with no case folding processing.

    Case-less OS mostly disappears at the start of MS/pc Dos 1981 with the start of case-insensitive OS coming common in business thanks to IBM/Microsoft success with the PC.

    Originally posted by billyswong View Post
    ​1) passport-use A-to-Z only name, 2) name in native script, 3) script language metadata that restrict the character range and also describe the case rules, in other words, some modern form of code page.
    You get to Japan this does not work. This is why those doing credit cards/passports started giving up. Japanese 3 different written forms. "hiragana, katakana and kanji" and the person could have a English name as well. Yes their name on their birth certificate can be written 4 times in japan. Please note this is still a simple example. Areas of india get really nightmare were a person can have up to 14 legal names on their birth certificate.

    China you have also the problem of multi names on birth certificate because there is more than 1 way of writing the language. Scary enough 70% of the human population does not fit the model you just described. Of course china is not as bad as India or japan where person legally can have different names written in A to Z.

    There are some very detail studies where parties looking at making universal passports and credit card system that costs millions each do end up totally giving up.

    When everything start failing apart and you have only done 3 countries Japan, india and china and 2 of these countries cover most of the human population it comes clear we have made one hell of a mess.

    Originally posted by billyswong View Post
    But for file and folder names? Adding such one-language-only restriction will be totally unreasonable. A simple Report_coauthored_by_person_A_and_person_B.pdf will break it in no time.
    The reality here is no matter how you do case folding that is going to fail. Think for 1 min you are dealing with person with 4 different legal names.

    The more you get into this problem more the possible of specialist folding comes up. Yes some areas storing reports you may want specialist folding to de-duplicate reports or make looks up work what you search by a person other given name not the name on the report.

    Upper and Lower case issue that case fold targets is really only the tip of a very big iceberg of the mess we humans have created with the different encoding of written forms of single spoken language.

    Case sensitive is the easy way for OS design because you throw the problem on the operator to solve themselves.

    Leave a comment:

Working...
X