No announcement yet.

Casefolding For Bcachefs File-System Posted

  • Filter
  • Time
  • Show
Clear All
new posts

  • #81
    Originally posted by oiaohm View Post
    Tamil has it own alphabet but then Tamil also has encoding using a to z and A to Z and few extra ascii chars. Remember Tamil there is no such thing as upper case or lower case in the Tamil original alphabets.
    Discover the beauty of the Tamil language by mastering its alphabets. With persistence and regular practice, you'll soon be proficient in reading, writing, and speaking Tamil!

    tamil chart has like Koo then kau,

    ​This is the problem it is valid to write tamil using just a-z/A-Z. The a-z/A-Z + a few extra ascii chars comes about that the early printing presses and typewriters did not have tamil chars. So encoding was developed using the chars that the typewriters and printing presses had.

    Tamil is a fun language where there are particular chars that link to alphabet organization that should never appear in words or ever be spoken.
    I don't see the issue here, I may need you to elaborate more on it, the chart supplied doesn't show where case makes an important distinction.

    Now it gets trickier countries that use tamil also use English. This is where locale setting starts coming apart. Tamil using person on their computer is totally valid to have a mix of tamil documents and mix of english documents. Case folding on english document name is useful case folding on Tamil document names written using ascii chars will end up matching up completely wrong documents and is also totally wrong because tamil is a caseless language.


    This is taken from a valid locale of Australian English speaker. Notice 3 languages set. Yes valid to set in LANGUAGE mixture of languages. A person has in Language lets say english and tamil using English chars how should you now case fold.

    This is not a simple problem. locale does not mandate you just pick one language or you pick languages with compatible case folding with each other.
    no it doesn't, just disable case folding if it's needed. pretty simple to me.


    • #82
      Originally posted by Quackdoc View Post
      I don't see the issue here, I may need you to elaborate more on it, the chart supplied doesn't show where case makes an important distinction.
      ​I did point to one. Koo then kau in the one line. Without having to major-ally start explain that chart. It has the english alterative and the correct chars in a mapping table.

      .There is the Ko before but there is the lower case ko that comes out of the bottom table that a completely different char. Tamil encoded into latin/roman chars just has too many chars not have these overlaps. Tamil not the only language with this problem abusing the Latin/roman chars this way is fine while there is not case folding and was fine for a very long time before computers.

      Notice you just looked at the Tamil char set and did not see the problem. Please note Tamil site there gave you one of the neatest to read alphabets of all the problem child languages. Horrible number of letters with horrible number of mapping into latin/roman chars equals items that are really simple to miss that the language cannot be case folded particular as they end up doing things to attempt to compact the charts.

      Originally posted by Quackdoc View Post
      no it doesn't, just disable case folding if it's needed. pretty simple to me.
      Remember the person with a language mix that is not case fold compatible as a complete set yet there is a case fold compatible language in there can be working with a person who depends on case folding working. Think the git issues that happen because git is case sensitive and windows uses have case fold.

      locale does not work as means to enable/disable case fold in lots of cases. You end up needing per folder controls when user is are in the locale where you have mixed languages. Yes the person with mixed languages for sane workflow with others need to be able to enable case fold in particular locations and not others just so they don't cause problems with the people they are working with. Its really simple to forgot those doing translations and the like operated mixed language OS.

      The per folder enabling and disabling of case folding is the correct method for those that have a mix of languages in use some that are case fold compatible and some that are not.

      Also remember with Linux someone can install the OS then decide they are changing the locale. This brings it own set of problems. How to allow user to change the case fold setting on their home directory. Core Linux being case sensitive make since person before changing their locale might download software that has translations and documentation in a case fold incompatible language.