Announcement

Collapse
No announcement yet.

EXT4 Case Insensitive Support Sent In For The Linux 5.2 Kernel

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • oiaohm
    replied
    Originally posted by chithanh View Post
    So all three lowercase versions must be able to coexist in the same directory, while in a theoretical homoglyph-folding filesystem the uppercase versions cannot. But what happens if you now open a file using one of the uppercase characters? By which criteria are you going to decide which one to open?
    Casefold full also fold the lower case to lower case. So no not all 3 lowercase versions have to be able to exist in a directory at the same time. Basically the first one to exist is it. Yes and this is why you include option to only turn casefold on particular directories.

    This is why i refer to this as casefold not case insensitive that much.

    Basically go look at the page 3 of the pdf I put up. It casefolding all lower case chars there are no upper case chars there.

    Leave a comment:


  • chithanh
    replied
    Originally posted by oiaohm View Post
    This is something interesting. This is a different application of casefold.
    I think it is very similar. The case folding there is used for string matching (e.g. user enters search string and expects to match regardless of case). And the proposed mechanisms are also similar to what the Linux Plumbers conference PDF proposes for matching in encrypted directories (normalize and then match).

    Originally posted by oiaohm View Post
    You have to remember file system is name-preserving form of the unicode casefold. So yes it does go into homoglypths.
    No, homoglyphs cannot be folded in filesystems in principle.
    Take for example Ρ (U+03A1 "GREEK CAPITAL LETTER RHO") which is a homoglyph to P (U+0050 "LATIN CAPITAL LETTER P") in uppercase, but when you apply lowercase mappings to them, they become ρ (U+03C1) and p (U+0070) which are no longer homoglyphs. As an added difficulty, there is ϱ (U+03F1) which has an uppercase mapping of U+03A1.

    So all three lowercase versions must be able to coexist in the same directory, while in a theoretical homoglyph-folding filesystem the uppercase versions cannot. But what happens if you now open a file using one of the uppercase characters? By which criteria are you going to decide which one to open?

    Originally posted by oiaohm View Post
    "well known 'ß' example" is not in fact a problem in a name-preserving file system level casefold.
    ß (U+00DF) is actually another problem due to having no simple uppercase mapping, but ẞ (U+1E9E) having a lowercase mapping of U+00DF.

    Leave a comment:


  • oiaohm
    replied
    Originally posted by chithanh View Post
    Are you talking about Unicode homoglyphs? Because casefolding does not address those.
    Also I am pretty sure that there is no full casefolding going on here, because that would cause more issues like the well known 'ß' example (cf. https://www.w3.org/TR/charmod-norm/#example-5).
    This is something interesting. This is a different application of casefold.


    Do look at page 3.

    You have to remember file system is name-preserving form of the unicode casefold. So yes it does go into homoglypths.

    "Because some applications cannot allocate additional storage when performing a case fold operation"
    This problem does not happen. Name-preserving means what ever string the application requested the file with stays exactly same no requirement to alter buffers application side. Casefold is in the ext4 case is effecting how the file system decides if X file name string matches Y file on the filesystem. There is no requirement for a file system to have 1 to 1 match on files to filename think historic hardlinks you could have multi filenames same file contents. Ext4 casefold is really like auto hard-linking is no effective different to applications if the 4 files on page3 in the pdf were all in fact hardlinks to the same file or if Ext4 casefold make them 1 file.

    So yes as ext4 casefold since it using in in the search for file process not the application strings it can be a full unicode casefold without causing any direct issues there are plans where you can put language particularly settings in the directory information.

    "well known 'ß' example" is not in fact a problem in a name-preserving file system level casefold.


    Leave a comment:


  • chithanh
    replied
    Originally posted by oiaohm View Post
    The feature is casefold so its not just plain case insensitive. Casefold include making chars on different keyboard layout that look basically the same successfully match instead of being unique unicode values as well.
    Are you talking about Unicode homoglyphs? Because casefolding does not address those.
    Also I am pretty sure that there is no full casefolding going on here, because that would cause more issues like the well known 'ß' example (cf. https://www.w3.org/TR/charmod-norm/#example-5).


    Leave a comment:


  • oiaohm
    replied
    Originally posted by Orphis View Post
    That's not going to happen with any Windows SDK as that isn't allowed on case insensitive filesystems where they were created.
    Do not underestimate how SDK update patches can screw up. I remember from doing this stuff under wine with case sensitive so it has happened with bad SDK update patches before where the patch was designed for case insensitive on case sensitive things did not turn out right.

    Leave a comment:


  • Orphis
    replied
    That's not going to happen with any Windows SDK as that isn't allowed on case insensitive filesystems where they were created.
    If you had 2 files with a conflicting names in different folder, then same rules that apply for case sensitive filesystems apply. Best case, it just won't build as it would on a proper case insensitive file system. So there is absolutely no conflict there to worry about.

    It's about importing data from a case insensitive file system onto a potentially sensitive one and keeping compatibility after changes, not the other way around (that's already a solved problem).

    Leave a comment:


  • oiaohm
    replied
    Originally posted by Orphis View Post
    Another interesting usecase is cross-compiling Windows programs using Clang.
    Most SDKs targeting Windows development have issues with casing. They can partly be worked around using a VFS in Clang itself, but you will end up having issues when linking (linker doesn't support the VFS).
    That inside Clang suffer from the same screw up as wine and samba suffer from what todo in the case of Text.h and tEXT.h files in fact existing and the #include "text.h". So this is another half fix that does not fix it all.

    Casefold file system fixs the linker and the compiler and prevents end up in the unsolvable conflict.

    Leave a comment:


  • rene
    replied
    Originally posted by starshipeleven View Post
    don't forget to smash that like button and subscribe, click also the bell icon to be notified of any new content.
    thanks!

    Leave a comment:


  • Orphis
    replied
    Another interesting usecase is cross-compiling Windows programs using Clang.
    Most SDKs targeting Windows development have issues with casing. They can partly be worked around using a VFS in Clang itself, but you will end up having issues when linking (linker doesn't support the VFS).

    Leave a comment:


  • starshipeleven
    replied
    Originally posted by rene View Post
    Oh, wait, did you say micro kernel? ;-) https://www.youtube.com/watch?v=g85yri1kfJo
    don't forget to smash that like button and subscribe, click also the bell icon to be notified of any new content.

    Leave a comment:

Working...
X