Announcement

Collapse
No announcement yet.

Proposed Reflink Support Would Provide Big Space Savings For Wine

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #81
    Originally posted by oiaohm View Post

    https://blogs.oracle.com/solaris/pos...w-can-i-use-it

    No 38 and before ZFS from Oracle Solaris reflink works. This I linked to first release of Oracle Solaris 11.4 has reflink on ZFS.

    https://blogs.oracle.com/oracle-systems/post/dedupe-20 40 that you said it most likely is is not this. This is a new duplication data table.you don't need version 40 or even 38 of Oracle ZFS to use reflink.



    I did not say the patch to use it is simple. The reality is go though all the changes to oracle ZFS nothing was added for reflink at all.



    You can do this in Oracle Solaris 11.3 by putting each VM in its own ZFS file system or ZVOL, and using ZFS snapshot and ZFS clone to create new ones. In Oracle Solaris 11.4 Beta you can achieve the same thing using reflink(3c).

    The Solaris 11.4 Beta that support reflink is in fact supports before pool version 38.

    How oracle did it is understanding the book keeping. The trick here is the disc format supports copy on write that the key bit you need for reflink to work..

    Think about it you can do a hardlink on a fat file system yes a fat file system does not officially support hard links. Yes the fat file system one is just writing a directory entry. reflink is really just writing directory entry with the correct metadata that it already been deduplicated.

    So instead of looking how to do a reflink you are looking at how to write to ZFS that I have just created a file with blocks that are duplicates of this other file that the basic of a reflink.

    There is a difference between what the documentation of a disc format says can be done and what you can do when you understand the metadata of the file system. Yes OpenSolaris ZFS disk format does not document it support reflink neither does current day Oracle ZFS versions.

    ryao reflink is not a disc format file system feature when you get down to the brass tacks of it. . Reflink is something you can pull off when you can copy on write with a file system. Side effect of being able to do Data Block Sharing with copy on write. The reality here if you can perform deduplication at all you can make Reflink.

    The reality here on disc when you deduplicate two files that have absolutely the same contents this process makes a reflink even if you don't have reflink syscall. When you make a snapshot you copy everything de-duplicate to get back to reflinks.

    ryao the reality here ZFS has reflinks all over place being generated by snapshots, clone and deduplication. What miss is a syscall to straight up just make a reflink of a file without having do duplication run. The reality is you are wanting the result from reflink syscall to look the same as two files have have been deduplicated with each other.

    There is no need to in fact be able to tell that a file has been reflinked in the past. That a feature of the Oracle ZFS reflink implementation that you cannot tell what has been merged by de-duplication or was created by reflink. Due to being no difference the Oracle solaris reflink syscall is very backward compatible with older versions of ZFS.

    if you think that, then I look forward to your patch. The disk format does not support it and it needed to be extended to do that. The Solaris extension could have been ninja added to an earlier pool version than pool version 40 without being documented. However, it will not work on pool version 28 from OpenSolaris. The only way the userland tools from Solaris 11.4 would enable it to “work” there would be by doing a full data copy as a fallback.

    If you told me that you had a realistic dream and had mistakenly insisted that your dream was reality, I would believe that. That is the most charitable interpretation that I have of your insistence that the OpenSolaris ZFS code has reflink support without any hard evidence for it when if it really did, then hard evidence would be easy to get from the OpenSolaris source code. Your insistence that it has reflink support combined with your inability to produce the specific OpenSolaris source code that enables reflinks to be possible when the code itself has no such functionality is making me recall my college introduction to psychology class. If you really believe this without actual evidence from the OpenSolaris source code, seek a psychiatric evaluation.
    Last edited by ryao; 18 September 2022, 09:35 PM.

    Comment


    • #82
      Originally posted by ryao View Post
      if you think that, then I look forward to your patch. The disk format does not support it and it needed to be extended to do that. The Solaris extension could have been ninja added to an earlier pool version than pool version 40 without being documented. However, it will not work on pool version 28 from OpenSolaris. The only way the userland tools from Solaris 11.4 would enable it to “work” there would be by doing a full data copy as a fallback.
      I will give you I made error if you use early Solaris 11.4 beta on earlier it will work. Its uses what was in the post I gave of exploting snapshots and clone.

      No 11.4 reflink functionality works on 11.3 and before pools without updating their pool versions.

      The reality here the operating of ZFS is in fact making reflinks and has been making them.

      Since November 1st, 2009, when ZFS Deduplication was integrated into OpenSolaris (no link, genunix.org no longer exists), a lot has happened: We learned how it worked, people got to play with it, used it in production and it became part of the Oracle Sun Storage 7000 Unified Storage System …


      Yes this is back before pool version 28. What metadata are you generated when you deduplicate a file. The reality here lets say you put two files of absolutely the same contents with two different names and it deduplicates what in file system system metadata have you just made. That right a basic reflink.

      Basically what you are needing the reflink syscall todo is basically say this new file is 100 percent copy of this other file you don't need to process the blocks to work out its a duplicate and produce the metadata on disk as if it was a duplicate file that has been deduplicated.

      ryao the reality here is you cannot have a file system that has deduplication support with copy on write support that does not have reflink. Might not be called reflink.

      ryao sorry every time Oracle has added a feature to the ZFS file format they have documented what it for and what uses it. They never added anything to ZFS for reflink it does need it. Yes OpenSolaris code has the reflink code it used in the clone/snapshot/deduplication its not called reflink and it not explosed as syscall. 11.4 Solaris takes the existing functionality to internally de-duplicate files and gives it a syscall reflink they did not add functionality to the file system. Why add functionality if the functionality is already there.

      Ryao really you need to answer what is the difference between reflink and a deduplicated file on ZFS? Big part of the question is does there need to be a difference. Oracle ZFS believed not their reflink just makes the same metadata on disc as if the file was deduplicated and this is functionally equal to how a reflink should behave.

      Yes sending same file to older version of ZFS with deduplicaiton enabled the result is it deduplicates and produces the same result on disc as if you used reflink with a lot more CPU overhead.

      Originally posted by ryao View Post
      The only way the userland tools from Solaris 11.4 would enable it to “work” there would be by doing a full data copy as a fallback.
      When the 11.4 solaris reflink is putting the same metadata on disc as if early version had deduplicated full data copy other than the CPU overhead of detecting the duplication on disc produces the same result.

      ryao full data copy then perform in file system deduplication with ZFS is just the very long way to produce Oracle Solaris 11.4 ZFS reflink metadata on disc on older versions of Solaris.

      Note you said the only way. There was only one way it could happen. That one way points you do the deduplication system. Ryao you are not thinking that the deduplication system is making reflink. You are thinking that reflink is some special thing that the ZFS file system does not have. You are not look to see if it happens to have it. Deduplicaiton system of ZFS is making something that is a reflink. The trick is working out how to go from 1 file on disc to 2 file on discs without having to perform the data compares to get the metadata generated.

      Basically take pool version 28 put two identical files on it de-duplicate it that gives you the on disc metadata you are after from reflink. The trick is working how to put 1 file on the disc call reflink syscall and generate same metadata. This way you don't need disc format changes.

      ryao the reality here with ZFS its been possible to make reflinks for ages. Just you have to use a long very CPU expensive way to get a reflink generated. 11.4 short cuts that.

      Comment


      • #83
        Originally posted by oiaohm View Post

        I will give you I made error if you use early Solaris 11.4 beta on earlier it will work. Its uses what was in the post I gave of exploting snapshots and clone.

        No 11.4 reflink functionality works on 11.3 and before pools without updating their pool versions.

        The reality here the operating of ZFS is in fact making reflinks and has been making them.

        https://constantin.glez.de/2010/03/1...you-need-know/

        Yes this is back before pool version 28. What metadata are you generated when you deduplicate a file. The reality here lets say you put two files of absolutely the same contents with two different names and it deduplicates what in file system system metadata have you just made. That right a basic reflink.
        ZFS data deduplication is not a reflink. When you write an aligned record in file A, and the same aligned record in file B, on a dataset with deduplication turned on, it will be deduplicated. Reflinks operate at file creation, not after file creation. Furthermore, a reflink does not receive a complete copy of the file data from userland and deduplicate it. Instead, it performs operations on metadata to reference the existing data. There are subtle bugs in edge cases that would be introduced if you tried to abuse this to implement reflinks. It is just not possible to do in a sane way.

        Originally posted by oiaohm View Post
        Basically what you are needing the reflink syscall todo is basically say this new file is 100 percent copy of this other file you don't need to process the blocks to work out its a duplicate and produce the metadata on disk as if it was a duplicate file that has been deduplicated.
        The only way for this to be safe would be if a dataset is created with `dedup=on` and `dedup=off` is *NEVER* set on it. Then it would be trivial to implement reflinks via the DDT. However, we support setting `dedup=off`. The moment you do that, any writes will write blocks that have pointers that do not have the dedup bit set. This means that suddenly, the mechanism to implement reflinks does not work anymore (unless you do data copies, which is not what reflinks advertise), since if you reference the existing data in the DDT when that bit is not set (and it must be set in *EVERY* involved block pointer), undefined behavior can happen later, which means data corruption. This is on top of deduplicaton being slow on decent sized datasets.

        Originally posted by oiaohm View Post
        ryao the reality here is you cannot have a file system that has deduplication support with copy on write support that does not have reflink. Might not be called reflink.
        You likely will not find any professional filesystem developer who agrees with that statement.

        Originally posted by oiaohm View Post
        ryao sorry every time Oracle has added a feature to the ZFS file format they have documented what it for and what uses it. They never added anything to ZFS for reflink it does need it. Yes OpenSolaris code has the reflink code it used in the clone/snapshot/deduplication its not called reflink and it not explosed as syscall. 11.4 Solaris takes the existing functionality to internally de-duplicate files and gives it a syscall reflink they did not add functionality to the file system. Why add functionality if the functionality is already there.
        ZFS data deduplication cannot be used to write deduplicated data unless the data being deduplicated was already written with it on. If you modify the ZFS source code to ignore that bit and start implementing reflinks via that, you create a problem when the pool is moved to an older system. In specific, data corruption could be triggered from entries in the DDT not being decremented on frees.

        Originally posted by oiaohm View Post
        Ryao really you need to answer what is the difference between reflink and a deduplicated file on ZFS? Big part of the question is does there need to be a difference. Oracle ZFS believed not their reflink just makes the same metadata on disc as if the file was deduplicated and this is functionally equal to how a reflink should behave.
        They needed to modify the disk format to do it safely. Otherwise, they would have introduced potential data corruption bugs. The only other option would be to use block pointer rewrite to make it work, but that is a unicorn that would perform terribly.

        Originally posted by oiaohm View Post
        When the 11.4 solaris reflink is putting the same metadata on disc as if early version had deduplicated full data copy other than the CPU overhead of detecting the duplication on disc produces the same result.
        Then Solaris 11.4 either has achieved BPR or it has introduced horrible data corruption bugs into its proprietary version of ZFS. It likely did a disk format change instead.

        Originally posted by oiaohm View Post
        Basically take pool version 28 put two identical files on it de-duplicate it that gives you the on disc metadata you are after from reflink.
        That is not a reflink operation, but a full data copy post deduplication.

        Originally posted by oiaohm View Post
        The trick is working how to put 1 file on the disc call reflink syscall and generate same metadata. This way you don't need disc format changes.

        The disk format changes are necessary. Either you need BPR to rewrite history so that the referenced file has the dedup bit set on every block pointer, which is slow, or you introduce horrible data corruption bugs into ZFS, to do reflinks the way you think that they can be done.

        Originally posted by oiaohm View Post
        ryao the reality here with ZFS its been possible to make reflinks for ages. Just you have to use a long very CPU expensive way to get a reflink generated. 11.4 short cuts that.
        Those are not reflinks. Other filesystem developers would not consider them to be reflinks either.

        That said, if you think those are reflinks, then it would mean that you could get "reflinks" in any filesystem that aligns its data at 4KB offsets by putting them on a zvol with volblocksize=4K and dedup=on. Suddenly, ext2, ext3 and ext4 all support reflinks. Performance would be terrible and the space savings would not be given to the filesystem on top until you expand it manually, but at least the filesystem supports "reflinks".
        Last edited by ryao; 19 September 2022, 12:56 AM.

        Comment


        • #84
          Originally posted by ryao View Post
          ZFS data deduplication is not a reflink. When you write an aligned record in file A, and the same aligned record in file B, on a dataset with deduplication turned on, it will be deduplicated. Reflinks operate at file creation, not after file creation. Furthermore, a reflink does not receive a complete copy of the file data from userland and deduplicate it. Instead, it performs operations on metadata to reference the existing data. There are subtle bugs in edge cases that would be introduced if you tried to abuse this to implement reflinks. It is just not possible to do in a sane way.
          This is a stack of mistakes. reflink syscall you don't have a copy of the data at all.
          reflink , reflinkat - fast copy source file to destination file The reflink() function creates the file named by path2 with the contents of the file named by path1...

          Code:
          int reflink(const char *path1, const char *path2, int preserve);
          int reflinkat(int fd1, const char *path1, int fd2, const char *path2, int preserve, int flags);​
          That true. reflink create just straight up presumes that you have perfect match all the way along the file.

          Originally posted by ryao View Post
          The only way for this to be safe would be if a dataset is created with `dedup=on` and `dedup=off` is *NEVER* set on it.
          That is false. dedup can be turned off on a dataset. So anything that has been created with dedup=on copy on write must still function with dedup=off.

          The deduplication process also has to write metadata reference existing data to replace the data that has been found to be duplicate right.

          reflinks is a metadata write. So large part of the de-duplication process. The reality here is if reflinks just create the same metadata as what dedup=on will create that metadata has to be able to be processed safely to allow dedup=off to be applied in future anyhow.

          Both files must be in the same ZFS pool.
          Do note the oracle Solaris restriction. Yes reflink syscall under Linux and Solaris is allowed to fail.


          Originally posted by ryao View Post
          ​However, we support setting `dedup=off`. The moment you do that, any writes will write blocks that have pointers that do not have the dedup bit set.
          This is following reflink behavour in other file systems where you have to update flag to say that the data has been reflinked.

          I wrote that the de-duplication end up generating the same metadata or close enough.

          Originally posted by ryao View Post
          ​​Either you need BPR to rewrite history so that the referenced file has the dedup bit set on every block pointer, which is slow, or you introduce horrible data corruption bugs into ZFS, to do reflinks the way you think that they can be done.
          Is this bit a problem. If you do a snapshot and clone on ZFS you don't need the dedup bit set to end up close to the same double up of blocks.

          Originally posted by ryao View Post
          ​Basically take pool version 28 put two identical files on it de-duplicate it that gives you the on disc metadata you are after from reflink.
          Do this then delete the dedup bit of all blocks. You still have reflink and it still has copy on write behavour like a reflink should.

          Think about it when you have stacks of snapshots and the like there is a counter on every block that counts how many users of this block are are there.


          Each block in the pool has a reference counter which keeps track of how many snapshots, clones, datasets, or volumes make use of that block.
          Notice that it called reflink. Every block in pool has reference counter already. Dedup=on process does increase the reference counter on blocks for every duplicate file. This is performing reflink. Do you need dedup bit set the answer is no you don't. The important thing with reflink is 1) then blocks reference counter increases to correct match the number of uses 2) that copy on write will happen so this is not hardlink behavior.

          ryao the one solid tell you with orcale Solaris 11.4 if you give ZFS pool version 28 formated disk and tell it to perform reflink it does not increase the version and it will be still compatible with opensolaris. Oracle pulled this off reflink without adding any features to ZFS.

          Yes doing what I said does with pool version 28 generate something very close to what Oracle Solaris 11.4 is. So what the Oracle developers have done is smarter than you guys.

          What in the section where the reference counter is stored that has to be updated when new item uses it.

          ryao you keep on presuming Oracle changed the disc format. Everything I can see they did not. Oracle made reflink work with what already existed. All I can think is we are missing something. It possible some extension in the open source implementation of ZFS is in fact getting in your way. This might be the other way over Oracle did not extend because they did not do some particular extension.

          Oracle solaris does support turning dedup on and off but that made me think.

          Ryao how do you handle the case that dedup was off write file 1 and now you turn dedup on then write file 2 that identical file 1 does this result in deduplication or not. Oracle Solaris 11.2 and newer that deduplicates. This could be a fundamental driver difference not the format. if this does not work in the open source ZFS driver this could explain the why you are going to format change.






          Comment


          • #85
            Originally posted by oiaohm View Post
            That is false. dedup can be turned off on a dataset. So anything that has been created with dedup=on copy on write must still function with dedup=off.​
            That would be the entire reason why you cannot implement reflinks via the DDT.

            That said, I think you are just going to ignore whatever you are told and proceed to write verbose amounts of text are a waste of my time to read, so I am not going to engage you on this anymore.
            Last edited by ryao; 19 September 2022, 03:40 AM.

            Comment


            • #86
              Originally posted by ryao View Post
              That would be the entire reason why you cannot implement reflinks via the DDT.

              That said, I think you are just going to ignore whatever you are told and proceed to write verbose amounts of text are a waste of my time to read, so I am not going to engage you on this anymore.
              There is something fun about the reflink syscall. Failure is allowed. Setting a rule that you can only reflink against files that were created with Dedup=on is allowed.

              DDT could be used to create reflinks. Reflinks don't have to be created in the file system by all the same methods. The important thing is having the behavior of a reflink with the copy on write bit.

              Originally posted by ryao View Post
              Ryao how do you handle the case that dedup was off write file 1 and now you turn dedup on then write file 2 that identical file 1 does this result in deduplication or not? Oracle Solaris 11.2 and newer that deduplicates. This could be a fundamental driver difference not the format. if this does not work in the open source ZFS driver this could explain the why you are going to format change.
              I did ask a question here you did not answer. If there is change in the ZFS I think it well prior to the 40 yes I think the core change in the driver is before 11.4 Oracle Solaris. The dedup behavior change is in 11.2. The Sequential Resilvering pool version 35 lines up for where I see a difference in behavior..

              Last edited by oiaohm; 19 September 2022, 06:13 AM.

              Comment

              Working...
              X