Announcement

Collapse
No announcement yet.

NVMe ZNS Support Coming To Linux 5.9

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • oiaohm
    replied
    Originally posted by Azrael5 View Post
    Is ZNS a feature to insert into any kind of file system?
    That the catch. Not all file system design are suitable to make zoned device effective.

    Leave a comment:


  • Azrael5
    replied
    Is ZNS a feature to insert into any kind of file system?

    Leave a comment:


  • oiaohm
    replied
    Originally posted by make_adobe_on_Linux! View Post
    oiaohm Good info... I wonder if there are any other alternatives for ZFS to catch up. I don't know much about why ZNS is better - but I guess it is needed on top of TRIM. You read the ZFS dev mailing list or something? I wish more stuff like this was discussed on forums, but it seems most devs like the mailing list style discussion - but I always find it tedious.
    There is no easy way to catch up. ZNS is zoned based storage like SMR in harddrives. Low level file system operation changes required these take lots of work and lots of time to validate you have done them right. This is a case that you should be seeing changes appearing in the file system update change logs saying X changes done to prep for either zoned based storage devices or SMR drives and ZFS at this stage has not started doing the ground work.

    Biggest reasons for ZNS route is improving the wear pattern on SSD devices to extend their operational life as well as improving performance by reducing stall events. It also comes with cost savings and reduction in SSD controller complexity requirements. Basically who does not want cheaper SSD that perform well of course ZNS comes with a trade off that your OS file system and block device system need to be smarter.

    Originally posted by markg85 View Post
    Oh, now it hits me. The ZNS stuff is a delicate dance between hardware changes and filesystems. Thus far i thought the gist of it was the driver for those SSD's would "just" expose some more functionality that filesystems "could" use. But from what you've told it looks to be a combination of must-haves. In other words, a ZNS enabled device won't work on a filesystem that doesn't support it.
    We will not know until vendors start releasing ZNS hardware.

    SMR drives on the market come in 3 forms.
    1) Device managed as in pretend to be old style drives hide the different in the controller.
    2) Host aware this is a device that can pretend to be a old style drive also take instructions to run SMR propery.
    3) Host managed this will not work unless the file system/block system support it in the OS.

    ZNS when you look at it. All existing SSD are operating in device managed so device managed is already on the market.
    We have dramless SSD on the market. https://journals.plos.org/plosone/ar...l.pone.0229645 and we will be getting the HMB form that uses host memory instead of DRAM chips in the SSD device.

    So there are 3 existing forms of SSD in the device managed.
    SSD with DRAM in the device.
    SSD without DRAM in the device and attempts to make do and performs really badly.
    SSD without DRAM using HMB this does not in fact catch up to SSD with DRAM in perform in a lot of cases.

    This means there can be 3 forms in the host aware if vendors decide to make them for SSD.

    Now a ZNS based device will most likely to be DRAM less as removing DRAM saves a lot of cost. Host Aware is a option for those designing the ZNS controller but it means more silicon in the controller so more cost. So host managed ZNS will also be on the market as the cheapest SSD option.

    Linux using dm-zoned can technically put any file system on a zoned storage device be it SMR/ZNS but technically being able todo it does not mean it will perform well. If you care about performance on ZNS or SMR drives you really need the file system to support zoned storage properly so you get the performance on the table.

    Leave a comment:


  • markg85
    replied
    Oh, now it hits me. The ZNS stuff is a delicate dance between hardware changes and filesystems. Thus far i thought the gist of it was the driver for those SSD's would "just" expose some more functionality that filesystems "could" use. But from what you've told it looks to be a combination of must-haves. In other words, a ZNS enabled device won't work on a filesystem that doesn't support it.

    Leave a comment:


  • make_adobe_on_Linux!
    replied
    oiaohm Good info... I wonder if there are any other alternatives for ZFS to catch up. I don't know much about why ZNS is better - but I guess it is needed on top of TRIM. You read the ZFS dev mailing list or something? I wish more stuff like this was discussed on forums, but it seems most devs like the mailing list style discussion - but I always find it tedious.

    Leave a comment:


  • oiaohm
    replied
    Originally posted by make_adobe_on_Linux! View Post
    Any idea what it'll take for ZFS to support it?
    The reality is a hard one. Current ZFS developers are not going to get early access to ZNS hardware. They are going to have to wait for ZNS hardware to be on the open market or change their licensing to something hardware vendors agree with. CDDL is not a good license choice.

    Also thinking ZFS developer were given heads up about SMR drives
    https://openzfs.org/w/images/2/2a/Ho...im_Feldman.pdf
    Yes this is 2014 and ZFS currently does not have a roadmap even to support SMR yet. Same changes to support SMR drives has to happen to support ZNS drives.\

    Remember we are over 6 years in on this zoned storage problem. ZFS has not started the work. File systems like XFS started 5 years ago working on fixing up items for zoned storage and are not ready yet. Yes if ZFS developers start now it could be over a decade before they have something ready.

    Originally posted by make_adobe_on_Linux! View Post
    Will EXT4 et al need to re-create their volumes in order to use ZNS - or will it work with existing volumes so long as the kernel supports it?
    And

    Originally posted by markg85 View Post
    Hi @oiaohm, thank you for that elaborate explanation!
    I think it makes a bit more sense to me now.

    Note for the "developer" point of view, i meant "just" a developer. Specifically not a filesystem dev
    But it looks like we, developers, don't have to care about this at all.

    Lastly, would this zone stuff be auto-enabled on - say - ext4 once all pieces are in place? Or is this going to be something that's off by default where users (or distributions) can opt-in to enable it?
    Both of you have the same question over ext4. Ext4 is fairly much screwed on SMR and ZNS going forwards.
    https://zonedstorage.io/linux/fs/
    However, support for host managed zoned block devices is not provided as some fundamental ext4 design aspects cannot be easily changed to match host managed device constraints.
    Yes Ext4 will be option on ZNS and SMR drives but ext4 will be sitting on top of dm-zoned so you still have lot of the problems of device managed SMR or existing SSD plus now extra CPU overhead as well. Of course there is still work altering ext4 operations to play better with dm-zoned being under it.

    Its the F2FS, BTRFS and XFS that are working on full Zoned Block Devices support. Mind you in place conversion from ext2-4 to BTRFS is possible explains lot of interest in btrfs performance.

    Please note XFS is taking a while to get perfectly done for zoned block devices as well as it requiring on disc changes.

    Its also not exactly auto enabled if you have a pure host managed SMR HDD or ZNS SSD you will not be able to use those devices unless you have support for the zoned storage tech. Host aware versions of SMR HDD and ZNS SSD it might be possible to auto enable after the fact. We are not sure if host aware ZNS SSD will even exist. ZNS could be a pure choice between device managed SSD as we are use to and host managed ZNS SSD with nothing in the middle. Of course the ZNS SSD could be cheap and perform well if you have software that supports it due to not requiring the DRAM device managed SSD does.

    Distributions deciding not to support ZNS will also not be supporting SMR HDD properly either. Yes that is another party I did not mention Distributions will have to update their install process to support Zoned blocked devices. Us installing and setting up drives will have to get use to a few slightly different processes.

    Leave a comment:


  • markg85
    replied
    Originally posted by oiaohm View Post

    If you are a file system developer yes you have to change the way you do things. End users formatting ZNS like SMR drives may have to do it with particular flags so everything works right. Users above really should not notice any major operational difference.



    Zones are more like sectors. So a in your example a multi gigabyte file could be spread over multi zones by the file system. What zones exposes from a SSD is what is called banks. Reality of a SSD is that you append data to a bank but to delete anything in bank you have to delete the complete lot this is very much the same as SMR. Yes in theory a bank in a SSD could be 1G in size reality is 4 to 256 MB is more likely but I will stick to your 1G banks/zones you don't see this size in production now but it would be possible future..

    Currently that SSD is made up of banks is hidden by the controller this is why a SSD drive getting a lot of fragment data can stall out. Different problem to what you are thinking. You were thinking 1 zone fully full things are stuffed that is not the problem. The real problem that current SSD is like where the 1000 1G zones are part full off stuff at this point the drive has to stall out while it reorders things.


    If the file system can know about the zones/banks in the device it can be in charge of what information is stacked into what zones/banks. This can remove guess work. Current SSD are really guess what data should be with what other data..

    https://getprostorage.com/blog/ssd-g...llection-trim/

    Yes you find instructions to use Trim command on SSD and this is stack blocks as tight as possible with each other in a dumb way. Lets say you are downloading a file when you perform a trim. Current SSD controller could take a small file tempory and shove it in the same bank as the large file you are downloading that you are going to store for quite some time of course that temporary file in time gets deleted and now the next trim to compact has to move copy that large file contents from where it is to another bank so it can unify free space again due to the strict rule you can only delete complete banks. At the file system level knowing about zones/banks a goof like this could have been avoided there are a few advantages for file system big one fact knows what files are in fact open and are being added as well as take highly educated guess what is a temporary file so unify those into zones/banks compared to the SSD controller in the SSD only know that it has written or unwritten areas. .

    If you have not worked out smart allocation of zones/banks in SSD could reduce number of writes due to not having to move data around as often. Remember SSD flash has limited write cycles.

    Please note different harddrive makers are putting out harddrives for the desktop space that are SMR drives but the device managed stuff that basically pull the same trick as SSD are doing of hiding the fact they have to delete large zones with increased ram and processing by the controller. Remember both the existing SSD and the harddrive case SMR device managed case have write operations happen without the OS knowledge this makes it possible for data integrity issues to be created behind file system with late detection that it happened.

    Please note supporting zones base drives has been under way with the Linux kernel for over half a decade now so not magically coming working over night with Linux Kernel we would be in many decades of work by individuals to get to this point. So not magically starting working. Lots and lots of work done so it can just work. Please note just work the finer details to make zoned based stuff work perfectly still has to be completed.
    Hi @oiaohm, thank you for that elaborate explanation!
    I think it makes a bit more sense to me now.

    Note for the "developer" point of view, i meant "just" a developer. Specifically not a filesystem dev
    But it looks like we, developers, don't have to care about this at all.

    Lastly, would this zone stuff be auto-enabled on - say - ext4 once all pieces are in place? Or is this going to be something that's off by default where users (or distributions) can opt-in to enable it?

    Leave a comment:


  • make_adobe_on_Linux!
    replied
    Originally posted by oiaohm View Post

    Really that video is targeted at the server market. But the largest users of SSD without DDRAM are desktop users. Yes removing the DDRAM from the SSD could be like a 10 USD saving off the built cost of the machines over a million units of something that really adds up and if it not giving a performance advantage any more its a direct saving makers of devices will be able todo. Currently removing DDRAM off SSD without ZNS can result in short SSD life span and unpredictable performance by attempting to make up for the missing DDRAM this is not great for consumer complaints and warranty claims..

    Big thing about ZNS is reduce dram requirement on SSD to perform well. This means a ZNS version of a DDRAM less SSD will have less issues and be a lot closer to the versions ZNS drives with DDRAM we will not know how close until we see ZNS dram-less drives in production(yes it possible that they will be that close that there will be no functional difference in lifespan). Problem here is operating systems have to catch up and I don't see Microsoft being that quick. Android, Chromebooks and Linux workstations could be taking advantage of ZNS drives fairly quickly.

    Do note that video listed working on support for f2fs, ext4 and Btrfs with Xfs down the track. This is another case of ZFS left out in the cold like they were with SMR.

    Zone based storage devices be it SSD or harddrives is the future tech be it SMR or ZNS it will come to the desktop at some point in volume.
    Any idea what it'll take for ZFS to support it? Will EXT4 et al need to re-create their volumes in order to use ZNS - or will it work with existing volumes so long as the kernel supports it?

    Leave a comment:


  • oiaohm
    replied
    Originally posted by markg85 View Post
    @KesZerda and @oiaohm does that mean that i - as end user - have to do exactly 0 to get ZNS and the benefits it brings in time with Linux?
    Also, from a developer point of view, do i have to do anything?
    If you are a file system developer yes you have to change the way you do things. End users formatting ZNS like SMR drives may have to do it with particular flags so everything works right. Users above really should not notice any major operational difference.

    Originally posted by markg85 View Post
    I find it "magical" if it would just started working "out of the blue" at some day, which i can't imagine to be the case. Also, from what i get zones are reserved areas. Lets assume, for a 1TB NVMe, that you have 1000 zones (probably more but this makes the example easier). Each zone would be 1GB. Now what if you used up all data in 1 zone where it can't grow as the other zones are there taking up all other "space" (but being mostly empty). I guess what i'm asking is if i would get "out of space" errors when one zone is full but the NVMe in it's entirety still has many gigabytes free in zones that aren't fully utilized yet.
    Zones are more like sectors. So a in your example a multi gigabyte file could be spread over multi zones by the file system. What zones exposes from a SSD is what is called banks. Reality of a SSD is that you append data to a bank but to delete anything in bank you have to delete the complete lot this is very much the same as SMR. Yes in theory a bank in a SSD could be 1G in size reality is 4 to 256 MB is more likely but I will stick to your 1G banks/zones you don't see this size in production now but it would be possible future..

    Currently that SSD is made up of banks is hidden by the controller this is why a SSD drive getting a lot of fragment data can stall out. Different problem to what you are thinking. You were thinking 1 zone fully full things are stuffed that is not the problem. The real problem that current SSD is like where the 1000 1G zones are part full off stuff at this point the drive has to stall out while it reorders things.


    If the file system can know about the zones/banks in the device it can be in charge of what information is stacked into what zones/banks. This can remove guess work. Current SSD are really guess what data should be with what other data..

    https://getprostorage.com/blog/ssd-g...llection-trim/

    Yes you find instructions to use Trim command on SSD and this is stack blocks as tight as possible with each other in a dumb way. Lets say you are downloading a file when you perform a trim. Current SSD controller could take a small file tempory and shove it in the same bank as the large file you are downloading that you are going to store for quite some time of course that temporary file in time gets deleted and now the next trim to compact has to move copy that large file contents from where it is to another bank so it can unify free space again due to the strict rule you can only delete complete banks. At the file system level knowing about zones/banks a goof like this could have been avoided there are a few advantages for file system big one fact knows what files are in fact open and are being added as well as take highly educated guess what is a temporary file so unify those into zones/banks compared to the SSD controller in the SSD only know that it has written or unwritten areas. .

    If you have not worked out smart allocation of zones/banks in SSD could reduce number of writes due to not having to move data around as often. Remember SSD flash has limited write cycles.

    Please note different harddrive makers are putting out harddrives for the desktop space that are SMR drives but the device managed stuff that basically pull the same trick as SSD are doing of hiding the fact they have to delete large zones with increased ram and processing by the controller. Remember both the existing SSD and the harddrive case SMR device managed case have write operations happen without the OS knowledge this makes it possible for data integrity issues to be created behind file system with late detection that it happened.

    Please note supporting zones base drives has been under way with the Linux kernel for over half a decade now so not magically coming working over night with Linux Kernel we would be in many decades of work by individuals to get to this point. So not magically starting working. Lots and lots of work done so it can just work. Please note just work the finer details to make zoned based stuff work perfectly still has to be completed.

    Leave a comment:


  • markg85
    replied
    @KesZerda and @oiaohm does that mean that i - as end user - have to do exactly 0 to get ZNS and the benefits it brings in time with Linux?
    Also, from a developer point of view, do i have to do anything?

    I find it "magical" if it would just started working "out of the blue" at some day, which i can't imagine to be the case. Also, from what i get zones are reserved areas. Lets assume, for a 1TB NVMe, that you have 1000 zones (probably more but this makes the example easier). Each zone would be 1GB. Now what if you used up all data in 1 zone where it can't grow as the other zones are there taking up all other "space" (but being mostly empty). I guess what i'm asking is if i would get "out of space" errors when one zone is full but the NVMe in it's entirety still has many gigabytes free in zones that aren't fully utilized yet.

    Leave a comment:

Working...
X