Announcement

Collapse
No announcement yet.

Intel Optimization Around Batched TLB Flushing For Folios Looks Great

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Intel Optimization Around Batched TLB Flushing For Folios Looks Great

    Phoronix: Intel Optimization Around Batched TLB Flushing For Folios Looks Great

    A patch worked on by an Intel engineer for batched TLB flushing for page migration with folios is showing some promising results and currently working its way to the mainline kernel...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Throughput is one thing. Batching is a classic for increasing throughput.
    But usually, you sacrifice latency or have latency spikes.
    Is this something that RT folks will hate?
    Also (clueless warning), folios seem to be always enabled?
    Ie, it's not something you can opt out from?

    Comment


    • #3
      Originally posted by milkylainen View Post
      Throughput is one thing. Batching is a classic for increasing throughput.
      But usually, you sacrifice latency or have latency spikes.
      Is this something that RT folks will hate?
      Sounds like a sensible concern.

      Originally posted by milkylainen View Post
      Also (clueless warning), folios seem to be always enabled?
      Ie, it's not something you can opt out from?
      AFAIK there's no downside to folios. Batching OTOH is a compromise, but one does not imply the other.

      Comment


      • #4
        Can someone summarize what a folio is in this situation? Google only returns this article and the patch refered to in it. Neither tells me what it is. Thank you.

        Comment


        • #5
          Originally posted by willmore View Post
          Can someone summarize what a folio is in this situation? Google only returns this article and the patch refered to in it. Neither tells me what it is. Thank you.
          First result on ddg for "linux folio": https://lwn.net/Articles/849538/

          Comment


          • #6
            Originally posted by xnor View Post

            First result on ddg for "linux folio": https://lwn.net/Articles/849538/
            I says it saves RAM how many bits/bytes do you save from a 4k page to a 2m page and how much actually gets saved if those 4k pages get stacked into one folio well i guess it depends how many pages are in one continued chunk.

            And at the CPU cycle level how much is saved/lost convertig a native 4k page to a set of folio pages ? I always thought that most things get stored in 4k chunks cause thats the native page size well atlast for x86 systems.

            Comment


            • #7
              Originally posted by erniv2 View Post
              I always thought that most things get stored in 4k chunks cause thats the native page size well atlast for x86 systems.
              The rest would be long to explain and I'm likely to get it wrong, but this part is simple: look up huge pages, x86 systems support them at the hardware level.

              Comment


              • #8
                Originally posted by sinepgib View Post
                Sounds like a sensible concern.
                AFAIK there's no downside to folios. Batching OTOH is a compromise, but one does not imply the other.
                Wouldn't folios push migrations as allocation gets tighter or more gritty?
                Afaiu, we're talking physically continuous pages?
                Maybe people already did higher order allocations, so this was just an unification in handling?

                Comment


                • #9
                  Originally posted by sinepgib View Post

                  The rest would be long to explain and I'm likely to get it wrong, but this part is simple: look up huge pages, x86 systems support them at the hardware level.
                  Yes i know that x86-64 cpus can do 4k and 2m pages, now if i assume every pages occupies a bit in the tlb 2048/4 you save 512bit 64byte or is this a brain fart ? for 64 bit if you go 64 bit it´s 8x8 so every address would need 8 byte so one pointer for a 4 k page is 8 byte and then that * 512 is 4096 byte i realy dont understand that bitmap stuff, and that does not explain how you fold lets say 20 4k pages into a folio and save ram there that would mean you need 160 bytes but you actually point it to one 8 byte header ? but where is the info to split those 20 pages after that so it cant be a 19 times compression it needs some info somewhere thats why i asked how ?

                  edit 20*8 is actually 160 byte that would mean 20 pointers vs 1 thats a significant reduction but how do you split the 20 pages after that ????

                  edit2 if i read my own post i c that from a 4k page to 2m page its probably better to use 8 byte for 2m then to use 8 byte for 4k you save 4088 bytes allmost a hole page
                  Last edited by erniv2; 20 January 2023, 10:15 PM.

                  Comment


                  • #10
                    Originally posted by milkylainen View Post
                    Wouldn't folios push migrations as allocation gets tighter or more gritty?
                    Afaiu, we're talking physically continuous pages?
                    Maybe people already did higher order allocations, so this was just an unification in handling?
                    I guess I'll have to re-read both folios and the batched flush in detail to be sure. But I would assume folios are just a representation of behavior, rather than a change in behavior itself, i.e. they represent continuous pages in places that already asked for continuous pages, not cause places to expect those.

                    Comment

                    Working...
                    X