Announcement

**tildearrow** · 18 August 2017, 02:21 PM

Does huge page increase memory usage?

**Jedibeeftrix** · 18 August 2017, 02:32 PM

The code drop for DC is an enormous untertaking, and not in any way equivalent to backporting support for RX480/580 to the 4.4 kernel so people could run those GPU's in opensuse 42.3.

But.... do you (bridgman?) think that there is enough preparatory work happening for 4.14 that it is feasible that a full Vega open-source stack might be backported to 4.14, even if it initially arrives only with 4.15?

It seems like a valuable question to ask, as both Ubuntu and Suse have new LTS releases arriving in April 2018 which will be using kernel 4.14 for the [long] term, i.e. probably four years!

**Zan Lynx** · 18 August 2017, 02:32 PM

Originally posted by tildearrow View Post

Does huge page increase memory usage?

It should decrease it. Using 2MB pages should remove the need for 4K TLB structures under that 2MB page.

Although, if you only needed 16KB then a 2MB page would waste 2032KB.

**agd5f** · 18 August 2017, 03:09 PM

This is mainly to take advantage of large pages by saving a level in the page walker for GPUVM if the page is large. Additionally the GPUVM hardware in vega10 supports 4 levels instead of 2 (previous asics) so the page tables should take less memory overall.

**agd5f** · 18 August 2017, 03:12 PM

Originally posted by Jedibeeftrix View Post

The code drop for DC is an enormous untertaking, and not in any way equivalent to backporting support for RX480/580 to the 4.4 kernel so people could run those GPU's in opensuse 42.3.

But.... do you (bridgeman?) think that there is enough preparatory work happening for 4.14 that it is feasible that a full Vega open-source stack might be backported to 4.14, even if it initially arrives only with 4.15?

It seems like a valuable question to ask, as both Ubuntu and Suse have new LTS releases arriving in April 2018 which will be using kernel 4.14 for the [long] term, i.e. probably four years!

If you want to run DC on current drm-next, you can use this branch:

~agd5f/linux - Unnamed repository; edit this file 'description' to name the repository.

https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next

For packaged releases, we support enterprise distros via dkms so they will be supported.

**jrch2k8** · 18 August 2017, 03:37 PM

Originally posted by Zan Lynx View Post

It should decrease it. Using 2MB pages should remove the need for 4K TLB structures under that 2MB page.

Although, if you only needed 16KB then a 2MB page would waste 2032KB.

paging in memory is different to block sectors in hard drives. Sector sizes refers as the minimum amount of data that can be stored and read back, so in the worst cases if you wanna store 1 bit in a 4k sector you need to fill 4095 bits or merge several other small fragments of data until you can fill that sector and index it somewhere else so you can recompute the data to avoid wasting space but generally there is always some waste.

On the other hand memory paging should only produce negligible waste since is not a minimum amount but an index structure, for example if you have 4GB RAM and you write a byte somewhere in there when you wanna read it back you have to scan all 4GB until you get a match, which will be horrible for latency. Of course you can reserve some RAM space and use it as a map instead of scanning the whole RAM just ask the map but you have no way to be sure how far exactly is and you must be extremely careful with writes later on, hence latency is a lot better now but it can randomly suffer a lot, so what if you use that map but also divide the memory on a grid of X size? well, in that case you have a more predictable a lot easier system to write and read data safely with predictable latency but then you have 1 issue left, if the grid size is too small you get into a management hell and the indexing speed suffers, so you have to find a more optimal size where you can still fill data fast enough but is not numerous enough to slow down indexed seeks either<--- this value is referred as Page Size

As you can imagine you are not wasting memory because if you write 16k into a 2MB page simply because is an imaginary division for management that don't imply exclusive use(aka you can only write 2mb or those meager 16k and is closed afterwards) but shared use(the index system can keep writing in that page freely until is full and move to the next) hence when you need those 16k again the index knows where in that page is located and give it to you regardless of how much others data is on it, the RAM physically don't have an equivalent of a sector size or any other minimal size beyond 1bit that can be manipulated by software, so you still have 4GB. Additionally bigger pages means equal or more available RAM overall because the indexing table could be actually smaller(depending how efficient the management algorithm is)

Disclaimer, I may be absolutely wrong here or have some confusion or be absolutely right, all I know is this is how I understand it work on a high level overview, if someone knows better please share

**F.Ultra** · 18 August 2017, 04:21 PM

Originally posted by jrch2k8 View Post

paging in memory is different to block sectors in hard drives. Sector sizes refers as the minimum amount of data that can be stored and read back, so in the worst cases if you wanna store 1 bit in a 4k sector you need to fill 4095 bits or merge several other small fragments of data until you can fill that sector and index it somewhere else so you can recompute the data to avoid wasting space but generally there is always some waste.

On the other hand memory paging should only produce negligible waste since is not a minimum amount but an index structure, for example if you have 4GB RAM and you write a byte somewhere in there when you wanna read it back you have to scan all 4GB until you get a match, which will be horrible for latency. Of course you can reserve some RAM space and use it as a map instead of scanning the whole RAM just ask the map but you have no way to be sure how far exactly is and you must be extremely careful with writes later on, hence latency is a lot better now but it can randomly suffer a lot, so what if you use that map but also divide the memory on a grid of X size? well, in that case you have a more predictable a lot easier system to write and read data safely with predictable latency but then you have 1 issue left, if the grid size is too small you get into a management hell and the indexing speed suffers, so you have to find a more optimal size where you can still fill data fast enough but is not numerous enough to slow down indexed seeks either<--- this value is referred as Page Size

As you can imagine you are not wasting memory because if you write 16k into a 2MB page simply because is an imaginary division for management that don't imply exclusive use(aka you can only write 2mb or those meager 16k and is closed afterwards) but shared use(the index system can keep writing in that page freely until is full and move to the next) hence when you need those 16k again the index knows where in that page is located and give it to you regardless of how much others data is on it, the RAM physically don't have an equivalent of a sector size or any other minimal size beyond 1bit that can be manipulated by software, so you still have 4GB. Additionally bigger pages means equal or more available RAM overall because the indexing table could be actually smaller(depending how efficient the management algorithm is)

Disclaimer, I may be absolutely wrong here or have some confusion or be absolutely right, all I know is this is how I understand it work on a high level overview, if someone knows better please share

Basically the pages are what the kernel allocates for each process that requests memory, in reality the page is given to libc which in turn give the process the memory so if your process only allocates 4K of memory then yes the rest of the 2M page is wasted but when/if your process allocates say 4K more then this is taken from the same page (this is why the page is given to libc and not your process directly).

The kernel must keep record of which page is owned by which process so with 4K pages this means that the kernel have to a lot of lookups if your process allocates a bit of memory, if i.e your process allocates 1G of RAM then the kernel would have to allocate 262144 4K pages just for that process compared with just 512 2M pages. Also since each page consumes 8 bytes (on a 64-bit system) just for the minimum lookup table in the kernel you also have to "waste" 2M with 4K tables just to maintain that lookup table for your 1G process and just 4K with 2M tables.

So if you have lots and lots of small processes that allocates much less memory than 2M then yes the memory usage would increase due to the overhead, but few would even run such a system. For the average user the memory usage would decrease with larger pages.

**devius** · 18 August 2017, 04:45 PM

Originally posted by Jedibeeftrix View Post

But.... do you (bridgeman?)...

It's bridgman.

**CrystalGamma** · 18 August 2017, 05:56 PM

Originally posted by F.Ultra View Post

The kernel must keep record of which page is owned by which process so with 4K pages this means that the kernel have to a lot of lookups if your process allocates a bit of memory, if i.e your process allocates 1G of RAM then the kernel would have to allocate 262144 4K pages just for that process compared with just 512 2M pages. Also since each page consumes 8 bytes (on a 64-bit system) just for the minimum lookup table in the kernel you also have to "waste" 2M with 4K tables just to maintain that lookup table for your 1G process and just 4K with 2M tables.

So if you have lots and lots of small processes that allocates much less memory than 2M then yes the memory usage would increase due to the overhead, but few would even run such a system. For the average user the memory usage would decrease with larger pages.

But 2M for 1G of allocated memory is just 0.2%, so by reducing this to 0.0004% you only save 0.1996% of memory. The real reason huge pages are useful is TLB pressure (smaller TLBs can be made faster!).

Announcement

Vega 10 Huge Page Support, Lower CS Overhead For AMDGPU In Linux 4.14

Vega 10 Huge Page Support, Lower CS Overhead For AMDGPU In Linux 4.14

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment