Announcement

**Gusar** · 19 April 2014, 02:40 PM

Originally posted by mendieta View Post

On that vain: I am using a Haswell i5 these days. Of course I overclocled/fine tuned it as much as I could. But adding or removing RAM for Graphics in the BIOS made no difference whatsoever in my tests (even with the intensive Unigine tests). I actually suggested another user in these forums to try that, and they got exactly the same behavior.

I have two settings for video ram - "Internal Graphics Memory Size", which I think is the framebuffer; and DVMT ("Dynamic Video Memory Technology", talk about buzzwords

). Do you have that as well, and which were you setting? I have the framebuffer set to the lowest (32MB) and DVMT to "max" (other options being 128M and 256M). I would expect limiting DVMT to have an impact, not sure about the other setting.

**dungeon** · 19 April 2014, 02:42 PM

Originally posted by gens View Post

@brosis
if i remember good xonotic uses about ~300MB gpu memory
i think most if not all of the level/objects is in there

GALLIUM_HUD says Xonotic 0.7 request ~600MB VRAM for high/ultra preset

at least that is the case when running demo benchmark

.

**gens** · 19 April 2014, 03:01 PM

Originally posted by bridgman View Post

It's often difficult/impossible to allocate big chunks of physically contiguous memory via the OS memory manager after the system has been running for a while... but having a reserved area of memory keeps the OS memory manager's grubby little fingers off it

can i ask why
i mean is it for historical/architectural reasons or complexity/performance or it's just not implemented or..

am curious as it seems like something an OS could do
(even memory defragmenting; suspend a process - copy page - change lookup table - resume)

like i have a client-server that transfer data via shm (server having an alsa DMA buffer)
it would be fun to just call the kernel to change.... scratch this, i didn't think that the device uses a different MMU
got me thinking now..

edit: still run-time defragmenting sounds, to uneducated me, to be possible

**mendieta** · 19 April 2014, 03:01 PM

Originally posted by bridgman View Post

It's not a silly question at all. The distinction between "video RAM" and "system RAM" on an IGP is made for a few different reasons :
[snip]

Ah, thank you so much, and many thanks to @curaga and @gens. My hunch was that for the older, not on-die IGP's, it made sense indeed to have a separate memory chunk, becuase the RAM access was different and pehaps slower, etc. It was with the more recent APUs that I was confused. Thank you all for the info.

I guess it wouldn't be out of the question to have, at some point, the kernel give up memory to the graphics driver on demand, so this becomes a runtime allocation as opposed to a BIOS setting. So, a user willing to allocate more RAM for graphics uses a GUI inside the OS, sets "graphics RAM" to 1Gb, and the next bootup, the kernel reads the config file and gives up one gig to the GPU drive to do as they please. So she doesn't need to fiddle with the BIOS, which is a terrifying experience for most people.

**mendieta** · 19 April 2014, 03:02 PM

Originally posted by Gusar View Post

I have two settings for video ram - "Internal Graphics Memory Size", which I think is the framebuffer; and DVMT ("Dynamic Video Memory Technology", talk about buzzwords

). Do you have that as well, and which were you setting? I have the framebuffer set to the lowest (32MB) and DVMT to "max" (other options being 128M and 256M). I would expect limiting DVMT to have an impact, not sure about the other setting.

Ah, let me check. Thanks!

**mendieta** · 19 April 2014, 03:51 PM

Originally posted by Gusar View Post

I have two settings for video ram - "Internal Graphics Memory Size", which I think is the framebuffer; and DVMT ("Dynamic Video Memory Technology", talk about buzzwords

). Do you have that as well, and which were you setting? I have the framebuffer set to the lowest (32MB) and DVMT to "max" (other options being 128M and 256M). I would expect limiting DVMT to have an impact, not sure about the other setting.

So, I checked, you are right. I had been playing with the first one, and I agree, it must be the framebuffer. For the DVMT, I also had it set to MAX. I just tried setting it to the minimum (128Mb) and it had no impact for Unigine Sanctuary, but who knows, maybe it does for full games. I am a bit lazy to run the game benchmarks. I'll revert DVMT to the MAX setting for now. Many thanks!

**Luke_Wolf** · 19 April 2014, 06:15 PM

Originally posted by gens View Post

can i ask why
i mean is it for historical/architectural reasons or complexity/performance or it's just not implemented or..

am curious as it seems like something an OS could do
(even memory defragmenting; suspend a process - copy page - change lookup table - resume)

like i have a client-server that transfer data via shm (server having an alsa DMA buffer)
it would be fun to just call the kernel to change.... scratch this, i didn't think that the device uses a different MMU
got me thinking now..

edit: still run-time defragmenting sounds, to uneducated me, to be possible

while I'm not an OS developer the main problem I see where you're going to run into issues is dynamic memory, where what you're going to have to do is as follows:

Scan memory to see whether the fragmentation level is above a certain level (say 15%)
Mark Memory that is in fragmented locations and find all open positions
Figure out how you want to slot the memory to defragment it
For Each item to be Moved:
1. find all pointers that point to this location
2. find a suitable place in memory to place the Memory
3. Stop the World
4. Move memory into the new location
5. Update Pointers
6. Mark previous location as open
7. Optional: zero out previous location for security (thus preventing heartbleed type attacks at the memory-manager level)
8. Restart the World
Goto 1

in short... what amounts to a garbage collection algorithm, with all of the optimization potential, such as using multiple generations and so on, but also all of the problems.

**Veerappan** · 19 April 2014, 08:31 PM

Originally posted by Luke_Wolf View Post

while I'm not an OS developer the main problem I see where you're going to run into issues is dynamic memory, where what you're going to have to do is as follows:

Scan memory to see whether the fragmentation level is above a certain level (say 15%)
Mark Memory that is in fragmented locations and find all open positions
Figure out how you want to slot the memory to defragment it
For Each item to be Moved:
1. find all pointers that point to this location
2. find a suitable place in memory to place the Memory
3. Stop the World
4. Move memory into the new location
5. Update Pointers
6. Mark previous location as open
7. Optional: zero out previous location for security (thus preventing heartbleed type attacks at the memory-manager level)
8. Restart the World
Goto 1

in short... what amounts to a garbage collection algorithm, with all of the optimization potential, such as using multiple generations and so on, but also all of the problems.

I like the idea... Just probably need to track memory usage patterns a bit, otherwise a realloc() could get a bit more expensive if you pack everything together.

**stiiixy** · 19 April 2014, 08:34 PM

Kernel's aren't really designed to handle graphics at all, are they? That's what Framebuffer, MESA, OpenGL and whatnot do? Do advanced graphical thingies using these optimised and task-orientated suites that can take advantage of hardware that also has these thingies. It seems to me (psuedo-logically, nothing technical here) the kernel would do a very slow and inefficient job to something as advanced as even the most basic GFX core.

**gens** · 19 April 2014, 10:06 PM

Originally posted by Luke_Wolf View Post

in short... what amounts to a garbage collection algorithm, with all of the optimization potential, such as using multiple generations and so on, but also all of the problems.

bit of overkill

thing is that memory does not need to be contiguous (except for this case, DMA to devices that is)

RAM memory in x86 is in 4kB pages
like when you do a malloc() of lets say a byte, your program asks the kernel for 4k bytes then manages it (there is the break() call, but that's different and still comes down to 4k pages)
then the kernel finds a free page and assigns it to your process
it does that by keeping a page table, that is an in-cpu thing (for the better part)
result is a different memory map for every process (virtual memory)
so when you write to that memory the cpu just translates your pointer to a real one

so fragmentation is no problem at all, unless you really need the mapped memory to be contiguous
it's not even a problem for most devices since they rarely have buffers over 4k, unlike in the case of a gpu (framebuffer for my screen now is 1280*1024*24 bits, that is ~3.8MB)

i was thinking of a call like malloc that would, if failing to find enough contiguous memory, screw over a couple processes for a couple nanoseconds

now (afaik at least) the kernel gets this buffers by reserving a part of memory at boot just for drivers

A deep dive into CMA [LWN.net]

https://lwn.net/Articles/486301/

Announcement

How Much Video RAM Is Needed For Catalyst R3 Graphics?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment