Announcement

**tildearrow** · 24 August 2019, 07:02 AM

"Massive"... ...think you can elaborate a little?

**Michael** · 24 August 2019, 07:14 AM

Originally posted by tildearrow View Post

"Massive"... ...think you can elaborate a little?

Was going by what Rasterman wrote with "Massive improvements to Wayland support" in the release announcement but also not elaborating.

**skeevy420** · 24 August 2019, 08:58 AM

Gonna have to give this a compile sometime this week. I've always been very partial to Enlightenment and I'm currently breaking my rule about only having one GUI environment setup and working (Plasma). Where other people would go towards XFCE, Openbox, LX something or other, etc for a lightweight setup, I'd always go to Enlightenment.

I will say that I do not like a lot of their default settings. Once those are tweaked a bit it's a really nice setup. Different and takes some getting used to, but nice.

**c117152** · 24 August 2019, 10:13 AM

Originally posted by Michael View Post

Was going by what Rasterman wrote with "Massive improvements to Wayland support" in the release announcement but also not elaborating.

Probably referring to the numerous EFL fixes: https://www.enlightenment.org/news/efl-1.22.3

**Cape** · 25 August 2019, 02:30 AM

My God... We should all ditch GTK/Gnome and start developing on Enlightenment...

That thing FLIES on my potato netbook, whereas Gnome struggles even on my i7 😓

Performance should be the driving force!!

**dkasak** · 25 August 2019, 05:02 AM

Originally posted by Cape View Post

My God... We should all ditch GTK/Gnome and start developing on Enlightenment...

That thing FLIES on my potato netbook, whereas Gnome struggles even on my i7 😓

Performance should be the driving force!!

I've used Enlightenment from the very early ( 0.15 ) days, and it's always been unique and performant. Gnome is my fallback option - for when I break my ( git ) build of Enlightenment, or when something else is playing up. While I much prefer E over Gnome, performance has nothing to do with it, because Gnome has always performed well for me, on a range of desktops and laptops, with a range of underpowered Intel GPUs or more impressive AMD GPUs.

**Marzal** · 25 August 2019, 05:53 AM

Michael
It has been almost two years since the release of Enlightenment 0.23
I think should be:
It has been almost two years since the release of Enlightenment 0.22

**raster** · 25 August 2019, 07:57 AM

Originally posted by Cape View Post

My God... We should all ditch GTK/Gnome and start developing on Enlightenment...

That thing FLIES on my potato netbook, whereas Gnome struggles even on my i7 😓

Performance should be the driving force!!

E/EFL could be better. There are areas where we can do markedly better in terms of smoothness. I have plans/ideas. It's just there is a huge amount of already optimized infra there and some of it was built with synchronous assumptions long long long ago (e.g. you create an image object, then set the file to point it to... you get get the geometry of the image to decide how to size it then size the object in sequence: pseudo-code):

Code:

obj = image_add();
file_set(obj, "/path/to/icon.png");
size = size_get(obj);
resize(obj, size.width, size.height);

You get the idea. That is synchronous and depends on loading the png to get the size, so we may block (please keep reading for details on what that load involves). We have caches (speculative that keep data around after it's not used anymore and was already freed/deleted) to speed up re-loading the same thing, so these file_set's become NOPs there when we get cache hits, as we just dig the data out form memory we already have lurking around. We also de-duplicate on the fly (load the same image file in 20 objects we just point to the same image data/struct in the background and share it across all the image instances). The software renderer will also count the uses of scaling that icon to different sizes and if a certain destination scale size is used often enough it'll stop on-the-fly scaling and keep a scaled copy around to avoid the rescale-on-the-fly costs (GL will always scale on-the-fly). There are all sorts of other fun going on too that I could spend all day describing.

You get the idea that there are multiple layers and ways we cache data and reduce overheads already, but that initial first-time-if-not-in-cache load means going to "disk" and waiting. We do split loading into "load header vs. load body", so this means we only open the file, find the header, get metadata like size, if it has alpha etc., then close it and avoid a full decode, but it's still a stall. We do have a "now preload the body data in the background async and tell me when it's done via an event callback" with a thread pool that goes and decodes the image data (the most expensive part of loading an image file), but that is explicit in higher level code (we do this for things like wallpapers, icons etc. but not everything) and you may notice sometimes icons "appear" or "zoom/fade in" later on. That's once the event of "we loaded the data now - you can show that image from now on as the data is ready" comes in. If something shows the image before this load has completed, the render pass stalls waiting on it to complete to ensure it has the data it needs. The object HAS to remain "logically hidden" to avoid that stall. Sometimes some code somewhere just decides to go show it anyway and you didn't realize it was happening, thus causing a stall. Still, that initial header load can hurt if your disk is slow and disk caches of the kernel don't have the data readily available. Also we don't always async load the data in a thread because we avoid ever decoding data if the image is never actually rendered by default (imagine you have 500 icons in a list and most are off-screen and not visible - why load all of them now when you can load them on demand as you scroll around and they are really needed for rendering? this is already implicit and the default). Forcing an async threaded load for everything all the time will mean in these cases we pay a decode price and memory for something that may never needed because you never scroll that far etc.

My point in describing this is to show we have places we can improve on. We can decode that header async too. It's also explicit in higher level code to do this, in addition to requesting to load the body data async. (I have noticed that there seems to be a bug involved here that I have yet to find that makes the canvas think such images have no alpha channel... sometimes... need to find that some time). We could be a whole lot better and async decode EVERYTHING in threads and carefully pick policies on what is and is not decoded and when. There is a lot more besides we could spawn off into threads.

Object construction actually can be quite costly as it not only allocates memory but does a bunch of setup (like the above file loads) and also... produces a lot of events which then cause event handler callbacks to be called that then react to those by modifying the object or something else etc.. Object destruction also can be costly for the same reasons. We can defer a lot of deletion of objects to idle time (we already defer by 2 render cycles for state comparison reasons for minimum update region calculation). We could spool off a queue of objects to delete whilst idle to avoid it impacting interactivity and keep the framerate snappier. We could add more higher level object caches that cache high level UI objects to cut the cost of creation down significantly thus making it a lot snappier too. Our Software renderer does all the hard work in a thread, but our GL renderer issues all the GL work in the main loop/thread and this can block - especially on getting buffer age and doing a swap, so moving this to threads would help. We're far from perfect. There is much to do. Sliding it into an already complex system is hard work - especially if you don't want to break anything. We've done the "inline assembly for routines that matter and can have this applied to well". Done if for x86 and ARM. It's the other things that still need work. We could move some data structs from fragmented linked lists to something more compact/array like for better CPU/memory cacheline niceness. We could drop our call overhead by doing fewer dispatches or doing some profiling/optimization on our call resolver/dispatcher. Already done some caching there too but more can be done.

You may notice that a lot of the optimizing is all about really just: 1. caches (de-duplicating as well as speculative), 2. deferring work until later, 3. deferring work until idle, 4. punting work off into threads to move it out of the main loop. We're really good with #1 and #2. #3 and #4 are still a bit spotty. Add in some data struct work and we have a #5 to do too.

**Cape** · 26 August 2019, 05:34 AM

Originally posted by raster View Post

E/EFL could be better. There are areas where we can do markedly better in terms of smoothness. I have plans/ideas. It's just there is a huge amount of already optimized infra there and some of it was built with synchronous assumptions long long long ago (e.g. you create an image object, then set the file to point it to... you get get the geometry of the image to decide how to size it then size the object in sequence: pseudo-code):

Code:

obj = image_add();
file_set(obj, "/path/to/icon.png");
size = size_get(obj);
resize(obj, size.width, size.height);

You get the idea. That is synchronous and depends on loading the png to get the size, so we may block (please keep reading for details on what that load involves). We have caches (speculative that keep data around after it's not used anymore and was already freed/deleted) to speed up re-loading the same thing, so these file_set's become NOPs there when we get cache hits, as we just dig the data out form memory we already have lurking around. We also de-duplicate on the fly (load the same image file in 20 objects we just point to the same image data/struct in the background and share it across all the image instances). The software renderer will also count the uses of scaling that icon to different sizes and if a certain destination scale size is used often enough it'll stop on-the-fly scaling and keep a scaled copy around to avoid the rescale-on-the-fly costs (GL will always scale on-the-fly). There are all sorts of other fun going on too that I could spend all day describing.

You get the idea that there are multiple layers and ways we cache data and reduce overheads already, but that initial first-time-if-not-in-cache load means going to "disk" and waiting. We do split loading into "load header vs. load body", so this means we only open the file, find the header, get metadata like size, if it has alpha etc., then close it and avoid a full decode, but it's still a stall. We do have a "now preload the body data in the background async and tell me when it's done via an event callback" with a thread pool that goes and decodes the image data (the most expensive part of loading an image file), but that is explicit in higher level code (we do this for things like wallpapers, icons etc. but not everything) and you may notice sometimes icons "appear" or "zoom/fade in" later on. That's once the event of "we loaded the data now - you can show that image from now on as the data is ready" comes in. If something shows the image before this load has completed, the render pass stalls waiting on it to complete to ensure it has the data it needs. The object HAS to remain "logically hidden" to avoid that stall. Sometimes some code somewhere just decides to go show it anyway and you didn't realize it was happening, thus causing a stall. Still, that initial header load can hurt if your disk is slow and disk caches of the kernel don't have the data readily available. Also we don't always async load the data in a thread because we avoid ever decoding data if the image is never actually rendered by default (imagine you have 500 icons in a list and most are off-screen and not visible - why load all of them now when you can load them on demand as you scroll around and they are really needed for rendering? this is already implicit and the default). Forcing an async threaded load for everything all the time will mean in these cases we pay a decode price and memory for something that may never needed because you never scroll that far etc.

My point in describing this is to show we have places we can improve on. We can decode that header async too. It's also explicit in higher level code to do this, in addition to requesting to load the body data async. (I have noticed that there seems to be a bug involved here that I have yet to find that makes the canvas think such images have no alpha channel... sometimes... need to find that some time). We could be a whole lot better and async decode EVERYTHING in threads and carefully pick policies on what is and is not decoded and when. There is a lot more besides we could spawn off into threads.

Object construction actually can be quite costly as it not only allocates memory but does a bunch of setup (like the above file loads) and also... produces a lot of events which then cause event handler callbacks to be called that then react to those by modifying the object or something else etc.. Object destruction also can be costly for the same reasons. We can defer a lot of deletion of objects to idle time (we already defer by 2 render cycles for state comparison reasons for minimum update region calculation). We could spool off a queue of objects to delete whilst idle to avoid it impacting interactivity and keep the framerate snappier. We could add more higher level object caches that cache high level UI objects to cut the cost of creation down significantly thus making it a lot snappier too. Our Software renderer does all the hard work in a thread, but our GL renderer issues all the GL work in the main loop/thread and this can block - especially on getting buffer age and doing a swap, so moving this to threads would help. We're far from perfect. There is much to do. Sliding it into an already complex system is hard work - especially if you don't want to break anything. We've done the "inline assembly for routines that matter and can have this applied to well". Done if for x86 and ARM. It's the other things that still need work. We could move some data structs from fragmented linked lists to something more compact/array like for better CPU/memory cacheline niceness. We could drop our call overhead by doing fewer dispatches or doing some profiling/optimization on our call resolver/dispatcher. Already done some caching there too but more can be done.

You may notice that a lot of the optimizing is all about really just: 1. caches (de-duplicating as well as speculative), 2. deferring work until later, 3. deferring work until idle, 4. punting work off into threads to move it out of the main loop. We're really good with #1 and #2. #3 and #4 are still a bit spotty. Add in some data struct work and we have a #5 to do too.

Rasterman! Your jargon is music to my ears!
For example: you are talking about objects but isn't E pure C? Do you have some sort of integrated GC in EFL?

Sidenote:
Did somebody ever tried E on Librem5 devkit or similar??

Announcement

Enlightenment 0.23 Released With Massive Wayland Improvements

Enlightenment 0.23 Released With Massive Wayland Improvements

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment