Originally posted by ermo
View Post
Announcement
Collapse
No announcement yet.
Rewriting Old Solaris C Code In Python Yielded A 17x Performance Improvement
Collapse
X
-
- Likes 2
-
Originally posted by F.Ultra View Post
Well it also depends on the size of the object stored. Even if we don't consider the cost of copy memory around we also face the problem of fitting the set itself in the cache so we are not hit by cache misses when traversing the set. Now of course we can hopefully trust that the cpu will prefetch in a sane manner here (which is no certainty, especially if the lookup is not linear). Sometimes I miss the more simpler days when I could remember the exact cycle count for each instruction and just look at two pieces of code and immediately see which one fast faster...
If the branch index has some dense nodes and the index value order is unpredictable by the branch predictor (profile-guided optimisation mode), then jump tables become preferable even though they waste a lot of cache. Why? Because the $IC waste for a lot of branches exceeds the $DC waste, and it's easier to mitigate $DC misses than $IC misses. $IC misses always (eventually) result in PC in hazard which means the pipeline has been starved to halt. Luckily with SMT, even in that worst-case scenario, the core remain busy with the other thread. So long as that condition switches intermittently between threads, it's unlikely that both pipelines will starve, and that's the real beauty of SMT for code with random memory access... it hides a ton of memory latency. It effectively doubles the effectiveness of your BP+ROB+Pipelining, cutting your effective memory latency in half. So don't worry about 2 or 3 back to back branch prediction misses if you have IVB+. Worry about it with short pipelines, small ROB and no SMT like A53.
This takes me back to GCC 1.0 architecture discussions in USENET lol. Years before SMT. Crap I am getting old.Last edited by linuxgeex; 18 October 2019, 08:44 PM.
- Likes 2
Comment
-
Originally posted by linuxgeex View PostJava's RAD days are long gone.
The JRE is so massive that compiling takes far longer than C, even when JITting with jar files.
On Compile time Java "does JIT compile"( what it can infer on compile time.. ),
Then you can JIT Compile in run-time, to continue optimising the application( or parts of it, that you couldn't on Compile Time ).. this process is slower of-course, but in a lot of cases it pays off in speed, at the cost of longer compilation process, and Memory..
Originally posted by linuxgeex View PostThat doesn't stop you from throwing 99% of the JRE in the garbage and using the true core language in a much lighter manner (like Google did by creating Dalvik), but then you discard the entire community around the language.
On Mobile devices,
I think Android is not a good model to follow, too much memory, to much cpu, too slow..
Originally posted by linuxgeex View PostAnd in the case of Google, they got sued for re-implementing the core APIs, and some idiot judges actually sided with Oracle because they don't understand the concept of an open specification. And they don't understand the costs of implementation. Using the same API isn't derivative in any way. The work to precisely match an existing API actually costs significantly more than creating your own from scratch. It was an extra cost to Google, in the spirit of cooperation, and the morons are looking only at the small body of the interface definition files and claiming that since those were "copied wholesale" therefore the work is derivative. Each of those judges should be sentenced to 1 year of systems programming with zero chance for parole, and then revise their findings.
Microsoft did the same, with a slightly difference thing..Sun Licensed Java on Windows for free,
Sun won in the court but was determined no fees, or damage was caused( due to the big big mistake sun did when Licensed it for free...but the court failed in acknowledge that the JVM is something separated, and that is proprietary...still C# continues out there..without no fines.. )..
Google tried the same, but Sun doesn't Licensed the JVM technology in any other OS for free..
So here Google continues infringing Java Technology owners Licence, and profiting big with it in Android..
You can run Scala, and others in the Sun JVM, they compile for the JVM but are different Languages..
Something that a lot of people fail, a lot of time, is to Acknowledge that the JVM is something completely different from the Java Language itself..
Sun invested a lot in optimisations in a lot of Arch's, and of-course that technology is proprietary, is not licensed for free.
So google should have been sued big for stealing Sun Technology, and Microsoft also for using its VM for C#..Last edited by tuxd3v; 18 October 2019, 08:56 PM.
- Likes 1
Comment
-
Originally posted by jacob View PostIn C he would have used some kind of tree library and he would have been skinned alive because that's - oh my god - a dependency!
It's hard for me to imagine where the original performance deficit came from to be able to improve it at all regardless of choice of language. Maybe the original author was trying to use as little RAM as possible (considering 1988 equipment) and had to read through the files multiple times, resulting in excess DISK USAGE. I don't think the performance was found in code but in I/O usage.
- Likes 2
Comment
-
Originally posted by vladpetric View PostOne shouldn't make a overly-general statement, that if C is slower, then something must be wrong with the code. Performance happens to be a really complex topic.
Comment
-
Originally posted by F.Ultra View Post
Well it also depends on the size of the object stored. Even if we don't consider the cost of copy memory around we also face the problem of fitting the set itself in the cache so we are not hit by cache misses when traversing the set. Now of course we can hopefully trust that the cpu will prefetch in a sane manner here (which is no certainty, especially if the lookup is not linear). Sometimes I miss the more simpler days when I could remember the exact cycle count for each instruction and just look at two pieces of code and immediately see which one fast faster...
With respect to prefetching - I can assure you that no modern processor has a prefetch system that can handle that (and btw, feel free to check my research work on the topic as well). Essentially prefetching is done linearly (usually in the L2 cache) - so it works for contiguous data, not linked stuff, or via prefetch instructions. And prefetch instructions won't work here. For effective prefetching you need to know the address a good number of cycles before you actually need the data. There's no point in having a prefetch instruction followed by the load in the same basic block.
Feel free to prove me otherwise - show me a processor that has a prefetching system capable of handling this. BTW, there are research proposals to deal with them. Problem is that they're not too good (in terms of expected speedups), really expensive in terms of chip complexity, and/or require profiling. So - nobody implemented them.
Out-of-order superscalar execution is an absolute necessity for good speed these days. The good old days, well, they weren't that good. I'd seriously (and respectfully) suggest you get up-to-date on processor microarchitecture. I honestly think you'll have fun with that . I for one find a lot of engineering beauty in modern processor design.
- Likes 1
Comment
-
Originally posted by L_A_G View Post
Probably not, but it's rather unlikely to be anything except a developer mistake (either in the code or the compiler settings). It's about as safe a bet as an a pool of oil showing up under your car while you're at work being an oil leak. Sure, it could be a prankster, but the first thing to do is still going to he having a look or getting a mechanic to do that if you don't service your own car.
But C code oftentimes ends up inefficient because of lack of good libraries. There's no good hash map implementation in the standard library, because, well, you can't really write a good, generic one, in C. C++ - yeah, totally.
- Likes 1
Comment
-
Originally posted by linuxgeex View PostJava's RAD days are long gone. The JRE is so massive that compiling takes far longer than C, even when JITting with jar files.
As many people already said, all this shows is that algorithms + data structures are more important than programming languages. The minute I saw that the code was implementing sets using linked lists, I knew it would be slow. A red black tree would be fine, a hash table would be optimal.
Originally posted by pal666 View Postoriginal oracle article is full of bullshit
didn't rewrite c code in python. he replaced c linked lists with c sets (i'm pretty sure python "native" sets are implemented in c)Last edited by cynical; 18 October 2019, 10:28 PM.
- Likes 1
Comment
Comment