It looks to me that you want a C++ template language as the final output of the "ultimate" overly-optimized binary, but to me this is plan dumb.
So let me point few mistakes I notice in comments:
- (not a comment of eigenlambda) Reflection can be done with Ahead-Of-Time: In fact AOT can help doing reflection as it can create type hierarchies ahead of time and also it can elide verification. One platform that does this for many many years and they fall back to a JIT just in cases where is needed is Excelsior-JET or there was RoboVM - now BugVM (I am not related with them). Some platform (.Net on Windows Phone 10) can create base on types serializations/deserializations bytecodes based on all types that can be reflected. This is not because AOT cannot be done ahead of time, but because AOT can help specific cases. The single thing that reflection needs it is that if there is a little code generation (which may no be needed) that a JIT/interpreter will run back these codes which are not supported or generated on runtime.
- Reflection is kind of fast it depends, in fact, it is depending on who you ask, around 15 times slower (if you redo with no optimizations the reflection code and you basically don't know how to write a proper reflection code) or just 50% slower than a virtual call. (I give a source: http://stackoverflow.com/questions/4...on-performance ). So really, if you care to have the ultimate performance, maybe better not use virtual calls, because this is what kills the program performance, or maybe, just maybe, you have a bad design.
- Language that converts to C++/stack allocation would be silly: There are few problems, with this line of thinking: stacks in any OSes are really small, and we do care about not only using the multithreaded code (which would give a nighmare in performance as there will be very hard to resolve problems like false sharing). At last, and not the least is that Java does convert heap allocations into stack allocation (it is named "escape analysis" optimization) when C++ does not.
- (not a comment of eigenlambda) The Falcon JIT matters a lot where is used a lot. It is very likely there the Azul team benchmarked it, it is around the financial domain where they compete for every millisecond. I was writing a specialized bytecode compiler that would target things like LLVM and I could achieve around 10-15% speedup, but it is possible that in some cases the advantage could be bigger. Given the price of license of just few thousands (I am not related with Azul - if you see some promotion here) of dollars, the price could be really justified, especially as they sell it as a part of low-latency with very rare "breaks" in the system. This is an extra-edge which they have in their product