Page 7 of 8 FirstFirst ... 5678 LastLast
Results 61 to 70 of 72

Thread: Gallium3D OpenCL GSoC Near-Final Status Update

  1. #61
    Join Date
    Oct 2007
    Location
    Toronto-ish
    Posts
    7,514

    Default

    Quote Originally Posted by Qaridarium View Post
    if you can answer this question with yes then it is easier : "would it be more complex and harder to learn of amd do the same new architecture with VLIW ?"

    then its easy compared to an VLIW version.
    I'm not having much luck parsing the question. What does "it" refer to ? Driver development (harder) or application development (easier) ?
    Last edited by bridgman; 08-20-2011 at 05:59 PM.

  2. #62
    Join Date
    Nov 2008
    Location
    Germany
    Posts
    5,411

    Default

    Quote Originally Posted by bridgman View Post
    I'm not having much luck parsing the question. What does "it" refer to ? Driver development (harder) or application development (easier) ?
    the question is not the existet architecture it's not about the old or the new architecture its about a fiktional architecture.

    if you build a fiktional architecture with the same feature set and useing VLIW insteat of a risc part would it be easier to build a driver for it ?

    you claim thats its harder to write a driver for an risc architecture than for an VLIW one.

    but all i read in the past calls me that if there is the same featureset an VLIW cpu is allways harder to get optimations.

  3. #63
    Join Date
    Oct 2007
    Location
    Toronto-ish
    Posts
    7,514

    Default

    Quote Originally Posted by Qaridarium View Post
    the question is not the existet architecture it's not about the old or the new architecture its about a fiktional architecture. if you build a fiktional architecture with the same feature set and useing VLIW insteat of a risc part would it be easier to build a driver for it ? you claim thats its harder to write a driver for an risc architecture than for an VLIW one.
    If we are comparing fictional parts the question is even harder to answer, since programming difficulty usually doesn't come from the basic architecture but rather from all the things you have to add (eg fixed function hardware for textures and for export) to make the architecture run real fast on specific workloads such as graphics. VLIW parts map naturally onto graphics since you can deal with all the components in a single instruction group, and things like texture and export processing are easier because all the components are lined up nicely in the hardware. Assuming the same team designed both parts on the same process, same die size, at the same time, my guess is that the VLIW part would be easier to program for graphics and the non-VLIW part would be easier to program for compute.

    Quote Originally Posted by Qaridarium View Post
    but all i read in the past calls me that if there is the same featureset an VLIW cpu is allways harder to get optimations.
    It's harder to optimize for full utilization of VLIW hardware, but since the peak performance of VLIW hardware is usually higher than non-VLIW hardware on the same die area (because relatively more of the space can be used for ALUs) the question is whether it is harder to optimize VLIW hardware to the same performance level you would get from an equivalently sized non-VLIW part.

    As always, the final answers depend on the workload -- if you are talking about simple-to-medium complexity graphics which naturally contain a lot of float3 and float4 operations then the same amount of optimization work would probably get you higher performance on the VLIW part. For compute, it's the other way round - float4 operations aren't all that common unless they are coded into the application, so non-VLIW is easier to optimize for and you can probably get higher performance from the same silicon area with non-VLIW.

    It's probably worth mentioning that everything you read about VLIW CPUs refers to compute, not graphics -- I haven't seen similar analyses of VLIW vs non-VLIW GPUs running typical graphics workloads.

  4. #64
    Join Date
    Nov 2008
    Location
    Germany
    Posts
    5,411

    Default

    Quote Originally Posted by bridgman View Post
    my guess is that the VLIW part would be easier to program for graphics and the non-VLIW part would be easier to program for compute.
    means amd stops being a graphic card company ?

    amd drops there lead in graphic only to compete with nvidia on compute stuff?

    and i really don't get the point about the "Compute" part in Bitcoin for example amd do have 6X more speed than any nvidia card means VLIW isn't the problem for the speed.

    or what kind of compute workload do you mean?


    Quote Originally Posted by bridgman View Post
    It's harder to optimize for full utilization of VLIW hardware, but since the peak performance of VLIW hardware is usually higher than non-VLIW hardware on the same die area (because relatively more of the space can be used for ALUs) the question is whether it is harder to optimize VLIW hardware to the same performance level you would get from an equivalently sized non-VLIW part.
    and again i don't get the Point.

    means amd want waste more die and transistors and they want less speed for the same die in the end of optimization process,

    just tell me: goes AMD to complete stupidity ? ?

    amd wana drop VLIW because of compute but on compute (bitcoin) they have X6 LEAD and because of this lead they drop VLIW ?

    and then they LEAD in Graphics speed per watt usage and speed per die size and because of this LEAD they drop VLIW ?

    maybe amd think like this: "damn nvidia win on compute with worst chip design ever maybe this is the clue maybe if we go worst we will push nvidia hard!..." LOL


    Quote Originally Posted by bridgman View Post
    As always, the final answers depend on the workload -- if you are talking about simple-to-medium complexity graphics which naturally contain a lot of float3 and float4 operations then the same amount of optimization work would probably get you higher performance on the VLIW part. For compute, it's the other way round - float4 operations aren't all that common unless they are coded into the application, so non-VLIW is easier to optimize for and you can probably get higher performance from the same silicon area with non-VLIW.
    your explanation contradicts the reality because "bitcoin" is compute and amd is 6 times faster and 10 times faster per WATT power usage than nvidia!

    what kind of compute workload do you mean?


    Quote Originally Posted by bridgman View Post
    It's probably worth mentioning that everything you read about VLIW CPUs refers to compute, not graphics -- I haven't seen similar analyses of VLIW vs non-VLIW GPUs running typical graphics workloads.
    sure.. i can do my own analytics.
    amd lead against nvidia in all graphic benchmarks by speed per watt power usage.
    in short words: amd is right nvidia is wrong.
    nvidia do RISC and amd do VLIW... an easy analytics

    ok lets analytics compute... amd is 6 times faster in bitcoin ,,, an very easy analytics...

  5. #65
    Join Date
    Oct 2007
    Location
    Toronto-ish
    Posts
    7,514

    Default

    Did you happen to see a movie called "The Last Boy Scout", with Bruce Willis ?

    If so, we're at that scene in the strip club where, after some increasingly unfriendly discussion... (other guy) "you're starting to pi** me off"... (Willis's character) "it's about f-ing time"... puts hand up to shake and introduces himself. Now we can talk.

    VLIW has all kinds of benefits -- there's a reason we're still using it across the board today -- it's a no-brainer for graphics and even offers the best performance in hand-tweaked compute apps. The questions are (a) whether we can rely on hand-tweaked compute apps as GPU compute becomes more common, (b) whether compiler technology can eliminate the need to hand-tweak application code in the future, and (c) whether it's possible to create a non-VLIW implementation which can maintain most of the advantages of VLIW.

  6. #66
    Join Date
    Nov 2008
    Location
    Germany
    Posts
    5,411

    Default

    Quote Originally Posted by bridgman View Post
    Did you happen to see a movie called "The Last Boy Scout", with Bruce Willis ?
    no... im not.. but i should watch this movie.

    [QUOTE=bridgman;224189]
    If so, we're at that scene in the strip club where, after some increasingly unfriendly discussion... (other guy) "you're starting to pi** me off"... (Willis's character) "it's about f-ing time"... puts hand up to shake and introduces himself. Now we can talk.[QUOTE=bridgman;224189]

    i think i need to watch the movie first to unterstand you..



    Quote Originally Posted by bridgman View Post
    VLIW has all kinds of benefits -- there's a reason we're still using it across the board today -- it's a no-brainer for graphics and even offers the best performance in hand-tweaked compute apps. The questions are (a) whether we can rely on hand-tweaked compute apps as GPU compute becomes more common, (b) whether compiler technology can eliminate the need to hand-tweak application code in the future, and (c) whether it's possible to create a non-VLIW implementation which can maintain most of the advantages of VLIW.
    long discussion for an simple point AMD just want an easier to handle Architecture. and "Less" hand optimizations in code meas "easier"

    and you still think its harder to write driver for it ?

    " whether it's possible to create a non-VLIW implementation which can maintain most of the advantages of VLIW."

    how can it be ?

  7. #67
    Join Date
    Oct 2007
    Location
    Toronto-ish
    Posts
    7,514

    Default

    Quote Originally Posted by Qaridarium View Post
    i think i need to watch the movie first to unterstand you..
    What I meant was "now you've argued both for and against VLIW, we're ready to talk about nuances, compromises and tradeoffs". When you are dealing with both compute and graphics, one architecture isn't obviously better than the other, and what you want is somewhere in between.

    Quote Originally Posted by Qaridarium View Post
    long discussion for an simple point AMD just want an easier to handle Architecture. and "Less" hand optimizations in code meas "easier" and you still think its harder to write driver for it ?
    Again, you keep combining things and looking for absolutes where they don't exist. Writing a graphics driver stack will probably be harder, writing a compute driver stack will probably be easier. That's not the point though -- nobody changes architectures to make writing drivers easier. The point is that writing performant compute *applications* will be easier on a non-VLIW architecture.
    Last edited by bridgman; 08-21-2011 at 12:00 PM.

  8. #68
    Join Date
    Sep 2010
    Posts
    474

    Default

    Quote Originally Posted by steckdenis View Post
    Hello,

    I've uploaded a new version of the documentation online at http://people.freedesktop.org/~steck...ver/index.html . The front page now says that OpenCL 1.1 is implemented, many new classes are documented, and I directly uploaded the documentation from my computer instead of producing it on the Freedesktop.org server. It is now beautiful !
    Thank you.
    It looks great, good work/job. ( I really mean that.)

  9. #69
    Join Date
    Mar 2007
    Location
    West Australia
    Posts
    371

    Default

    Quote Originally Posted by bridgman View Post
    One minor terminology point -- the AMD GPUs are all SIMD today (SIMD "this way" and VLIW "that way", eg a 16x5 or 16x4 array). In the future, they will still be SIMD, just not SIMD *and* VLIW.

    There are certainly lots of IRs to choose from, each with their own benefits and drawbacks. One more wild card is that our Fusion System Architecture (FSA) initiative will be built in part around a virtual ISA (FSAIL) designed to bridge over CPUs and GPUs, so I think we have at least 5 to choose from -- 4 if you lump LunarGLASS and LLVM IR together since LunarGLASS builds on and extends LLVM IR. I'm assuming nobody is arguing that we should go back to Mesa IR yet

    It's going to be an interesting few months, and an interesting XDC. I might even learn to like compilers, although that is doubtful.
    You don't like compilers? You must be using the intel compiler. ~troll face~

    FSAIL (why does that sound like fail) + Maybe SIMD will become very long instruction word multiple data.

  10. #70
    Join Date
    Mar 2007
    Location
    West Australia
    Posts
    371

    Default

    Quote Originally Posted by Qaridarium View Post
    for me its the same.
    lag of manpower or lag of money to pay manpower.
    its the same!
    call it lag of POWER

    the Humble Indie bundle shows Linux do not lag on money.

    Linux only Lag on organization. its to much chaos.

    we just need open-source fan-boy hardware to fund super ultra MAREK ...

    i send 13,37 euro to Marek in the past !
    everyone should do the same!
    just search your personal hacker you like and send him 13,37euro.
    I LoL'd at this post. x)
    Chaos theory == Open Source.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •