Announcement

Collapse
No announcement yet.

Cooling The Raspberry Pi 4 With The Fan SHIM & FLIRC For Better Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Raphexon
    replied
    "S922x
    Has 4x [email protected] + 2x [email protected]"

    The A73 cores of the Odroid N2 are clocked at 1.8 ghz. (Their A53 cores at 1.9 ghz)
    While the S922x can be clocked higher, Hardkernel has decided on 1.8 ghz, not 2.21 ghz. (For the A73 cores)

    And the A53 on the RK3399/S922x can reach 2 ghz, though I don't think any of the boards will sustain that frequency without a good heatsink.

    Leave a comment:


  • coder
    replied
    Originally posted by tuxd3v View Post
    I saw that big article you showed,
    and it basically states the same thing I guessed in my previous comment..
    Not really. They go into quite a bit of depth in areas where the architecture was improved to reduce things like pipeline bubbles and so improve the typical case, even if the best case isn't as good as the A72.

    Originally posted by tuxd3v View Post
    Because they state that the improvements are so great that you can decode in one cycle( but only some intructions can be decoded in one cycle...they say the majority...).
    What is the majority???
    If my application only uses the minority of them...is my application faster? OfCourse not..
    If you read the whole article, they say it's the vector instructions that take two cycles to decode. The reason is probably that vectorized code tends to be in loops that are larger and more easily branch-predicted, so an extra cycle in the pipeline isn't a big deal for them.

    Originally posted by tuxd3v View Post
    The biggest improvement they state, is theoretical is 15%, there are also there a 10% improvement in other thing that I don't recall now..
    But IMO, you should not add improvements percentages arithmetically, because it doesn't work that way..
    You can try to guess how the performance improvements stack up, but you forget that they're each estimates for some particular workload, and they might react differently to different workloads. Again, what you need is real-world benchmarks, like the ones on ODROID's N2 page, rather than a lot of guessing.

    Originally posted by tuxd3v View Post
    Well I think, the values for A73 were retrieved from the A72 metrics, has a base..
    But they only put there a 6.35 DMIPS/Mhz, they dropped the maximum value that A72 could achieve( 7.4 DMIPS/Mhz ), and after that article you showed, its now clear why..
    I think it's a waste of time to argue about Dhrystone MIPS, but start by find a reliable source for these numbers and then we'll talk. IMO, it's less relevant than any of ODROID's benchmarks.

    Originally posted by tuxd3v View Post
    Also you can see by yourself,
    RK3399,
    Has 2x A72@2Ghz + 4x [email protected]
    S922x
    Has 4x [email protected] + 2x [email protected]

    So you doubled the big cores, you even boost the frequency of them,by a considerable margin, you boost the frequency of the A53 too, Again by a considerable margin, and you achieve around 30% improvement over RK3399?

    I would be expected something between 40-60% more at least..
    Okay, so we're now comparing SoCs. So, where did you get 30%?

    Leave a comment:


  • tuxd3v
    replied
    Originally posted by coder View Post
    Yeah, I saw those specs, too.
    But, as I said about DRAM clocks, you can't just latch on to a couple specs and use them to characterize the performance of the entire system. There's a lot more to the A73 than that.

    Here's a good overview of the A73. Particularly, the second page goes into some of the various performance improvements it contains:
    I saw that big article you showed,
    and it basically states the same thing I guessed in my previous comment..

    With some things more,
    like optimisations in the decoder, and launching uop from there on. instead of doing that in the dispatcher..
    They were made by ARM France Team,
    Even tough that there are a great degree of true, on that..., its not a universal truth..

    Why?
    Because they state that the improvements are so great that you can decode in one cycle( but only some intructions can be decoded in one cycle...they say the majority...).
    What is the majority???
    If my application only uses the minority of them...is my application faster? OfCourse not..

    Its an improvement for the decoder?
    Maybe, if you rely in that specific instructions then yes, it his, otherwise no, but it could be a improvement for the dispatcher, the uops since are smaller they do less things each one, and so, the pipelines are shortened, it could lead to less latency, that I 'guessed' in previous reply..

    They have done improvements in the dispatcher, but I don't consider them so important, has the improvements in the decoder..
    This changes are the most significant, in the decoder/dispatcher, in my opinion, for them to use the word 'optimisations'..

    But the article also states there there are situations were you have degraded performance..since they share lots of resources between execution units..
    A72 have this resources for each execution unit
    OfCourse that could mean more power consumption, but 'there are no launches'..and the same article also states that sooner or later they will have to increase the decoder with again, like in the A72..

    The biggest improvement they state, is theoretical is 15%, there are also there a 10% improvement in other thing that I don't recall now..
    But IMO, you should not add improvements percentages arithmetically, because it doesn't work that way..

    Originally posted by coder View Post
    That article just plotted values from a different wikipedia page, which seems to have changed since the article was written.
    Well I think, the values for A73 were retrieved from the A72 metrics, has a base..
    But they only put there a 6.35 DMIPS/Mhz, they dropped the maximum value that A72 could achieve( 7.4 DMIPS/Mhz ), and after that article you showed, its now clear why..

    Since the execution units on A73 share resources between them,
    When one is using a resource the other one cannot use it, so they stamped a 6.35 DMIPS/Mhz, for A73, and forgot the maximum value the A72 can achieve which is 7.4 DMIPS/Mhz...

    Also you can see by yourself,
    RK3399,
    Has 2x A72@2Ghz + 4x [email protected]
    S922x
    Has 4x [email protected] + 2x [email protected]

    So you doubled the big cores, you even boost the frequency of them,by a considerable margin, you boost the frequency of the A53 too, Again by a considerable margin, and you achieve around 30% improvement over RK3399?

    I would be expected something between 40-60% more at least..

    Originally posted by coder View Post
    I don't really understand your preoccupation with node size. The SoCs we're talking about only come in one node size, each. So, they should be judged and compared as is. Because that's how they're actually used, not based on some theoretical performance estimate of a different node size.
    Agree,
    I spoke about 16nm, because A72/A73 share that node size( both can be made in that node size.. )
    The Idea was to compare side by side on same frequencies/node size..

    Ofcourse comparing by Soc, s922x has half the node size, and that its important..

    Leave a comment:


  • tuxd3v
    replied
    Originally posted by Tomin View Post
    And sure, 1-3 dead pixels is fairly normal for a used product after some time and is probably not covered by warranty but for a new product I think it really should be none. Maybe I'm used to too good.
    Even Apple, don't return your LCD with death pixels on it..
    Other vendors the same..

    I don't recall now,
    What is the mount of dead pixels, but I think, is his measured per inch( I am not sure right now the ratio.. ).

    Its used by all players in the market..

    Leave a comment:


  • coder
    replied
    Originally posted by danmcgrew View Post
    Thanks, guys.

    You both are making my decision--and indeed, ANYone's decision--to buy the RK3399-based Pinebook Pro look like a stroke of genius.

    https://www.pine64.org/pinebook-pro/
    I do hope you enjoy it.

    Leave a comment:


  • coder
    replied
    Originally posted by tuxd3v View Post
    A72:
    Instruction Decode Width: 3-wide
    Dispatcher Execution Unit: Out of Order 5 wide, with a dept of 15 pipelines( executes more instructions but maybe more latency?? )

    A73:
    Instruction Decode Width: 2-wide( can only decode 2 instructions at time.. )
    Dispatcher Execution Unit: Out of Order 4 wide, with a dept of 11-12 pipelines( executes less instructions but maybe less latency?? )

    So,
    A73 has a 33.33% slower decoder, a 25% less powerful dispatcher, also a considerable shorter pipeline depth, only 73.3% to 80% of A72..

    The only thing A73 has( and his important..), is a $1 cache bigger( 64i + 32/64d) vs A72 $1(48i + 32d)
    Also,
    Since its pipeline depth is shorter is could also means that it could have less latency..
    Yeah, I saw those specs, too.

    But, as I said about DRAM clocks, you can't just latch on to a couple specs and use them to characterize the performance of the entire system. There's a lot more to the A73 than that.

    Here's a good overview of the A73. Particularly, the second page goes into some of the various performance improvements it contains:




    Originally posted by tuxd3v View Post
    I don't know were those values come from( 4.72 vs ~6.35 DMIPS/Mhz ), and its a purely integer benchmark ONLY..
    They can be challenged by other benchmarks people have been running( that supports my case, of A72 having a big performance per core.. ), you can see values between ~6.3 and ~7.4 DMIPS/Mhz for A72 there( depending on implementation.. )..
    That article just plotted values from a different wikipedia page, which seems to have changed since the article was written.

    Originally posted by tuxd3v View Post
    I am just guessing..
    And I don't find it helpful. Benchmarks would be helpful.

    Originally posted by tuxd3v View Post
    You need to be correct, because a real comparison would make sense, in same node size.. example 16 nm..
    In this way, you could sort out any doubts..
    I don't really understand your preoccupation with node size. The SoCs we're talking about only come in one node size, each. So, they should be judged and compared as is. Because that's how they're actually used, not based on some theoretical performance estimate of a different node size.

    Leave a comment:


  • Tomin
    replied
    Originally posted by danmcgrew View Post
    These (from Store page):
    Warranty: 30 days
    Small numbers (1-3) of stuck or dead pixels are a characteristic of LCD screens. These are normal and should not be considered a defect.
    don't strike me very professional. I mean I understand that this is small volume production from a small manufacturer with probably quite slim margins but still I prefer to buy more expensive purchases from places where I can get proper consumer protection. Of course, for a toy that's probably good enough.

    And sure, 1-3 dead pixels is fairly normal for a used product after some time and is probably not covered by warranty but for a new product I think it really should be none. Maybe I'm used to too good.

    Leave a comment:


  • tuxd3v
    replied
    Originally posted by danmcgrew View Post
    And I'm STILL going to buy a Pinebook Pro 14.1", 1080p IPS display Linux laptop--with full privacy controls AND a magnesium-alloy case; NOT PLASIC--for $199.95.
    Pinebook-pro,
    Is a very very attractive option, hard to refuse!
    • 2 Powerful Big Cortex A72, for a performance scheme( when power cord is plugged in), were they can take priority.
    • 4 efficient A53, for low power Consumption scheme( when on Battery only ), were they can take priority, and use the A72 for demanding situations only..
    Its a very, very good solution indeed!!

    Originally posted by danmcgrew View Post
    Thanks, guys.
    You both are making my decision--and indeed, ANYone's decision--to buy the RK3399-based Pinebook Pro look like a stroke of genius.
    Well,
    I will also have mine!!
    Last edited by tuxd3v; 18 August 2019, 10:38 AM. Reason: comment with less space used..

    Leave a comment:


  • danmcgrew
    replied
    Originally posted by tuxd3v View Post

    heheh, its ok
    Yes the RK3399 is a dual core A72 @ 2 Ghz( it doesn't need to be a 4 core there..see bellow.. )..
    A73 its maybe a optimized A72,maybe for power..,
    With barely to no difference in performance, or even slower than A72( I think A73 is slower..but could depend in what.. )
    A72:
    Instruction Decode Width: 3-wide
    Dispatcher Execution Unit: Out of Order 5 wide, with a dept of 15 pipelines( executes more instructions but maybe more latency?? )
    A73:
    Instruction Decode Width: 2-wide( can only decode 2 instructions at time.. )
    Dispatcher Execution Unit: Out of Order 4 wide, with a dept of 11-12 pipelines( executes less instructions but maybe less latency?? )
    So,
    A73 has a 33.33% slower decoder, a 25% less powerful dispatcher, also a considerable shorter pipeline depth, only 73.3% to 80% of A72..
    The only thing A73 has( and his important..), is a $1 cache bigger( 64i + 32/64d) vs A72 $1(48i + 32d)
    Also,
    Since its pipeline depth is shorter is could also means that it could have less latency..
    Maybe this 2 factors give him a small edge..,
    But A72, seems to me, to be a more powerful core than A73,
    At same time A73, seems to me, to use less power consumption( due to a shorter decoder/dispatcher, and also shorter pipeline.. )..
    I don't know were those values come from( 4.72 vs ~6.35 DMIPS/Mhz ), and its a purely integer benchmark ONLY..
    They can be challenged by other benchmarks people have been running( that supports my case, of A72 having a big performance per core.. ), you can see values between ~6.3 and ~7.4 DMIPS/Mhz for A72 there( depending on implementation.. )..
    So that 4.72 DMIPS/Mhz value seems to be a marketing operation..
    To be honest,
    To me seems that ARM at the time, were worried about the media buzzing the power consumption of A72( people think that performance "came for free.." ),
    And created a less capable CPU, but slightly more efficient( maybe?? ).. and since they are deployed in clusters of 4, they needed to consume less power...??
    I am just guessing..
    I didn't saw earlier that information, but yes, I do saw people comparing them in the ARM own forums and such, here and there..
    And it seems that the difference between them is maybe efficiency..
    But here,
    You need to be correct, because a real comparison would make sense, in same node size.. example 16 nm..
    In this way, you could sort out any doubts..
    I am also not trying to change your mind, or anybody else one.., just wanted to share this info for anyone who's interested.
    Thanks, guys.

    You both are making my decision--and indeed, ANYone's decision--to buy the RK3399-based Pinebook Pro look like a stroke of genius.

    Last edited by danmcgrew; 18 August 2019, 10:00 AM.

    Leave a comment:


  • tuxd3v
    replied
    Originally posted by coder View Post
    Not to stir things back up again, after this finally wound down, but do you realize that RK3399 is only dual-A72 + quad-A53, while the N2's S922X is quad-A73 + dual-A53?
    heheh, its ok
    Yes the RK3399 is a dual core A72 @ 2 Ghz( it doesn't need to be a 4 core there..see bellow.. )..

    Originally posted by coder View Post
    And according to this, the A73 is about 35% faster, per clock (i.e. DMIPS/MHz), at least for integer workloads.

    https://en.wikipedia.org/wiki/Compar..._ARMv8-A_cores
    A73 its maybe a optimized A72,maybe for power..,
    With barely to no difference in performance, or even slower than A72( I think A73 is slower..but could depend in what.. )

    A72:
    Instruction Decode Width: 3-wide
    Dispatcher Execution Unit: Out of Order 5 wide, with a dept of 15 pipelines( executes more instructions but maybe more latency?? )

    A73:
    Instruction Decode Width: 2-wide( can only decode 2 instructions at time.. )
    Dispatcher Execution Unit: Out of Order 4 wide, with a dept of 11-12 pipelines( executes less instructions but maybe less latency?? )

    So,
    A73 has a 33.33% slower decoder, a 25% less powerful dispatcher, also a considerable shorter pipeline depth, only 73.3% to 80% of A72..

    The only thing A73 has( and his important..), is a $1 cache bigger( 64i + 32/64d) vs A72 $1(48i + 32d)
    Also,
    Since its pipeline depth is shorter is could also means that it could have less latency..

    Maybe this 2 factors give him a small edge..,
    But A72, seems to me, to be a more powerful core than A73,
    At same time A73, seems to me, to use less power consumption( due to a shorter decoder/dispatcher, and also shorter pipeline.. )..

    I don't know were those values come from( 4.72 vs ~6.35 DMIPS/Mhz ), and its a purely integer benchmark ONLY..
    They can be challenged by other benchmarks people have been running( that supports my case, of A72 having a big performance per core.. ), you can see values between ~6.3 and ~7.4 DMIPS/Mhz for A72 there( depending on implementation.. )..

    So that 4.72 DMIPS/Mhz value seems to be a marketing operation..
    To be honest,
    To me seems that ARM at the time, were worried about the media buzzing the power consumption of A72( people think that performance "came for free.." ),
    And created a less capable CPU, but slightly more efficient( maybe?? ).. and since they are deployed in clusters of 4, they needed to consume less power...??

    I am just guessing..

    Originally posted by coder View Post
    If you knew this, fine. If not, now you do. I'm not trying to change your mind - just wanted share this info for anyone who's interested.
    I didn't saw earlier that information, but yes, I do saw people comparing them in the ARM own forums and such, here and there..
    And it seems that the difference between them is maybe efficiency..

    But here,
    You need to be correct, because a real comparison would make sense, in same node size.. example 16 nm..
    In this way, you could sort out any doubts..

    I am also not trying to change your mind, or anybody else one.., just wanted to share this info for anyone who's interested.
    Last edited by tuxd3v; 18 August 2019, 12:32 AM.

    Leave a comment:

Working...
X