Announcement

**Michael** · 12 January 2017, 11:38 AM

Originally posted by nevion View Post

Michael another bump - as you requested previously.

Hi nevion,
I've been playing with it this morning. Built ArrayFire stock and installed it in /usr/local, (that build script mentioned in the test profile contents didn't work).

Is there a reason ArrayFire isn't being built with the test profile itself? Seems like it could be automated easily sans getting all the dependencies, then building based upon what OpenCL/CUDA/etc is available.

It's been working fine for me. here is example I used of selecting multiple options:

local/arrayfire:
Test Installation 1 of 1
Installation Size: 1.0 MB
Installing Test @ 10:35:22

ArrayFire 1.0:
local/arrayfire
Processor Test Configuration
1: OpenCL
2: CUDA
3: CPU
4: Test All Options
Platform: 3

1: Accumulate_1D_f32
2: Accumulate_1D_f64
3: Accumulate_2D_f32
4: Accumulate_2D_f64
5: Bandwidth_f32
6: Bandwidth_f64
7: BilateralFilter_f32
8: BilateralFilter_f64
9: Convolve_f32_11x11
10: Convolve_f32_5x5
11: Convolve_f32_9x9
12: Convolve_f64_11x11
13: Convolve_f64_5x5
14: Convolve_f64_9x9
15: Data_f32_CONSTANT
16: Data_f32_IDENTITY
17: Data_f32_RANDN
18: Data_f32_RANDU
19: Data_f32_RANGE
20: Data_f64_CONSTANT
21: Data_f64_IDENTITY
22: Data_f64_RANDN
23: Data_f64_RANDU
24: Data_f64_RANGE
25: ELWISE_f32_ADD
26: ELWISE_f32_ADD_CONSTANT
27: ELWISE_f32_ARC_COS
28: ELWISE_f32_ARC_SIN
29: ELWISE_f32_ARC_TAN
30: ELWISE_f32_ATAN2
31: ELWISE_f32_CBRT
32: ELWISE_f32_COS
33: ELWISE_f32_DIVIDE
34: ELWISE_f32_DIVIDE_CONSTANT
35: ELWISE_f32_ERF
36: ELWISE_f32_ERFC
37: ELWISE_f32_EXP
38: ELWISE_f32_EXP_M1
39: ELWISE_f32_HYPOT
40: ELWISE_f32_HYP_ARC_COS
41: ELWISE_f32_HYP_ARC_SIN
42: ELWISE_f32_HYP_ARC_TAN
43: ELWISE_f32_HYP_COS
44: ELWISE_f32_HYP_SIN
45: ELWISE_f32_HYP_TAN
46: ELWISE_f32_IS_INF
47: ELWISE_f32_IS_NAN
48: ELWISE_f32_IS_ZERO
49: ELWISE_f32_LGAMMA
50: ELWISE_f32_LOG10
51: ELWISE_f32_LOG_1P
52: ELWISE_f32_LOG_E
53: ELWISE_f32_MAX
54: ELWISE_f32_MIN
55: ELWISE_f32_MODULO
56: ELWISE_f32_MULTIPLY
57: ELWISE_f32_MULTIPY_CONSTANT
58: ELWISE_f32_POW
59: ELWISE_f32_REMAINDER
60: ELWISE_f32_SIN
61: ELWISE_f32_SQRT
62: ELWISE_f32_SUBTRACT
63: ELWISE_f32_SUBTRACT_CONSTANT
64: ELWISE_f32_TAN
65: ELWISE_f32_TGAMMA
66: ELWISE_f64_ADD
67: ELWISE_f64_ADD_CONSTANT
68: ELWISE_f64_ARC_COS
69: ELWISE_f64_ARC_SIN
70: ELWISE_f64_ARC_TAN
71: ELWISE_f64_ATAN2
72: ELWISE_f64_CBRT
73: ELWISE_f64_COS
74: ELWISE_f64_DIVIDE
75: ELWISE_f64_DIVIDE_CONSTANT
76: ELWISE_f64_ERF
77: ELWISE_f64_ERFC
78: ELWISE_f64_EXP
79: ELWISE_f64_EXP_M1
80: ELWISE_f64_HYPOT
81: ELWISE_f64_HYP_ARC_COS
82: ELWISE_f64_HYP_ARC_SIN
83: ELWISE_f64_HYP_ARC_TAN
84: ELWISE_f64_HYP_COS
85: ELWISE_f64_HYP_SIN
86: ELWISE_f64_HYP_TAN
87: ELWISE_f64_IS_INF
88: ELWISE_f64_IS_NAN
89: ELWISE_f64_IS_ZERO
90: ELWISE_f64_LGAMMA
91: ELWISE_f64_LOG10
92: ELWISE_f64_LOG_1P
93: ELWISE_f64_LOG_E
94: ELWISE_f64_MAX
95: ELWISE_f64_MIN
96: ELWISE_f64_MODULO
97: ELWISE_f64_MULTIPLY
98: ELWISE_f64_MULTIPY_CONSTANT
99: ELWISE_f64_POW
100: ELWISE_f64_REMAINDER
101: ELWISE_f64_SIN
102: ELWISE_f64_SQRT
103: ELWISE_f64_SUBTRACT
104: ELWISE_f64_SUBTRACT_CONSTANT
105: ELWISE_f64_TAN
106: ELWISE_f64_TGAMMA
107: Erode_f32_5x5
108: Erode_f64_5x5
109: FFT_1D_f32
110: FFT_1D_f64
111: FFT_2D_f32
112: FFT_2D_f64
113: GFOR_FOR_LOOP_SUM
114: GFOR_NO_LOOP_SUM
115: GFOR_SUM
116: Histogram_f32
117: Histogram_f64
118: Image_Bilateral_11x11
119: Image_Bilateral_5x5
120: Image_Bilateral_9x9
121: Image_Convolve_11x11
122: Image_Convolve_5x5
123: Image_Convolve_9x9
124: Image_Erode_11x11
125: Image_Erode_5x5
126: Image_Erode_9x9
127: Image_FAST
128: Image_Histogram
129: Image_ORB
130: Image_Resize_Expand_2x
131: Image_Resize_Shrink_2x
132: Cholesky_f32
133: Cholesky_f64
134: LU_f32
135: LU_f64
136: MatrixMultiply_f32
137: MatrixMultiply_f64
138: MedianFilter_f32_4x4_PAD_SYM
139: MedianFilter_f32_4x4_PAD_ZERO
140: MedianFilter_f64_4x4_PAD_SYM
141: MedianFilter_f64_4x4_PAD_ZERO
142: PinnedMemory_f32_Bandwidth
143: PinnedMemory_f64_Bandwidth
144: Expand_2D_f32_AF_INTERP_BILINEAR
145: Expand_2D_f32_AF_INTERP_NEAREST
146: Expand_2D_f64_AF_INTERP_BILINEAR
147: Expand_2D_f64_AF_INTERP_NEAREST
148: Shrink_2D_f32_AF_INTERP_BILINEAR
149: Shrink_2D_f32_AF_INTERP_NEAREST
150: Shrink_2D_f64_AF_INTERP_BILINEAR
151: Shrink_2D_f64_AF_INTERP_NEAREST
152: Rotate_f32_INTERP_NEAREST
153: Rotate_f64_INTERP_NEAREST
154: Sort_f32_ASCENDING
155: Sort_f32_DESCENDING
156: Sort_f64_ASCENDING
157: Sort_f64_DESCENDING
158: Sum_1D_f32
159: Sum_1D_f64
160: Sum_2D_f32
161: Sum_2D_f64
162: Transpose_f32
163: Transpose_f64
164: Test All Options
Benchmark: 1-10

Is working fine and could do like 1-10,20-40,57,164

**nevion** · 12 January 2017, 11:51 AM

Michael It takes quite some time to compile arrayfire... I didn't have it built automatically just for that reason. That and using the prepackaged arrayfire makes for a more stable basis. So I download and install that out of band, although the updateLibraries script should do that too. I think it is possible to put the download to this software in your download list to make it a bit more self contained - but it's a heavy package, near a gig and they're hosting it on amazon (=bw costs per dl)... I figured it more like CUDA in the way of cost of installationthat it should just be installed out of band prior to benchmark run. What do you think?

Do you have interest in incorporating this testing into your repertoire - are there some tests you'd like to see added or removed? Any other problems? I'd like to extend or contract it where you think that could be useful; and potentially add more. For instance, I am going to add int{8,16,32,64} benchmarks for the majority of these.

**Michael** · 12 January 2017, 11:55 AM

[QUOTE=nevion;n924207]Michael It takes quite some time to compile arrayfire... I didn't have it built automatically just for that reason. That and using the prepackaged arrayfire makes for a more stable basis. So I download and install that out of band, although the updateLibraries script should do that too. I think it is possible to put the download to this software in your download list to make it a bit more self contained - but it's a heavy package, near a gig and they're hosting it on amazon (=bw costs per dl)... I figured it more like CUDA in the way of cost of installationthat it should just be installed out of band prior to benchmark run. What do you think?[/QUOTE[

Hmm okay. I didn't think it took too incredibly long to build ArrayFire. Maybe only build it in the test profile if it's not found on the system otherwise? Just trying to think how to make it easier to setup/deploy.

Originally posted by nevion View Post

Do you have interest in incorporating this testing into your repertoire - are there some tests you'd like to see added or removed? Any other problems? I'd like to extend or contract it where you think that could be useful; and potentially add more. For instance, I am going to add int{8,16,32,64} benchmarks for the majority of these.

Sure, would be interested in promoting it to the official test repository. Currently running all 168 tests on the CPU: with many of them failing -- is it known a number of them fail, at least when running on the CPU?

After doing that, will likely try out a CUDA build, etc. Were there any other planned improvements you wanted to the test profile? Unfortunately not too familiar with ArrayFire to know if there are any big pieces missing, etc.

**nevion** · 12 January 2017, 12:30 PM

The problem I had in building arrayfire was making sure it pointed to the right opencl and that, against ROCm, things would execute at all - ROCm's last release presented some mixed runtime vs driver versioning right now that trips it up (and was a bad call IMO) - one more argument for prebuilt binaries, for now. We can do a local build or fetch-install if it's not on the system though - keeping in consideration the OpenCL caveat just mentioned.

With the max problemsize I am running at, those CPU jobs will not finish in a timely manner (many would be very, very long times). I mentioned previously I'm only allowing so many seconds for the jobs to compute, then they are killed - this is regardless of platform (cpu, cuda, or opencl). When a test times out, it will fail. It's a tough problem with no right answer - if you change the problem sizes to fit the CPUs, it opens up other cans of worms or mixes and matches results. We can increase the threshold, but the behavior is always going to be there. It also deals with hung or near hung GPUs (where a bug or performance issue is present, which I am experiencing now also and reported to ROCm upstream).

I don't know of an in-arrayfire profile change I'd make atm beyond datatype extension, I'll have to look it over - but it's already a good haul.

**Michael** · 12 January 2017, 12:35 PM

Originally posted by nevion View Post

The problem I had in building arrayfire was making sure it pointed to the right opencl and that, against ROCm, things would execute at all - ROCm's last release presented some mixed runtime vs driver versioning right now that trips it up (and was a bad call IMO) - one more argument for prebuilt binaries, for now. We can do a local build or fetch-install if it's not on the system though - keeping in consideration the OpenCL caveat just mentioned.

With the max problemsize I am running at, those CPU jobs will not finish in a timely manner (many would be very, very long times). I mentioned previously I'm only allowing so many seconds for the jobs to compute, then they are killed - this is regardless of platform (cpu, cuda, or opencl). When a test times out, it will fail. It's a tough problem with no right answer - if you change the problem sizes to fit the CPUs, it opens up other cans of worms or mixes and matches results. We can increase the threshold, but the behavior is always going to be there. It also deals with hung or near hung GPUs (where a bug or performance issue is present, which I am experiencing now also and reported to ROCm upstream).

I don't know of an in-arrayfire profile change I'd make atm beyond datatype extension, I'll have to look it over - but it's already a good haul.

I may add it then where if arrayfire on the system isn't detected, go ahead and do a stock build, trying to guess sane defaults.

Okay, I'll let this CPU run continue for a hour or two (since I'm busy with other work at the moment anyhow) and if I don't hear from you with other changes will then go through and ensure everything is tidy and working fine with the test profile before pushing it to OpenBenchmarking.org!

One more thing: right now arrayfire-benchmark is just cloned from Git, is it sane enough where I can package up a git snapshot and host it on Phoronixtestsuite.com for it to download=, that way don't have to worry about any Git changes there breaking the test profile, etc.

**nevion** · 12 January 2017, 02:03 PM

Michael I'll update the scripts tonight to clone tag'd versions - as well as a local install of arrayfire prebuilt when not on the system already. At that point you can put it up stream though I want to make sure that I can get the datatype extended variant tests in on short notice after that - otherwise I'd prefer to wait another week or two.

**Michael** · 13 January 2017, 07:36 AM

Originally posted by nevion View Post

Michael I'll update the scripts tonight to clone tag'd versions - as well as a local install of arrayfire prebuilt when not on the system already. At that point you can put it up stream though I want to make sure that I can get the datatype extended variant tests in on short notice after that - otherwise I'd prefer to wait another week or two.

Just checking if those updates have been pushed yet?

**nevion** · 13 January 2017, 11:33 AM

Michael - nope, I'm trying to get those datatype extensions in before. I'll ping you again when I'm ready but hopefully it'll be tonight.

**nevion** · 16 January 2017, 05:52 AM

Michael - ok I had some issues with ROCm that slowed me down but I pushed through and namely got those datatype extensions in. I also switched to local, from-git no-gl installations of arrayfire and created pts branches for arrayfire and arrayfire-benchmark which will serve as mainline for bugfixes and in this moment some fixes for arrayfire on ROCm to work _now_. Still trying to fix CUDA builds but I'm getting pulled down fixing it on an OpenSUSE box and it's a bit harder to satisfy the build deps of arrayfire (subtle cmake bugs). It probably works on ubuntu on CUDA, not that I can test.

I'll probably fix any remaining issues I'm having on SUSE but it's working pretty well now. I submitted a pull request for a couple of package deps on Ubuntu/SUSE too, for arrayfire local builds. See if things work fine for you on a clean install, if you can - perhaps it's good enough to be merged now - fixing SUSE+CUDA is taking like 20 minutes every test change...

**Michael** · 16 January 2017, 07:01 AM

Originally posted by nevion View Post

Michael - ok I had some issues with ROCm that slowed me down but I pushed through and namely got those datatype extensions in. I also switched to local, from-git no-gl installations of arrayfire and created pts branches for arrayfire and arrayfire-benchmark which will serve as mainline for bugfixes and in this moment some fixes for arrayfire on ROCm to work _now_. Still trying to fix CUDA builds but I'm getting pulled down fixing it on an OpenSUSE box and it's a bit harder to satisfy the build deps of arrayfire (subtle cmake bugs). It probably works on ubuntu on CUDA, not that I can test.

I'll probably fix any remaining issues I'm having on SUSE but it's working pretty well now. I submitted a pull request for a couple of package deps on Ubuntu/SUSE too, for arrayfire local builds. See if things work fine for you on a clean install, if you can - perhaps it's good enough to be merged now - fixing SUSE+CUDA is taking like 20 minutes every test change...

Great thanks trying it out today.

Announcement

where do I find git sources for a test?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment