| commit | 1cd03ba3888297bc945f2c84574e105e3ced3e34 | [log] [tgz] |
|---|---|---|
| author | Jeremy Kemp <[email protected]> | Wed Jun 26 15:15:43 2024 +0100 |
| committer | Jeremy Kemp <[email protected]> | Wed Jun 26 15:15:43 2024 +0100 |
| tree | 2a3ce03a466b1eae1410de88886772f3e5e8a81b | |
| parent | 98f5b2645e18fc90ab9b79869c4abd0bfd125d83 [diff] |
Explicitly link against the OpenCL ICD Bug: 349574403 Test: m clpeak Change-Id: Ib8b92327369e48b6eedff1779318aa1f27355cd1
A synthetic benchmarking tool to measure peak capabilities of opencl devices. It only measures the peak metrics that can be achieved using vector operations and does not represent a real-world use case
git submodule update --init --recursive --remote mkdir build cd build cmake .. cmake --build .
Platform: NVIDIA CUDA Device: Tesla V100-SXM2-16GB Driver version : 390.77 (Linux x64) Compute units : 80 Clock frequency : 1530 MHz Global memory bandwidth (GBPS) float : 767.48 float2 : 810.81 float4 : 843.06 float8 : 726.12 float16 : 735.98 Single-precision compute (GFLOPS) float : 15680.96 float2 : 15674.50 float4 : 15645.58 float8 : 15583.27 float16 : 15466.50 No half precision support! Skipped Double-precision compute (GFLOPS) double : 7859.49 double2 : 7849.96 double4 : 7832.96 double8 : 7799.82 double16 : 7740.88 Integer compute (GIOPS) int : 15653.47 int2 : 15654.40 int4 : 15655.21 int8 : 15659.04 int16 : 15608.65 Transfer bandwidth (GBPS) enqueueWriteBuffer : 10.64 enqueueReadBuffer : 11.92 enqueueMapBuffer(for read) : 9.97 memcpy from mapped ptr : 8.62 enqueueUnmap(after write) : 11.04 memcpy to mapped ptr : 9.16 Kernel launch latency : 7.22 us