| commit | 966912e639652a3f227ccd8174b824379f8892b2 | [log] [tgz] |
|---|---|---|
| author | Sadaf Ebrahimi <[email protected]> | Fri Jan 17 08:19:57 2025 -0800 |
| committer | Automerger Merge Worker <[email protected]> | Fri Jan 17 08:19:57 2025 -0800 |
| tree | 2169851995877c2ca10427b5008b7aab2cdb43a9 | |
| parent | 8a5fd62fb0500d1408859f7b596ebb7e4e11ac99 [diff] | |
| parent | 6b1f0df7564941fa67ceba937b7f5fcb36996599 [diff] |
Upgrade clpeak to 1.1.4 am: 6b1f0df756 Original change: https://android-review.googlesource.com/c/platform/external/clpeak/+/3454911 Change-Id: I23e7082610d0502054e949e2d9ff10d0cb4b415e Signed-off-by: Automerger Merge Worker <[email protected]>
A synthetic benchmarking tool to measure peak capabilities of opencl devices. It only measures the peak metrics that can be achieved using vector operations and does not represent a real-world use case
git submodule update --init --recursive --remote mkdir build cd build cmake .. cmake --build .
Platform: NVIDIA CUDA Device: Tesla V100-SXM2-16GB Driver version : 390.77 (Linux x64) Compute units : 80 Clock frequency : 1530 MHz Global memory bandwidth (GBPS) float : 767.48 float2 : 810.81 float4 : 843.06 float8 : 726.12 float16 : 735.98 Single-precision compute (GFLOPS) float : 15680.96 float2 : 15674.50 float4 : 15645.58 float8 : 15583.27 float16 : 15466.50 No half precision support! Skipped Double-precision compute (GFLOPS) double : 7859.49 double2 : 7849.96 double4 : 7832.96 double8 : 7799.82 double16 : 7740.88 Integer compute (GIOPS) int : 15653.47 int2 : 15654.40 int4 : 15655.21 int8 : 15659.04 int16 : 15608.65 Transfer bandwidth (GBPS) enqueueWriteBuffer : 10.64 enqueueReadBuffer : 11.92 enqueueMapBuffer(for read) : 9.97 memcpy from mapped ptr : 8.62 enqueueUnmap(after write) : 11.04 memcpy to mapped ptr : 9.16 Kernel launch latency : 7.22 us