commit	1cd03ba3888297bc945f2c84574e105e3ced3e34	[log] [tgz]
author	Jeremy Kemp <[email protected]>	Wed Jun 26 15:15:43 2024 +0100
committer	Jeremy Kemp <[email protected]>	Wed Jun 26 15:15:43 2024 +0100
tree	2a3ce03a466b1eae1410de88886772f3e5e8a81b
parent	98f5b2645e18fc90ab9b79869c4abd0bfd125d83 [diff]

tree: 2a3ce03a466b1eae1410de88886772f3e5e8a81b

README.md

clpeak

A synthetic benchmarking tool to measure peak capabilities of opencl devices. It only measures the peak metrics that can be achieved using vector operations and does not represent a real-world use case

Building

git submodule update --init --recursive --remote
mkdir build
cd build
cmake ..
cmake --build .

Sample

Platform: NVIDIA CUDA
  Device: Tesla V100-SXM2-16GB
    Driver version  : 390.77 (Linux x64)
    Compute units   : 80
    Clock frequency : 1530 MHz

    Global memory bandwidth (GBPS)
      float   : 767.48
      float2  : 810.81
      float4  : 843.06
      float8  : 726.12
      float16 : 735.98

    Single-precision compute (GFLOPS)
      float   : 15680.96
      float2  : 15674.50
      float4  : 15645.58
      float8  : 15583.27
      float16 : 15466.50

    No half precision support! Skipped

    Double-precision compute (GFLOPS)
      double   : 7859.49
      double2  : 7849.96
      double4  : 7832.96
      double8  : 7799.82
      double16 : 7740.88

    Integer compute (GIOPS)
      int   : 15653.47
      int2  : 15654.40
      int4  : 15655.21
      int8  : 15659.04
      int16 : 15608.65

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 10.64
      enqueueReadBuffer          : 11.92
      enqueueMapBuffer(for read) : 9.97
        memcpy from mapped ptr   : 8.62
      enqueueUnmap(after write)  : 11.04
        memcpy to mapped ptr     : 9.16

    Kernel launch latency : 7.22 us