commit	966912e639652a3f227ccd8174b824379f8892b2	[log] [tgz]
author	Sadaf Ebrahimi <[email protected]>	Fri Jan 17 08:19:57 2025 -0800
committer	Automerger Merge Worker <[email protected]>	Fri Jan 17 08:19:57 2025 -0800
tree	2169851995877c2ca10427b5008b7aab2cdb43a9
parent	8a5fd62fb0500d1408859f7b596ebb7e4e11ac99 [diff]
parent	6b1f0df7564941fa67ceba937b7f5fcb36996599 [diff]

tree: 2169851995877c2ca10427b5008b7aab2cdb43a9

README.md

clpeak

A synthetic benchmarking tool to measure peak capabilities of opencl devices. It only measures the peak metrics that can be achieved using vector operations and does not represent a real-world use case

Building

git submodule update --init --recursive --remote
mkdir build
cd build
cmake ..
cmake --build .

Sample

Platform: NVIDIA CUDA
  Device: Tesla V100-SXM2-16GB
    Driver version  : 390.77 (Linux x64)
    Compute units   : 80
    Clock frequency : 1530 MHz

    Global memory bandwidth (GBPS)
      float   : 767.48
      float2  : 810.81
      float4  : 843.06
      float8  : 726.12
      float16 : 735.98

    Single-precision compute (GFLOPS)
      float   : 15680.96
      float2  : 15674.50
      float4  : 15645.58
      float8  : 15583.27
      float16 : 15466.50

    No half precision support! Skipped

    Double-precision compute (GFLOPS)
      double   : 7859.49
      double2  : 7849.96
      double4  : 7832.96
      double8  : 7799.82
      double16 : 7740.88

    Integer compute (GIOPS)
      int   : 15653.47
      int2  : 15654.40
      int4  : 15655.21
      int8  : 15659.04
      int16 : 15608.65

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 10.64
      enqueueReadBuffer          : 11.92
      enqueueMapBuffer(for read) : 9.97
        memcpy from mapped ptr   : 8.62
      enqueueUnmap(after write)  : 11.04
        memcpy to mapped ptr     : 9.16

    Kernel launch latency : 7.22 us