Measuring traffic on the PCI Express Bus (PCIe)

During my talk at the parallel 2015 conference i was asked how one can measure traffic on the PCI express bus. For multi GPU computing it is very important to control the amount of data exchanged on the PCIe bus.

You need the Intel Performance Counter Monitor. Compile it and copy pcm-pcie.exe into a new directory.

Then read this helpful article of how to obtain the missing WinRing-Dlls and drivers. Copy them into the same directory, start the cmd.exe as an admin and there you go.

Now you can analyse the traffic on the PCI bus.

C:\\IntelPerformanceCounterMonitorV2.8\\bin> pcm-pcie.exe
DEBUG: Setting Ctrl+C done.

Intel(r) Performance Counter Monitor: PCIe Bandwidth Monitoring Utility

Copyright (c) 2013-2014 Intel Corporation
This utility measures PCIe bandwidth in real-time

PCIe event definitions (each event counts as a transfer):
PCIe read events (PCI devices reading from memory - application writes to disk/network/PCIe device):
PCIePRd   - PCIe UC read transfer (partial cache line)
PCIeRdCur* - PCIe read current transfer (full cache line)
On Haswell Server PCIeRdCur counts both full/partial cache lines
RFO*      - Demand Data RFO
CRd*      - Demand Code Read
DRd       - Demand Data Read
PCIeNSWr  - PCIe Non-snoop write transfer (partial cache line)
PRd       - MMIO Read [Haswell Server only: PL verify this on IVT] (Partial Cache Line)
PCIe write events (PCI devices writing to memory - application reads from disk/network/PCIe device):
PCIeWiLF  - PCIe Write transfer (non-allocating) (full cache line)
PCIeItoM  - PCIe Write transfer (allocating) (full cache line)
PCIeNSWr  - PCIe Non-snoop write transfer (partial cache line)
PCIeNSWrF - PCIe Non-snoop write transfer (full cache line)
ItoM      - PCIe write full cache line
RFO       - PCIe parial Write
WiL       - MMIO Write (Full/Partial)

* - NOTE: Depending on the configuration of your BIOS, this tool may report '0' if the message
has not been selected.

Starting MSR service failed with error 2 The system cannot find the file specified.
Trying to load winring0.dll/winring0.sys driver...
Using winring0.dll/winring0.sys driver.

Number of physical cores: 6
Number of logical cores: 12
Number of online logical cores: 12
Threads (logical cores) per physical core: 2
Num sockets: 1
Physical cores per socket: 6
Core PMU (perfmon) version: 3
Number of core PMU generic (programmable) counters: 4
Width of generic (programmable) counters: 48 bits
Number of core PMU fixed counters: 3
Width of fixed counters: 48 bits
Nominal core frequency: 3500000000 Hz
Package thermal spec power: 140 Watt; Package minimum power: 47 Watt; Package maximum power: 0 Watt;
2 memory controllers detected with total number of 5 channels.

Detected Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz "Intel(r) microarchitecture codename Haswell-EP/EN/EX"
Update every 1 seconds
delay_ms: 84
Skt | PCIeRdCur |  RFO  |  CRd  |  DRd  |  ItoM  |  PRd  |  WiL
0    4236         576    5980 K  1536 K     48    3456    7116
*    4236         576    5980 K  1536 K     48    3456    7116

Remark: This post was adapted to the new blog format in November 2016.

 "Vortrag: 'Multi-GPU-Computing: Eins, zwei, drei, ganz viele'" "3D-Druck und Industrie 4.0 für Kleinunternehmen"