nvprof --query-events

人生处万类,知识最为贤。这篇文章主要讲述nvprof --query-events相关的知识,希望能为你提供帮助。


Available Events:
Name Description
【nvprof --query-events】Device 0 (GeForce GTX 970M):


Domain domain_a:

elapsed_cycles_sm: Elapsed clocks



Domain domain_b:
fb_subp0_read_sectors: Number of DRAM read requests to sub partition 0, increments by 1 for 32 byte access.
fb_subp1_read_sectors: Number of DRAM read requests to sub partition 1, increments by 1 for 32 byte access.
fb_subp0_write_sectors: Number of DRAM write requests to sub partition 0, increments by 1 for 32 byte access.
fb_subp1_write_sectors: Number of DRAM write requests to sub partition 1, increments by 1 for 32 byte access.



Domain domain_c:
gld_inst_8bit: Total number of 8-bit global load instructions that are executed by all the threads across all thread blocks.
gld_inst_16bit: Total number of 16-bit global load instructions that are executed by all the threads across all thread blocks.
gld_inst_32bit: Total number of 32-bit global load instructions that are executed by all the threads across all thread blocks.
gld_inst_64bit: Total number of 64-bit global load instructions that are executed by all the threads across all thread blocks.
gld_inst_128bit: Total number of 128-bit global load instructions that are executed by all the threads across all thread blocks.
gst_inst_8bit: Total number of 8-bit global store instructions that are executed by all the threads across all thread blocks.
gst_inst_16bit: Total number of 16-bit global store instructions that are executed by all the threads across all thread blocks.
gst_inst_32bit: Total number of 32-bit global store instructions that are executed by all the threads across all thread blocks.
gst_inst_64bit: Total number of 64-bit global store instructions that are executed by all the threads across all thread blocks.
gst_inst_128bit: Total number of 128-bit global store instructions that are executed by all the threads across all thread blocks.



Domain domain_d:
warps_launched: Number of warps launched.
inst_issued0: Number of cycles that did not issue any instruction, increments per warp.
inst_issued1: Number of cycles that issued single instruction, increments per warp.
inst_issued2: Number of cycles that issued dual instructions, increments per warp.
inst_executed: Number of instructions executed per warp.
local_store: Number of executed store instructions where state space is specified as local, increments per warp on a multiprocessor.
local_load: Number of executed load instructions where state space is specified as local, increments per warp on a multiprocessor.
shared_load: Number of executed load instructions where state space is specified as shared, increments per warp on a multiprocessor.
shared_store: Number of executed store instructions where state space is specified as shared, increments per warp on a multiprocessor.
shared_atom_cas: Number of ATOMS.CAS instructions executed per warp.
shared_atom: Number of ATOMS instructions executed per warp.
global_atom_cas: Number of ATOM.CAS instructions executed per warp.
atom_count: Number of ATOM instructions executed per warp.
global_load: Number of executed load instructions where state space is specified as global, increments per warp on a multiprocessor.
global_store: Number of executed store instructions where state space is specified as global, increments per warp on a multiprocessor.
gred_count: Number of reduction operations performed per warp.
branch: Number of branch instructions executed per warp on a multiprocessor.
active_cycles: Number of cycles a multiprocessor has at least one active warp.
sm_cta_launched: Number of blocks launched
shared_ld_bank_conflict: Number of shared load bank conflict generated when the addresses for two or more shared memory load requests fall in the same memory bank.
shared_st_bank_conflict: Number of shared store bank conflict generated when the addresses for two or more shared memory store requests fall in the same memory bank.



Domain domain_e:





    推荐阅读