知是行的主意,行是知的功夫。这篇文章主要讲述nvcc --help 命令备忘相关的知识,希望能为你提供帮助。
C:\\Users\\panda> nvcc --help
Usage: nvcc [options] < inputfile>
Options for specifying the compilation phase
============================================
More exactly, this option specifies up to which stage the input files must be compiled,
according to the following compilation trajectories for different input file types:
.c/.cc/.cpp/.cxx : preprocess, compile, link
.o: link
.i/.ii: compile, link
.cu: preprocess, cuda frontend, PTX assemble,
merge with host C code, compile, link
.gpu: cicc compile into cubin
.ptx: PTX assemble into cubin.
--cuda(-cuda)
Compile all .cu input files to .cu.cpp.ii output.
--cubin (-cubin)
step discards the host code for each .cu input file.
--fatbin(-fatbin)
Compile all .cu/.gpu/.ptx/.cubin input files to device-only .fatbin files.
This step discards the host code for each .cu input file.
--ptx(-ptx)
the host code for each of these input file.
--preprocess(-E)
Preprocess all .c/.cc/.cpp/.cxx/.cu input files.
--generate-dependencies(-M)
Generate a dependency file that can be included in a make file for the .c/.cc/.cpp/.cxx/.cu
input file (more than one are not allowed in this mode).
--compile(-c)
Compile each .c/.cc/.cpp/.cxx/.cu input file into an object file.
--device-c(-dc)
Compile each .c/.cc/.cpp/.cxx/.cu input file into an object file that contains
--compile.
--device-w(-dw)
Compile each .c/.cc/.cpp/.cxx/.cu input file into an object file that contains
--compile.
--device-link(-dlink)
Link object files with relocatable device code and .ptx/.cubin/.fatbin files
into an object file with executable device code, which can be passed to the
host linker.
--link(-link)
This option specifies the default behavior: compile and link all inputs.
--lib(-lib)
Compile all inputs into object files (if necessary) and add the results to
the specified output library file.
--run(-run)
This option compiles and links all inputs into an executable, and executes
to be bothered with setting the necessary environment variables; these are
set temporarily by nvcc).
File and path specifications.
=============================
--output-file < file> (-o)
allowed when this option is present in nvcc non-linking/archiving mode.
Specify header files that must be preincluded during preprocessing.
Specify libraries to be used in the linking stage without the library file
have been specified using option --library-path.
Specify macro definitions to define for use during preprocessing or compilation.
Undefine macro definitions during preprocessing or compilation.
Specify include search paths.
Specify system include search paths.
Specify library search paths.
--output-directory < directory> (-odir)
the dependency generation step (see --generate-dependencies) generate a
rule that defines the target object file in the proper directory.
--compiler-bindir < path> (-ccbin)
host compiler executable name can be also specified to ensure that the correct
--dependency-drive-prefix, or --drive-prefix) may need to be specified,
if nvcc is executed in a Cygwin shell or a MinGW shell on Windows.
--cudartnone|shared|static(-cudart)
Specify the type of CUDA runtime library to be used: no CUDA runtime library,
shared/dynamic CUDA runtime library, or static CUDA runtime library.
Allowed values for this option:none,shared,static.
Default value:static.
--libdevice-directory < directory> (-ldir)
Specify the directory that contains the libdevice library files when option
nvvm/libdevice directory in the CUDA toolkit.
--cl-version < cl-version-number> --cl-version < cl-version-number>
option is to be used in conjunction with --use-local-env, and is ignored
been deprecated.
Allowed values for this option:2010,2012,2013,2015,2017.
--use-local-env--use-local-env
Specify whether the environment is already set up for the host compiler.
Options for specifying behavior of compiler/linker.
===================================================
--profile(-pg)
Instrument generated code/executable for use by gprof (Linux only).
--debug (-g)
Generate debug information for host code.
--device-debug(-G)
Dont use for profiling; use -lineinfo instead.
--generate-line-info(-lineinfo)
Generate line-number information for device code.
--optimize < level> (-O)
Specify optimization level for host code.
--ftemplate-backtrace-limit < limit> (-ftemplate-backtrace-limit)
Set the maximum number of template instantiation notes for a single warning
provides an equivalent flag.
--ftemplate-depth < limit> (-ftemplate-depth)
value is also passed to the host compiler if it provides an equivalent flag.
--shared(-shared)
when other linker options are required for more control.
--x c|c++|cu(-x)
Explicitly specify the language for the input files, rather than letting
the compiler choose a default based on the file name suffix.
Allowed values for this option:c,c++,cu.
--std c++03|c++11|c++14(-std)
dialect flag for the host compiler.
Allowed values for this option:c++03,c++11,c++14.
--no-host-device-initializer-list(-nohdinitlist)
Do not implicitly consider member functions of std::initializer_list as __host__
__device__ functions.
--no-host-device-move-forward(-nohdmoveforward)
Do not implicitly consider std::move and std::forward as __host__ __device__
function templates.
--expt-relaxed-constexpr(-expt-relaxed-constexpr)
Experimental flag: Allow host code to invoke __device__ constexpr functions,
and device code to invoke __host__ constexpr functions.
--expt-extended-lambda(-expt-extended-lambda)
Experimental flag: Allow __host__, __device__ annotations in lambda declaration.
--machine 32|64(-m)
Specify 32 vs 64 bit architecture.
Allowed values for this option:32,64.
Default value:64.
Options for passing specific phase options
==========================================
these, users have the ability to pass options to the lower level compilation tools,
without the need for nvcc to know about each and every such option.
Specify options directly to the compiler/preprocessor.
Specify options directly to the host linker.
Specify options directly to library manager.
Specify options directly to ptxas, the PTX optimizing assembler.
Specify options directly to nvlink.
Miscellaneous options for guiding the compiler driver.
======================================================
--dont-use-profile(-noprof)
the profile file is not used.
--dryrun(-dryrun)
them.
--verbose(-v)
List the compilation commands generated by this compiler driver, but do not
suppress their execution.
--keep(-keep)
Keep all intermediate files that are generated during internal compilation
steps.
--keep-dir < directory> (-keep-dir)
Keep all intermediate files that are generated during internal compilation
steps in this directory.
--save-temps(-save-temps)
This option is an alias of --keep.
--clean-targets(-clean)
would otherwise create will be deleted.
Used in combination with option --run to specify command line arguments for
the executable.
--input-drive-prefix < prefix> (-idp)
On Windows, all command line arguments that refer to file names must be converted
to the Windows native format before they are passed to pure Windows executables.
This option specifies how the current development environment represents
and / as < prefix> for MinGW.
--dependency-drive-prefix < prefix> (-ddp)
On Windows, when generating dependency files (see --generate-dependencies),
all file names must be converted appropriately for the instance of make
paths in the native Windows format, which depends on the environment in which
the native Windows format by specifying nothing.
--drive-prefix < prefix> (-dp)
Specifies < prefix> as both --input-drive-prefix and --dependency-drive-prefix.
--dependency-target-name < target> (-MT)
Specify the target name of the generated rule when generating a dependency
file (see --generate-dependencies).
--no-align-double--no-align-double
Specifies that -malign-double should not be passed as a compiler argument
kernel ABI for certain 64-bit types.
--no-device-link(-nodlink)
Skip the device link step when linking object files.
Options for steering GPU code generation.
=========================================
--gpu-architecture < arch> (-arch)
Specify the name of the class of NVIDIA virtual GPU architecture for which
the CUDA input files must be compiled.
With the exception as described for the shorthand below, the architecture
specified with this option must be a virtual architecture (such as compute_50).
Normally, this option alone does not trigger assembly of the generated PTX
for a real architecture (that is the role of nvcc option --gpu-code,
see below); rather, its purpose is to control preprocessing and compilation
of the input to PTX.
For convenience, in case of simple nvcc compilations, the following shorthand
situation, as only exception to the description above, the value specified
for --gpu-architecture may be a real architecture (such as a sm_50),
in which case nvcc uses the specified real architecture and its closest
--gpu-architecture=sm_50 is equivalent to nvcc --gpu-architecture=compute_50
--gpu-code=sm_50,compute_50.
Allowed values for this option:compute_30,compute_32,compute_35,
compute_37,compute_50,compute_52,compute_53,compute_60,compute_61,
compute_62,compute_70,compute_72,sm_30,sm_32,sm_35,sm_37,sm_50,
sm_52,sm_53,sm_60,sm_61,sm_62,sm_70,sm_72.
Specify the name of the NVIDIA GPU to assemble and optimize PTX for.
nvcc embeds a compiled code image in the resulting executable for each specified
< code> architecture, which is a true binary load image for each real architecture
(such as sm_50), and PTX code for the virtual architecture (such as compute_50).
During runtime, such embedded PTX code is dynamically compiled by the CUDA
runtime system if no binary load image is found for the current GPU.
Architectures specified for options --gpu-architecture and --gpu-code
may be virtual as well as real, but the < code> architectures must be
used, the value for the --gpu-architecture option must be a virtual PTX
architecture.
For instance, --gpu-architecture=compute_35 is not compatible with --gpu-code=sm_30,
because the earlier compilation stages will assume the availability of compute_35
features that are not present on sm_30.
Allowed values for this option:compute_30,compute_32,compute_35,
compute_37,compute_50,compute_52,compute_53,compute_60,compute_61,
compute_62,compute_70,compute_72,sm_30,sm_32,sm_35,sm_37,sm_50,
sm_52,sm_53,sm_60,sm_61,sm_62,sm_70,sm_72.
This option provides a generalization of the --gpu-architecture=< arch> --gpu-code=< code> ,
... option combination for specifying nvcc behavior with respect to code
real architectures with the PTX for the same virtual architecture, option
--generate-code allows multiple PTX generations for different virtual
... is equivalent to --generate-code arch=< arch> ,code=< code> ,....
--generate-code options may be repeated for different virtual architectures.
Allowed keywords for this option:arch,code.
--relocatable-device-code true|false(-rdc)
before it can be executed.
Default value:false.
default, code will be generated for all entry functions.
--maxrregcount < amount> (-maxrregcount)
Specify the maximum amount of registers that GPU functions can use.
Until a function-specific limit, a higher value will generally increase the
because thread registers are allocated from a global register pool on each
GPU, a higher value of this option will also reduce the maximum thread block
value is the result of a trade-off.
If this option is not specified, then no maximum is assumed.
Value less than the minimum registers required by ABI will be bumped up by
the compiler to ABI minimum limit.
User program may not be able to make use of all registers as some registers
are reserved by compiler.
--use_fast_math(-use_fast_math)
--prec-sqrt=false --fmad=true.
--ftz true|false(-ftz)
implies --ftz=true.
Default value:false.
--prec-div true|false(-prec-div)
This option controls single-precision floating-point division and reciprocals.
--prec-div=true enables the IEEE round-to-nearest mode and --prec-div=false
Default value:true.
--prec-sqrt true|false(-prec-sqrt)
enables the IEEE round-to-nearest mode and --prec-sqrt=false enables the
Default value:true.
--fmad true|false(-fmad)
This option enables (disables) the contraction of floating-point multiplies
and adds/subtracts into floating-point multiply-add operations (FMAD, FFMA,
Default value:true.
Options for steering cuda compilation.
======================================
--default-stream legacy|null|per-thread(-default-stream)
Specify the stream that CUDA commands from the compiled program will be sent
to by default.
legacy
The CUDA legacy stream (per context, implicitly synchronizes with
other streams).
per-thread
A normal CUDA stream (per thread, does not implicitly
synchronize with other streams).
null is a deprecated alias for legacy.
Allowed values for this option:legacy,null,per-thread.
Default value:legacy.
Generic tool options.
=====================
--disable-warnings(-w)
Inhibit all warning messages.
--keep-device-functions(-keep-device-functions)
In whole program compilation mode, preserve user defined external linkage
__device__ function definitions up to PTX.
--source-in-ptx(-src-in-ptx)
or --generate-line-info.
--restrict(-restrict)
Programmer assertion that all kernel pointer parameters are restrict pointers.
--Wreorder(-Wreorder)
Generate warnings when member initializers are reordered.
--Wno-deprecated-declarations(-Wno-deprecated-declarations)
Suppress warning on use of deprecated entity.
--Wno-deprecated-gpu-targets(-Wno-deprecated-gpu-targets)
Suppress warnings about deprecated GPU target architectures.
of warning kinds accepted by this option:
cross-execution-space-call
Be more strict about unsupported cross execution space calls.
The compiler will generate an error instead of a warning for a
call from a __host__ __device__ to a __host__ function.
reorder
Generate errors when member initializers are reordered.
deprecated-declarations
Generate error on use of a deprecated entity.
Allowed values for this option:cross-execution-space-call,deprecated-declarations,
reorder.
--resource-usage(-res-usage)
Show resource usage such as registers and memory of the GPU code.
This option implies --nvlink-options --verbose when --relocatable-device-code=true
--help(-h)
Print this help information on this tool.
--version(-V)
Print version information on this tool.
Include command line options from specified file.
【nvcc --help 命令备忘】
推荐阅读
- Android Coverflow Gallery 的关键源码解析AndroidOpenGL
- 像素缓冲区对象(PBO) 的Streaming-Texture上传 源码解析
- 怎样禁用可移动存储设备,例如 U盘
- 像素缓冲区对象(PBO)的异步Read-Back 源码解析
- nvprof --query-events
- 《OpenGL 超级宝典(Super Bible)第五版》 有关 PBO 的 Example
- [swscaler] Warning: data is not aligned! This can lead to a speedloss 的解决方法FFmpeg
- CUDA C 最佳实践(控制流读书笔记)
- #星光计划2.0# 构建HarmonyOS 3D游戏