C++|CUDA 进阶编程 Thrust库使用-算法与迭代器 C++|CUDA

文章目录

device_ptr
for_each, transform, copy等简单算法

自定义function
与STL进行配合

device_ptr thrust提供了一个ptr, 叫device_ptr。
但是注意, 这个不是类似auto_ptr, 也不是类似shared_ptr, 只是为了类型安全做的简单的封装，从device_ptr的源码里面，我们没有看到对应的析构代码

size_t N = 10; thrust::device_ptr dev_ptr = thrust::device_malloc(N);

【C++|CUDA 进阶编程 Thrust库使用-算法与迭代器】如上面的代码，当然最后我们还要用thrust::device_free析构掉
同时，也支持获取原始的指针出来使用

int * raw_ptr = thrust::raw_pointer_cast(dev_ptr);

for_each, transform, copy等简单算法

#include #include #include #include #include #include #include #include int main(void) { thrust::device_vector X(10000); thrust::device_vector Y(10000); thrust::device_vector Z(10000); thrust::sequence(X.begin(), X.end()); thrust::transform(X.begin(), X.end(), Y.begin(), thrust::negate()); thrust::fill(Z.begin(), Z.end(), 2); thrust::transform(X.begin(), X.end(), Z.begin(), Y.begin(), thrust::modulus()); thrust::replace(Y.begin(), Y.end(), 1, 10); return 0; }

上面的算法看起来都很熟悉，都是STL里面拥有的，但是不同的时候，里面的都是并行的运行, 看Nsight里面的报告,可以看出开了多个grid和block，例如下面的sequence的launch详情,可以看到开了多个的线程去执行这段代码

文章图片

自定义function
目前thrust并没有提供std::function来与thrust的算法进行配合，所以只能使用仿函数的方式进行

struct saxpy_functor { const float a; saxpy_functor(float _a) : a(_a) {}__host__ __device__ float operator()(const float& x, const float& y) const { return a * x + y; } }; void saxpy_fast(float A, thrust::device_vector& X, thrust::device_vector& Y) { thrust::transform(X.begin(), X.end(), Y.begin(), Y.begin(), saxpy_functor(A)); }

与STL进行配合

thrust::device_vector X(100000); thrust::sequence(X.begin(), X.end()); std::vector s(100000); std::copy(X.begin(), X.end(), s.begin()); thrust::copy(X.begin(), X.end(), s.begin());

上面分别使用了std的和thrust的copy进行复制，std的版本在我的机子上要5s,而thrust的版本200us就可以了，速度差距还是非常明显的