boost.compute使用gpu计算（c++）

boost

怪力左手

5567人浏览 · 2022-04-22 17:39:45

怪力左手 · 2022-04-22 17:39:45 发布

boost.compute

https://github.com/boostorg/compute

编译错误

cl.h找不到

下载opencl的头文件，icd（源码）和demo
https://github.com/KhronosGroup/OpenCL-Headers.git 只有头文件
https://github.com/KhronosGroup/OpenCL-ICD-Loader.git opencl的实现libary
https://github.com/KhronosGroup/OpenCL-CLHPP.git c++封装可以不用

min、max函数找不到

#include <boost/compute.hpp> 头文件放在最上面，避免boost内部其他模块导致函数找不到

demo

compute::device gpu = compute::system::default_device();
// create a compute context and command queue
compute::context ctx(gpu);
compute::command_queue queue(ctx, gpu);
// generate random numbers on the host
std::vector<float> host_vector(1000000);
std::generate(host_vector.begin(), host_vector.end(), rand);
// create vector on the device
compute::vector<float> device_vector(1000000, ctx);
// copy data to the device
compute::copy(host_vector.begin(), host_vector.end(), device_vector.begin(), queue);
// sort data on the device
compute::sort(device_vector.begin(), device_vector.end(), queue);
// copy data back to the host
compute::copy(device_vector.begin(), device_vector.end(), host_vector.begin(), queue);

boost.compute自定义函数

//////////////////////////////////////////////////////////方法1
boost::compute::function<int (int)> add_four =
    boost::compute::make_function_from_source<int (int)>(
        "add_four",
        "int add_four(int x) { return x + 4; }"
    );
//////////////////////////////////////////////////////////方法2
BOOST_COMPUTE_FUNCTION(int, add_four, (int x),
{
    return x + 4;
});
//////////////////////////////////////////////////////////
boost::compute::transform(input.begin(), input.end(), output.begin(), add_four, queue);   //input、output参数传递

opencl自定义函数核函数限定

所有核函数返回都是void

_host_，cpu函数，不加标注默认都是该类型函数
_kernel_，设备上执行，设备上调用，异步执行
_global_，设备上执行，主机cpu上调用函数，异步执行

__global__ void fun(void)
{
    int a=3;
    printf("%d\n", a);
    fun1();
    printf("hello world from GPU\n");
}
int main()
{
    fun<<<1,2>>>();//（grid，block）线程布局，1个块grid，2个线程
	clEnqueueNDRangeKernel(command1,fun1); 
	clfinish(command1); //阻塞等待返回
}

opencv加速

opencv编译参数，with_opencl自动连接opencl的库加速opencv计算

自定义函数遍历像素，可以使用openmp(cpu多线程)或者opencl（gpu异步）加速算法执行。

九章云极普惠算力

更多推荐

ollama使用gpu运行大模型

九章云极普惠算力

成功安装faiss-gpu

九章云极普惠算力

tensorflow-gpu缺少的.dll文件下载：一键解决TensorFlow GPU运行难题

tensorflow-gpu缺少的.dll文件下载：一键解决TensorFlow GPU运行难题【下载地址】tensorflow-gpu缺少的.dll文件下载此仓库为TensorFlow GPU版本用户提供了运行所需的缺失动态链接库文件，如cudart64_110.dll、cublas64_11.dll等。这些文件是...