男儿欲遂平生志,六经勤向窗前读。这篇文章主要讲述Apple Metal元素矩阵乘法(Hadamard产品)相关的知识,希望能为你提供帮助。
是否可以使用Apple的金属性能着色器执行Hadamard产品?我看到可以使用this执行正常的矩阵乘法,但我特意寻找一个逐元素乘法,或者一个巧妙的方法来构造一个。 (例如,是否可以将MPSMatrix转换为MPSVector,然后使用向量执行产品?)
更新:我感谢使用着色器的建议!我正在开发一个实现,看起来很有前途!一旦我有工作,我会发布解决方案。
答案好的,根据评论者的建议在这里回答我自己的问题 - 试着写我自己的着色器!
这是着色器代码:
#include <
metal_stdlib>
using namespace metal;
/*
hadamardProduct:
Perform an element-wise multiplication (hadamard product) of the two input matrices A and B, store the result in C
*/
kernel void hadamardProductKernel(
texture_buffer<
float, access::read>
A [[texture(0)]],
texture_buffer<
float, access::read>
B [[texture(1)]],
texture_buffer<
float, access::write>
C [[texture(2)]],
uint gid [[thread_position_in_grid]]) {
// C[i,j] = A[i,j] * B[i,j]
C.write(A.read(gid) * B.read(gid), gid);
}
在两个4x4矩阵上执行着色器的swift:
import Foundation
import Metal
import MetalKitguard
let gpu = MTLCreateSystemDefaultDevice(),
let commandQueue = gpu.makeCommandQueue(),
let commandBuffer = commandQueue.makeCommandBuffer(),
let defaultLibrary = gpu.makeDefaultLibrary(),
let kernelFunction = defaultLibrary.makeFunction(name: "hadamardProductKernel")
else {exit(1)}// Create the matrices to multiply (as row-major matrices)
var A:[Float] = [2,0,0,0,
0,2,0,0,
0,0,2,0,
0,0,0,2]var B:[Float] = [1,0,0,0,
0,2,0,0,
0,0,3,0,
0,0,0,4]let A_buffer = gpu.makeTexture(descriptor: MTLTextureDescriptor.textureBufferDescriptor(with: .r32Float,
width: 16,
resourceOptions: .storageModeManaged,
usage: .shaderRead))
let B_buffer = gpu.makeTexture(descriptor: MTLTextureDescriptor.textureBufferDescriptor(with: .r32Float,
width: 16,
resourceOptions: .storageModeManaged,
usage: .shaderRead))
let C_buffer = gpu.makeTexture(descriptor: MTLTextureDescriptor.textureBufferDescriptor(with: .r32Float,
width: 16,
resourceOptions: .storageModeManaged,
usage: .shaderWrite))
A_buffer?.replace(region: MTLRegionMake1D(0, 16),
mipmapLevel: 0,
withBytes: UnsafeRawPointer(A),
bytesPerRow: 64)
B_buffer?.replace(region: MTLRegionMake1D(0, 16),
mipmapLevel: 0,
withBytes: UnsafeRawPointer(B),
bytesPerRow: 64)let computePipelineState = try gpu.makeComputePipelineState(function: kernelFunction)
let computeEncoder = commandBuffer.makeComputeCommandEncoder()
computeEncoder?.setComputePipelineState(computePipelineState)
computeEncoder?.setTexture(A_buffer, index: 0)
computeEncoder?.setTexture(B_buffer, index: 1)
computeEncoder?.setTexture(C_buffer, index: 2)
let threadGroupSize = MTLSize(width: 16, height: 1, depth: 1)
let threadGroupCount = MTLSize(width: 1, height: 1, depth: 1)
computeEncoder?.dispatchThreadgroups(threadGroupCount, threadsPerThreadgroup: threadGroupSize)
computeEncoder?.endEncoding()
commandBuffer.commit()
commandBuffer.waitUntilCompleted()print("done")
【Apple Metal元素矩阵乘法(Hadamard产品)】感谢任何链接到资源的评论,以进一步了解这类事情。
推荐阅读
- App在后台运行时检测到解锁事件
- 如何使quickxml ObjectMapper与codehaus注释一起使用
- 如何在Android和iPhone的移动应用程序中实现推荐计划
- 如何在命令行上使用相同的命令,通过Ruby shell命令运行app
- 为什么`android(foreground`属性不起作用())
- 如何在android表格布局中合并两行
- 如何将xml文件保存在我想要的目录中(android studio)[关闭]
- 应用程序图标缺失/未出现android studio
- Android资源链接失败('à¥?dp'与属性layout_marginBottom(attr)维度不兼容)