DFT Parallelization in OpenMP 后端

【DFT Parallelization in OpenMP】DFT Parallelization in OpenMP
The FFT is the most famous library to solve a Fourier Transform (FT) using
the Fast Fourier Transform algorithm. The goal of this exercise is to develop a
fast DFT (Discrete Fourier Transform, a different algorithm to calculate FT) by
parallelizing the serial C code that is provided as DFT_seq.c.
The code calculates direct and inverse DFTs, timing the total time taken for the
computation of the two DFTs. The computational core of the program is the
DFT subroutine that takes idft argument as 1 for the direct DFT and -1 for the
inverse DFT.
The code also prints the value of the first element of the DFT computation. This
is equal to the input size (N). You can use this number to quickly check the
correctness of your implementation.
You will focus on parallelizing the DFT function:
int DFT(int idft, double xr, double xi, double Xr_o, double Xi_o, int N)
{
// Region 1: main DFT function
for (int k = 0; k < N; k++)
{
for (int n = 0; n < N; n++)
{
// Real part of X[k]
Xr_o[k] += xr[n] cos(n k PI2 / N) + idft xi[n] sin(nk*PI2/N);
// Imaginary part of X[k]
Xi_o[k] += -idft xr[n] sin(n k PI2 / N) + xi[n] cos(nk*PI2/N);
}
}
// Region 2: Normalizing IDFT
if (idft == -1)
{
for (int n = 0; n < N; n++)
{
Xr_o[n] /= N;
Xi_o[n] /= N;
}
}
return 1;
}
In this exercise, N=10000. Please provide a comprehensive report to answer the
following questions. You should plot figures to show your testing results in the
report.
Questions:
1

(1 point) Using OpenMP to parallelize the given sequential code in
DFT_seq.c . The goal is to achieve the best possible performance for DFT
with the correct result and no false sharing. Note that the scheduling policy
and where to parallelize will affect your final code performance. Your plot
should reflect the execution time for the runs on 1, 2, 4, 8 and 16 threads.
The performance measurement should be an average of 5 repeated runs.
Note that you need to parallelize all the code regions that are necessary
for performance improvement.
(1 point) Zoom into Region 1, which is the main DFT function nested
loops, please write the two OpenMP parallelization versions for outer and
inner loops in the report and discuss which design choice is better for
performance and why. After this, using the one design above that has
the better performance as the baseline, please discuss how the scheduling
policy choices (default, static and dynamic) may impact its final execution
time. Plot them out in a figure to reflect the runs on 1, 2, 4, 8 and 16
threads. Note that please design your scheduling policy that avoids false
sharing. Also, it is possible that they don’t present significant performance
differences. You have to summarize your findings.

DFT Parallelization in OpenMP

推荐阅读

脚趾甲砸了一个月可以洗澡吗

C++|C++ std::shared_mutex读写锁的使用

轮胎怎么看正反面

介绍予盾是什么意思

京津冀拼音京津冀怎么读

m三款高配低价的手机，既好看又实用，坚持几年没问题

回避的方式及法定事由有哪些？

黑啤是黑色的吗为什么？黑啤与普通啤酒的区别？

芦荟掰下来怎么生根芦荟小芽没根还能活不

炕羊排的制作方法

三车追尾最后一辆车有没有责任

四川大学华西医院|以色列舍巴医学中心与华西医院签署谅解备忘录，将建姊妹医院

历史上孙策厉害吗，东汉诸侯孙策的简介生平介绍

打底衫和毛衣的区别是什么

年假一般什么时候放拼多多电商什么时候放年假，电商公司年假一般是几天

最新发布的5G旗舰手机荣耀V30值不值得买？

左手中指戴戒指啥意思呢左手中指戴戒指啥意思呢男

违反诫命者的刑罚(三)

火焰山取暖炉售后电话,火焰山取暖炉售后电话是多少

DD13是什么材质 dd13是什么材质