Real to Complex FFT with CUFFT, using OpenCV as Data source

I'm having an issue trying to perform a two dimensional transform on an array of floats using cuFFT. I've had a look at the documentation, but some of the information is contradictory/not clear; so I have a few questions:

My data is 480 rows, with 640 columns (e.g. float data but in a single dimension so float data[480*640])

1. If we say my input dimensions (of real data) are N1 = 480 and N2 = 640. Are the dimensions (after a real to complex transform) N1=480, N2=321?

2. Can I cudaMemcpy the data directly into a cufftReal array of the same size? Or must it be acufftComplex array?

If it must be acufftComplex array, I am assuming the elements need to be in the place of the real components?

3. What is the correct structure of a call to cufftPlan2d, cufftExecR2C and cufftC2R given the above values.

I think that's all for now...

EDIT: So, I've implemented the Forward and Inverse transforms as suggested by JackOLantern. However my results are not what I am expecting (an identical Result after FFT as Before it). I have an image gallery here showing two sets of examples. The first is from my room, the second from my University Project.

In the cuFFT Documentation, there is ambiguity in the use of cufftPlan2d (hence why I asked). In the documentation, for a two dimensional array, the data should be input as above (float data == float data[NY][NX]) So NY represents the rows. However in the function listing for cufftPlan2d, it states that nx (the parameter) is for the rows...

Swapping the values of NX and NY in the function call gives the result as in the project image (correct orientation, but split into three partially overlapping images at 1/4 the normal size) however, using the parameters as JackOLantern states in his answer gives a slanted/skewed result.

Am I doing something wrong here? Or does the cuFFT library have issues with this type of thing.

ALSO: I have undone a couple of the edits made by JackOLantern to this question as my issues MAY stem from the fact my data is coming from OpenCV.

EDIT: I've recently found out that I was the one who made a mistake in the way I used the function.

Originally I though the function definition referred to the size of the data being passed into it.

However, it appears that the parameters actually refer directly to the size of the REAL part.

This means that the parameters refer to:

• The size of the input data when using R2C (Real to Complex)
• The size of the output data when using C2R (Complex to Real)

So it appears that the cuFFT documentation and the library itself do not correspond.

When performing an R2C followed by a C2R (real to complex, complex to real respectively), the documentation states that for a Real input of NX x NY dimensions, the Complex output is NX x (floor(NY/2) +1); and vice versa.

However the actual output is of dimensions NX x NY and the actual input is of dimensions NX x NY. This is (half) mentioned on the very first page as

C2R - Symmetric complex input to real output

Implying that the complex data must be Symmetric, i.e. must also have the redundant data in addition to the non-redundant data.

There are a number of other contradictions within the documentation as well which I won't go into.

Needless to say, the problem has been solved.

I have included a MWE below. Near the top are a couple of lines with #define NUM_C2 and appropriate comments. Changing this changes whether the documentation format is followed, or my "fix".

The output is

1. The Input Real data
2. The Intermediate Complex data
3. The output Real data
4. The ratio of the output data to the input data (there are minor FFT errors, ~1 indicates correct)

Feel free to change the parameters (NUM_R and NUM_C) and feel free to comment if you think I have made a mistake somewhere.

```#include <iostream>
#include <math.h>
#include <cufft.h>

// e.g. float data[NUM_R][NUM_C]
#define NUM_R 12
#define NUM_C 16

// Documentation Version
//#define NUM_C2 (1+NUM_C/2)
// "Correct" Version
#define NUM_C2 NUM_C

using namespace std;

int main(int argc, char** argv)
{
cufftReal *in_h, *out_h, *in_d, *out_d;
cufftComplex *mid_d, *mid_h;
cufftHandle pF, pI;
int r, c;

in_h = (cufftReal*) malloc(NUM_R * NUM_C * sizeof(cufftReal));
out_h= (cufftReal*) malloc(NUM_R * NUM_C * sizeof(cufftReal));
mid_h= (cufftComplex*)malloc(NUM_C2*NUM_R*sizeof(cufftComplex));

cudaMalloc((void**) &in_d, NUM_R * NUM_C * sizeof(cufftReal));
cudaMalloc((void**)&out_d, NUM_R * NUM_C * sizeof(cufftReal));
cudaMalloc((void**)&mid_d, NUM_C2 * NUM_R * sizeof(cufftComplex));

cufftPlan2d(&pF, NUM_R, NUM_C, CUFFT_R2C);
cufftPlan2d(&pI, NUM_R,NUM_C2, CUFFT_C2R);

cout<<endl<<"------"<<endl;
for(r=0; r<NUM_R; r++)
{
for(c=0; c<NUM_C; c++)
{
in_h[c + NUM_C * r] = cos(2.0*M_PI*(c*7.0/NUM_C+r*3.0/NUM_R));
out_h[c+ NUM_C * r] = 0.f;
cout<<in_h[c+NUM_C*r];
if(c<(NUM_C-1)) cout<<", ";
else cout<<endl;
}
}

cudaMemcpy((cufftReal*)in_d, (cufftReal*)in_h, NUM_R * NUM_C * sizeof(cufftReal),cudaMemcpyHostToDevice);

cufftExecR2C(pF, (cufftReal*)in_d, (cufftComplex*)mid_d);

cudaMemcpy((cufftComplex*)mid_h, (cufftComplex*)mid_d, NUM_C2*NUM_R*sizeof(cufftComplex), cudaMemcpyDeviceToHost);

cout<<endl<<"------"<<endl;
for(r=0; r<NUM_R; r++)
{
for(c=0; c<NUM_C2; c++)
{
cout<<mid_h[c+(NUM_C2)*r].x<<"|"<<mid_h[c+(NUM_C2)*r].y;
if(c<(NUM_C2-1)) cout<<", ";
else cout<<endl;
}
}

cufftExecC2R(pI, (cufftComplex*)mid_d, (cufftReal*)out_d);

cudaMemcpy((cufftReal*)out_h, (cufftReal*)out_d, NUM_R*NUM_C*sizeof(cufftReal), cudaMemcpyDeviceToHost);

cout<<endl<<"------"<<endl;

for(r=0; r<NUM_R; r++)
{
for(c=0; c<NUM_C; c++)
{
cout<<out_h[c+NUM_C*r]/(NUM_R*NUM_C);
if(c<(NUM_C-1)) cout<<", ";
else cout<<endl;
}
}

cout<<endl<<"------"<<endl;

for(r=0; r<NUM_R; r++)
{
for(c=0; c<NUM_C; c++)
{
cout<<(out_h[c+NUM_C*r]/(NUM_R*NUM_C))/in_h[c+NUM_C*r];
if(c<(NUM_C-1)) cout<<", ";
else cout<<endl;
}
}
free(in_h);
free(out_h);
free(mid_h);
cudaFree(in_d);
cudaFree(out_h);
cudaFree(mid_d);

return 0;
}
```

1) If we say my input dimensions (of real data) are N1 = 480 and N2 = 640. Are the dimensions (after a real to complex transform) N1=480, N2=321?

The output of cufftExecR2C is a NX*(NY/2+1) cufftComplex matrix. So in your case, you will have a 480x321 float2 matrix as output.

2) Can I cudaMemcpy the data directly into a cufftReal array of the same size? Or must it be a cufftComplex array?

If it must be a cufftComplex array, I am assuming the elements need to be in the place of the real components?

Yes, you can copy the data to a cufftReal array and the N1xN2 data.

3) What is the correct structure of a call to cufftPlan2d, cufftExecR2C and cufftC2R given the above values.

```cufftPlan2d(&plan, N1, N2, CUFFT_R2C);
cufftExecR2C(plan, (cufftReal*)idata, (cufftComplex*) odata);
```