Metric 3d reconstruction

I'm trying to reconstruct 3D points from 2D image correspondences. My camera is calibrated. The test images are of a checkered cube and correspondences are hand picked. Radial distortion is removed. After triangulation the construction seems to be wrong however. The X and Y values seem to be correct, but the Z values are about the same and do not differentiate along the cube. The 3D points look like as if the points were flattened along the Z-axis.

What is going wrong in the Z values? Do the points need to be normalized or changed from image coordinates at any point, say before the fundamental matrix is computed? (If this is too vague I can explain my general process or elaborate on parts)

Update

Given: x1 = P1 * X and x2 = P2 * X

x1, x2 being the first and second image points and X being the 3d point.

However, I have found that x1 is not close to the actual hand picked value but x2 is in fact close.

How I compute projection matrices:

```P1 = [eye(3), zeros(3,1)];
P2 = K * [R, t];
```

Update II

Calibration results after optimization (with uncertainties)

```% Focal Length:          fc = [ 699.13458   701.11196 ] ± [ 1.05092   1.08272 ]
% Principal point:       cc = [ 393.51797   304.05914 ] ± [ 1.61832   1.27604 ]
% Skew:             alpha_c = [ 0.00180 ] ± [ 0.00042  ]   => angle of pixel axes = 89.89661 ± 0.02379 degrees
% Distortion:            kc = [ 0.05867   -0.28214   0.00131   0.00244  0.35651 ] ± [ 0.01228   0.09805   0.00060   0.00083  0.22340 ]
% Pixel error:          err = [ 0.19975   0.23023 ]
%
% Note: The numerical errors are approximately three times the standard
% deviations (for reference).
```

-

```K =

699.1346    1.2584  393.5180
0  701.1120  304.0591
0         0    1.0000

E =

0.3692   -0.8351   -4.0017
0.3881   -1.6743   -6.5774
4.5508    6.3663    0.2764

R =

-0.9852    0.0712   -0.1561
-0.0967   -0.9820    0.1624
0.1417   -0.1751   -0.9743

t =

0.7942
-0.5761
0.1935

P1 =

1     0     0     0
0     1     0     0
0     0     1     0

P2 =

-633.1409  -20.3941 -492.3047  630.6410
-24.6964 -741.7198 -182.3506 -345.0670
0.1417   -0.1751   -0.9743    0.1935

C1 =

0
0
0
1

C2 =

0.6993
-0.5883
0.4060
1.0000

% new points using cpselect

%x1
input_points =

422.7500  260.2500
384.2500  238.7500
339.7500  211.7500
298.7500  186.7500
452.7500  236.2500
412.2500  214.2500
368.7500  191.2500
329.7500  165.2500
482.7500  210.2500
443.2500  189.2500
402.2500  166.2500
362.7500  143.2500
510.7500  186.7500
466.7500  165.7500
425.7500  144.2500
392.2500  125.7500
403.2500  369.7500
367.7500  345.2500
330.2500  319.7500
296.2500  297.7500
406.7500  341.2500
365.7500  316.2500
331.2500  293.2500
295.2500  270.2500
414.2500  306.7500
370.2500  281.2500
333.2500  257.7500
296.7500  232.7500
434.7500  341.2500
441.7500  312.7500
446.2500  282.2500
462.7500  311.2500
466.7500  286.2500
475.2500  252.2500
481.7500  292.7500
490.2500  262.7500
498.2500  232.7500

%x2
base_points =

393.2500  311.7500
358.7500  282.7500
319.7500  249.2500
284.2500  216.2500
431.7500  285.2500
395.7500  256.2500
356.7500  223.7500
320.2500  194.2500
474.7500  254.7500
437.7500  226.2500
398.7500  197.2500
362.7500  168.7500
511.2500  227.7500
471.2500  196.7500
432.7500  169.7500
400.2500  145.7500
388.2500  404.2500
357.2500  373.2500
326.7500  343.2500
297.2500  318.7500
387.7500  381.7500
356.2500  351.7500
323.2500  321.7500
291.7500  292.7500
390.7500  352.7500
357.2500  323.2500
320.2500  291.2500
287.2500  258.7500
427.7500  376.7500
429.7500  351.7500
431.7500  324.2500
462.7500  345.7500
463.7500  325.2500
470.7500  295.2500
491.7500  325.2500
497.7500  298.2500
504.7500  270.2500
```

Update III

See answer for corrections. Answers computed above were using the wrong variables/values.

** Note all reference are to Multiple View Geometry in Computer Vision by Hartley and Zisserman.

OK, so there were a couple bugs:

1. When computing the essential matrix (p. 257-259) the author mentions the correct R,t pair from the set of four R,t (Result 9.19) is the one where the 3D points lay in front of both cameras (Fig. 9.12, a) but doesn't mention how one computes this. By chance I was re-reading chapter 6 and discovered that 6.2.3 (p.162) discusses depth of points and Result 6.1 is the equation needed to be applied to get the correct R and t.

2. In my implementation of the optimal triangulation method (Algorithm 12.1 (p.318)) in step 2 I had T2^-1' * F * T1^-1 where I needed to have (T2^-1)' * F * T1^-1. The former translates the -1.I wanted, and in the latter, to translate the inverted the T2 matrix (foiled again by MATLAB!).

3. Finally, I wasn't computing P1 correctly, it should have been P1 = K * [eye(3),zeros(3,1)];. I forgot to multiple by the calibration matrix K.

Hope this helps future passerby's !

It may be that your points are in a degenerate configuration. Try to add a couple of points from the scene that don't belong to the cube and see how it goes.

• What is t? The baseline might be too small for parallax.
• What is the disparity between x1 and x2?
• Are you confident about the accuracy of the calibration (I'm assuming you used the Stereo part of the Bouguet Toolbox)?
• When you say the correspondences are hand-picked, do you mean you selected the corresponding points on the image or did you use an interest point detector on the two images are then set the correspondences?

I'm sure we can resolve this problem :)