# Where is the Pencil?

This problem was assigned in my computer vision course.  Below are two pictures of Grad Student Dennis sitting in the graduate student office, holding a pencil.  Grad Student Dennis did not move the pencil between the time that the two pictures were taken.

 Picture A (click for full size) Picture B (click for full size)

Now let's assume the following information:
• We have put a coordinate system on the grad student office.
• The origin (0, 0, 0) is at the lower corner near the door (you can't actually see the origin in either picture).
• The positive z-axis goes up the line where the two walls meet.
• The positive x-axis travels left where the back wall and the floor meet.
• The positive y-axis travels toward the camera where the wall with the door and the floor meet.
• All units are measured in feet.
• The pictures were taken in perspective view, meaning that the center of the camera, the point on the picture, and the point in the real world are all collinear (this is essentially true for most cameras).
• Each person in the class (including me) found one point in the room, measured its real world coordinate, and then found its pixel coordinate in both pictures.  The results are:

 World Point Picture A Picture B (2.500, 11.375, 2.427) (698, 236) (908, 322) (2.500, 7.833, 2.448) (650, 340) (894, 400) (3.396, 0.000, 5.254) (609, 642) (837, 642) (8.500, 9.927, 2.458) (25, 523) (345, 398) (8.117, 1.542, 4.275) (294, 686) (576, 590) (1.333, 0.000, 4.250) (702, 526) (948, 572) (7.792, 2.771, 2.458) (250, 551) (567, 470) (1.000, 1.552, 7.135) (812, 686) (992, 740) (0.400, 6.350, 7.125) (983, 669) (1122, 787)

Finally, the tip of Grad Student Dennis's pencil appears at (315, 215) in Picture A, and (504, 205) in Picture B.

With all this information, we ask:
1. What are the real world coordinates of the tip of Grad Student Dennis's pencil?
2. What are the real world coordinates of the camera in both pictures?
Since there is a linear relation between the real world coordinates and the pixel coordinates, this boils down to a messy linear algebra problem.  We first create a projection matrix for each camera.  This is a matrix that converts real world coordinates into pixel coordinates.  Creating this matrix is known as camera calibration, and it is a very common procedure in computer vision.  Once the projection matrices are worked out, it is possible to use the information to determine the real world coordinates of any point that appears in both pictures.  It is also possible to determine the location of the camera from its projection matrix.

One tiny problem: our exercise boils down to an overdetermined system of linear equations.  And even if it didn't, we enevitably have errors in our data.  Even if we ignored the human error, we still had to round all of the pixel coordinates to the nearest whole number.  So solving a system of linear equations is out of the question.  Instead, we must settle for the closest solution to our system.  This involves numerical methods, and in particular it involves a least squares solution to the system of equations.