Here is a very simple trick to recover the 3D pose of a square object from a single picture of it. The algorithm is fast, very easy to implement, and even easier to understand. It is algebraic, so if you're not happy with it, you can always use its output as an initial guess for a more complicated algorithm that minimizes the reprojection error.
Consider the square of length
with corners at 3D
coordinates
,
,
,
as
shown. The camera projects these points to pixel coordinates
,
,
,
.

To recover the pose of the square, we would ideally like to solve
where
is the third component of
. This optimization
problem assume Gaussian noise in the observed pixel coordinates. But
it is a difficult optimization problem to solve.
Instead, we'll use some geometric intuition. Because the shape is a parallelogram, we know
Here's the intuition:

, and we have
which can be rewritten as
So we can recover the depth of each corner, up to a scale (which is
just the depth of the fourth corner) by a simple matrix
inversion. Once we recover
, to recover the scale, we use the size
of the square as a clue:
so
.
Here's a matlab demo of this algorithm.
Here's a video of recovering the 3D pose of a square pattern on a piece of paper. In this demo, the corners of the square are marked with black dots, and the colors surrounding each dot help to uniquey identify each corner. Once we recover the 3D pose of the square, we can superimpose a 3D shape on top of the paper.