16. Orthogonal Projections and Their Applications#
Contents
16.1. Overview#
Orthogonal projection is a cornerstone of vector space methods, with many diverse applications.
These include, but are not limited to,
Least squares projection, also known as linear regression
Conditional expectations for multivariate normal (Gaussian) distributions
Gram–Schmidt orthogonalization
QR decomposition
Orthogonal polynomials
etc
In this lecture we focus on
key ideas
least squares regression
16.1.1. Further Reading#
For background and foundational concepts, see our lecture on linear algebra.
For more proofs and greater theoretical detail, see A Primer in Econometric Theory.
For a complete set of proofs in a general setting, see, for example, [Rom05].
For an advanced treatment of projection in the context of least squares prediction, see this book chapter.
16.2. Key Definitions#
Assume
Define
Recall
The law of cosines states that
When
For a linear subspace
The orthogonal complement of linear subspace
To see this, fix
and .Observe that if
, then
Hence
, as was to be shown
A set of vectors
If
For example, when
16.2.1. Linear Independence vs Orthogonality#
If
Proving this is a nice exercise.
While the converse is not true, a kind of partial converse holds, as we’ll see below.
16.3. The Orthogonal Projection Theorem#
What vector within a linear subspace of
The next theorem provides answers this question.
Theorem (OPT) Given
The minimizer
The vector
The next figure provides some intuition
16.3.1. Proof of sufficiency#
We’ll omit the full proof.
But we will prove sufficiency of the asserted conditions.
To this end, let
Let
Let
Hence
16.3.2. Orthogonal Projection as a Mapping#
For a linear space
By the OPT, this is a well-defined mapping or operator from
In what follows we denote this operator by a matrix
represents the projection .This is sometimes expressed as
, where denotes a wide-sense expectations operator and the subscript indicates that we are projecting onto the linear subspace .
The operator
It is immediate from the OPT that for any
and
From this we can deduce additional useful properties, such as
and
For example, to prove 1, observe that
16.3.2.1. Orthogonal Complement#
Let
The orthogonal complement of
Let
We write
to indicate that for every
Moreover,
This amounts to another version of the OPT:
Theorem. If
The next figure illustrates
16.4. Orthonormal Basis#
An orthogonal set of vectors
Let
If
One example of an orthonormal set is the canonical basis
If
To see this, observe that since
Taking the inner product with respect to
Combining this result with (16.1) verifies the claim.
16.4.1. Projection onto an Orthonormal Basis#
When the subspace onto which we are projecting is orthonormal, computing the projection simplifies:
Theorem If
Proof: Fix
Clearly,
We claim that
It sufficies to show that
This is true because
16.5. Projection Using Matrix Algebra#
Let
We want to compute the matrix
Evidently
This reference is useful https://en.wikipedia.org/wiki/Linear_map#Matrices.
Theorem. Let the columns of the
Proof: Given arbitrary
, and
Claim 1 is true because
An expression of the form
Claim 2 is equivalent to the statement
This is true: If
The proof is now complete.
16.5.1. Starting with #
It is common in applications to start with an
Then the columns of
From the preceding theorem,
In this context,
The matrix
satisfies and is sometimes called the annihilator matrix.
16.5.2. The Orthonormal Case#
Suppose that
Let
We know that the projection of
Since
Hence
We have recovered our earlier result about projecting onto the span of an orthonormal basis.
16.5.3. Application: Overdetermined Systems of Equations#
Let
Given
If
Intuitively, we may not be able find a
The best approach here is to
Accept that an exact solution may not exist
Look instead for an approximate solution
By approximate solution, we mean a
The next theorem shows that the solution is well defined and unique.
The proof uses the OPT.
Theorem The unique minimizer of
Proof: Note that
Since
Because
This is what we aimed to show.
16.6. Least Squares Regression#
Let’s apply the theory of orthogonal projection to least squares regression.
This approach provides insights about many geometric properties of linear regression.
We treat only some examples.
16.6.1. Squared risk measures#
Given pairs
If probabilities and hence
However, if a sample is available, we can estimate the risk with the empirical risk:
Minimizing this expression is called empirical risk minimization.
The set
The theory of statistical learning tells us that to prevent overfitting we should take the set
If we let
This is the sample linear least squares problem.
16.6.2. Solution#
Define the matrices
and
We assume throughout that
If you work through the algebra, you will be able to verify that
Since monotone transforms don’t affect minimizers, we have
By our results about overdetermined linear systems of equations, the solution is
Let
The vector of fitted values is
The vector of residuals is
Here are some more standard definitions:
The total sum of squares is
.The sum of squared residuals is
.The explained sum of squares is
.
TSS = ESS + SSR.
We can prove this easily using the OPT.
From the OPT we have
Applying the Pythagorean law completes the proof.
16.7. Orthogonalization and Decomposition#
Let’s return to the connection between linear independence and orthogonality touched on above.
A result of much interest is a famous algorithm for constructing orthonormal sets from linearly independent sets.
The next section gives details.
16.7.1. Gram-Schmidt Orthogonalization#
Theorem For each linearly independent set
The Gram-Schmidt orthogonalization procedure constructs an orthogonal set
One description of this procedure is as follows:
For
, form andSet
For
set and
The sequence
A Gram-Schmidt orthogonalization construction is a key idea behind the Kalman filter described in A First Look at the Kalman filter.
In some exercises below you are asked to implement this algorithm and test it using projection.
16.7.2. QR Decomposition#
The following result uses the preceding algorithm to produce a useful decomposition.
Theorem If
is , upper triangular, and nonsingular is with orthonormal columns
Proof sketch: Let
be orthonormal with same span as (to be constructed using Gram–Schmidt) be formed from cols
Since
Some rearranging gives
16.7.3. Linear Regression via QR Decomposition#
For matrices
Using the QR decomposition
Numerical routines would in this case use the alternative form
16.8. Exercises#
16.8.1. Exercise 1#
Show that, for any linear subspace
16.8.2. Exercise 2#
Let
16.8.3. Exercise 3#
Using Gram-Schmidt orthogonalization, produce a linear projection of
and
16.9. Solutions#
16.9.1. Exercise 1#
Clearly,
16.9.2. Exercise 2#
Symmetry and idempotence of
16.9.3. Exercise 3#
Here’s a function that computes the orthonormal vectors using the GS algorithm given in the lecture.
using LinearAlgebra, Statistics
function gram_schmidt(X)
U = similar(X, Float64) # for robustness
function normalized_orthogonal_projection(b, Z)
# project onto the orthogonal complement of the col span of Z
orthogonal = I - Z * inv(Z'Z) * Z'
projection = orthogonal * b
# normalize
return projection / norm(projection)
end
for col in 1:size(U, 2)
# set up
b = X[:, col] # vector we're going to project
Z = X[:, 1:(col - 1)] # first i-1 columns of X
U[:, col] = normalized_orthogonal_projection(b, Z)
end
return U
end
gram_schmidt (generic function with 1 method)
Here are the arrays we’ll work with
y = [1, 3, -3]
X = [1 0; 0 -6; 2 2];
First let’s do ordinary projection of
Py1 = X * inv(X'X) * X' * y
3-element Vector{Float64}:
-0.5652173913043479
3.260869565217391
-2.217391304347826
Now let’s orthogonalize first, using Gram–Schmidt:
U = gram_schmidt(X)
3×2 Matrix{Float64}:
0.447214 -0.131876
0.0 -0.989071
0.894427 0.065938
Now we can project using the orthonormal basis and see if we get the same thing:
Py2 = U * U' * y
3-element Vector{Float64}:
-0.5652173913043477
3.2608695652173916
-2.217391304347826
The result is the same. To complete the exercise, we get an orthonormal basis by QR decomposition and project once more.
Q, R = qr(X)
Q = Matrix(Q)
3×2 Matrix{Float64}:
-0.447214 -0.131876
0.0 -0.989071
-0.894427 0.065938
Py3 = Q * Q' * y
3-element Vector{Float64}:
-0.5652173913043476
3.2608695652173907
-2.2173913043478257
Again, the result is the same.