Skip to content

Computing the Jacobian of a Matrix Product

Published: at 03:31 AM in 6 min readSuggest Changes

Computing the Jacobian of a Matrix Product

In this post, we’ll walk through how to compute the Jacobian of a matrix product. We’ll start with a small example and then generalize to larger matrices. Because these sorts of Jacobians can be large and unwieldy , it’s helpful to visualize the derivation to understand the structure.

Simple Example: 2x2 Matrix Product

  1. Problem Setup We’ll start with a very simple 2x2 matrix product: Let
A=(abcd),B=(wxyz).A = \begin{pmatrix} a & b \\ c & d \end{pmatrix}, \quad B = \begin{pmatrix} w & x \\ y & z \end{pmatrix}.

We want the product

C=AB=(abcd)(wxyz)  =  (aw+byax+bzcw+dycx+dz).C = AB = \begin{pmatrix} a & b \\ c & d \end{pmatrix} \begin{pmatrix} w & x \\ y & z \end{pmatrix} \;=\; \begin{pmatrix} aw + by & ax + bz \\ cw + dy & cx + dz \end{pmatrix}.

Denote the entries of CC by

C=(c1c2c3c4)=(aw+byax+bzcw+dycx+dz).C = \begin{pmatrix} c_1 & c_2 \\ c_3 & c_4 \end{pmatrix} = \begin{pmatrix} aw + by & ax + bz \\ cw + dy & cx + dz \end{pmatrix}.

We will treat CC as a function of the entries of AA, holding BB fixed. In other words, we are computing the Jacobian wrt. AA. We want to understand the Jacobian of

c=(c1,c2,c3,c4)\mathbf{c} = \bigl( c_1, c_2, c_3, c_4 \bigr)

with respect to

a=(a,b,c,d).\mathbf{a} = \bigl( a, b, c, d \bigr).
  1. Write Each cic_i in Terms of (a,b,c,d)(a, b, c, d)

From the product C=ABC = AB, we identify:

  1. Compute Partial Derivatives

We compute all partial derivatives ciα\frac{\partial c_i}{\partial \alpha} for α{a,b,c,d}\alpha \in \{a, b, c, d\} and i=1,2,3,4i=1,2,3,4.

c1a=w,c1b=y,c1c=0,c1d=0.\frac{\partial c_1}{\partial a} = w, \quad \frac{\partial c_1}{\partial b} = y, \quad \frac{\partial c_1}{\partial c} = 0, \quad \frac{\partial c_1}{\partial d} = 0. c2a=x,c2b=z,c2c=0,c2d=0.\frac{\partial c_2}{\partial a} = x, \quad \frac{\partial c_2}{\partial b} = z, \quad \frac{\partial c_2}{\partial c} = 0, \quad \frac{\partial c_2}{\partial d} = 0. c3a=0,c3b=0,c3c=w,c3d=y.\frac{\partial c_3}{\partial a} = 0, \quad \frac{\partial c_3}{\partial b} = 0, \quad \frac{\partial c_3}{\partial c} = w, \quad \frac{\partial c_3}{\partial d} = y. c4a=0,c4b=0,c4c=x,c4d=z.\frac{\partial c_4}{\partial a} = 0, \quad \frac{\partial c_4}{\partial b} = 0, \quad \frac{\partial c_4}{\partial c} = x, \quad \frac{\partial c_4}{\partial d} = z.
  1. Arrange These Partials into a Jacobian Matrix

When we speak of “the Jacobian,” we typically stack the outputs c\mathbf{c} as rows (or as one long column) and do the same with the parameters a\mathbf{a}. In one common convention (outputs as rows, parameters as columns), we get a 4×44 \times 4 matrix:

J=(c1ac1bc1cc1dc2ac2bc2cc2dc3ac3bc3cc3dc4ac4bc4cc4d).J = \begin{pmatrix} \frac{\partial c_1}{\partial a} & \frac{\partial c_1}{\partial b} & \frac{\partial c_1}{\partial c} & \frac{\partial c_1}{\partial d} \\[6pt] \frac{\partial c_2}{\partial a} & \frac{\partial c_2}{\partial b} & \frac{\partial c_2}{\partial c} & \frac{\partial c_2}{\partial d} \\[6pt] \frac{\partial c_3}{\partial a} & \frac{\partial c_3}{\partial b} & \frac{\partial c_3}{\partial c} & \frac{\partial c_3}{\partial d} \\[6pt] \frac{\partial c_4}{\partial a} & \frac{\partial c_4}{\partial b} & \frac{\partial c_4}{\partial c} & \frac{\partial c_4}{\partial d} \end{pmatrix}.

Plugging in the partial derivatives we computed:

J=(wy00xz0000wy00xz).J = \begin{pmatrix} w & y & 0 & 0 \\ x & z & 0 & 0 \\ 0 & 0 & w & y \\ 0 & 0 & x & z \end{pmatrix}.

This is the Jacobian of (c1,c2,c3,c4)\bigl(c_1, c_2, c_3, c_4\bigr) with respect to (a,b,c,d)\bigl(a, b, c, d\bigr).

Index-wise, you can think of c1,c2,c3,c4c_{1}, c_{2}, c_{3}, c_{4} as c1,1,c1,2,c2,1,c2,2c_{1,1}, c_{1,2}, c_{2,1}, c_{2,2} (i.e. first row of CC is (c1,c2)(c_1, c_2), second row is (c3,c4)(c_3, c_4)). Similarly, (a,b,c,d)(a, b, c, d) can be mapped to (a1,1,a1,2,a2,1,a2,2)(a_{1,1}, a_{1,2}, a_{2,1}, a_{2,2}).

Hence, each row index of JJ can be viewed as (i,j){1,2}×{1,2}(i,j)\in\{1,2\}\times\{1,2\} for ci,jc_{i,j}, and each column index of JJ can be viewed as (k,){1,2}×{1,2}(k,\ell)\in\{1,2\}\times\{1,2\} for ak,a_{k,\ell}. So J can be written as:

J(i,j),(k,)=ci,jak,.J_{(i,j),\, (k,\ell)} = \frac{\partial\,c_{i,j}}{\partial\, a_{k,\ell}}.

In this sense, the single matrix J can be reshaped into a 4D array of shape (2,2,2,2),

  1. Visualizing the Pattern

Even for larger matrices, you can see the pattern:

Another way to see this is via vec-operator identities (if you flatten each matrix into a long column vector). For instance, the well-known identity

vec(AB)=(BTI)vec(A)\text{vec}(AB) = (B^T \otimes I)\text{vec}(A)

tells you exactly how to build the Jacobian by Kronecker products. In the 2×22 \times 2 case, BTB^T is (wyxz) \begin{pmatrix} w & y \\ x & z \end{pmatrix}, and II is the 2×22\times 2 identity matrix. Then

BTI=(wIyIxIzI)=(w0y00w0yx0z00x0z),B^T \otimes I = \begin{pmatrix} wI & yI \\ xI & zI \end{pmatrix} = \begin{pmatrix} w & 0 & y & 0 \\ 0 & w & 0 & y \\ x & 0 & z & 0 \\ 0 & x & 0 & z \end{pmatrix},

which matches the same structure we saw by direct partial derivatives (just with a slightly different row/column ordering convention depending on how exactly you flatten).

Generalization

Suppose Y=ABY = A B where

A very direct way to express the partial derivatives componentwise is to notice:

Yik=j=1pAijBjk.Y_{i k} = \sum_{j=1}^{p} A_{i j} \, B_{j k}.

Hence,

YikAαβ  =  j=1pAαβ(AijBjk)  =  j=1pδi,αδj,βBjk  =  δi,αBβ,k.\frac{\partial Y_{i k}}{\partial A_{\alpha \beta}} \;=\; \sum_{j=1}^{p} \frac{\partial}{\partial A_{\alpha \beta}} \Bigl(A_{i j} \, B_{j k}\Bigr) \;=\; \sum_{j=1}^{p} \delta_{i,\alpha}\,\delta_{j,\beta} \, B_{j k} \;=\; \delta_{i,\alpha}\,B_{\beta,k}.

In words:

YikAαβ=0\frac{\partial Y_{ik}}{\partial A_{\alpha\beta}} = 0 unless i=αi = \alpha and j=βj = \beta. When i=αi=\alpha and j=βj=\beta, it equals Bβ,kB_{\beta,k}.

Often, this is written using an indicator (or Iverson bracket) as:

YikAαβ1{i=α}Bβ,k\frac{\partial Y_{i k}}{\partial A_{\alpha \beta}} \mathbf{1}{\{ i = \alpha \}}B_{\beta,k}

Another common notation uses the Kronecker delta δ\delta. We can compactly write:

(AB)ikAαβδi,α  Bβ,k.\frac{\partial (AB){i k}}{\partial A{\alpha \beta}} \delta_{\,i,\alpha}\;B_{\beta,k}.

In vectorized form (using Kronecker products)

When working with gradients or Jacobians involving many matrix entries at once, it is typical to vectorize the matrices. Recall that

vec(Y)  =  vec(AB)  =  (BTIm)vec(A),\mathrm{vec}(Y) \;=\; \mathrm{vec}(A\,B) \;=\; (B^\mathsf{T} \,\otimes\, I_{m})\,\mathrm{vec}(A),

where

From that identity, you can read off that the total Jacobian matrix (of size (mn×mp)(mn \times mp)) that maps vec(A)\mathrm{vec}(A) to vec(Y)\mathrm{vec}(Y) is precisely:

vec(Y)vec(A)BTIm.\frac{\partial\,\mathrm{vec}(Y)}{\partial\,\mathrm{vec}(A)} B^\mathsf{T} \otimes I_{m}.

That is a “one-shot” formula: once you memorize

vec(AB)=(BTI)vec(A),\mathrm{vec}(A\,B) = (B^\mathsf{T}\otimes I)\,\mathrm{vec}(A),

you effectively know the Jacobian as well.

BTIm(using the Kronecker product).B^T \otimes I_m \quad \text{(using the Kronecker product).}

Previous Post
GPU Puzzles
Next Post
AutoDiff Puzzles