Computing the Jacobian of a Matrix Product

In this post, we’ll walk through how to compute the Jacobian of a matrix product. We’ll start with a small example and then generalize to larger matrices. Because these sorts of Jacobians can be large and unwieldy , it’s helpful to visualize the derivation to understand the structure.

Simple Example: 2x2 Matrix Product

Problem Setup We’ll start with a very simple 2x2 matrix product: Let

A = \begin{pmatrix} a & b \\ c & d \end{pmatrix}, \quad B = \begin{pmatrix} w & x \\ y & z \end{pmatrix}.

We want the product

C = AB = \begin{pmatrix} a & b \\ c & d \end{pmatrix} \begin{pmatrix} w & x \\ y & z \end{pmatrix} \;=\; \begin{pmatrix} aw + by & ax + bz \\ cw + dy & cx + dz \end{pmatrix}.

Denote the entries of $C$ by

C = \begin{pmatrix} c_1 & c_2 \\ c_3 & c_4 \end{pmatrix} = \begin{pmatrix} aw + by & ax + bz \\ cw + dy & cx + dz \end{pmatrix}.

We will treat $C$ as a function of the entries of $A$ , holding $B$ fixed. In other words, we are computing the Jacobian wrt. $A$ . We want to understand the Jacobian of

\mathbf{c} = \bigl( c_1, c_2, c_3, c_4 \bigr)

with respect to

\mathbf{a} = \bigl( a, b, c, d \bigr).

Write Each $c_i$ in Terms of $(a, b, c, d)$

From the product $C = AB$ , we identify:

$c_1 = aw + by$
$c_2 = ax + bz$
$c_3 = cw + dy$
$c_4 = cx + dz$

Compute Partial Derivatives

We compute all partial derivatives $\frac{\partial c_i}{\partial \alpha}$ for $\alpha \in \{a, b, c, d\}$ and $i=1,2,3,4$ .

For $c_1 = aw + by$

\frac{\partial c_1}{\partial a} = w, \quad \frac{\partial c_1}{\partial b} = y, \quad \frac{\partial c_1}{\partial c} = 0, \quad \frac{\partial c_1}{\partial d} = 0.

For $c_2 = ax + bz$

\frac{\partial c_2}{\partial a} = x, \quad \frac{\partial c_2}{\partial b} = z, \quad \frac{\partial c_2}{\partial c} = 0, \quad \frac{\partial c_2}{\partial d} = 0.

For $c_3 = cw + dy$

\frac{\partial c_3}{\partial a} = 0, \quad \frac{\partial c_3}{\partial b} = 0, \quad \frac{\partial c_3}{\partial c} = w, \quad \frac{\partial c_3}{\partial d} = y.

For $c_4 = cx + dz$

\frac{\partial c_4}{\partial a} = 0, \quad \frac{\partial c_4}{\partial b} = 0, \quad \frac{\partial c_4}{\partial c} = x, \quad \frac{\partial c_4}{\partial d} = z.

Arrange These Partials into a Jacobian Matrix

When we speak of “the Jacobian,” we typically stack the outputs $\mathbf{c}$ as rows (or as one long column) and do the same with the parameters $\mathbf{a}$ . In one common convention (outputs as rows, parameters as columns), we get a $4 \times 4$ matrix:

J = \begin{pmatrix} \frac{\partial c_1}{\partial a} & \frac{\partial c_1}{\partial b} & \frac{\partial c_1}{\partial c} & \frac{\partial c_1}{\partial d} \\[6pt] \frac{\partial c_2}{\partial a} & \frac{\partial c_2}{\partial b} & \frac{\partial c_2}{\partial c} & \frac{\partial c_2}{\partial d} \\[6pt] \frac{\partial c_3}{\partial a} & \frac{\partial c_3}{\partial b} & \frac{\partial c_3}{\partial c} & \frac{\partial c_3}{\partial d} \\[6pt] \frac{\partial c_4}{\partial a} & \frac{\partial c_4}{\partial b} & \frac{\partial c_4}{\partial c} & \frac{\partial c_4}{\partial d} \end{pmatrix}.

Plugging in the partial derivatives we computed:

J = \begin{pmatrix} w & y & 0 & 0 \\ x & z & 0 & 0 \\ 0 & 0 & w & y \\ 0 & 0 & x & z \end{pmatrix}.

This is the Jacobian of $\bigl(c_1, c_2, c_3, c_4\bigr)$ with respect to $\bigl(a, b, c, d\bigr)$ .

Index-wise, you can think of $c_{1}, c_{2}, c_{3}, c_{4}$ as $c_{1,1}, c_{1,2}, c_{2,1}, c_{2,2}$ (i.e. first row of $C$ is $(c_1, c_2)$ , second row is $(c_3, c_4)$ ). Similarly, $(a, b, c, d)$ can be mapped to $(a_{1,1}, a_{1,2}, a_{2,1}, a_{2,2})$ .

Hence, each row index of $J$ can be viewed as $(i,j)\in\{1,2\}\times\{1,2\}$ for $c_{i,j}$ , and each column index of $J$ can be viewed as $(k,\ell)\in\{1,2\}\times\{1,2\}$ for $a_{k,\ell}$ . So J can be written as:

J_{(i,j),\, (k,\ell)} = \frac{\partial\,c_{i,j}}{\partial\, a_{k,\ell}}.

In this sense, the single matrix J can be reshaped into a 4D array of shape (2,2,2,2),

Visualizing the Pattern

Even for larger matrices, you can see the pattern:

The top-left block depends on the first row-block of $A$ (i.e., $a,b$ ) and the relevant columns of $B$ .
The bottom-right block depends on the second row-block of $A$ (i.e., $c,d$ ) and the relevant columns of $B$ .
Zeros appear outside the blocks corresponding to the relevant multiplication.

Another way to see this is via vec-operator identities (if you flatten each matrix into a long column vector). For instance, the well-known identity

$\text{vec}(AB) = (B^T \otimes I)\text{vec}(A)$

tells you exactly how to build the Jacobian by Kronecker products. In the $2 \times 2$ case, $B^T$ is $\begin{pmatrix} w & y \\ x & z \end{pmatrix}$ , and $I$ is the $2\times 2$ identity matrix. Then

B^T \otimes I = \begin{pmatrix} wI & yI \\ xI & zI \end{pmatrix} = \begin{pmatrix} w & 0 & y & 0 \\ 0 & w & 0 & y \\ x & 0 & z & 0 \\ 0 & x & 0 & z \end{pmatrix},

which matches the same structure we saw by direct partial derivatives (just with a slightly different row/column ordering convention depending on how exactly you flatten).

Generalization

Suppose $Y = A B$ where

$A$ is an $(m \times p)$ matrix (entries $A_{i j})$ ,
$B$ is a $(p \times n)$ matrix,
$Y$ is then an $(m \times n)$ matrix with entries $Y_{i k}$ .

A very direct way to express the partial derivatives componentwise is to notice:

Y_{i k} = \sum_{j=1}^{p} A_{i j} \, B_{j k}.

Hence,

\frac{\partial Y_{i k}}{\partial A_{\alpha \beta}} \;=\; \sum_{j=1}^{p} \frac{\partial}{\partial A_{\alpha \beta}} \Bigl(A_{i j} \, B_{j k}\Bigr) \;=\; \sum_{j=1}^{p} \delta_{i,\alpha}\,\delta_{j,\beta} \, B_{j k} \;=\; \delta_{i,\alpha}\,B_{\beta,k}.

In words:

$\frac{\partial Y_{ik}}{\partial A_{\alpha\beta}} = 0$ unless $i = \alpha$ and $j = \beta$ . When $i=\alpha$ and $j=\beta$ , it equals $B_{\beta,k}$ .

Often, this is written using an indicator (or Iverson bracket) as:

\frac{\partial Y_{i k}}{\partial A_{\alpha \beta}} \mathbf{1}{\{ i = \alpha \}}B_{\beta,k}

Another common notation uses the Kronecker delta $\delta$ . We can compactly write:

\frac{\partial (AB){i k}}{\partial A{\alpha \beta}} \delta_{\,i,\alpha}\;B_{\beta,k}.

In vectorized form (using Kronecker products)

When working with gradients or Jacobians involving many matrix entries at once, it is typical to vectorize the matrices. Recall that

\mathrm{vec}(Y) \;=\; \mathrm{vec}(A\,B) \;=\; (B^\mathsf{T} \,\otimes\, I_{m})\,\mathrm{vec}(A),

where

$\mathrm{vec}(\cdot)$ stacks the columns of a matrix into a single column vector,
$\otimes$ denotes the Kronecker product,
$I_m$ is the $m \times m$ identity matrix,
$B^\mathsf{T}$ is the transpose of $B$ .

From that identity, you can read off that the total Jacobian matrix (of size $(mn \times mp)$ ) that maps $\mathrm{vec}(A)$ to $\mathrm{vec}(Y)$ is precisely:

\frac{\partial\,\mathrm{vec}(Y)}{\partial\,\mathrm{vec}(A)} B^\mathsf{T} \otimes I_{m}.

That is a “one-shot” formula: once you memorize

\mathrm{vec}(A\,B) = (B^\mathsf{T}\otimes I)\,\mathrm{vec}(A),

you effectively know the Jacobian as well.

For an $(m \times n)$ matrix $A$ and an $(n \times p)$ matrix $B$ , the product $C = AB$ is $(m \times p)$ .
Flattening $A$ into an $(mn \times 1)$ vector and $C$ into an $(mp \times 1)$ vector, the Jacobian $\frac{\partial \,\text{vec}(C)}{\partial \,\text{vec}(A)}$ is a block matrix of size $(mp \times mn)$ , which can be written explicitly as:

B^T \otimes I_m \quad \text{(using the Kronecker product).}

Each partial derivative $\frac{\partial C_{ij}}{\partial A_{kl}}$ is nonzero only when $i = k$ and it equals $B_{\ell j}$ when $l = \ell$ . That is the “delta” structure you see in the small example.