Goyalayus

You See Neural Nets Wrong

A lot of beginner explanations make neural networks look like mystical webs of circles. That picture is memorable, but it is also the wrong thing to internalize. A neural network is not primarily a diagram. It is a composition of functions, and most of those functions are matrix multiplications followed by simple nonlinearities.

The clean mental model is:

z_{1} = X W_{1} + b_{1}

h_{1} = σ (z_{1})

z_{2} = h_{1} W_{2} + b_{2}

\overset{y}{^} = f (z_{2})

That is the forward pass. Data comes in as a matrix, weights transform it, biases shift it, nonlinearities bend it, and the next layer repeats the same pattern.

Why the circle diagram misleads

The circle diagram makes you think neuron by neuron. That is useful for the first five minutes, but real models are not implemented neuron by neuron. They are implemented as dense tensor operations.

For one neuron, you can say:

z = w_{1} x_{1} + w_{2} x_{2} + ... + w_{n} x_{n} + b

For a whole layer, this becomes:

Z = X W + b

That single equation is the layer. The same thing that looked like many little arrows is really a batch matrix multiplication.

What learning means

Learning means changing the weights so the function produces better outputs. The loss measures how wrong the prediction is:

L (\overset{y}{^}, y)

Backpropagation computes how much each parameter contributed to the loss:

\frac{\partial L}{\partial W}

Then the optimizer nudges weights in the opposite direction:

W \leftarrow W - η \frac{\partial L}{\partial W}

This is not magic. It is bookkeeping through a chain of matrix operations.

The important thing to memorize

Do not memorize neural nets as circles connected by arrows. Memorize them as repeated blocks:

multiply by weights
add bias
apply nonlinearity
compute loss
propagate gradients backward
update weights

That picture scales. MLPs, CNNs, transformers, and diffusion models all become less mysterious when you ask: what tensor shape enters this block, what operation transforms it, and where do gradients flow?

If you can track shapes, operations, and gradients, neural networks stop being diagrams and start being programs.