All articles
AI 14 min readJune 14, 2026

Neural Networks: Backpropagation & Loss

The elegant mathematical loop behind AI: guess, measure the error, and adjust.

Behind the "magic" of ChatGPT and Midjourney is a surprisingly elegant mathematical loop: guess, measure the error, and adjust. Here is exactly how neural networks learn, explained from the ground up without the academic jargon.

1. The Architecture: Neurons and Layers

A neural network is just a collection of simple mathematical functions (called Neurons) arranged in layers.

  • Input Layer: Where raw data enters. If you are analyzing an image, each pixel is an input.
  • Hidden Layers: The "brain" of the network. They are called hidden because you don't directly see their outputs. Deep Learning simply means a network has many hidden layers.
  • Output Layer: The final prediction. For a dog-vs-cat classifier, it outputs two probabilities.

Every neuron in one layer is connected to every neuron in the next layer via a Weight. The weight determines how much influence one neuron has on the next. If a weight is close to zero, that connection is ignored. If it is high, it heavily sways the outcome.

2. The Forward Pass: Making a Guess

When you give a neural network an input (like asking it to recognize a handwritten number "7"), data flows from left to right. This is the Forward Pass.

At each neuron, the network multiplies the incoming data by the weights, adds a Bias (a baseline threshold), and passes the result through an Activation Function (like ReLU) to decide if the neuron should "fire." This happens millions of times in a fraction of a second until a final answer pops out of the output layer: "I am 12% sure this is a 1, and 88% sure this is a 7."

3. The Loss Function: Measuring the Mistake

During training, the network is usually completely wrong at first because the weights are initialized randomly. It might look at a "7" and guess "4".

The Loss Function calculates exactly how wrong the network was. It takes the network's guess (e.g., 0% confident it's a 7) and compares it to the ground truth (100% confident it's a 7). The resulting number is the "Loss". The entire goal of AI training is to make the Loss as close to zero as possible.

Gradient Descent: Finding the Bottom of the Valley
Imagine you are blindfolded at the top of a hilly mountain, and you need to find the absolute lowest valley (which represents zero Loss). You can't see the whole mountain, but you can feel the slope of the ground directly beneath your feet.

If the ground slopes down to the left, you take a step left. Gradient Descent is the algorithm that calculates the "slope" of the error and tells the network which direction to adjust its millions of weights to get closer to the bottom of the valley.

4. Backpropagation: The Crown Jewel of AI

Knowing you are wrong is easy. Knowing exactly which of your 100 billion weights caused the mistake is the hardest problem in computer science.

Backpropagation (Backward Propagation of Errors) solves this using chain-rule calculus. Once the Loss is calculated at the end of the Forward Pass, the network goes backwards. It looks at the output and says, "To fix this error, the last hidden layer should have fired slightly differently." It then looks at the layer before that and says, "To make the last layer fire differently, this layer needs to change its weights."

It propagates the blame backwards through the entire network, tweaking every single weight by a microscopic amount (the Learning Rate) so that the next time it sees that image, it will be slightly more accurate.

5. Epochs and Compute

A neural network doesn't learn perfectly after seeing an image once. It has to repeat the cycle—Forward Pass (guess), Loss (measure error), Backpropagation (adjust)—millions of times over the entire dataset. Going through the entire dataset once is called an Epoch.

Doing this for models like GPT-4 requires solving trillions of calculus equations per second for months straight, which is why it requires tens of thousands of GPUs running in massive datacenters.

Written by the Stratiflux engineering team

We build and run this kind of infrastructure and AI for companies, and train the engineers who do it. If a piece of this is on your plate, we can help.

Deep Dive Locked

Enter your email to instantly claim 10 free credits and read the rest of this highly technical deep dive.