Understanding Gradient Descent: The Engine Behind Neural Networks

07/22/25

What is Gradient Descent?

When you train a neural network, you’re teaching it to make better predictions by adjusting its internal parameters (weights and biases) to minimize loss (how wrong it is). But how does it know what direction to adjust these weights?

Enter Gradient Descent, the engine that drives neural networks to learn.

The Core Idea

Imagine standing on a hilly terrain in the fog, trying to find the lowest point. You take small steps downhill in the steepest direction until you reach the bottom.

Gradient Descent works the same way:

It calculates the gradient (slope) of the loss function with respect to each weight.
It updates the weights in the opposite direction of the gradient to reduce the loss.

The Formula

For each parameter (weight) ww:

w=w−α⋅∂w∂L

Where:

α = Learning rate (how big your steps are)
LL = Loss function
∂L/∂w = Gradient of the loss with respect to ww

Types of Gradient Descent

Batch Gradient Descent: Uses the entire dataset to compute gradients. Stable but slow for large datasets.

Stochastic Gradient Descent (SGD): Uses one sample at a time to compute gradients. Faster but noisier.

Mini-Batch Gradient Descent: Uses small batches (e.g., 32 samples). Balances speed and stability.

KaushikRao196 - OverviewKaushikRao196 has one repository available. Follow their code on GitHub.

How to make a neural network FROM SCRatch?

07/10/25

Even if you are an absolute beginner to AI/Machine Learning this blog will teach a lot from behind the scenes. A Neural Network is basically a fully connected layer of nodes (neurons) that take in information and output information. There is an input layer which takes in the orignial dataset and output layer which spits out the predictions of the model. In the middle, there are hidden layers which are the main pillar of neural networks (they do the main transformations for the model) I personally built a neural network to predict handwritten digits (MNIST dataset).

1️What you will build

A feedforward neural network with:

1 input layer
1 hidden layer (using ReLU)
1 output layer (using softmax)
Training using cross-entropy loss and gradient descent

2 Preliminaries

import numpy as np

Assume:

Input X shape: (features, samples)
Labels Y as one-hot shape: (classes, samples)

3 Initialization

def initialize_parameters(input_size, hidden_size, output_size):

W1 = np.random.randn(hidden_size, input_size) * 0.01

B1 = np.zeros((hidden_size, 1))

W2 = np.random.randn(output_size, hidden_size) * 0.01

B2 = np.zeros((output_size, 1))

return W1, B1, W2, B2

4Activation Functions

ReLU and Softmax:

def ReLU(Z):

return np.maximum(0, Z)

def ReLU_derivative(Z):

return Z > 0

def softmax(Z):

expZ = np.exp(Z - np.max(Z, axis=0, keepdims=True))

return expZ / expZ.sum(axis=0, keepdims=True)

5 Forward Propagation

python

CopyEdit

def forward_propagation(W1, B1, W2, B2, X):

Z1 = np.dot(W1, X) + B1

A1 = ReLU(Z1)

Z2 = np.dot(W2, A1) + B2

A2 = softmax(Z2)

return Z1, A1, Z2, A2

6 Loss Function

Cross-entropy loss:

def compute_loss(Y, A2):

m = Y.shape[1]

loss = -np.sum(Y * np.log(A2 + 1e-8)) / m

return loss

7 Backward Propagation

def backward_propagation(Z1, A1, Z2, A2, W2, X, Y):

m = X.shape[1]

dZ2 = A2 - Y

dW2 = (1/m) * np.dot(dZ2, A1.T)

dB2 = (1/m) * np.sum(dZ2, axis=1, keepdims=True)

dZ1 = np.dot(W2.T, dZ2) * ReLU_derivative(Z1)

dW1 = (1/m) * np.dot(dZ1, X.T)

dB1 = (1/m) * np.sum(dZ1, axis=1, keepdims=True)

return dW1, dB1, dW2, dB2

8 Update Parameters

Using gradient descent:

def update_parameters(W1, B1, W2, B2, dW1, dB1, dW2, dB2, learning_rate):

W1 -= learning_rate * dW1

B1 -= learning_rate * dB1

W2 -= learning_rate * dW2

B2 -= learning_rate * dB2

return W1, B1, W2, B2

9Prediction

def get_predictions(A2):

return np.argmax(A2, axis=0)

def get_accuracy(predictions, Y):

return np.mean(predictions == np.argmax(Y, axis=0))

10Training Loop

def train(X, Y, hidden_size, learning_rate, epochs):

input_size = X.shape[0]

output_size = Y.shape[0]

W1, B1, W2, B2 = initialize_parameters(input_size, hidden_size, output_size)

for epoch in range(epochs):

Z1, A1, Z2, A2 = forward_propagation(W1, B1, W2, B2, X)

loss = compute_loss(Y, A2)

dW1, dB1, dW2, dB2 = backward_propagation(Z1, A1, Z2, A2, W2, X, Y)

W1, B1, W2, B2 = update_parameters(W1, B1, W2, B2, dW1, dB1, dW2, dB2, learning_rate)

if epoch % 100 == 0:

predictions = get_predictions(A2)

acc = get_accuracy(predictions, Y)

print(f"Epoch {epoch}, Loss: {loss:.4f}, Accuracy: {acc:.4f}")

return W1, B1, W2, B2

If you guys need more help in understanding the math behind this checkout the playlist from youtube its up top!

If you guys want to see my very own code chechout my github @ https://github.com/KaushikRao196

What is Artifical Intellegence ?

06/1/25

Artificial Intelligence is exactly what it sounds like. It is technology that is able to simulate human intellect in tasks and allow for more efficiency whether it is for a business or even education. When thinking about AI, most people tend to talk about chatbots like ChatGpt and DeepSeek. These innovations fall under Generative AI, which refers to models that can create new text, audio, images and even videos through the learned patterns of existing data.

AI is a massive field of computer science which contains a major subfield: Machine Learning. Before Machine Learning, computers had to be hardcoded using strong logic and search algorithms however, they could not handle messy, complex real world data. ML allowed computers to learn from data instead of relying on fixed rules. ML consists of subfields like deep learning, reinforcement learning, supervised and unsupervised learning.

When talking about Gen AI we are referring to Deep Learning, a type of machine learning that utilises artificial neural networks to find patterns and analyse large amounts of complex data. These networks are inspired from the human brain and its innate ability to recognise features without having to think for long.

There are three main types of Deep Learning architectures: Convolutional Neural Networks(CNNs) used for image generation, Recurrent Networks(RNNs) used for text or time series and Transformers which power large language models like GPT. In the next blog we will discuss in more detail on how neural networks work and the math behind them.

Page updated

Google Sites

Report abuse