23.2 From Brain Neurons → to Artificial Neurons

We just saw how your brain neurons pass messages:
Dendrites listen.
Cell body decides.
Axon passes the message forward.
Synapse connects to the next neuron.
Scientists thought:
“If the brain can learn by connecting neurons together, maybe we can build computers that ‘think’ in a similar way!”
So they created Artificial Neural Networks (ANNs).
How ANN copies the brain
In ANN, inputs are like dendrites.
Example: For a picture of a cat , inputs are the pixels (tiny dots of the image).Each input has a weight (like importance).
In the brain, some signals are stronger than others. In ANN, weights decide which input matters more.Then comes the summing point (like the neuron’s cell body).
It adds everything up: important signals + bias.Next, an activation function acts like the brain’s decision-making spark
.
It decides: Should I pass this signal forward or not?Finally, the signal travels to the next layer of artificial neurons, just like biological neurons passing signals through synapses.
Example
Imagine training an ANN to recognize “Fire” from an image:
Inputs = image pixels (red, orange, yellow shades).
Weights = give more importance to “red + yellow” parts.
Activation function = if enough of these colors are present, the neuron fires.
Output = “Yes, this looks like fire.”
So just like your brain neurons helped you move your hand away from a flame,
ANN neurons help a computer recognize fire in a photo or video.
Final Line
ANNs are just computer programs inspired by how your brain’s neurons work—taking inputs, deciding, and passing signals forward to solve problems like recognizing images, sounds, or even making decisions.
Weight Concept
In the Brain
When one neuron talks to another, the strength of the signal depends on:
How important or intense that signal is.
Example: If you just touch warm water, the signal is weak. If you touch boiling water, the signal is much stronger.
So, stronger signals = more likely to make the next neuron “fire” (send the message onward).
In Artificial Neural Networks (ANNs)
We copy this idea with weights.
Each input has a weight, which says how important that input is.
Big weight = strong signal → has more influence on the decision.
Small weight = weak signal → less influence.
Example:
If an ANN is trying to recognize a cat in a picture:
The pixel showing whiskers might get a high weight (important clue).
The pixel showing just the background wall gets a low weight (not important).
So, “some signals are stronger than others” means that both in the brain and in ANNs, not every piece of information matters equally. Some clues are more important and carry more “power” in decision-making.
Cat features (like whiskers, ears, eyes, fur) → very important → strong signals.
Background wall (plain surface behind the cat) → not important for knowing it’s a cat → weak signal.
Linear Transformation (the “weighing & adding step”)
Analogy:
Imagine you’re making a decision about whether to go out for a movie .
You consider three things:
Money (do I have enough?)
Time (am I free?)
Mood (do I feel like it?)
But not all things are equally important.
Money might matter the most (high weight).
Time might matter a little less.
Mood might matter the least.
So, you multiply each factor by its importance (weight), and then add them all up.
That adding and weighing is what we call a linear transformation in ANN.
It’s just: combine all inputs with their importance.
In math:
z=w1x1+w2x2+…+wnxn+b
The weights × inputs decide how strong each input is.
The bias (b) shifts the result up or down.
What is Bias in ANN?
Bias is like an extra knob the network can adjust, even when all inputs are zero.
It helps the model shift the decision boundary and make more flexible predictions.
“Bias is like a free pass or adjustment the neuron gets. Even if no input is given, the neuron can still fire because bias gives it a starting point.”
Explanation of Bias with real life examples
Why real-life patterns aren’t always “zero when inputs are zero”
Zero input doesn’t mean zero outcome in reality.
Example: A student who doesn’t study (x1=0) and doesn’t sleep (x2=0) might still pass because the test was super easy.
Example: A business with no ads (x1=0) and no promotions (x2=0) might still get customers because of word-of-mouth.
Real-World Analogy
1. Teacher’s Grace Marks
Imagine a teacher always adds +10 grace marks (bias).
Even if you wrote nothing (input = 0), you still get 10 marks.
Real life often has a baseline effect that is independent of the inputs.
Biological inspiration.
Real neurons also have a “resting potential.”
Even when no input signals arrive, neurons don’t sit at a perfect zero — they have a small baseline activity.
Link to ANN Bias
Just like neurons don’t start from zero, ANNs also have a bias term.
It ensures the model can “fire” even if inputs are zero.
ANN bias mimics this biological property.
Bias exists because in the real world, outcomes don’t always vanish when inputs are zero. There’s often a baseline effect (like free marks, or resting activity in neurons). Without bias, our neural network would be too rigid and fail to learn real-world patterns.
The network doesn’t want to always output zero for zero input, because real-world problems don’t work that way. Bias gives the network flexibility to shift its predictions — like a head start or baseline — so it can fit data better.
it’s like setting a starting point or baseline before considering the inputs.
What does “passes through zero” mean?
It means: when all inputs are 0, the output must also be 0.
Imagine you’re baking, but the rule says “sweetness must always start from zero.”
That means you can’t set a baseline sugar amount — every cake starts completely unsweet unless ingredients push it up.
Result: You can’t make cakes that require some default sweetness.
Bias = freedom to set that baseline.
That’s too restrictive! Just like a cake could taste completely bland without fixed sugar.
bias ensures the neuron has a starting flavor — it shifts the decision boundary, making the network more flexible.
Key takeaway
If the true pattern passes through zero, then no-bias works fine.
But if the true pattern is shifted (most real-life cases), the network without bias will fail to learn it correctly, no matter how much you train.
Real-life situations where it can happen
Work = Force × Distance
If force = 0 and distance = 0 → Work = 0.
The formula naturally passes through zero.
Simple proportional rules
Example: salary = 200 × hours worked.
If you work 0 hours, salary = 0.
The line goes through the origin.
In these cases, the relationship is purely linear and proportional.
But in most real-life problems…
Outcomes don’t start from zero.
There’s usually a baseline, threshold, or offset.
Examples:
Exam pass/fail: A student who studies 0 hours might still pass if the exam is too easy (baseline offset).
Business revenue: A shop with 0 ads and 0 promotions might still get some walk-in customers.
Biology: A neuron at rest is not at 0 activity — it has a resting potential.
These require the line to be shifted away from zero → that’s what bias does.
Case 1: Proportional Relationships (Pass through zero)
Work = Force × Distance
Salary = 200 × Hours worked
Here, if all inputs = 0, output should also = 0.
Adding a bias would distort the truth (e.g., predicting salary > 0 even if hours worked = 0).
So in this case, the best model learns with bias ≈ 0.
Case 2: Real-world Non-Proportional Relationships (Need a baseline)
Exam pass/fail → Even with 0 study, someone may pass.
Business revenue → Even with 0 ads, some customers come.
Neurons → Even with 0 inputs, there’s resting potential.
Here, outputs don’t vanish when inputs are zero.
Bias becomes essential to capture the baseline.
Big Picture
Bias doesn’t “fail” in proportional cases — it just naturally adjusts to 0 during training.
If the true pattern needs no baseline, training will push bias → 0.
If the true pattern needs a baseline shift, bias will learn a non-zero value.
So bias isn’t harmful — it just gives the model flexibility.
Bias is like giving the network the option to shift the line. If the real-world pattern passes through zero, the bias will simply learn to be zero. But if not, bias will take a non-zero value. That’s why bias is always included — it makes the model flexible for both cases.
Activation Function (the “final decision step”)
Now, after adding everything up, you still need to decide:
If the total is high enough → “Yes, I’ll go to the movie.”
If the total is too low → “No, I’ll stay home.”
This decision-making step is the activation function.
It acts like a gate that checks: Should I pass the signal forward or not?
Put Together (Simple Story)
Linear transformation = collecting opinions and weighting them.
Activation function = making the final yes/no decision.
Super-Simple Example for Class
Ask students:
“If you have 80 marks in exams, good attendance, and you study daily, will you pass? ”
Each of these is an input.
Some inputs matter more (exam marks > attendance).
You add them up = linear transformation.
Then you ask: “Is the total good enough to pass the cutoff line?”
That’s the activation function.
What is an activation function in ANN (actually)?
After the linear transformation (weighing inputs and adding them up), the ANN needs a mathematical function to decide:
Should this neuron be active (fire) or inactive?
How strongly should it pass the signal forward?
That deciding function is the activation function.
Common Activation Functions (explained simply)
Step Function
fx={1, if x≥0 0, if x<0
Very simple: If input is above a threshold → output 1 (fire).
If below → output 0 (no fire).
Like a light switch (on/off).
Problem: Too rigid, not used much today.
Sigmoid Function
Formula:
Smoothly squashes values between 0 and 1.
Small input → close to 0
Large input → close to 1
Looks like an S-curve.
Example: Probability of being a cat vs not a cat.
Limitation: Can be slow and cause “vanishing gradients.”
tanh (Hyperbolic Tangent)
fx=ez-e-zez+e-z
Similar to sigmoid, but outputs between –1 and +1.
Negative inputs → negative outputs.
Positive inputs → positive outputs.
Helps when signals need to be centered around 0.
ReLU (Rectified Linear Unit)
(most common today)
Formula: f(x)=max(0,z)
If input is positive → pass it forward.
If input is negative → output 0.
Like a filter: only keeps useful positive signals.
Fast, simple, and works great in deep networks.
Step → Teacher says: “Pass/Fail only, no grades.”
Sigmoid → Teacher gives grades between 0–100, but squashed into 0–1.
tanh → Grades can be negative (bad performance) or positive (good performance).
ReLU → Teacher ignores bad scores (negative = 0) and only cares about positive effort.
Threshold Decision
In step function, the threshold is fixed (e.g., 0).
In sigmoid/tanh, the “threshold” is smooth: closer to 1 means strongly positive, closer to 0 (or –1 for tanh) means negative.
In ReLU, there’s no threshold in the usual sense — negative values are cut to 0, positives just go through.
So, in real ANN:
Linear Transformation = add up weighted inputs.
Activation Function = mathematical function that decides the neuron’s output
Example: Predict if a student will pass or fail an exam
Inputs to the network:
Hours studied = 2
Hours slept = 8
We want the ANN to say Pass (1) or Fail (0).
Step 1: Inputs & Weights
Each input is multiplied by a weight (importance).
Suppose weights are:
Study hours weight = 0.6
Sleep hours weight = 0.4
Weighted inputs:
2 × 0.6 = 1.2
8 × 0.4 = 3.2
Step 2: Linear Transformation (Summation)
Now add them up with a bias (say b = –4).
z=(1.2+3.2)+(−4)=0.4
Step 3: Activation Function
Let’s use Sigmoid:
z=11+e-z
For z = 0.4:
0.4≈0.60
Output = 0.60 (60% chance of passing).
Step 4: Final Decision
If probability ≥ 0.5 → Pass
If probability < 0.5 → Fail
Here, 0.60 ≥ 0.5 → Pass.
It looks at how much you studied and slept.
It multiplies them by importance (weights).
Adds everything up (linear transformation).
Runs it through a decision formula (activation).
Finally says: Yes, this student will pass.



