Interactive Neural Network Visualizer
Step through neural network training and see every computation in mathematical detail.
Quick Start
Click ▶ Play to start training • Explore the tabs at the top (Network, Computation, Data, Predictions, Parameters) • Adjust learning rate slider to see different behaviors
📖 Detailed Controls & Tips
Control Buttons
- ▶ Play - Auto-train | ⏸ Pause - Stop | ⏭ Step - Manual advance | ↻ Reset - New weights
Learning Rate
- 0.1 - Steady (default) | 0.5-1.0 - Faster | 2.0+ - Unstable (try it!)
What Each Tab Shows
- Network - Visual diagram (node size = activation, edge color = weight sign)
- Computation - Step-by-step math with notation glossary
- Data - Scatter plot & table of training points
- Predictions - How well the network solves XOR
- Parameters - Weight evolution graphs over time
Learning Tips
- Start with defaults (100 samples, no noise, LR 0.1)
- Watch loss decrease to near 0
- Try LR 0.5, 1.0, 2.0 to see different behaviors
- Add noise to test robustness
- Reset and compare different runs
Try It Now
What is XOR?
XOR (Exclusive OR) is a logical operation that outputs true (1) only when inputs differ:
| Input A | Input B | XOR Output |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |
Why XOR is Perfect for Learning
XOR is the simplest problem that requires a hidden layer:
- Linear separability: You cannot draw a single straight line to separate the outputs
- Requires feature combination: The network must learn that "inputs differ" is the pattern
- Classic test: If a neural network can learn XOR, it has the capacity for non-linear learning
What the Network is Learning
The demo shows a 2→4→1 network learning XOR through these steps:
- Random initialization: Weights start random, predictions are terrible
- Forward pass: Input flows through network, producing a prediction (0 to 1)
- Calculate error: Compare prediction to actual XOR answer
- Backpropagation: Calculate how to adjust each weight to reduce error
- Update weights: Apply gradients to make predictions slightly better
- Repeat: After many iterations, the network learns the XOR pattern
Watch for:
- Early training: Loss is high (~0.7), predictions random
- Middle training: Loss decreases, patterns start emerging
- Convergence: Loss near 0, all 4 XOR cases predicted correctly
The Data Visualization tab shows the 4 training points and how the network classifies them in the input space.
What This Demo Teaches
1. How Neural Networks Learn
Watch a network solve the XOR problem from random initialization:
- Initial state: Random weights, terrible predictions
- Training: Gradual weight adjustments via backpropagation
- Convergence: Network learns the correct pattern
2. Forward Pass in Detail
See each computation step:
Input layer → Hidden layer:
z₀ = x₀ · w₀₀ + x₁ · w₁₀ + b₀
a₀ = sigmoid(z₀)
Hidden layer → Output:
z_out = a₀ · w₀₀ + a₁ · w₀₁ + ... + b_out
prediction = sigmoid(z_out)Every value is shown with actual numbers.
3. Backward Pass (Backpropagation)
See how errors propagate backward:
Output error → Hidden layer error → Weight gradientsThe chain rule is made explicit at each step.
4. Why Hidden Layers Matter
XOR is not linearly separable:
- Without hidden layer: Cannot solve XOR
- With hidden layer: Network creates new features (intermediate representations)
- Visualization: See how hidden neurons create separable representations
Features
Interactive Controls
- Step-by-step: Move forward/backward through training
- Play/Pause: Watch training in real-time or at your own pace
- Reset: Start over with new random weights
- Randomize: Try different initializations
- Learning rate: Adjust from 0.01 to 2.0
Training Data Options
- Data size: 50 to 500 training examples
- Noise level: 0% to 50% label corruption
- Confidence penalty: Push predictions toward 0 or 1
- Regenerate: Create new random datasets
Multiple Visualizations
Network Graph:
- Nodes show activations
- Edges show weights (color = sign, thickness = magnitude)
- Click edges to see detailed gradient info
Computation Panel:
- Forward pass step-by-step
- Loss calculation
- Backward pass gradients
- Notation glossary
Data Visualization:
- Scatter plot of training points
- Table view of all samples
- Noisy samples highlighted
Parameter Graphs:
- Weight evolution over time
- Loss curve
- Path strength analysis
Predictions Panel:
- Network output on canonical XOR points
- Accuracy and loss metrics
Understanding the Architecture
Network Structure: 2 → 4 → 1
Input layer (2 neurons):
- Takes two binary inputs (0 or 1)
Hidden layer (4 neurons):
- Creates intermediate representations
- Learns features like "AND" and "OR"
- Uses sigmoid activation
Output layer (1 neuron):
- Combines hidden features
- Produces final prediction (0 to 1)
Why 4 Hidden Neurons?
Could solve XOR with 2-3 hidden neurons, but 4 provides:
- More capacity for learning
- Clearer visualization of path strengths
- Redundancy for noisy data
Common Questions
Why does loss sometimes increase?
Reasons:
- Learning rate too high → overshooting optima
- Noisy training data → memorizing wrong patterns
- Stochastic training → batch-to-batch variance
Experiment: Lower the learning rate and watch convergence smooth out.
Why do some weights grow very large?
Answer: The network is becoming confident. Large weights → steep sigmoid → predictions close to 0 or 1.
Experiment: Enable "confidence penalty" to discourage extreme weights.
What are path strengths?
Path strength = product of weights along a path from input to output.
Example: Input x₀ → Hidden h₂ → Output
Path strength = w[0→2] × w[2→out]Strong paths dominate the network's computation.
Experiment: Click "Parameter Graphs" tab to see path evolution.
Learning Exercises
Beginner
- Watch one full training run: Observe loss decreasing
- Inspect final weights: Which paths matter most?
- Reset and compare: Do different random starts converge to similar solutions?
Intermediate
Learning rate experiments:
- Try 0.1 (default), 0.5, 1.0, 2.0
- When does training become unstable?
Data size impact:
- Train with 50 vs 500 samples
- Does more data always help?
Noise robustness:
- Add 10%, 20%, 30% noise
- At what point does learning fail?
Advanced
Hypothesis testing:
- Form a hypothesis (e.g., "larger learning rate = faster convergence")
- Test it systematically
- Analyze results
Path analysis:
- Identify the dominant path at convergence
- What does this path compute?
- Why is it stronger than others?
Initialization sensitivity:
- Run 10 training sessions
- Do they all find similar solutions?
- What varies? What stays constant?
Technical Implementation
Pure TypeScript
No ML libraries - everything implemented from scratch:
- Forward pass computation
- Backpropagation algorithm
- Weight updates
- Loss calculation
Why? Understanding the implementation is part of the learning.
Testing
Comprehensive test suite ensures correctness:
- Forward pass produces correct shapes
- Gradients computed accurately
- Weight updates applied correctly
- Loss decreases over training
Source Code
All code is open source:
- Network logic:
web/src/network/Network.ts - Visualization:
web/src/components/ - State management:
web/src/hooks/useTraining.ts
Next Steps
After mastering this demo:
- Read the implementation: See how backprop works in code
- Explore the learnings: What we discovered while building this
- Try modifying it: Fork the code and experiment
- Wait for more demos: Attention mechanisms coming soon!
Feedback
Found a bug? Have a feature idea? Something confusing?
Open an issue on GitHub to help improve this learning tool!
This visualizer is part of the LLM Workings learning project exploring neural networks and language models from the inside out.