PART 0 — Foundations of Quantum Machine Learning

Why Part 0 Matters

Quantum Machine Learning (QML) is often presented as something mysterious or futuristic. In reality, it is a natural extension of classical machine learning into quantum state spaces.

Before discussing algorithms, speedups, or applications, we must understand what replaces vectors, layers, weights, and nonlinearities in a quantum setting. This part establishes the conceptual and mathematical foundations required to understand why QML works at all.

0.1 A Common Mathematical Language: Linear Algebra

Both machine learning and quantum computing are fundamentally based on linear algebra.

In classical machine learning:

Data is represented as vectors:

$x \in \mathbb{R}^n$

Models are functions parameterized by matrices and vectors:

$f_\theta(x) = Wx + b$

Learning means adjusting parameters to minimize a loss.

In quantum computing:

Information is represented as state vectors:

$|\psi\rangle \in \mathbb{C}^{2^n}$

Computation is performed via linear transformations:

$|\psi\rangle \rightarrow U|\psi\rangle$

where (U) is a unitary matrix. (read more here)

📌 Key observation
Both paradigms manipulate vectors using matrices. The difference lies in:

the space (real vs complex),
the constraints (unitary vs arbitrary),
and how outputs are extracted (measurement vs direct readout).

This shared foundation is what makes QML possible.

0.2 Classical Bits and Quantum Qubits

Classical Bit

A classical bit can take only one of two values:

$b \in {0,1}$

At any instant, the bit is in a definite state.

Quantum Qubit

A qubit, by contrast, exists in a superposition of basis states:

$|\psi\rangle = \alpha|0\rangle + \beta|1\rangle$

with the normalization condition:

$|\alpha|^2 + |\beta|^2 = 1$

( $\alpha, \beta \in \mathbb{C}$ ) are probability amplitudes
Measurement yields:
outcome ( $0$ ) with probability ( $|\alpha|^2$ )
outcome ( $1$ ) with probability ( $|\beta|^2$ )

📌 Important distinction
A qubit is not “partly 0 and partly 1” in a classical sense.
It is a vector in a complex vector space, and probabilities emerge only upon measurement.

0.3 Geometric Interpretation: The Bloch Sphere

Any pure qubit state can be written as:

$|\psi\rangle = \cos(\theta/2)|0\rangle + e^{i\phi}\sin(\theta/2)|1\rangle$

This maps to a point on the Bloch sphere:

North pole → ( $|0\rangle$ )
South pole → ( $|1\rangle$ )
Equator → equal superpositions

🧠 Machine learning intuition
A single qubit can encode continuous information, similar to a real-valued feature, but constrained to lie on a sphere.

0.4 Multi-Qubit Systems and Tensor Products

When combining qubits, we use the tensor product.

Two-qubit state:

$|\psi\rangle = \sum_{i,j \in {0,1}} \alpha_{ij}|ij\rangle$

This state lives in a 4-dimensional complex space.

Number of qubits	Dimension
$1$	$(2)$
$2$	$(4)$
$(n)$	$(2^n)$

⚠️ This exponential growth is often misunderstood.

📌 Crucial point
The system exists in a $(2^n)$ -dimensional space, but measurement reveals only limited information.
This is why QML is powerful—but not trivially exploitable.

0.5 Tensor Products of Operators: Understanding ( $H \otimes I$ )

Quantum gates also combine via tensor products.

$H \otimes I$

means:

Apply Hadamard (H) to the first qubit
Apply identity (I) to the second qubit

In matrix form:

$(H \otimes I)|01\rangle = (H|0\rangle) \otimes |1\rangle$

🧠 ML analogy
This is analogous to transforming one feature while leaving another unchanged.

0.6 Entanglement: Beyond Classical Correlations

Separable (non-entangled) state:

$|\psi\rangle = |\phi\rangle \otimes |\chi\rangle$

Entangled state (Bell state):

$|\Phi^+\rangle = \frac{|00\rangle + |11\rangle}{\sqrt{2}}$

This state cannot be written as a tensor product of two single-qubit states.

📌 Why this matters for QML

Classical ML models interactions using extra parameters
Quantum models generate non-factorizable feature interactions naturally

Entanglement acts as a built-in inductive bias for modeling complex dependencies.

0.7 Quantum Gates as Trainable Parameters

Quantum gates are unitary transformations:

$U^\dagger U = I$

Of special importance are parameterized rotation gates:

$R_x(\theta), \quad R_y(\theta), \quad R_z(\theta)$

Example:

$R_y(\theta) = \begin{bmatrix} \cos(\theta/2) & -\sin(\theta/2) \\ \sin(\theta/2) & \cos(\theta/2) \end{bmatrix}$

📌 In QML:

These ( $\theta$ ) values are learnable parameters
They play the same role as weights in neural networks

0.8 Measurement as the Source of Nonlinearity

Quantum evolution is strictly linear:

$|\psi\rangle \rightarrow U|\psi\rangle$

However, measurement produces nonlinear classical outputs:

$\hat{y} = \langle \psi | O | \psi \rangle$

where (O) is an observable (e.g., Pauli-Z).

🧠 Critical insight
Measurement replaces activation functions.
Without measurement, quantum circuits alone cannot perform learning.

0.9 The Hybrid Quantum–Classical Learning Paradigm

Modern QML models are hybrid systems:

Classical data → Quantum encoding
               → Parameterized quantum circuit
               → Measurement
               → Classical loss + optimizer
               → Parameter update

Quantum computer: evaluates complex functions
Classical computer: performs optimization

This is not a limitation—it is a design principle.

Summary of Part 0

By the end of Part 0, we have established:

✔ QML is linear algebra on quantum states
✔ Qubits generalize classical features
✔ Entanglement encodes complex feature interactions
✔ Trainable gates replace weights
✔ Measurement provides nonlinearity