Mathematics for Machine Learning - Chapter 1: Unveiling Vectors
Welcome to the foundational chapter of our journey into the mathematics that powers Machine Learning. If you're looking to understand the 'why' behind the 'how' in ML algorithms, a solid grasp of Linear Algebra is indispensable. And at its very core lies the concept of a vector. Often misunderstood or oversimplified, a vector is far more than just 'an arrow' or 'a list of numbers'. Let's explore its multifaceted identity from different viewpoints.
What is a Vector? A Multi-Perspective View
The beauty of mathematics, especially in applied fields like Machine Learning, is how abstract concepts manifest in concrete ways. A vector is a prime example, holding slightly different connotations depending on whether you're a physicist, a computer scientist, or a pure mathematician.
The Physics Perspective: Magnitude and Direction
In physics, a vector is inherently tied to spatial concepts. It's an entity possessing both a magnitude (size or length) and a direction. Think of forces, velocities, displacements, or accelerations. When you push an object, the force has a certain strength (magnitude) and acts in a particular way (direction).
Example: Displacement
If you walk 5 kilometers East, your displacement is a vector. Its magnitude is 5 km, and its direction is East. If you just say '5 km', that's a scalar (magnitude only), lacking directional information.
Physicists often visualize vectors as arrows starting from an origin, with the arrow's length representing magnitude and its orientation representing direction. This geometric intuition is powerful and forms the basis for many higher-level concepts.
The Computer Science Perspective: An Ordered List of Numbers
For a computer scientist, especially in the realm of Machine Learning and Data Science, a vector often manifests as an ordered list, array, or tuple of numbers. Each number in the list is a 'component' or 'feature'. This representation is crucial because computers excel at manipulating structured data.
Analogy: A Data Row
Imagine a spreadsheet row describing a house: [2000, 3, 2, 199000]. This can be a vector where 2000 is square footage, 3 is the number of bedrooms, 2 is the number of bathrooms, and 199000 is the price. Each number represents a specific attribute (feature) of the house.
In Machine Learning, a data point (like a customer, an image, or a word) is frequently encoded as a vector. Each element of the vector corresponds to a specific attribute or feature of that data point. Algorithms then process these numerical representations.
The Mathematics Perspective: An Element of a Vector Space
From a pure mathematical standpoint, a vector is a more abstract entity. It's defined as an element of a vector space. A vector space is a collection of objects (which we call vectors) that satisfy a specific set of axioms concerning two fundamental operations: vector addition and scalar multiplication. These axioms ensure that these objects behave 'nicely' and consistently, much like numbers do under addition and multiplication.
Key Takeaway: Unification
While the perspectives differ, they are not contradictory. The physics arrow in 2D or 3D can be represented as an ordered list of 2 or 3 numbers, which, in turn, is an element of the 2-dimensional or 3-dimensional vector space $\mathbb{R}^2$ or $\mathbb{R}^3$. The mathematical definition provides the underlying framework that unifies these different manifestations.
The Pillars of Linear Algebra: Vector Addition and Scalar Multiplication
You might wonder, with all the complex operations in linear algebra, why are vector addition and scalar multiplication considered the 'most important'? The answer lies in their foundational role: they are the two operations that define a vector space itself, and nearly every other concept in linear algebra builds upon them.
Vector Addition: Combining Directions and Information
Vector addition allows us to combine vectors, geometrically representing the 'resultant' of multiple forces or displacements. Algebraically, it's performed component-wise:
$$ \mathbf{v} + \mathbf{w} = \begin{pmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{pmatrix} + \begin{pmatrix} w_1 \\ w_2 \\ \vdots \\ w_n \end{pmatrix} = \begin{pmatrix} v_1+w_1 \\ v_2+w_2 \\ \vdots \\ v_n+w_n \end{pmatrix} $$
Why it's crucial:
- Superposition: In physics, the net effect of multiple forces.
- Relative Movement: Combining velocities.
- Feature Combination: In ML, combining different aspects of data. For instance, if you have a feature vector for 'fruit properties' and another for 'nutrient content', adding them (though not always directly meaningful in that context) is a fundamental operation if you were to, say, average two data points.
Scalar Multiplication: Scaling and Weighting
Scalar multiplication involves multiplying a vector by a single number (a 'scalar'). This operation scales the vector, changing its magnitude but keeping its direction (or reversing it if the scalar is negative). Algebraically, it's also performed component-wise:
$$ c \mathbf{v} = c \begin{pmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{pmatrix} = \begin{pmatrix} c v_1 \\ c v_2 \\ \vdots \\ c v_n \end{pmatrix} $$
Why it's crucial:
- Scaling: Doubling the velocity of an object.
- Feature Scaling: Normalizing data in ML (e.g., dividing all features by their max value).
- Weighting: Assigning importance to different features. In neural networks, weights are applied to input vectors via scalar multiplication (or matrix multiplication, which is a generalization).
Why "Linear" Algebra? What's Linear About It?
The term 'linear' is central to this branch of mathematics. It refers to operations and relationships that preserve the fundamental structure defined by vector addition and scalar multiplication. Essentially, linear relationships are those that can be drawn as straight lines, planes, or hyperplanes, and linear transformations map these 'straight' structures to other 'straight' structures.
Formal Definition of Linearity:
An operation (or function/transformation) $f$ is considered linear if it satisfies two properties:
- Additivity: $f(\mathbf{v} + \mathbf{w}) = f(\mathbf{v}) + f(\mathbf{w})$
- Homogeneity of Degree 1: $f(c \mathbf{v}) = c f(\mathbf{v})$
These two properties can often be combined into one: $f(c_1 \mathbf{v} + c_2 \mathbf{w}) = c_1 f(\mathbf{v}) + c_2 f(\mathbf{w})$. This is the hallmark of a linear transformation.
Think of simple linear equations like $y = mx$. The graph is a straight line passing through the origin. Linear algebra extends this concept to multiple dimensions, dealing with systems of such equations, and transformations that maintain this 'straightness' property. It's about proportionality and superposition.
Analogy: Stretching a Rubber Band
Imagine stretching a rubber band. If you stretch it twice as much, any point on the band moves twice as far from its origin (scalar multiplication). If you stretch it along two directions simultaneously, the final position is the sum of movements from each stretch (vector addition). A 'linear' stretch means the band remains a straight line; it doesn't curve or twist in a non-proportional way.
In Machine Learning, many algorithms (e.g., linear regression, principal component analysis) fundamentally rely on linear relationships or approximate non-linear ones using linear techniques. Neural networks, despite their complexity, are built from layers of linear transformations followed by non-linear activation functions.
Dimensions of a Vector (Not Vector Space)
When we talk about the 'dimension' of a single vector, it's quite straightforward:
Definition: Dimension of a Vector
The dimension of a vector refers to the number of components (or entries) it contains. If a vector has n components, it is an n-dimensional vector. We often say it belongs to the vector space $\mathbb{R}^n$.
For example:
- $\mathbf{v} = [3, -1]$ is a 2-dimensional vector.
- $\mathbf{w} = [5, 0, 7]$ is a 3-dimensional vector.
- A vector representing the features of a dataset with 100 attributes would be a 100-dimensional vector.
It's crucial to distinguish this from the 'dimension of a vector space', which refers to the maximum number of linearly independent vectors needed to span that space. For introductory purposes, when discussing a single vector, its dimension is simply its number of elements.
Conclusion: The Humble Yet Powerful Vector
From describing physical forces to encoding complex data points for machine learning models, vectors are the fundamental building blocks of linear algebra. Their seemingly simple operations of addition and scalar multiplication unlock an entire universe of mathematical tools that allow us to understand, manipulate, and learn from data. As we delve deeper, you'll find that mastering these basic concepts provides an unshakeable foundation for tackling more advanced topics in the exciting world of Machine Learning.
Take a Quiz Based on This Article
Test your understanding with AI-generated questions tailored to this content