The Role of Vectors in Machine Learning

We hear over and over that machine learning is linear algebra, which, in turn, has to do with vectors and matrices. But why? What’s so amazing about this particular mathematical discipline as it pertains to ML? Here, from a bird’s eye-view, are 7 reasons why understanding vectors & matrices and their manipulations is integral to making sense of machine learning:

“A masterpiece”
- Geoff Hinton

  1. Most of today’s ML algorithms first turn data—whether they are images, audio or text—into vectors. The encoding is unique to each type of data, and even for a given type of data, such as text, there might be multiple ways of encoding the information as vectors. A vector is a sequence of numbers (the ordering is important), and the number of elements of a vector gives you the dimensionality of the vector space in which the vector resides. So, 3 numbers equate to a 3D space, 10 numbers to 10D space, and so on, with each element varying along its respective axis in that space.

  2. Encoding data as vectors makes it possible to use measures of similarity and dissimilarity to compare instances of data (a property exploited by, say, the k-Nearest Neighbor algorithm).

  3. In vector spaces, proper encoding can result in nearby vectors having similar semantic meanings, and thus allowing for new kinds of manipulations. For example, the well-known word2vec model—which uses a neural network to embed words in a vector space—shows that the “distance” between the vectors for “man” and “woman” is comparable to the distance between the vectors for “king” and “queen”. So, if you take the vector for “king”, subtract “man” and add “woman”, you would be very near the vector for “queen”.

  4. Many ML algorithms, and deep neural networks in particular, can be thought of as algorithms that transform input vectors into output vectors, via matrix-vector manipulations (with the elements of such manipulations often times being passed through non-linear functions, as in the case of neural networks).

  5. Converting discrete data (such as images and text) into vectors embeds them in a continuous space. This has numerous advantages. Functions in vector spaces are continuous and differentiable, making them amenable for use in optimization techniques such as gradient descent, which are used to train neural networks and other ML models.

  6. Also, embedding discrete data in continuous vector spaces lets the ML algorithm access vectors that weren’t in the training data, making it possible to generalize to new (but similar) data, and also to generate new (but similar) data.

  7. And when it comes to neural networks, it’s the mathematics of vector spaces that’s key to proving that neural networks with at least one hidden layer, given enough artificial neurons, are universal function approximators: they can approximate any desired function to convert an input vector to an output vector.

    For more, see: WHY MACHINES LEARN

Previous
Previous

Machine Learning’s Arc Of History

Next
Next

The Many Ways To Grok Machine Learning