Statistics, Machine Learning and Deep Learning…

Recently I am sure you must have often heard about AGI (Artificial General Intelligence) and that it has something to do with Deep Learning and all that, but, what is Deep Learning? What are the differences with things you might be more familiar with? Like Stats or Machine Learning?

Let me give you a tour, and in the end you’ll find out that Deep Learning is not has mysterious as it sounds (in fact the backbone is liner algebra plus the calculus you studied at college).

Patterns, Patterns and more Patterns:

Overall we are talking about patterns, and Maths is fundamentally the discipline that “speaks” about patterns. In a sense this is the most fundamental definition of Mathematics (see Hofstadter foreword of ‘Godel’s Proof by Nagel & Newman).

Let me put in front of you 3 apples, 3 pens, 3 books, 3 plates and 3 balls… now forget what ‘3’ means in my sentence (if you can), then answer this question: What is the pattern among the various objects that I have presented you? The answer should be that they all seem to be a group of… three distinct objects of a given category. There you have it, a definition of the most basic pattern, and that’s what we call a number. It might seem I am digressing but I want you to get a bit abstract here so let me indulge.

There are then hundredths of specific disciplines that study patterns and relationships and properties of patterns, calculus for example, is the branch of mathematics that studies patterns between infinitely small quantities and how those patterns generalise, geometry can be looked at the study of patterns in shapes (or distances) etc.etc.

From this perspective Statistics, Machine Learning and Deep learning are of the same family, they sit in a branch of Mathematics that studies how to reconstruct or recognise patterns. For example, I could give you the below series of numbers:

2, 4, 8, 16, 32 etc.etc.

And we could try and understand whether the pattern is: number at position n+1 is twice that at position number n.

How Statistics, Machine Learning and Deep Learning differ is at how they Represent patterns, and what type of questions about patterns they focus answering.

With “representation” I mean something pretty basic, in essence a map that allows you to work with the pattern. For example for the square function you can use several representations:

X^2, or X*X, or you could draw a square with a side of length x, or go more abstract and draw a parabola that meets the origin, or as a derivative, or an integral, or any other way you want really, so long as that representation is useful to you.

That representation will only be a useful map to you so long as it helps you move toward your objective, e.g. are you trying to draw a trajectory? Or prove a geometrical theorem? or solve an equation? etc.etc.etc.

Statistics:

Patterns are represented by parameters of special functions we call Distributions, so statistics focuses on finding parameters and telling us how confident we should be that the pattern we identified can be mapped by a given relationship which is determined by some parameters. We don’t always want to “predict” in Statistics, but more likely want to make sure we are onto something substantial or not (e.g. is the proposed medicine sure to drive benefits? Is the defendant DNA matching the blood sample?).

Statistical Inference, to use an example, would like to answer the question whether the income of a customer has an impact on how much the customer is likely to spend on your products. It will give you a yes or no answer (yes for Jewellery… not so much for bottled water perhaps?).

Statistics is a highly developed discipline that is able to define boundaries and provide indisputable proofs, as it is a branch of Mathematics. For example it can answer the following general question:

Under certain assumptions on my data, what is the “best” test to verify my hypothesis? What is an admissible test? What is the dynamic policy that maximises utility? Best strategy to win a game? etc.etc.

Machine Learning (or Statistical Learning):

With Machine Learning we are interested in teaching computers to identify patterns, for the purpose of making predictions. For example, if the income of my customer is X how much is she likely to spend on our products? Our task will be that of giving as much data as we can, and a learning framework, for the machine to establish that pattern between income and spend (if there is one).

In Machine Learning it is also important how patterns are represented. We can only use representations that can be implemented in computer programs efficiently.

For example decision trees represent patterns as a series of if-else statements… if this -> then this else that etc.etc. So, in the case of decision trees, the pattern needs to be amenable to mapping to simple if-else statements (e.g. if income is between 100 and 200 then spend will be 10, if greater than 200 then 20 etc.etc.), but overall all Machine Learning algorithms have their own specific ways of representing patterns.

To give another example K Nearest Neighbours is an algorithm where patterns are identified “spatially” by mapping the input data to an abstract space where we can define distances. In our example a KNN learning approach would put incomes and spend in a two dimensional space and try to find, when you give it only income, the spend that puts the point in a region of space that “fits” with the geometry of that shape the training data drew.

Unlike Statistics (and Decision Theory as a consequence) Machine Learning does not give close answers to defined questions. There is no theorem that goes something like: given some assumptions on your data this specific learning approach is guaranteed to give you the best prediction.

Machine Learning is therefore closer to engineering than Mathematics, a lot of the effort (and fun) in Machine Learning is that you have to play with the tools to go somewhere and find new opportunities.

Obviously the algorithms are not found by trial and error only, there is a lot of pure statistics behind Machine Learning but it is mixed with research in computer science as well.

Deep Learning (or Deep Neural Networks):

Deep Learning is a specific branch of Machine Learning where patterns are to be represented by Networks of fundamental computational units called “Neurons”.

Such Neurons are not that complicated, as they perform basic algebra + a simple transformation (to be decided upfront), but the overall architecture of the networks plus the size can make them incredibly powerful. In fact Neural Networks have a nice theorem to them, the universal approximation theorem, which states that, under certain assumptions, neural nets can indeed represent any well behaved function, guaranteed! (we do not have such theorems for decision trees or KNN).

Chat GPT, for example, has millions of Neurons and hundredths of billions of parameters that are only devoted to the task of predicting what sentence from a given language best fits the “pattern” of the conversation that a given prompt initiated (in a human like way).

As with Machine Learning, and even more so, Deep Learning is closer to Computer Science than Statistics since we basically need to proceed experimentally within technical constraints, we do not have solid theoretical foundations for any results that are obtained in Deep Learning, we literally do not know what Chat GPT is doing in much details but, through trial and error, a certain Network architecture (called the transformer) has been found to be effective, but this is driven also by what architectures we can implement within the constraints of existing Computers and computer languages, so Deep Learning is also about the hardware.

A lot of current research focuses on architectures and how to set up various layers of the network.

To be clear this is not “trial and error” in the common language sense, but works more like experimental Physics, with a close interplay with theoretical advances and areas of inquiry.

I hope you found the above useful but for anything a bit deeper on Deep Learning, I recommend the two books below:

Both very readable introductions, and can also give a sense of what might be needed to work on AI as opposed to consume AI, or what might be needed to develop AI in house (lots of compute power, computer scientists and mathematicians for sure).