Data is one of the essential ingredients for building the best machine learning models. The more you know about the data, the better your machine learning model will be, as you will be able to depict the reason behind your model's performance. Probability is one of the most important mathematical tools that help in understanding different data patterns. Famous algorithms in machine learning like Naive Bayes are completely derived from the probability theory. Hence, knowing probability basics will always be considered to be the best to start the ML journey.
In this article, we will not be only describing the theoretical aspects of probability, but we will give you a sense of where those theoretical aspects will be used in ML. So, let’s start without any further delay.
If we have to define “probability”,
Originated from the “Games of Chance,” probability in itself is a branch of mathematics concerned about how likely it is that a proposition is true.
In a more layman manner, Probability is simply a Possibility of occurring a random event. For example, what is the possibility of having rain tomorrow? The values of probability can only lie between 0 and 1, with 0 and 1 inclusive.
If we notice carefully, every daily-life phenomenon can only be of two types:
There are two famous terms when we define the relationship between events from the same experiment.
Mathematically, probability can be defined as:
If a random experiment has n > 0 mutually exclusive, exhaustive, and equally likely events and, if out of this n, m such events are favorable ( m ≥ 0 and n ≥ m), then the probability of occurrence of any event E can be defined as
Let A and B are two events; A̅ is the complement of A, then.
Suppose two coins are to be tossed, then the probability of occurrences of the head or tail can be classified as:
When the probability of one event's occurrence depends on the probability of another event's occurrence, that scenario comes under statistical dependence.
If we have two events, A and B, then:
1. Conditional Probability is the probability of occurrence of an event A if event B has already occurred.
2. Joint Probability is the measure of two or more events happening at the same time. It can only be applied to situations where more than one observation can occur simultaneously, i.e., the probability of occurrence of event B at the same time when event A occurs.
3. Marginal Probability is obtained by summing up probabilities of all the joint events in which a simple event is involved.
If B1, B2, …, Bn are disjoint events and their union completes the entire sample space (i.e., they are mutually exhaustive), then the probability of occurrence of an event A will be
P (A) = P (A ∩ B1) + P (A ∩ B2) + · · · + P (A ∩ Bn)
This is one of the most famous theorems in probability and lies in the heart of the Naive Bayes algorithm in Machine Learning.
In the context of the above image,
P(chill):= Probability that you are chilling out.
P(Netflix):= Probability that you are watching Netflix.
P(chill/Netflix):= Probability that you are chilling while watching Netflix.
P(Netflix/chill):= Probability that you will watch Netflix while chilling out.
More formally, Let S be a sample space such that B1, B2, B3… Bn form the partitions of S and let A be an arbitrary event then,
𝑃(𝐵𝑖 ), 𝑖 = 1,2, …, 𝑛 are called the prior probabilities of occurrence of events.
𝑃(𝐵𝑘/A) is the posterior probability of 𝐵𝑘 when 𝐴 has already occurred.
Unlike algebraic variables, where the variable in an algebraic equation is unknown and calculated, random variables take on different values based on the outcomes of any random experiment. It is just a rule that assigns a number to each possible outcome of an experiment.
Mathematically, a random variable is defined as a real function (X or Y or Z) of the elements of a sample space S to a measurable space E, i.e.,
𝑋∶𝑆 →𝐸
In more layman language,
Random variables are of two types:
That's it for this article. There are some important concepts related to probability distribution function, mathematical expectations, and famous distribution functions about which we will discuss in part 2 of the probability theory blog.
In this article, we discussed the basics and most commonly used terminologies in probability and machine learning. We discussed things like deterministic and indeterministic probabilities, exclusive and exhaustive events, the definition of some famous terms, marginal, joint, and conditional probabilities, the famous Baye’s theorem, and in the last, we talked about the random variables. We hope you have enjoyed the article.
There are n+1 people at a party. They might or might not know each other names. There is one celebrity in the group, and the celebrity does not know anyone by their name. However, all the n people know that celebrity by name. You are given the list of people present at the party. And we can ask only one question from each one of them. “Do you know this name”? How many maximum numbers of questions do you need to ask to identify the actual celebrity?
A group of four people, who have one torch, need to cross a bridge at night. A maximum of two people can cross the bridge at one time, and any party that crosses (either one or two people) must have the torch with them. The torch must be walked back and forth and cannot be thrown. Person A takes 1 minute to cross the bridge, person B takes 2 minutes, person C takes 5 minutes, and person D takes 10 minutes. A pair must walk together at the rate of the slower person’s pace. Find the fastest way they can accomplish this task.
The summation formulas are used to calculate the sum of the sequence. In this blog, we have discussed the visual proofs: the sum of numbers from 1 to n (arithmetic series), the sum of infinite geometric series, the sum of squares of numbers from 1 to n, etc.