Beta Distribution

What is Beta Distribution?

The Beta distribution is one kind of probability distribution on probabilities which typically models an ancestry of probabilities. Beta curve distribution is considered to be a versatile, resourceful way to describe outcomes for proportions or percentages. Since the Beta distribution represents a probability, its domain is bounded between 0 and 1. For example, how likely is it that Vladimir Putin will win the next Presidential election of Russia? While you might guess the probability is 0.25. Your classmate might think it’s 0.2. But, the beta distribution provides you a way to explain this.

(images will be uploaded soon)

Clear off the Confusion with Beta Distribution Function

One reason that the function of beta distribution causes confusion is there are 3 “Betas” to encounter with in the mathematical tomfoolery, and they all are individualistic having different meanings:

  1. β: The designation of the second shape parameter in the probability density function

  2. Beta (α, β): The designation of the probability distribution (pdf).

  3. B (α, β): The designation of a function in the denominator of the probability density function. This performs a function of a “normalizing constant” to make sure that the area beneath the curve of the pdf equals 1.

P.S: Don't get jumbled up with all those betas. In (typical), beta distribution of the first kind is another name for the basic beta distribution, while Beta distribution of the second kind is also called the beta prime distribution.

(image will be uploaded soon)

Examples of Beta Distribution

For example, we can use beta probability distribution to represent the probabilities:

ü  how likely audiences are to rate a new movie release

ü  the Click-Through Rate (proportion of visitors) of your website

ü   the conversion rate of buyers actually buying on your website,

ü  how likely is the survival chance for a person with blood cancer and so on.

Beta Distribution Formula

A standard formula for the pdf of the beta distribution is:-


Where, p and q stands for the shape parameters,

‘a’ and ‘b’ denotes the lower and upper bounds, respectively, of the distribution,

B (p,q) is the beta function. The beta function also has the formula


An event where a = 0 and b = 1 is known as the standard beta distribution. The mathematical equation for the standard beta distribution is;

F (x) = xp−1 (1−x) q−1B (p,q) 0≤x≤1;p,q>0

In general, we describe the usual form of a distribution with regard to location and scale parameters. The beta is somewhat different in that we describe the usual distribution concerning the upper and lower bounds. That being said, the location and scale parameters can be defined in terms of the lower and upper limits as follows:

Location = a

Scale = b - a


Applications of the Beta Density Function

The beta distribution is put to use in a number of applications, including the Rule of Succession, Bayesian hypothesis testing (an alternative to null hypothesis significance testing), and Task duration modeling. The beta distribution is specifically appropriate to project/planning control systems like PERT and CPM mainly due to the reason that the function is contrived by an interval with a maximum (1) value and minimum (0) value.

Science Behind the Beta Shapes

Did you ever ask yourself why Beta (2,2) is bell-shaped Or why Beta(0.5, 0.5) is U-shaped?  If you have in consideration of α-1 as the number of victories and β-1 as the number of defeats, Beta(2,2) signifies you got 1 victory and 1 defeat. Thus, it would be reasonable to state that the probability of success is highest at 0.5.

Also, Beta (1, 1) would signify you attain zero for the head & zero for the tail. Then, your prediction about the probability of victory should be the same all-through [0, 1]. That verification is received from the horizontal straight line.

(image will be uploaded soon)

Solved Examples


Case: Probability of Probability

What will be the probability of someone to agree to go on for a movie outing with you follows a Beta distribution with α = 2 and β = 8. Find out if the probability of your success rate will be > 50%?

Applying the formula

P(X>0.5) = 1- CDF (0.5) = 0.01953

Unfortunately the outcome is very low.

Example 2:

A treasure hunt gameplay affirms that at least 1 out of every 10 candidate wins. Of the last 500 treasure coupons  sold, 37 were winners. Depending on this sample, calculate the probability that the sponsor’s claim is true: specifically candidate's have at least a 10% probability of purchasing a winning coupon?


Using the  cumulative beta distribution function as follows:

β (.1, 37, 463, TRUE) = 98.1%

This simply denotes that the sponsor’s claim is false (i.e. less than 10% probability of success). The probability that the sponsor's affirmation is true is only {100% – 98.1%} = 1.9%.

Did You Know

  • Scientist Dr. Bognar at the University of Iowa devised the calculator for Beta distribution.

  • Different values of α and β help you envision how the shape of beta curve distribution changes.

  • You can even simplify the beta function using the gamma function.

FAQ (Frequently Asked Questions)

1. What Makes Beta Distribution Useful?

If we only look for the probability distribution to represent the probability, any arbitrary distribution over (0, 1) would work in order. And making one must be easy. Simply lay hold on a function that doesn’t go boom anywhere between 0 and 1 and remains positive, then integrate it from 0 to 1, and just divide the function with that outcome. You just obtain a probability distribution that can be directly employed to represent the probability. In such a case, why ask you to stand firm and use the beta distribution over the arbitrary probability distribution.

What Makes Beta Distribution so Special?

The Beta distribution is the conjugate before the binomial, Bernoulli, negative binomial and geometric distributions (appears like those are the distributions that include success & failure) in Bayesian hypothesizing.

Calculating a posterior taking into consideration a conjugate before is quite convenient. This is because you can skip off immoderate differential arithmetic computation involved in Bayesian Inference.

As a machine learning scientist, your model is never complete. You need to keep updating your model as more data flows in (and that’s why we insist on the use of Bayesian Inference).

The computation in Bayesian Inference can turn out to be very difficult or even unmanageable. But if we could execute the closed-form beta formula with the conjugate prior, the computation becomes a walkover.