Regression Analysis: Explanation, Types, and Formula

Regression Analysis: Explanation, Types, and Formula. The term Regression was introduced by “Sir Francis Galton” to describe the Phenomenon which he observed about the relationship between the heights of children and their parents.

Today, used in quite a different sense. It investigates the dependence of one variable (called dependent variable) on one or more other variables (called independent variables) and provides an equation to be used for estimating or predicting the average value of the dependent variable from known values of an independent variable.

The dependent variable is assumed to be a random variable while independent variables are assumed to have fixed values.

When we study the dependence of a variable on a single independent variable, it is called simple or two-variable regression.

When the dependence of a variable on two or more than two independent variables is studied it is called multiple regression.

Furthermore, when dependence is represented by a straight line equation is said to be linear, otherwise, it is said to be non-linear.

Explanation of Sample Regression Analysis

Sample regression analysis, in most practical problems we never have data about population, but what we have is often sample data i.e. we have a sample of Y value corresponding to some fixed value of X.

Now the question arises, can we predict the average value of dependent variable Y from the sample data? or can we estimate the P.R.F. from sample data?

The answer is No, but what we can do is that we can construct a sample regression function (SRF) by using the tools of econometrics that best represent the (P.R.F) based on the limited information of the sample data.

Sample Regression
Sample Regression Diagram

Regression Analysis: Explanation, Types, and Formula

Sample Regression Diagram Explanation

In the sample, we have just one value of dependent variable Y corresponding to each Y fixed value of the independent variable X as shown in the diagram. We can also draw a line through our scatter diagram called SRL.

The equation for this regression line is as below:

Ŷi = β1 + β2Xi ________ β1 = estimator of β1 & β2 = estimator of β2

The sample regression function can be written as

Ŷi = Ŷi +µi

Ŷi = β12X1+µi

Where Yi is like  E(Ŷ/X1) also called estimated value and Ŷi is the individual value or actual value.

From one sample we can draw a number of lines that represent the P.R.L. But only one out of them will be the best one. To find the line out we use a statistic technique called.


Two-Variable regression means a population or statistical population is a collection of all possible observations whether finite or infinite about some characteristic of interest.

For example, if we collect the data about the age of students in our college and we write down the age of all students then this is population data about the age of college students.

It should be noted that the number obtained from the observations made in the population is called the size of the population.

The two variable relationship is given as Y = f(x)

Y = dependent variable  X = independent variable.

Moreover, we assume that X has fixed values, while is a random variable that justifies that given a single value of X, the variable Y can assume a different value that is variable Y has a complete distribution of every value of X.

This can be shown below:


two variable regression
two variable regression

The points corresponding to X1 show the different values of Y for a given value of X=X1. Then what a population regression function implies?

Population regression curve is simply a locus of points showing the conditional mean or expected value of the dependent variable(s). As the line in the above diagram passes through the mean values of Y.

If instead, Y has distribution curves corresponding to every value of X then the curve will look like in the diagram. Where we are assuming a linear relationship between the mean value of Y and the value of X.

It can also be written in the form of equality.

E (Y/Xi) = β1 + β2Xi ———– PR Line

Where β1 & β2 are unknown fixed parameters called regression coefficients. β1 = intercept, β2 = slope-coefficient and equation is known as Population Regression Line.

two-variable regression
two-variable regression

But if we want to estimate the individual value of dependent variable Y instead of mean value E(y) then the above equation is modified as

Yi = E(Y/Xi) +µi

Ŷi = β12Xi+µi  P.R Function

where µi is the deviation of individual value from its mean value.

µi = Yi – E(Yi/X)

This µi is also called stochastic disturbance or stochastic error term.

Leave a Comment

%d bloggers like this: