 # Statistics Formula Sheet – Your answers to all stats homework questions

Whether you’re studying for a stats exam or just curious about data around, a statistics formula sheet can help you solve all the questions as well as collect, organize, analyze and interpret the meaning behind the data .

As per linkedin data, in 2014 Statistical analysis was the hottest skill that got people hired. In 2022 Statisticians play a major role in how we see and consume information in our day to day life. For example: if you are watching a YouTube video and got recommendations for other videos then statistics was used to make those recommendations. From agriculture to climate change, we rely on statistics to make decisions based on data and patterns.

## So let’s first start by defining – Statistics

Statistics in simple terms means learning from data and the main purpose of statistics is to use a sample to make accurate statements or inferences about a population.

So as suggested above, there are two possible ways of analyzing data:

Descriptive statistics deals with accurate statements based on data available. One can find common use of mean and standard deviations in descriptive stats.
Inferential statistics deals with conclusions drawn from data based on sampling etc. So common formals will be regression analysis and correlations.

Listed below are most common and important formulas under descriptive & inferential statistics. But before we get there, let’s also get familiar with the five main and basic words you will encounter in statistics class or exam. The five key terms are: population, sample, parameter, statistic, and variable form the basic vocabulary of statistics.

Population defines the members of a group about which you want to draw a conclusion
Sample defines the part of the population selected for analysis.
Parameter defines a numerical measure that describes a characteristic of a population
Statistic defines a numerical measure that describes a characteristic of a sample.
Variable defines a characteristic of an item or an individual that will be analyzed using statistics.

## MAIN FORMULAS USED IN STATISTICS:

Use this statistics formula sheet for solving all statistical problems but if you get stuck and need help, then our are just a click away and you can easily upload your homework assignment and let our experts help you.

### Descriptive Statistics

Σ = sum
N = population size
n = sample size
f = frequency
w = weight

Sample mean: x = (x)n

Population mean:μ=xN

Weighted mean: x = (w*x)w

Mean for frequency table:x = (f*x)f

Median:
Ungrouped data
Median = (n+12)th term , where ‘n’ is odd.
Median = ((n2)th term +(n2+1)th term2) , where ‘n’ is even.
Grouped data
Median = l +n2 – cffh , where l =lower limit of median class
n = number of observations
cf = cumulative frequency of class preceding
median class
f = frequency of median class & h = class size

Mode (ungrouped data): the most commonly occurring data set’s value
Mode (grouped data) = L +h(fm – f1)(fm – f1) +(fm – f2) , where L= lower limit of modal class
h = class size
fm= frequency of modal class
f1=frequency of the class preceding the modal class
f2 = frequency of the class succeeding the modal class

Range = Maximum – Minimum

Mid range = Maximum – Minimum2

Class size = Highest value – Lowest valuenumber class

Class limit = Upper limit – Lower limit2
Sample standard deviation: s =(x-x)2n-1
Population standard deviation: = (x-μ)2N
Sample variance = s2=(x-x)2n-1
Population variance = 2=(x-μ)2N
Limits for Unusual Data : Below:-2 & Above: +2

Empirical Rule:
About 95% = -2 to +2
About 99.7% = -3 to +3

Quantiles(Q):
First Quartile(Q1) = ((n + 1)/4)th Term
Second Quartile(Q2) = ((n + 1)/2)th Term
Third Quartile(Q3) = (3(n + 1)/4)th Term
Interquartile Range: IQR =Q3-Q1
Limits : Lower limit = IQR – 1.5 Q1
Upper limit = IQR + 1.5 Q3

Sample coefficient of variation: CV = s x*100%
Population coefficient of variation: CV = *100%
Sample standard deviation for frequency table:s= n(f(x2))-((f(x))2n(n-1)
Sample z-score: z= x-xs
Population z-score: z=x-

### Probability:

P(A)=number of times A occuredtotal number of trials
P(A or B) =P(A)+P(B), if two events A and B are mutually exclusive events.
P(A or B) = P(A) +P(B)-P(A and B), if A and B are not mutually exclusive events.
P(A and B) = P(A)*P(B),if A and B are independent.
P(A and B) = P(A)*P(B|A), if A and B are dependent.
Conditional probability: P(A|B)=P(A∩B) P(B)
Bayes Theorem:P(A|B)=P(B|A)*P(A) P(B)
Permutation : nPr= n!(n-r)!
Combination: nCr= n!r!(n-r)!

### Discrete Probability Distributions:

Mean, =(x*P(x))
Variance, 2=[(x2*P(x)) -2
Standard deviation,=2=[(x2*P(x)) -2

Binomial Distributions
n=number of trials
x=number of successes
p= probability of success
q= probability of failure,q=p-1

Binomial probability distribution: P(x)=nCx*pxqn-x
Mean,=np
Variance,2=npq

Poisson Distributions:
x=number of successes
= mean number of successes (over a given interval)
Poisson probability distribution,P(x)= x*e- x!
Mean,=
Variance,2=

Normal Distributions:
Score: x=z+
Z-score: z= x-
Mean of distribution:x=
Standard deviation of x distribution:x=n
Standard score for x : z= x -/n
Confidence Interval: C.I.= xz*(n)
for proportions (p):
C.I. = pz2*p(1-p) n

Chi-Square (2): 2=(O – E)2E ; E=(row total)*(column total)sample size
Degree of freedom,d.f.=(R-1)(C-1) (Test of independence)
Degree of freedom,d.f.=(no. of categories -1) (Goodness of fit)

### Regression and Correlation:

Correlation = cov(x,y)xy= (xi-x)(yi-y)(xi-x)2(yi-y)2
Correlation, r =nxy-(x)(y)n(x2)-(x)2 n(y2)-(y)2

r = -1 to 1
-1 for negative correlation
0 for no correlation
1 for positive correlation

Coefficient of determination = r2
Standard error of estimate,se=(y-y)2n-2
or, se=y2-b0y-b1xyn-2

Regression Line (Least-Squares Line or Line of Best Fit)
y=b0+b1x, where b0= y-intercept
b1= slope
b1=nxy -(x)(y) n(x2) – (x)2 or, b1=rsysx
and,
b0=y-b1x

Now let’s solve a few questions using the above discussed formulas:

Let’s try a problem using mean.

As an example, to find the mean of 6, 18, and 24, you would first add them together.

6 + 18 + 24 = 48

Then, divide by how many numbers in the list (3).

48 / 3 = 16

The mean is 16.

Statistics Question

Sample Mean:

Calculate the sample mean of 8 students’ maths marks on their weekly test.

 Number of student 1 2 3 4 5 6 7 8 Marks Scored (X) 45 57 63 44 33 87 82 95

Solution: Thus, the sample mean of the marks scored by students is 63.25.

Population mean:

Find the population mean of the given data

73,84,71,62,67,69,88 and 90?

Solution:

Total population (N) = 8

 i 1 2 3 4 5 6 7 8 X 73 84 71 62 67 69 88 90 Thus, the population mean is 75.5 .

Weighted mean:

Men make up 65 percent of members at a fitness club, while women make up 35 percent.What is the average age of all the members if the average age of the males is 30 and the average age of the women is 40?

Solution: Mean for frequency table:

The Anand Vihar colony’s children’s ages have been recorded. What is the average age of the colony’s children?

 Ages in year frequency 12 4 13 1 14 5 15 3 17 8

Solution: Find the data’s mean, median, mode, range, maximum, and minimum values?

11,19,21,10,15,10,10,16,15,14

Solution: Mode = the most commonly occurring data set’s value = 10

Maximum = 21

Minimum = 11

Range =Maximum – Minimum = 21-11 = 10

### Sample variance & Sample standard deviation:

Calculate the sample variance and standard deviation of the sample of 8 students’ maths marks on their weekly test.

 Number of student 1 2 3 4 5 6 7 8 Marks Scored (X) 45 57 63 44 33 87 82 95

Solution:  ### population variance and population standard deviation:

Find the population variance and population standard deviation of the given data?

73,84,71,62,67,69,88 and 90?

Solution:

 i 1 2 3 4 5 6 7 8 X 73 84 71 62 67 69 88 90  ### Empirical Rule:

What is the probability that a randomly chosen place will have a snow depth between 2.25 and 2.75 inches if the depth of the snow in my yard is regularly distributed, with μ = 2.5′′ and =.25′′?

Solution:

Using empirical rule:

µ= 2.5”

σ=.25′′

µ± σ=2.5”± 0.25”

µ-σ = 2.5” – 0.25” =2.25” = 34% area under curve

µ+σ = 2.5” + 0.25” =2.75” = 34% area under curve

So,the area of snow depth between 2.25’’ and 2.75’’ is 34%+34% = 68%

And its probability is 0.68.

What proportion of the values are less than -145, given a standard deviation of 113 and a mean of 81?

Solution:

µ =81

σ= 113

µ- = 81-2*113 =-145

So, area under below  the curve -145       =(100 – 50-34-13.5)%

= (100 – 97.5)%=2.5%

2.5% area comes less than -145.

Quartile, Interquartile range and also its IQR limits

Find each quartile, interquartile range and also its IQR limits of the given data?

25,32,49,21,37,43,27,45,31

Solution:

Rearranging the data:

21,25,27,31,32,37,43,45,49

n= 9

First Quartile(Q1) = ((n + 1)/4)th term

= ((9+1)/4) term = 2.5 th term

i.e., mean value of 2nd and 3rd term

So, Q1 = (25+27)/2 = 52/2 = 26

Second Quantile (Q2) = (n+1)/2  term = (9+1)/2 = 5 th term

Q2 (Median) = 32

Third Quantile (Q3) = (3(n + 1)/4)th term

= (3(9+1)/4) term = 7.5 th term

i.e., mean of 7th and 8th term

So, Q3 = (43+45)/2 = 44

Interquartile range, IQR = Q3 – Q1 = 44 – 26 = 18

Lower limit = IQR – 1.5 Q1 = 18 – 1.526 = -21

Upper limit = IQR + 1.5 Q3 = 18 +1.5 44 = 84

Coefficient of Variation:

Find the coefficient of variation, if standard deviation is 6.7 and mean is 13.8?

Solution:

Standard deviation,    σ= 6.7

Mean, µ = 13.8

Therefore, So, the coefficient of variation is 108.8%.

z-score:

At 60 airports within a region, the temperature is recorded. The average temperature is 67°F, with a 5°F standard deviation. What is the z-score for 68°F? Thus, 0.2 is the z-score for  68°F.

What is the average length of book pages in the library, with a standard deviation of 100 pages and a z-score of 0.56 for an 80-page book?

Solution: Thus the average length of book pages is 77(approximately).

Probability:

1. Around a circular table, sixteen people sit.What are the chances of two people sitting at the same table?

Solution:

Total people = 16

Number of ways 16 people sit = 15! ways

Number of ways two people sit together = 14! x 2! ways

Therefore, A, B, and C are given a problem with 2/7, 4/7, and 4/9 possibilities of answering it, respectively.How likely is it that the issue will be resolved?

Solution: 2. Tickets with numbers ranging from 21 to 30 are mixed together, and a ticket is chosen at random. What is the chance that the ticket drawn contains a number that is a multiple of three or five?

Solution:

Total event = 10

Number of events that contain 3 or 5 = {21,24,25,27,30}=5

Probability of drawing number 3 or 5 = 5/10=1/2

3. You make a purchase of a specific item. According to the instructions, the product’s lifetime T, which is defined as the number of years the device functions well before breaking down, satisfies, I bought the product and used it for two years without any issues. What are the chances that it would fail in the third year?

Solution:

Assume A is the third-year failure of a purchased product.Let B represent the case where a purchased product does not fail during the first two years.

We have to find P(A|B), 4. A, B, and C are three events, with (A and C) and (B and C) being independent and A and C being disjoint.If P(AC)=2/3, P(BC)=3/4 and P(ABC) =11/12

Find P(A), P(B), and P(C)?

Solution:

Let S be the sample space.  5. A, B, and C, respectively, supply 40%, 25%, and 35% of a school’s notebooks.In the past, these firms’ notebooks have been found to be defective in 2%, 5%, and 4% of cases, respectively.What is the probability that a notebook was given by A if it was discovered to be defective?

Solution:Let A, B, and C be the events for which A, B, and C, respectively, offer notebooks.Let D be the case where the notebook is defective.

Then,

P(A)= 0.4, P(B) = 0.25 and P(C) = 0.35

Also, P(D|A) = 0.02,P(D|B) = 0.05, P(D|C) =0.04 6. Without repeating the digits, find the sum of all four-digit numbers that may be created from the digits 1, 3, 5, 7, and 9?

Solution: The digits are 1, 3, 5, 7, and 9.

n-1Pr-1(Sum of all n digits) (1111… r times) is the sum of r digit numbers.

The number of non-zero digits is denoted by N.

n=5 and r=4 in this case.

4P3(1+3+5+7+9)(1111)=

Thus, 666600 is the sum of four-digit numbers.

1. If a plane has 30 points, how many triangles may these points create if 6 of them are collinear?

Solution:

Number of points in plane n = 30.

Number of collinear points m = 6.

Number of triangles from by joining n points of which m are collinear

= nC3mC3

Therefore the number of triangles = 30C36C3 = 4060 – 20= 4040.

Discrete Probability Distributions:

1. If E(X) = 6 and E(Y) = 8. Find E(2Y – X)?

Solution: E(2Y – X) = E(2Y) – E(X)

= 2E(Y) – E(X)

= 2*8 – 6 =16-6=10

Thus, E(2Y – X) = 10.

2. What is the Standard Deviation of a Binomial Distribution with p, q, and n as the chance of success, failure, and number of trials, respectively?

Solution: Variance of Binomial distribution = npq

So, Standard deviation = √Variance =√npq.

3. Calculate the likelihood of at least three earthquakes occurring in the next two weeks? 4. A filling machine is used by a bottling company to fill plastic bottles with orange juice. The bottles are designed to hold 200 ml of liquid. The contents, in reality, follow a normal distribution with a mean of 198ml and a standard deviation of 2 ml. What are the chances that a single bottle contains more than 201ml? 5. Consider a collection of 18 random samples drawn from a normal distribution. We square each sample and add the squares together. How many degrees of freedom does a Chi Square distribution have?

Solution: The number of standard normal derivatives or samples equals the degrees of freedom in the Chi Square Distribution.

There are a total of 18 standard normal derivatives in this example.

As a result, the Chi Square distribution has 18 degrees of freedom.

1. The sample mean was 22.5 and the sample standard deviation was 4.4 in a simple random sample of 56 people.(To one decimal place, round your answers.)
1. a) For the sample mean, create a 90 percent confidence interval.
2. b) For the sample mean, create a 95 percent confidence interval.
3. c) For the sample mean, create a 99 percent confidence interval.

Solution:Given,

Mean,‾x = 22.5 and S.D.,s= 4.4, n= 56

a. 90% confidence Interval

The z-value for α/2=0.05 confidence level is1.64 So, 90% confidence interval is 21.5<µ< 23.5

b. 95% confidence interval

The z-value for α/2=0.025 confidence level is1.96 So, 95% confidence interval is 21.4 <µ< 23.7

c. 99% confidence interval

The z-value for α/2=0.005 confidence level is 2.58 So, 95% confidence interval is 21.0 <µ< 24.0

Regression and Correlation:

3X+2Y=26 and 6X+3Y=30 are the two regression lines.What is the correlation coefficient?

Solution: Let the regression equation of Y on X is

3X + 2Y =26

2Y = -3X+26

Y = -1.5X +13 Let the regression equation of Y on X is 6X+3Y=30

6X =-3Y +30

X = -0.5Y +5 [ since both regression coefficient is negative so we take r as negative ]

=-0.866025

Thus, the correlation coefficient is -0.86602, which implies both lines are strongly negatively correlated.