Normal Distributions

… Understanding that a distribution of measurements (‘data’) often follows what we call a ‘normal distribution’. The standard deviation tells us the distance from the mean which accounts for about 68% of all the measurements. The standard deviation is therefore a measure of the spread of data (like the Inter-Quartile range)

… Normal distribution for a continuous random variable, X~N(mean,variance)

… The total area under a normal distribution is always 1.

… When comparing distributions, always comment on the means and the standard deviations (the spread of the data).

… The number of standard deviations from the mean is also called the “Standard Score”, “sigma” or “z-score”.

… We can take any Normal Distribution and convert and x values to a z value on the Standard Normal Distribution.

This is done by converting any value to what is called a Standard Score (or “z-score”):

- first subtract the mean,
- then divide by the Standard Deviation

Doing this is called “Standardizing”:

… Draw and label two distributions, one above the other; one for the random variable X, and the other for the standardised score, Z.

… Calculating a Standardised Z score = (value – mean)/standard deviation

… Standardised Z scores help us to compare two different measurements, for e.g. Is machine A more accurate than machine B at making bottle tops?

… We can also use a Z-score to look up the area (probability) under the bell curve to the left of the z-score. To do this we use the ‘standard normal distribution table’ or your calculator (much easier!)

… Calculating a z score and then finding cumulative probabilities P(Z<z) from the normal distribution tables / calculator.

… If you are given a probability, you can use the inverse normal distribution on your calculator to find the standardised z score

… Usually answers are required to either 3 or 4 significant figures. Make sure that you use at least 4 sig figs during your calculations to avoid rounding errors.

… Note that P(X<x) is exactly the same thing as P(X≤x) because the variable X is continuous

… It’s worth noting that the “Normal PD” calculator function provides the probability DENSITY of the distribution – the height of the Normal distribution graph at a particular value of the variable. This can be useful for sketching the graph and for studying its properties. At A-level, we don’t really use this function.

Normal Distribution of the Sample Means

… If many means are calculated from a distribution using random samples of n data values for each mean, then the distribution of the sample means will be normal.

… The Distribution of the sample means is a much tighter distribution (with a much smaller σ) than the original.

So if:

X-bar ~ N(population average, σ²/n)

and n is the number of samples taken to calculate each mean value.

Then…

The new standard deviation of the sample means distribution is given by √(σ²/n)

… If you don’t know the population variance, σ², then you can use the sample mean s² as an unbiased estimate of σ² provided n is large (>30)

If n is not large, then the best unbiased estimate of σ² is given by:

σ² = n/(n-1) × s²

… The Central Limit Theorem states:

If the distribution of a random variable X is NOT normal, then the distribution of sample means (Xbar) will follow an approximately normal distribution:

Xbar ~ N(μ , σ²/n) … providing n is large (>30)

Poisson distributions

… The conditions necessary to model a discrete random variable as Poisson distribution are:

Events must occur singly and at random in a given interval of space or time.

Events must be independent of each other.

Events occur at a constant rate, so that the number of occurrences in the interval is proportional to the length of interval.

Lambda, the mean number of occurrences in the given interval is known.

… If X~Po(λ), then

The mean of X = population mean = E(X) = λ

and

Var(X) = σ^2 = λ

P(X = r) = e^(-λ) (λ^r) / r!

… Some ‘trigger words’ that will help you to identify a Poisoson distribution questions include: “rate”, “average number of occurrences” etc.

Binomial Distributions (success or fail over n trials)

… What are the 4 conditions necessary to model the probability distribution of a random variable X with a Binomial Distribution?

… The 4 conditions necessary to model the probability distribution of a random variable X with a Binomial Distribution are:

- A fixed number of trials
- Each trial should be a success or a failure
- The trials are independent
- The probability of success, p, at each trial is constant.

… X~B(n , p) means that the variable X is modelled as a binomial distribution with number of trials n, and probability of success in each trial p:

The equation for P(X = x) is given by:

P(X = x) = (number of arrangements of x successes in n trials) × (probability of success)^x × (probability of fail)^(n-x)

P(X = x) = nCx (p^x) ( q^(n-x) ) where q = 1-p

Note that the combination formula (nCr) is used here instead of the permutations formula because the successes are all similar and the failures are all similar, so we shouldn’t count S1 S2 as being different from S2 S1.

So if P(hitting bullseye) = 0.1 and we have 5 throws, then what is the probability of hitting the bull twice?

So we have

H H M M M

But this could be in any order, so what are the number of arrangements of 2 hits and 3 misses?

The number of arrangements works out to be 5C2 (or 5C3 because Pascal’s triangle is symmetric).

Why?

If the ‘objects’ were all different, we would have 5! arrangements (permutations)

However, 3 of them are the same (misses) and 2 of them are also the same (hits). So we must divide 5! by the number of ways of arranging 3 different objects (as we must not count these) and also by the number of ways of arranging 2 objects (as E must not count these either):

No. of arrangements = 5!/(3! 2!) = 5C2 = 10 arrangements.

So P(hitting the bull twice in 5 throws)

P(X=2) = 5C2 × 0.1^2 × 0.9^3 = 0.0729 = 7.3% chance.

… If you were looking for P(X >= 8) using the cumulative probability tables, how would you first modify this condition?

Continuous Random Variables

The Probability density function, f(x)

… A pdf is represented by f(x) and tells us the value of the ‘probability density’ as used in histograms, but for a continuous random variable, X.

… What are two conditions that must be satisfied for a pdf to be valid?

… Notation of a pdf using curly brackets, for example:

f(x) = {kx, 1<x<3

{0, otherwise

… Integrating a pdf between key limits and setting the result equal to 1 (the area) enables you to find values of unknown constants such as k.

The cumulative distribution function, F(x)

… F(x) = P(X <= x) = integral of f(x) between negative infinity and x

… The cdf is exactly like the cumulative probability tables used for normal and binomial distributions.

… Note that f(t) dt is sometimes used to distinguish between the limit ‘x’, and the variable being integrated.

… To find F(x) it can be easier to integrate f(x) and get an indefinite integral with a constant of integration, c.

To find c, set the indefinite integral = 1 using the ‘final’ value of x in the pdf, or you can also use another value of the probability at that x value (if the PDF is split up into different sections).

… Uniform distributions – do draw yourself a graph (rectangular box!) so you can find the height of the function. Remember that the area of the box must equal…?

… There are several useful equations in your formulae booklet – well worth becoming familiar with, for example:

Var(X) = (1/12)(b-a)^2

Normal approximations

… A binomial distribution X~B(n,p) can be approximated to a normal distribution if n is large (>20) and p is close to 0.5 (so that the binomial distribution is approximately symmetric).

Alternatively, the conditions np>10 AND n(1-p)>10 must be met.

So X can then be approximated as:

Y ~ N ( np, np(1-p) ) where np is the mean and np(1-p) is the variance.

… A Poisson distribution where X ~ Po(λ) can be approximated to a normal distribution if the mean, λ, is large (>10) and a continuity correction is performed. So we get:

Y ~ N(λ,λ) … because the variance of a Poisson distribution = λ (the mean

… A binomial distribution can be modelled as a Poisson distribution if n is large (>50) and p is small (<0.1), so

X~B(n,p) is then approximately equal to Po(np)

The reason for this is because variance = np(1-p) for a binomial distribution, so if p is small then the variance is approximately equal to np, the mean (lambda), which is the condition needed for a discrete random variable to be modelled as a Poisson distribution.

… A continuity correction is required when approximating a discrete random variable,X (such as binomial or Poisson distribution) into a continuous random variable, Y (such as a normal distribution)

If P(X ≤ n) then P(Y< n+0.5) If P(X ≥ n) then P(Y > n-0.5)

Note that P(Y ≤ y) and P(Y < y) mean exactly the same thing for a continuous distribution.

Hypothesis Testing

… An hypothesis is a statement made about a population parameter (for example, a probability or a mean). We can test this parameter by collecting evidence from a sample (a test statistic)

… The null hypothesis (Ho) is the hypothesis we assumed to be correct unless proved otherwise. For example

For a binomial distribution we might test Ho:p = 0.4,

or for a normal distribution, we might test Ho:μ = 15 millimetres

(The Ho null hypothesis will usually contain the ‘equals’ sign)

… Hypothesis tests can be used for any distribution, e.g. binomial, normal or Poisson.

… The alternative hypothesis (H1) tells us what to expect about the value of the population parameter if Ho is rejected. For example,

H1:p > 0.4

or

H1:μ ≠ 15 millimetres

… The critical REGION is the range of values of the random variable X that would lead to rejecting H0.

… The critical VALUES are the critical region boundary values.

… The level of significance (alpha, α) is the ‘testing level’ which defines the size of the critical region(s). Alpha is usually either 1% (very unlikely) or 5% (unlikely). For example:

Reject Ho if P(X ≥ x) ≤ 0.05 for an upper one-tailed test at α = 5%

or

Reject Ho if P(X ≤ x) is ≤ 0.005 or if P(X ≥ x) is ≤ 0.005 for a two-tailed test at α = 1%

… A one-tailed test looks to see if a parameter is lower than some stated value. A one-tailed test could also be looking to see if a parameter is higher than a stated value. So these have a SINGLE critical region and critical value.

… A two-tailed test looks to see if a parameter is either higher OR lower than some proposed value and so has TWO critical regions and TWO critical values.

… Here are some steps to answer hypothesis questions:

- Define the random variable and write down its distribution model, e.g. X ~ B(20,0.2)
- identify the population parameter, either a probability (for a binomial distribution) or a mean (for a normal distribution).
- Write down the null and alternative hypotheses Ho and H1.
- Specify the significance level, alpha.

Then EITHER:

- find P(X ≤ observed value) or P(X ≥ observed value) given that the distribution model (Ho) is true.
- is this probability less than / greater than the significance level?

(Note that if you have a two-tailed test, then check if the probability is less than HALF the significance level).

OR

- Identify the critical value(s) of the test statistic using probability tables or formulae. Note that for binomial distributions the critical value will often fall between two integer values.
- Find out if the observed value of the test statistic falls in the critical region.

It can be useful to note that if |observed value| > |critical value| then we would reject Ho

And FINALLY…

- State a conclusion: is there enough evidence to accept or reject H0? Accept H1? Explain this in the context of the problem.

… Note that when stating H0 or H1 that you use population parameters (probability or mean), NOT the test statistic (random variable, e.g. X). For example “H0: p = 0.25”

A hypothesis test example…

A driving instructor claims that 60% of his pupils pass their driving test at the first attempt. In a recent survey of 20 pupils, 14 passed first time. Has the driving instructor unstated his success rate? Use a significance level of α = 5%.

Our random variable X = number of students who pass first time

X ~ B(20,p)

Ho:p = 0.6

It’s useful to note that we would expect 0.6 × 20 = 12 people to pass first time.

H1:p > 0.6

We are testing at the α = 5% = 0.05 significance level.

Reject Ho if P( X≥14 ) ≤ 0.05

This is a “one-tailed test”:

P( X≥14 ) = 1 – P( X≤13) = 1 – 0.7499 = 0.2501

0.2501 > 0.05 so this result is NOT significant.

The probability of observing 14 OR MORE pupils who have passed first time (out of 20) is less than 5%, therefore there is insufficient evidence to suggest that the driving instructor is understating his success rate.

Alternatively, we can identify the critical region by finding the critical value of X for which:

P(X ≥ x) = 0.05

Using the List option in the calculator’s Binomial CD function:

P(X ≤ 15) = 0.9490, so P(X ≥ 16) = 1 – 0.9490 = 0.0510, which is not significant

P(X ≤ 16) = 0.9840, so P(X ≥ 17) = 1 – 0.9840 = 0.0160, which IS significant

And so the critical value is X = 17 and the critical region is X ≥ 17 (when Ho should be rejected).

As the observed value x = 14 < 17, there is insufficient evidence to suggest that the driving instructor is understating his success rate,and so we should not reject Ho.

Chi-squared – testing for independence

… Chi-squared tests are all about the differences from the expected value.

… A contingency table can show us the observed frequencies of two variables each containing sub-categories, for example:

Type of soil:

Rate of growth acidic | neutral | alkaline | Totals

slow

average

fast

Totals

… The expected values can be found from the probability of two independent events happening (test for independence). Using the row and column totals, we can calculate the expected values for each cell using ratios, assuming that the two variables are INDEPENDENT.

For example, the expected value for the above cell for acidic/slow growth would be:

= (total acidic / grand total ) × total slow

… Expected values could also be obtained from other distributions a model, e,g, a Poisson, Binomial or Normal distribution.

… Null Hypothesis, Ho: state that the variables are independent, e.g. No effect on each other.

… The alternative hypothesis, H1: there is some association between the variables

… Find the degrees of freedom, ν, using (rows-1) x (columns-1)

… Find the Chi-squared (X²) critical value using tables (significance level, p, and degrees of freedom, ν)

… Find the X² test statistic by summing (O -E)²/E in a table

… Note that there IS a correction for 2 by 2 tables: find X-squared statistic using:

(| O – E | – 0.5)²) / E

… Do make sure you clearly state Ho, H1, the rejection criterion, the working and finally the conclusion both in maths and in words, for example:

“2.465 < 7.544 (the critical value) which is not significant at the 5% significance level, therefore do not reject Ho. There is no reason to believe that there is an association between plant growth rate and soil type.”

… Each expected value in a contingency table must be > 5 for the Chi-Squared approximation to be valid (i.e. Χ² ~ χ²).

Discrete random variables

… The expectation is the mean value.

… Probability distributions.

The expectation is the expected mean of a random variable X:

E(X) = ΣnP(X=x)

To find the expectation E(X) multiply each probability by the corresponding score, then add the results. A table with rows of x and P(X = x) is useful here.

Var(X) = E(X²) – E(X)². Find the expectation of the squares and subtract E(X) squared… However, note that you may need to correct this by multiplying by (n/n-1) to get the bets unbiased estimate of a population if you data is from a sample.

Populations and Samples

Definitions

… A POPULATION is a collection of individual items.

… A SAMPLE is a selection of individual items from a population.

… A FINITE population is one in which each individual member can be given a number.

… An INFINITE population is one in which it is impossible to number each member.

… A sampling UNIT is an individual member of the population.

… A sampling FRAME is a list of sampling units used in practice to represent a population.

… A STATISTIC is a quantity calculated only from the observations in a sample.

… A statistic has a sampling distribution that is made up of all the possible values of the statistic and the probability of each one occurring.

Sampling Distributions

To find the sampling distribution of a statistic such as the population mean or median, follow these steps:

1) Write down a table of the probability distribution of the variable.

For example, let’s say we want to find the sampling distribution of the median, M, for a population given by:

x: 5 10

P(X=x) 0.25 0.75

2) Write out all the possible sample permutations for the given sample size. For example, if we are taking a sample of 3 from the population:

(5,5,5) (5,5,10) (5,10,5) (10,5,5) … median = 5

(10,10,10) (10,10,5) (10,5,10) (5,10,10) … median = 10

3) Define a and b (from the population distribution):

a = P(5) = 0.25 … and… b = P(10) = 0.75

4) Calculate the probabilities of observing each median value (spot the patterns to make the calculations easier where possible!)

P(M = 5) = aaa + aab + aba + baa = (0.25)³ + 3(0.25)²(0.75) = 5/32

P(M = 10) = bbb + bba + bab + abb = (0.75)³ + 3(0.75)²(0.25) = 27/32

5) Write down the sampling distribution for M in table form

m: 5 10

P(M=m) 5/32 27/32