Maths GCSE – Statistics and Probability Notes

Estimating the mean from a Grouped Frequency Table

… For this you will always need to write in two more columns:
the “Mid-value” of each group, and
The “Mid value x Frequency”.
Then add up the numbers in the last column and divide by the number of people (the total frequency, NOT the number of rows!)

… Finding the class interval which contains the median from a Grouped Frequency Table:

  • Class intervals are the groups usually shown on the left column of the table, for example:
    “20 < x ≤ 30”
  • A class interval is the possible range of values for a certain number (frequency) of people or things.
  • Add up the total number of items (the total frequency)
  • Divide this by 2 and round up if necessary. This is the median (middle) number that you are looking for.
  • Count up the frequency in each class (like you would do for cumulative frequency) and see in which class the median number would lie.

Cumulative Frequency Graphs

… Plotting a cumulative frequency graph from frequency data by adding an extra “Cumulative Frequency” column. Plot each cumulative frequency point at the UPPER end of each group.

… Note that the beginning of the curve can go to zero on the y axis but only to the lowest number in the bottom class on the x-axis. E.g. If the first class is 15 < h < 25 for a cumulative a frequency of 6, then you would plot the first point at (25, 6) with the line starting at (15, 0).

… Using the cumulative frequency curve to find the median, upper quartile and lower quartile.

… Using the cumulative frequency graph to answer questions such as,”how many people scored more (or less) than 70 marks in the test?”. Take care with these! Once you have drawn the appropriate line (up from the x axis and across to the y axis) onto your graph, think carefully about whether you need to count the people above the line, or below the line!

… How the upper quartile and lower quartile can show us how spread out the data is. The distance from the upper quartile to the lower quartile is called the …?… …?…

… Plotting a “box and whiskers” from a cumulative frequency curve. The ‘whiskers’ represent the maximum and minimum (range) of data.

… When comparing data from a cumulative frequency graph, make sure you have first drawn the box and whisker plots. Then, you can compare:

  • medians
  • interquartile ranges (range = largest – smallest)
  • ranges (largest – smallest)
    Use numerical data wherever possible and then explain what this means in reality. For example, if the IQ range of group A’s tomatoes is larger than B, then “group A’s tomatoes are more variable in size ON AVERAGE, compared to group B”

Note that the Interquartile range describes the spread of data, so a very small IQ range means more consistent results.


Percentiles, Quartiles and Deciles

… A Percentile is the value below which a percentage of data falls.
For example, if you are the 4th tallest person in a group of 20 people…
Then 16 people would be below you in height.
(16/20) × 100% = 80%
So you are at the 80th percentile

… Deciles split the data up into 10% chunks. So in the above example you would be at the 8th decile.

… Quartiles split the data up into 25% chunks.

  • The lower quartile is the 25th percentile.
  • The median is at the 50th percentile.
  • The upper quartile is the 75th percentile.

… For small amounts of data, say n values, write the data in order of size from lowest to highest.
Then use (n+1) * percentile to find the position at which the percentile will lie. If you land in between two values, take the mid-value.
For example
0 , 2 , 2, 3 , 5 , 6 , 6 , 9
Estimate the 80th percentile

  • Here we have 8 values, so the 80th percentile will lie at (8+1) × 0.80 = 7.2th position.
  • this lies between the 7th and the 8th value (6 and 9), so estimate the mid value = (6+9)/2 = 7.5

Note that the 7.2th position is very close to the 7th data value and so our value of 7.5 is probably a bit high (it should be just above 6)… There is a more accurate method called ‘interpolation’, which is covered at A-level :).

… Sometimes you will often need to plot a cumulative frequency graph to estimate percentiles.

For example:
Shopping time/min       No. of people        Cumulative frequency
      10-20                                  14                                  14
      20-25                                  36                                  50
      25-30                                  15                                  65
      30-50                                   7                                    72
Estimate the 75th percentile

  • First plot a cumulative frequency graph by plotting the UPPER value of each class with its cumulative frequency. So 20min (x-axis) would go with 14 people (y-axis). The graph can use with a smooth line or straight lines joining each point.
  • Find 75% of the total frequency (the total number of people) = 0.75 × 72 = 54.
  • So the 75th percentile lies at the 54th person (if they were all lined up in order of time spent shopping).
  • Rule a line across across from the 54th person to your graph and then down to find the associated time. The 75th percentile (the upper quartile) will be at this value.

Standardised Demographic Rates

… Standardised rates are a statistical measure of a population usually measured for a standard population size of 1000. 

… Crude rates are always per 1000, so for example:
Crude Birth Rate = (Number of Births / Total Population ) x 1000

… Standardised birth rate = (Crude birth rate / 1000 ) x Standard population

This means that the crude rate = standardised rate if your standard population size is 1000!


Histograms

… The frequency is actually represented by the …?… of each histogram bar.

… The formula linking class width, frequency and frequency density is…?

… With histogram questions you sometimes have to add another column onto the grouped frequency table for the Frequency Density.

… Frequency density is plotted on the y axis of a histogram. You may need to think carefully about what the scale of this axis is using the frequency density values in the question.

… Another method of working out histograms is to find out how many items is represented by each square (preferably a medium or large square not a 1mm x 1mm square!). You can then count squares to work out frequencies.

… Watch out for varying class widths!

… To find the median from a histogram, first find the middle number of the total frequency, or use (n+1)/2

  • Then find the bar which contains this middle number.
  • How many more people/things do you need in this bar to get to the middle number?
  • Then rearrange the formula
    Frequency = Class Width  x Frequency Density  
    to find the class width which will extend into the median bar.

… Keep an eye out for the key words, “compare” and “proportions” on the question. Using percentages as proportions is fine. A comparison statement at the end is always a good plan 🙂


Stratified sampling:

… find the representative fraction for each category and multiply this by the sample size required. Remember that if you get decimals you may need to adjust the sample sizes of each category up or down to the nearest whole number. Do check by adding up the samples at the end! For example, if your sample was 80 and your stratified sample sizes for each group were:
23.8, 41.7, 14.5
then your adjusted sample sizes would be
24, 42, 14 (notice that we rounded the last sample DOWN rather than up otherwise they would have added to 81!)


Multi-point moving averages and seasonal variation.

… Seasonal variation (a repeating pattern) can be identified by plotting a scatter graph of the data with time on the x-axis. To ‘smooth out’ the seasonal variation, we can add a ‘moving point average’ to the graph.
For example
Quarter:  1      2     3      4      5      6     7      8      9      10      11     12
Sales:      56   65   70    47   62    60   76    50    59     68     81     52

When you plot the graph of sales vs time, you will see a repeating pattern every 4 quarters. So, to help smooth out this seasonal variation, choose a 4-point moving average:

Quarter:              1      2     3      4       5      6      7      8      9      10      11     12
Sales:                  56   65   70    47    62    60    76    50    59     68     81     52
4-pt mvg avg.     –      –       –    59.5   61   59.8  …etc

The first moving average calculation here was: (56+65+70+47)/4 = 59.5

The 4-point moving average shows the upwards trend of the data more clearly.

Note that some teachers / text books say you should put the moving average value in the centre of the time period. Realistically, this doesn’t make sense as you will only be able to calculate the moving average at the END of each period of time!.. However, please check this with your teacher to see what he/she suggests.


Stem and Leaf diagrams

… Finding the median from a stem and leaf diagram. Divide the number of numbers by 2 and then round UP (always). So if you had 31 numbers, the median number would be the 15.5 —> 16th number.

… Finding the upper and lower quartiles in order to find the interquartile range.

  • First find the lower quartile by dividing the number of numbers by 4 and rounding UP (always). So if you had 31 numbers, the lower quartile would be the 7.25—-> 8th number.
  • Then find the upper quartile by dividing the number of numbers by 4 (as before) and multiplying by 3. So if you had 31 numbers, the upper quartile would be 23.25 —–> 24th number.
  • When you have found the values of these numbers, the interquartile range = upper quartile – lower quartile.

Frequency Polygons

… Join the points using a ruler. Remember that you plot each point at the MIDDLE of the group (if it is a grouped frequency table)


Probability

… Probabilities can have values of 0 (definitely won’t happen) to 1 (definitely will happen)

… If a certain number of events can happen, then the sum of their probabilities will be …?… This can help you to form an algebra equation in some questions.

… Relative frequency = number of times an event happens / total number of trials.
We can use relative frequency as an estimate for the probability of an event occurring. The more trials there are, then the more accurate this estimate will be.

… When using probabilities, try to work in decimals or fractions rather than percentages.

… A probability space is usually a two-way table which shows all the possible outcomes of two events. For example, a spinner (with numbers 1, 3 or 5) could be spun, and a coin could be flipped where Heads = 1 and a tails = 2. The result might be found by taking the product of the two numbers:

        Spinner  1      3        5
Coin 
1                      1        3        5
2                      2        6        10

You can work out the probability of a certain set of numbers occurring, for example:
P(prime result) = number of times the result could happen / total number of results possible = 3/6 = 1/2

… If you know that the probability of an event happening is 0.03, then the expected number of times that particular event will happen on average will be:
Total number x probability.
For example, if the probability of finding a faulty brick is 0.03, then the expected number of faulty bricks in a batch of 10,000 would be…?

… For events which are not connected, P(event A happening) AND P(event B happening) = P(A) x P(B). So AND means multiply.

… Similarly, to find the probability of event A happening, OR event B happening, then ADD their probabilities: P(A) OR P(B) = P(A) + P(B).

Probability trees.

… Drawing a probability tree for more than one event (probabilities go on the branches).

… If you move along a path, then remember to multiply the probabilities (AND)

… If you have more than one possible path, then you add the probabilities of each path (OR)

… ‘Beads in a bag’ type question with replacement and without replacement. Remember that the total number of beads decreases for the second pick.

… Multiplying fractions, for example 2/3 × 5/7 = …?…

…  If you calculate a probability that is greater than 1, this is not possible, so check your calculations!

… You can often save yourself time by first calculating the opposite probability P'(X) and then calculate 1 – P'(X).
This is because the sum of all the probabilities must always add up to …?…

… Sometimes, probability tree questions involve algebra. For example: Hannah has a bag containing n sweets of which 8 are orange.
She picks a sweet out at random and eats it. She then picks another sweet out and eats that one too.
What is the probability the she picks out two orange sweets?
P(Orange on 1st pick) = 8/n
P(Orange on 2nd pick) = 7/(n-1)
P(Both orange) = 8/n  x  7/(n-1) = 56/n(n-1)


Probability Equation Questions

A bag contains White, Blue and Green counters.
The ratio of White to Blue counters is 3:25
The probability of picking a Green counter = 0.2
What is the least number of Green counters in the bag?

So the ratios of all three counters can be written:
W     :     B     :     G
3      :     25    :    x
Here, x is not necessarily the number of Green counters, but it might be if we get a whole number.

As P(Green) = 0.2, we can write:
x/(25+3+x) = 0.2
x = 0.2(28 +x)
x = 5.6 + 0.2x
0.8x = 5.6
x = 7

Note that if we had found a decimal value for x, then we can scale the ratios up to get integer values by multiplying by a suitable number.


The ‘statistical problem solving process/handling data cycle’

1) Explain a way of measuring the variables in which you are interested: draw up a data collection table with headings, e.g. Tally, frequency, amounts…

2) Your sample size must be greater than 7. For example, if you are counting the number of people in a shop each day vs. the shop’s daily sales, you should do this for more than 7 days.

3) Describe the graph or chart you will use to present the data. It can be useful to draw up a sketch example. You may also need to describe a calculation that needs to be done, e.g. an average.

4) how will you analyse your graph or chart? For example, a line of best fit? Here, talk about the correlation of the data – is it a positive, negative or no correlation?

5) Explain what you would expect to see if you will accept or reject the original hypothesis. What bias might you have in your data?


Capture / Recapture

… This is a method used by scientists to find an estimate of, for example, the total number of fishes, N, in a lake.

… The scientist captures M fish in the first visit to the lake and Marks them with a tag. She returns the fish to the lake.

… The scientist then returns to the lake some time later (when the marked fish have had time to distribute themselves randomly in the lake) and catches T fish.

… She notices that R fish of her second catch are marked with her tags.

… The Capture / Recapture formula is:

M/N  =  R/T

… We can use this formula to find an estimate for the total number of fish in the lake, N. 

Comments are closed.