Data Collection & Representation

Types of Data

Quantitative: Numerical data (height, weight, temperature).

Qualitative: Non-numerical data (colour, gender, opinion).

Discrete: Can only take certain values, usually counted (number of pets, shoe size).

Continuous: Can take any value in a range, usually measured (height, time, mass).

Sampling Methods

Random sampling: Every member of the population has an equal chance of being selected.

Systematic sampling: Select every nth member from a list.

Stratified sampling: Divide population into groups (strata), then sample proportionally from each.

Stratified sample from group = group sizetotal population × sample size

Exam Tip

You may be asked to criticise a data collection method. Look for bias, sample size, and whether the sample is representative.

Averages & Spread

Mean, Median, Mode & Range

Mean = sum of all valuesnumber of values

Median = middle value (when ordered)

Mode = most common value

Range = highest − lowest

Mean from a Frequency Table

Multiply each value by its frequency, add the products, divide by total frequency.

Worked Example

Scores: 1(freq 3), 2(freq 5), 3(freq 7), 4(freq 5). Find the mean.

Sum = 1×3 + 2×5 + 3×7 + 4×5 = 3 + 10 + 21 + 20 = 54

Total frequency = 3 + 5 + 7 + 5 = 20

Mean = 54 ÷ 20 = 2.7

Estimated Mean from Grouped Data

Use the midpoint of each class. Multiply midpoint by frequency, add up, divide by total frequency.

This gives an estimate because we don't know exact values within each group.

Key Facts

  • Mean uses all the data but is affected by outliers
  • Median is not affected by outliers - best for skewed data
  • Mode is the only average for qualitative data
  • When comparing distributions, comment on BOTH an average AND the spread

Charts & Graphs

Scatter Graphs

Show the relationship between two variables.

Positive correlation: As one increases, the other increases.

Negative correlation: As one increases, the other decreases.

No correlation: No pattern between the variables.

The line of best fit should pass through the mean point and have roughly equal points above and below it.

Pie Charts

Each sector angle = frequencytotal frequency × 360°

Histograms (Higher)

Used for continuous grouped data with unequal class widths.

Frequency density = frequencyclass width

Frequency = frequency density × class width (= area of bar)

Common Mistake

In a histogram, the y-axis is frequency DENSITY, not frequency! The area of each bar represents the frequency.

Probability

Basic Probability

P(event) = number of favourable outcomestotal number of outcomes

0 ≤ P(event) ≤ 1

P(not A) = 1 − P(A)

Combined Events

Mutually exclusive (cannot both happen): P(A or B) = P(A) + P(B)

Independent (one doesn't affect the other): P(A and B) = P(A) × P(B)

Worked Example

A fair die is rolled twice. Find P(both rolls are even).

P(even) = 3/6 = 1/2

Independent events: P(both even) = 1/2 × 1/2 = 1/4

Expected Frequency

Expected frequency = probability × number of trials

This is what we'd expect on average. Actual results may differ due to chance.

Probability Diagrams

Tree Diagrams

Show all possible outcomes for two or more events.

Rules:

  • Probabilities on branches from the same point must sum to 1
  • Multiply along branches to find P(A AND B)
  • Add between branches to find P(A OR B)

Without replacement: Probabilities change on the second pick because the total has decreased.

Venn Diagrams

Show the relationship between sets using overlapping circles.

n(A ∪ B) = n(A) + n(B) − n(A ∩ B)

A ∩ B = intersection (in both)
A ∪ B = union (in either or both)

Method: Start with the intersection, then work outwards.

Two-Way Tables

Organise data about two categories. Rows and columns must add up to the totals.

Exam Tip

For "without replacement" questions, always adjust the denominator on the second pick. If you started with 10 items, the second pick is out of 9.

Cumulative Frequency & Box Plots

Cumulative Frequency Diagrams

Plot the upper class boundary against the cumulative frequency (running total).

Join points with a smooth S-shaped curve.

To read off quartiles:

  • Median: at n2 on the y-axis
  • Q1: at n4 on the y-axis
  • Q3: at 3n4 on the y-axis

Box Plots (Box-and-Whisker Diagrams)

Shows five key values: minimum, Q1, median, Q3, maximum.

IQR = Q3 − Q1

The box spans Q1 to Q3 (the middle 50% of data).
Whiskers extend to the minimum and maximum.

Comparing Distributions

When comparing, always mention:

  1. An average (mean or median) - which is higher/lower
  2. A measure of spread (range or IQR) - which is more/less consistent

Use the context of the question in your answer.

Key Facts

  • IQR is a better measure of spread than range because it ignores outliers
  • A smaller IQR means more consistent data
  • Box plots can be drawn horizontally or vertically

Probability & Stats Flashcards

Click to flip. Use arrow keys to navigate.

Probability & Stats Quiz

Test your knowledge.

Probability & Stats - Mock Exam Questions

Practice exam-style questions.