Data Collection & Representation
Types of Data
Quantitative: Numerical data (height, weight, temperature).
Qualitative: Non-numerical data (colour, gender, opinion).
Discrete: Can only take certain values, usually counted (number of pets, shoe size).
Continuous: Can take any value in a range, usually measured (height, time, mass).
Sampling Methods
Random sampling: Every member of the population has an equal chance of being selected.
Systematic sampling: Select every nth member from a list.
Stratified sampling: Divide population into groups (strata), then sample proportionally from each.
Exam Tip
You may be asked to criticise a data collection method. Look for bias, sample size, and whether the sample is representative.
Averages & Spread
Mean, Median, Mode & Range
Median = middle value (when ordered)
Mode = most common value
Range = highest − lowest
Mean from a Frequency Table
Multiply each value by its frequency, add the products, divide by total frequency.
Worked Example
Scores: 1(freq 3), 2(freq 5), 3(freq 7), 4(freq 5). Find the mean.
Sum = 1×3 + 2×5 + 3×7 + 4×5 = 3 + 10 + 21 + 20 = 54
Total frequency = 3 + 5 + 7 + 5 = 20
Mean = 54 ÷ 20 = 2.7
Estimated Mean from Grouped Data
Use the midpoint of each class. Multiply midpoint by frequency, add up, divide by total frequency.
This gives an estimate because we don't know exact values within each group.
Key Facts
- Mean uses all the data but is affected by outliers
- Median is not affected by outliers - best for skewed data
- Mode is the only average for qualitative data
- When comparing distributions, comment on BOTH an average AND the spread
Charts & Graphs
Scatter Graphs
Show the relationship between two variables.
Positive correlation: As one increases, the other increases.
Negative correlation: As one increases, the other decreases.
No correlation: No pattern between the variables.
The line of best fit should pass through the mean point and have roughly equal points above and below it.
Pie Charts
Each sector angle = frequencytotal frequency × 360°
Histograms (Higher)
Used for continuous grouped data with unequal class widths.
Frequency = frequency density × class width (= area of bar)
Common Mistake
In a histogram, the y-axis is frequency DENSITY, not frequency! The area of each bar represents the frequency.
Probability
Basic Probability
0 ≤ P(event) ≤ 1
P(not A) = 1 − P(A)
Combined Events
Mutually exclusive (cannot both happen): P(A or B) = P(A) + P(B)
Independent (one doesn't affect the other): P(A and B) = P(A) × P(B)
Worked Example
A fair die is rolled twice. Find P(both rolls are even).
P(even) = 3/6 = 1/2
Independent events: P(both even) = 1/2 × 1/2 = 1/4
Expected Frequency
This is what we'd expect on average. Actual results may differ due to chance.
Probability Diagrams
Tree Diagrams
Show all possible outcomes for two or more events.
Rules:
- Probabilities on branches from the same point must sum to 1
- Multiply along branches to find P(A AND B)
- Add between branches to find P(A OR B)
Without replacement: Probabilities change on the second pick because the total has decreased.
Venn Diagrams
Show the relationship between sets using overlapping circles.
A ∩ B = intersection (in both)
A ∪ B = union (in either or both)
Method: Start with the intersection, then work outwards.
Two-Way Tables
Organise data about two categories. Rows and columns must add up to the totals.
Exam Tip
For "without replacement" questions, always adjust the denominator on the second pick. If you started with 10 items, the second pick is out of 9.
Cumulative Frequency & Box Plots
Cumulative Frequency Diagrams
Plot the upper class boundary against the cumulative frequency (running total).
Join points with a smooth S-shaped curve.
To read off quartiles:
- Median: at n2 on the y-axis
- Q1: at n4 on the y-axis
- Q3: at 3n4 on the y-axis
Box Plots (Box-and-Whisker Diagrams)
Shows five key values: minimum, Q1, median, Q3, maximum.
The box spans Q1 to Q3 (the middle 50% of data).
Whiskers extend to the minimum and maximum.
Comparing Distributions
When comparing, always mention:
- An average (mean or median) - which is higher/lower
- A measure of spread (range or IQR) - which is more/less consistent
Use the context of the question in your answer.
Key Facts
- IQR is a better measure of spread than range because it ignores outliers
- A smaller IQR means more consistent data
- Box plots can be drawn horizontally or vertically
Probability & Stats Flashcards
Click to flip. Use arrow keys to navigate.
Probability & Stats Quiz
Test your knowledge.
Probability & Stats - Mock Exam Questions
Practice exam-style questions.