Lesson 5 ⢠45 min read
Probability & Statistics
In This Lesson
1Basic Probability
Definition
Probability measures how likely an event is to occur, expressed as a number between 0 and 1.
P(E) = Number of favorable outcomes / Total number of outcomes
0 ⤠P(E) ⤠1
P(E) = 0
Impossible
0 < P(E) < 1
Possible
P(E) = 1
Certain
Key Terms
- Experiment: An action with uncertain outcomes (rolling a die)
- Sample Space (S): Set of all possible outcomes
- Event (E): A subset of the sample space
- Complement (E'): Event NOT happening; P(E') = 1 - P(E)
Example:
Rolling a fair die, probability of getting 6:
Sample space S = {1, 2, 3, 4, 5, 6}
Event E = {6}
P(6) = 1/6 ā 0.167 or 16.7%
2Compound Events
Union (A OR B)
P(A āŖ B) = P(A) + P(B) - P(A ā© B)
At least one event occurs
Intersection (A AND B)
P(A ā© B) = P(A) Ć P(B|A)
Both events occur
Mutually Exclusive Events
Events that cannot happen at the same time: P(A ā© B) = 0
P(A āŖ B) = P(A) + P(B)
Example: Rolling 3 OR 5 on a die: P(3 or 5) = 1/6 + 1/6 = 2/6 = 1/3
Independent Events
One event does not affect the other: P(B|A) = P(B)
P(A ā© B) = P(A) Ć P(B)
Example: Two coin flips: P(HH) = 1/2 Ć 1/2 = 1/4
3Conditional Probability
Definition
The probability of event A occurring, given that event B has occurred.
P(A|B) = P(A ā© B) / P(B)
Read as "probability of A given B"
Example: Cards
Drawing from a standard deck (52 cards):
Q: Given a card is red, what's P(it's a heart)?
Red cards = 26 (13 hearts + 13 diamonds)
Hearts among red = 13
P(Heart|Red) = 13/26 = 1/2
Dependent vs Independent
| Dependent | Independent |
|---|---|
| P(A|B) ā P(A) | P(A|B) = P(A) |
| B affects A's probability | B has no effect on A |
| Drawing without replacement | Drawing with replacement |
4Measures of Position
Quartiles (Q)
Divide data into 4 equal parts (25% each).
Qā
25th %ile
Qā
50th %ile (Median)
Qā
75th %ile
IQR
Qā - Qā
Deciles (D)
Divide data into 10 equal parts (10% each).
Dā = 10th %ile, Dā = 20th %ile, ..., Dā = 90th %ile
Dā = Median = Qā = 50th percentile
Percentiles (P)
Divide data into 100 equal parts (1% each).
Position = (k/100) Ć (n + 1)
k = percentile, n = number of data points
Pāā = Qā, Pā ā = Qā = Median, Pāā = Qā
Example: Find Qā
Data: 2, 5, 7, 9, 11, 14, 18, 21 (n=8)
Qā position = (1/4) Ć (8+1) = 2.25
Between 2nd (5) and 3rd (7) values
Qā = 5 + 0.25(7-5) = 5.5
5Box Plots
Five-Number Summary
A box plot (box-and-whisker plot) displays the five-number summary:
Min
Lowest
Qā
25%
Qā
Median
Qā
75%
Max
Highest
Reading a Box Plot
Min Q1 Q2 Q3 Max
|-----|=====|=====|-----|
[ BOX ]
<--whisker--> <--whisker-->- ⢠Box spans Qā to Qā (IQR = middle 50%)
- ⢠Line inside box = Median (Qā)
- ⢠Whiskers extend to min and max
Identifying Outliers
Outlier if: value < Qā - 1.5(IQR) or value > Qā + 1.5(IQR)
Outliers are shown as individual dots beyond the whiskers.
Interpreting Shape
Symmetric
Median in center of box
Equal whiskers
Right Skewed
Median closer to Qā
Longer right whisker
Left Skewed
Median closer to Qā
Longer left whisker