Familywise error rate (FWER)

Familywise error rate (FWER) - explained

Watch on YouTube

Now Playing

Familywise error rate (FWER) - explained

Transcript

315 segments

0:00

welcome to this first basic lecture

0:02

about postdeck tests where we'll discuss

0:04

the family-wise error rate which is a

0:06

key concept that you need to understand

0:08

before we go into the details about the

0:10

postdoc tests

0:12

in later videos we'll discuss specific

0:15

postdoc tests

0:17

to understand the meaning and the

0:18

family-wise error rate we first need to

0:21

briefly discuss significance level p

0:23

values and type 1 and 2 errors

0:26

if these concepts are new to you i

0:28

recommend that you first watch my basic

0:30

videos about this

0:33

the significance level for a test is the

0:35

threshold we set before we do the test

0:38

common threshold values are 0.1 0.05 and

0:42

0.01 however 0.05 is before the most

0:47

widely used significance level

0:50

if our p-value from a statistical test

0:52

is less than significance level we

0:54

reject the null hypothesis

0:57

and if our p-value from a statistical

0:59

test is greater than the significance

1:01

level we do not reject the null

1:03

hypothesis

1:05

the p-value is the probability that we

1:07

observe a test statistic or more extreme

1:10

if the null hypothesis is true

1:14

let's consider following data when one

1:16

wants to compare the means historic

1:18

blood pressure between young and middle

1:20

aged individuals

1:21

we have collected four subjects from

1:23

each group

1:24

based on our sample the mean blood

1:27

pressure for the young individuals is

1:28

124

1:30

whereas the mean blood pressure for the

1:32

middle aged individuals is 1 in 29

1:35

let's say that we use a t test to

1:37

compare the two groups

1:39

in this case

1:40

the p-value was computed to 0.023

1:44

the probability of observing a mean

1:46

blood pressure difference of 5 or more

1:49

if the null hypothesis is true

1:52

is about 2.3 percent

1:55

the lower the p-value is

1:57

the more certain we are that the

1:59

difference we observe is not just due to

2:01

chance

2:03

since the p-value is less than the

2:04

significance level 0.05

2:07

reject a null hypothesis and conclude

2:09

that young individuals have a

2:11

significantly lower systolic blood

2:12

pressure than middle-aged individuals

2:15

however there is still a 2.3 percent

2:18

risk that the difference we have absurd

2:20

or more extreme is due to chance if the

2:22

null hypothesis is true

2:25

suppose that actually is no difference

2:28

in blood pressure between young and

2:29

middle-aged individuals

2:31

the null hypothesis which states that

2:34

the two groups have equal means is

2:36

therefore true

2:38

due to chance we were unlucky because we

2:41

happened to select four middle-aged

2:42

individuals with a reality high blood

2:45

pressure

2:46

and four young individuals with a

2:48

relatively low blood pressure

2:51

which resulted in a p-value that was

2:53

less than the significance level

2:55

we therefore reject the null hypothesis

2:59

we have therefore committed a type 1

3:01

error because we incorrectly rejected a

3:03

true null hypothesis

3:06

if we set the significance level to 0.05

3:10

then the risk that you make a type 1

3:12

error if the null hypothesis is true is

3:14

5 percent

3:18

let's say that we collect a new data set

3:20

and compute the p-value to 0.09

3:24

suppose that the now is actually a true

3:26

difference in blood pressure between

3:28

young and middle-aged individuals

3:30

the null hypothesis

3:32

is therefore false in this case

3:35

however due to chance and small sample

3:38

size our statistical test results in a

3:40

p-value of 0.09 which is greater than

3:44

our significance level of 0.05

3:47

we have therefore committed a type 2

3:48

error because we did not reject the null

3:50

hypothesis even though it was false

3:54

one minus the probability that we commit

3:56

a type 2 error is called the statistical

3:59

power

4:00

the statistical power is the probability

4:03

that we correctly reject a false null

4:05

hypothesis

4:08

when we perform a test we lack a slow

4:10

probability as possible for committing a

4:13

type 1 and a type 2 error

4:16

we can easily reduce the probability for

4:18

type 1 error by reducing our

4:20

significance level

4:22

for example we can reduce the

4:24

significance level from 0.05 to 0.01

4:30

the problem by reducing the significance

4:32

level is that we then will increase the

4:34

risk for type 2 error and will therefore

4:36

reduce the statistical power

4:40

this is the reason why we should not

4:41

reduce the significance level

4:44

we therefore have to live with the fact

4:46

that there is a 5 risk of committing a

4:48

type 1 error if the null hypothesis

4:50

happens to be true

4:53

however we can reduce the risk for a

4:55

type 2 error and thereby increase the

4:57

statistical power by simply increase the

4:59

sample size in our experiment

5:03

note that the sample size does not

5:05

affect the risk for type 1 error

5:08

we'll now discuss the family-wise error

5:10

rate which is something that can be

5:12

estimated when we do multiple tests

5:17

let's say that you would like to compare

5:18

more things between young and

5:20

middle-aged individuals

5:22

in addition to the blood pressure

5:25

you would also like to compare if there

5:27

is a difference in body weight

5:29

and body height between young and

5:31

middle-aged individuals

5:34

we therefore compute three separate t

5:36

tests and get three p values

5:39

the problem when performing multiple

5:41

comparisons

5:42

is that we then increase the risk that

5:44

we commit at least one type one error

5:48

recall that every time we do a t test

5:50

with a significance level of five

5:52

percent when the null hypothesis is true

5:54

we run a five percent risk over

5:56

committing a type one error

5:59

the family-wise error rate is defined as

6:02

the risk that will commit at least one

6:04

type one error

6:07

the following equation can be used to

6:09

estimate the risk of making at least one

6:11

type one error when we make several

6:14

independent tests

6:16

when k denotes the number of independent

6:18

comparisons we make

6:21

and alpha represents our significance

6:23

level

6:26

in our previous example we made three

6:29

comparisons and used the significance

6:31

level of 0.05

6:34

let's try this equation based on our

6:36

previous example where we made three

6:38

comparisons and use an alpha value of

6:40

0.05

6:42

if you set alpha to 0.05 and k to 3 and

6:46

do the math

6:48

we see that the family-wise error rate

6:50

is equal to about 0.143

6:53

which means that the risk that we make

6:55

at least one type one error if the null

6:57

hypotheses happen to be true it's about

7:00

fourteen percent

7:02

let's say that we will test if there is

7:04

a difference in the following 10

7:06

variables between young and middle-aged

7:08

individuals

7:09

then we make 10 comparisons

7:13

we therefore change k from 3 to 10.

7:16

if we make 10 independent t tests with

7:19

the significance level of 0.05 the risk

7:22

that we commit at least one type 1 error

7:24

is about 40

7:26

if the mean of all these 10 variables

7:29

are equal between young and mlh

7:31

individuals

7:32

which means that all 10 null hypotheses

7:34

are true

7:36

this means that there is a 40 risk they

7:38

will commit at least one type 1 error

7:42

for example we might by chance get a

7:44

p-value that is less than 0.05 when we

7:48

compare the bmi between the two groups

7:50

although there actually is no true

7:52

difference

7:53

this table shows the risk to commit at

7:56

least one type on error if the null

7:58

hypothesis is true for all tests

8:02

for example we see that five tests

8:04

result in 23 risk

8:07

whereas 20 tests result in a 64 risk if

8:12

you run 100 tests you can be almost

8:14

certain that you have committed at least

8:16

one type of an error if the null

8:18

hypothesis is true for all tests

8:22

note that the following formula to

8:23

calculate the family-wise error rate is

8:26

only valid when we perform independent

8:28

tests

8:30

if we draw a sample from the population

8:32

and compare the blood pressure

8:34

and then draw a new sample to compare

8:36

the body weight

8:38

and a third sample to compare the body

8:40

height

8:41

these three tests will be considered as

8:44

independent

8:46

however

8:47

if you use the same sample to measure

8:49

all three variables

8:51

the tests are no longer independent

8:54

for example if we have a person who

8:56

deviates a lot from the other

8:58

such as a very tall person

9:01

we expect that the person also deviates

9:03

for other related variables such as the

9:06

body weight

9:07

the tests are no longer independent

9:09

because one person's data is involved in

9:12

several tests

9:14

we have a similar dependency also when

9:17

we like to compare three or more groups

9:19

with each other

9:21

for example if you like to compare this

9:23

historical blood pressure between

9:25

young and middle-aged individuals

9:28

and between middle-aged and old

9:29

individuals

9:31

and between young and old individuals

9:33

the same group will be involved in more

9:35

than one comparison

9:37

in this example each group will be

9:40

involved in two comparisons

9:42

for example

9:43

the old individuals are compared to both

9:46

middle-aged and young individuals

9:49

these two tests are therefore not

9:51

independent because if we happen to

9:53

randomly select four old individuals

9:56

that deviate a lot for other old

9:58

individuals

9:59

that will affect two tests instead of

10:01

just one

10:03

although we might deal with dependent

10:05

groups

10:06

the equation would still work as a rough

10:08

estimation of the family-wise error rate

10:11

when comparing dependent groups

10:13

the formula can be seen as an upper

10:15

bound to the family-wise error rate

10:18

for more accurate calculations of the

10:20

family-wise error rate one can use

10:22

simulation studies that fit the

10:24

ex-mental design

10:26

we'll do this in the last video about

10:28

poster tests

10:31

we'll now discuss how we can control the

10:32

family-wise error rate

10:35

one common approach to deal with the

10:37

problem or multiple comparisons

10:39

is to adjust the significance level

10:41

which is what most postdoc tests do

10:46

we can reduce the significance level for

10:48

each test to a level that keeps the

10:50

family voice error rate at 0.05

10:54

for example if you make 5 independent

10:56

comparisons

10:59

and reduce the significance level of

11:01

each test from 0.05 to 0.01

11:05

the family-wise error rate will

11:07

approximately be equal to 0.05

11:10

this means that the risks that commit at

11:12

least one type one error is now five

11:15

percent which is great because we do not

11:17

want to make a type one error

11:20

however the problem is that when we

11:22

reduce the significance level for each

11:24

test

11:26

we'll increase the risk for type 2 error

11:28

which means they've reduced the

11:30

statistical power

11:33

postdoc tests are a family of tests that

11:35

have been developed to deal with the

11:37

problem when we make several comparisons

11:40

most postdoc tests keep the family voice

11:43

rare rate at the designated level for

11:46

example at 0.05 which means that

11:49

although we make several pairwise

11:51

comparisons

11:52

the risk that commits at least one type

11:54

one error is still less than five

11:56

percent

11:57

although postdoc tests keep the family

12:00

wise error rate at the designated level

12:02

they have been developed to minimize the

12:04

loss of statistical power

12:07

different post-doc tests should be used

12:09

for different types of study designs

12:12

in the next lecture we'll discuss fish's

12:14

lsd test which is basically a set of

12:17

unpaired t-tests

Interactive Summary

Ask follow-up questions or revisit key timestamps.

This lecture introduces post-hoc tests and the crucial concept of the family-wise error rate (FWER). It begins by reviewing statistical basics, including significance levels, p-values, and Type 1 and Type 2 errors, illustrating them with practical examples. The FWER is defined as the probability of making at least one Type 1 error when conducting multiple comparisons. The video demonstrates how this risk escalates with an increasing number of tests, providing a formula for independent comparisons and explaining when tests are considered dependent. Finally, it explains that post-hoc tests are designed to control the FWER at a designated level, such as 0.05, while simultaneously minimizing the loss of statistical power by adjusting the significance level for individual tests.

Recently Distilled

Videos recently processed by our community

Новости PoE | Брич в коре +тизер по восхождениям из Фрекии | ПоЕ 3.28

Feb 26, 2026

by unstabb

Новости PoE | Тизер нового скилла + проблемы дропа на картах (ипрочее бухтение) | PoE 3.28 Mirage

Feb 26, 2026

by unstabb

Why Nobody Wants to Hear You Over Explain

Feb 26, 2026

by HealthyGamerGG

Jensen Huang's Misdirection Play: What NVIDIA's Earnings Call Didn't Want You to See

Feb 26, 2026

by Dr. Josh C. Simmons

GEOGRAFIA 4|Dział VI. Zróżnicowanie jakości życia ludności na świecie Roz.3:Zagrożenia życia.Choroby

Feb 26, 2026

by Czytanie Na Ekranie

Why Sharing Your Feelings Can Kill Your Relationship

Feb 26, 2026

by HealthyGamerGG

GEOGRAFIA 4|Dział VI. Zróżnicowanie jakości życia ludności na świecie Roz.4:Poczucie bezpieczeństwa

Feb 26, 2026

by Czytanie Na Ekranie

How to actually force Claude Code to use the right CLI (don't use CLAUDE.md)

Feb 26, 2026

by Matt Pocock

GEOGRAFIA 4|Dział VI.Zróżnicowanie jakości życia ludności .. Roz.5:Zaspokajanie potrzeb edukacyjnych

Feb 26, 2026

by Czytanie Na Ekranie