HomeVideos

Familywise error rate (FWER) - explained

Now Playing

Familywise error rate (FWER) - explained

Transcript

315 segments

0:00

welcome to this first basic lecture

0:02

about postdeck tests where we'll discuss

0:04

the family-wise error rate which is a

0:06

key concept that you need to understand

0:08

before we go into the details about the

0:10

postdoc tests

0:12

in later videos we'll discuss specific

0:15

postdoc tests

0:17

to understand the meaning and the

0:18

family-wise error rate we first need to

0:21

briefly discuss significance level p

0:23

values and type 1 and 2 errors

0:26

if these concepts are new to you i

0:28

recommend that you first watch my basic

0:30

videos about this

0:33

the significance level for a test is the

0:35

threshold we set before we do the test

0:38

common threshold values are 0.1 0.05 and

0:42

0.01 however 0.05 is before the most

0:47

widely used significance level

0:50

if our p-value from a statistical test

0:52

is less than significance level we

0:54

reject the null hypothesis

0:57

and if our p-value from a statistical

0:59

test is greater than the significance

1:01

level we do not reject the null

1:03

hypothesis

1:05

the p-value is the probability that we

1:07

observe a test statistic or more extreme

1:10

if the null hypothesis is true

1:14

let's consider following data when one

1:16

wants to compare the means historic

1:18

blood pressure between young and middle

1:20

aged individuals

1:21

we have collected four subjects from

1:23

each group

1:24

based on our sample the mean blood

1:27

pressure for the young individuals is

1:28

124

1:30

whereas the mean blood pressure for the

1:32

middle aged individuals is 1 in 29

1:35

let's say that we use a t test to

1:37

compare the two groups

1:39

in this case

1:40

the p-value was computed to 0.023

1:44

the probability of observing a mean

1:46

blood pressure difference of 5 or more

1:49

if the null hypothesis is true

1:52

is about 2.3 percent

1:55

the lower the p-value is

1:57

the more certain we are that the

1:59

difference we observe is not just due to

2:01

chance

2:03

since the p-value is less than the

2:04

significance level 0.05

2:07

reject a null hypothesis and conclude

2:09

that young individuals have a

2:11

significantly lower systolic blood

2:12

pressure than middle-aged individuals

2:15

however there is still a 2.3 percent

2:18

risk that the difference we have absurd

2:20

or more extreme is due to chance if the

2:22

null hypothesis is true

2:25

suppose that actually is no difference

2:28

in blood pressure between young and

2:29

middle-aged individuals

2:31

the null hypothesis which states that

2:34

the two groups have equal means is

2:36

therefore true

2:38

due to chance we were unlucky because we

2:41

happened to select four middle-aged

2:42

individuals with a reality high blood

2:45

pressure

2:46

and four young individuals with a

2:48

relatively low blood pressure

2:51

which resulted in a p-value that was

2:53

less than the significance level

2:55

we therefore reject the null hypothesis

2:59

we have therefore committed a type 1

3:01

error because we incorrectly rejected a

3:03

true null hypothesis

3:06

if we set the significance level to 0.05

3:10

then the risk that you make a type 1

3:12

error if the null hypothesis is true is

3:14

5 percent

3:18

let's say that we collect a new data set

3:20

and compute the p-value to 0.09

3:24

suppose that the now is actually a true

3:26

difference in blood pressure between

3:28

young and middle-aged individuals

3:30

the null hypothesis

3:32

is therefore false in this case

3:35

however due to chance and small sample

3:38

size our statistical test results in a

3:40

p-value of 0.09 which is greater than

3:44

our significance level of 0.05

3:47

we have therefore committed a type 2

3:48

error because we did not reject the null

3:50

hypothesis even though it was false

3:54

one minus the probability that we commit

3:56

a type 2 error is called the statistical

3:59

power

4:00

the statistical power is the probability

4:03

that we correctly reject a false null

4:05

hypothesis

4:08

when we perform a test we lack a slow

4:10

probability as possible for committing a

4:13

type 1 and a type 2 error

4:16

we can easily reduce the probability for

4:18

type 1 error by reducing our

4:20

significance level

4:22

for example we can reduce the

4:24

significance level from 0.05 to 0.01

4:30

the problem by reducing the significance

4:32

level is that we then will increase the

4:34

risk for type 2 error and will therefore

4:36

reduce the statistical power

4:40

this is the reason why we should not

4:41

reduce the significance level

4:44

we therefore have to live with the fact

4:46

that there is a 5 risk of committing a

4:48

type 1 error if the null hypothesis

4:50

happens to be true

4:53

however we can reduce the risk for a

4:55

type 2 error and thereby increase the

4:57

statistical power by simply increase the

4:59

sample size in our experiment

5:03

note that the sample size does not

5:05

affect the risk for type 1 error

5:08

we'll now discuss the family-wise error

5:10

rate which is something that can be

5:12

estimated when we do multiple tests

5:17

let's say that you would like to compare

5:18

more things between young and

5:20

middle-aged individuals

5:22

in addition to the blood pressure

5:25

you would also like to compare if there

5:27

is a difference in body weight

5:29

and body height between young and

5:31

middle-aged individuals

5:34

we therefore compute three separate t

5:36

tests and get three p values

5:39

the problem when performing multiple

5:41

comparisons

5:42

is that we then increase the risk that

5:44

we commit at least one type one error

5:48

recall that every time we do a t test

5:50

with a significance level of five

5:52

percent when the null hypothesis is true

5:54

we run a five percent risk over

5:56

committing a type one error

5:59

the family-wise error rate is defined as

6:02

the risk that will commit at least one

6:04

type one error

6:07

the following equation can be used to

6:09

estimate the risk of making at least one

6:11

type one error when we make several

6:14

independent tests

6:16

when k denotes the number of independent

6:18

comparisons we make

6:21

and alpha represents our significance

6:23

level

6:26

in our previous example we made three

6:29

comparisons and used the significance

6:31

level of 0.05

6:34

let's try this equation based on our

6:36

previous example where we made three

6:38

comparisons and use an alpha value of

6:40

0.05

6:42

if you set alpha to 0.05 and k to 3 and

6:46

do the math

6:48

we see that the family-wise error rate

6:50

is equal to about 0.143

6:53

which means that the risk that we make

6:55

at least one type one error if the null

6:57

hypotheses happen to be true it's about

7:00

fourteen percent

7:02

let's say that we will test if there is

7:04

a difference in the following 10

7:06

variables between young and middle-aged

7:08

individuals

7:09

then we make 10 comparisons

7:13

we therefore change k from 3 to 10.

7:16

if we make 10 independent t tests with

7:19

the significance level of 0.05 the risk

7:22

that we commit at least one type 1 error

7:24

is about 40

7:26

if the mean of all these 10 variables

7:29

are equal between young and mlh

7:31

individuals

7:32

which means that all 10 null hypotheses

7:34

are true

7:36

this means that there is a 40 risk they

7:38

will commit at least one type 1 error

7:42

for example we might by chance get a

7:44

p-value that is less than 0.05 when we

7:48

compare the bmi between the two groups

7:50

although there actually is no true

7:52

difference

7:53

this table shows the risk to commit at

7:56

least one type on error if the null

7:58

hypothesis is true for all tests

8:02

for example we see that five tests

8:04

result in 23 risk

8:07

whereas 20 tests result in a 64 risk if

8:12

you run 100 tests you can be almost

8:14

certain that you have committed at least

8:16

one type of an error if the null

8:18

hypothesis is true for all tests

8:22

note that the following formula to

8:23

calculate the family-wise error rate is

8:26

only valid when we perform independent

8:28

tests

8:30

if we draw a sample from the population

8:32

and compare the blood pressure

8:34

and then draw a new sample to compare

8:36

the body weight

8:38

and a third sample to compare the body

8:40

height

8:41

these three tests will be considered as

8:44

independent

8:46

however

8:47

if you use the same sample to measure

8:49

all three variables

8:51

the tests are no longer independent

8:54

for example if we have a person who

8:56

deviates a lot from the other

8:58

such as a very tall person

9:01

we expect that the person also deviates

9:03

for other related variables such as the

9:06

body weight

9:07

the tests are no longer independent

9:09

because one person's data is involved in

9:12

several tests

9:14

we have a similar dependency also when

9:17

we like to compare three or more groups

9:19

with each other

9:21

for example if you like to compare this

9:23

historical blood pressure between

9:25

young and middle-aged individuals

9:28

and between middle-aged and old

9:29

individuals

9:31

and between young and old individuals

9:33

the same group will be involved in more

9:35

than one comparison

9:37

in this example each group will be

9:40

involved in two comparisons

9:42

for example

9:43

the old individuals are compared to both

9:46

middle-aged and young individuals

9:49

these two tests are therefore not

9:51

independent because if we happen to

9:53

randomly select four old individuals

9:56

that deviate a lot for other old

9:58

individuals

9:59

that will affect two tests instead of

10:01

just one

10:03

although we might deal with dependent

10:05

groups

10:06

the equation would still work as a rough

10:08

estimation of the family-wise error rate

10:11

when comparing dependent groups

10:13

the formula can be seen as an upper

10:15

bound to the family-wise error rate

10:18

for more accurate calculations of the

10:20

family-wise error rate one can use

10:22

simulation studies that fit the

10:24

ex-mental design

10:26

we'll do this in the last video about

10:28

poster tests

10:31

we'll now discuss how we can control the

10:32

family-wise error rate

10:35

one common approach to deal with the

10:37

problem or multiple comparisons

10:39

is to adjust the significance level

10:41

which is what most postdoc tests do

10:46

we can reduce the significance level for

10:48

each test to a level that keeps the

10:50

family voice error rate at 0.05

10:54

for example if you make 5 independent

10:56

comparisons

10:59

and reduce the significance level of

11:01

each test from 0.05 to 0.01

11:05

the family-wise error rate will

11:07

approximately be equal to 0.05

11:10

this means that the risks that commit at

11:12

least one type one error is now five

11:15

percent which is great because we do not

11:17

want to make a type one error

11:20

however the problem is that when we

11:22

reduce the significance level for each

11:24

test

11:26

we'll increase the risk for type 2 error

11:28

which means they've reduced the

11:30

statistical power

11:33

postdoc tests are a family of tests that

11:35

have been developed to deal with the

11:37

problem when we make several comparisons

11:40

most postdoc tests keep the family voice

11:43

rare rate at the designated level for

11:46

example at 0.05 which means that

11:49

although we make several pairwise

11:51

comparisons

11:52

the risk that commits at least one type

11:54

one error is still less than five

11:56

percent

11:57

although postdoc tests keep the family

12:00

wise error rate at the designated level

12:02

they have been developed to minimize the

12:04

loss of statistical power

12:07

different post-doc tests should be used

12:09

for different types of study designs

12:12

in the next lecture we'll discuss fish's

12:14

lsd test which is basically a set of

12:17

unpaired t-tests

Interactive Summary

This lecture introduces post-hoc tests and the crucial concept of the family-wise error rate (FWER). It begins by reviewing statistical basics, including significance levels, p-values, and Type 1 and Type 2 errors, illustrating them with practical examples. The FWER is defined as the probability of making at least one Type 1 error when conducting multiple comparisons. The video demonstrates how this risk escalates with an increasing number of tests, providing a formula for independent comparisons and explaining when tests are considered dependent. Finally, it explains that post-hoc tests are designed to control the FWER at a designated level, such as 0.05, while simultaneously minimizing the loss of statistical power by adjusting the significance level for individual tests.

Suggested questions

10 ready-made prompts

Recently Distilled

Videos recently processed by our community