Familywise error rate (FWER) - explained
315 segments
welcome to this first basic lecture
about postdeck tests where we'll discuss
the family-wise error rate which is a
key concept that you need to understand
before we go into the details about the
postdoc tests
in later videos we'll discuss specific
postdoc tests
to understand the meaning and the
family-wise error rate we first need to
briefly discuss significance level p
values and type 1 and 2 errors
if these concepts are new to you i
recommend that you first watch my basic
videos about this
the significance level for a test is the
threshold we set before we do the test
common threshold values are 0.1 0.05 and
0.01 however 0.05 is before the most
widely used significance level
if our p-value from a statistical test
is less than significance level we
reject the null hypothesis
and if our p-value from a statistical
test is greater than the significance
level we do not reject the null
hypothesis
the p-value is the probability that we
observe a test statistic or more extreme
if the null hypothesis is true
let's consider following data when one
wants to compare the means historic
blood pressure between young and middle
aged individuals
we have collected four subjects from
each group
based on our sample the mean blood
pressure for the young individuals is
124
whereas the mean blood pressure for the
middle aged individuals is 1 in 29
let's say that we use a t test to
compare the two groups
in this case
the p-value was computed to 0.023
the probability of observing a mean
blood pressure difference of 5 or more
if the null hypothesis is true
is about 2.3 percent
the lower the p-value is
the more certain we are that the
difference we observe is not just due to
chance
since the p-value is less than the
significance level 0.05
reject a null hypothesis and conclude
that young individuals have a
significantly lower systolic blood
pressure than middle-aged individuals
however there is still a 2.3 percent
risk that the difference we have absurd
or more extreme is due to chance if the
null hypothesis is true
suppose that actually is no difference
in blood pressure between young and
middle-aged individuals
the null hypothesis which states that
the two groups have equal means is
therefore true
due to chance we were unlucky because we
happened to select four middle-aged
individuals with a reality high blood
pressure
and four young individuals with a
relatively low blood pressure
which resulted in a p-value that was
less than the significance level
we therefore reject the null hypothesis
we have therefore committed a type 1
error because we incorrectly rejected a
true null hypothesis
if we set the significance level to 0.05
then the risk that you make a type 1
error if the null hypothesis is true is
5 percent
let's say that we collect a new data set
and compute the p-value to 0.09
suppose that the now is actually a true
difference in blood pressure between
young and middle-aged individuals
the null hypothesis
is therefore false in this case
however due to chance and small sample
size our statistical test results in a
p-value of 0.09 which is greater than
our significance level of 0.05
we have therefore committed a type 2
error because we did not reject the null
hypothesis even though it was false
one minus the probability that we commit
a type 2 error is called the statistical
power
the statistical power is the probability
that we correctly reject a false null
hypothesis
when we perform a test we lack a slow
probability as possible for committing a
type 1 and a type 2 error
we can easily reduce the probability for
type 1 error by reducing our
significance level
for example we can reduce the
significance level from 0.05 to 0.01
the problem by reducing the significance
level is that we then will increase the
risk for type 2 error and will therefore
reduce the statistical power
this is the reason why we should not
reduce the significance level
we therefore have to live with the fact
that there is a 5 risk of committing a
type 1 error if the null hypothesis
happens to be true
however we can reduce the risk for a
type 2 error and thereby increase the
statistical power by simply increase the
sample size in our experiment
note that the sample size does not
affect the risk for type 1 error
we'll now discuss the family-wise error
rate which is something that can be
estimated when we do multiple tests
let's say that you would like to compare
more things between young and
middle-aged individuals
in addition to the blood pressure
you would also like to compare if there
is a difference in body weight
and body height between young and
middle-aged individuals
we therefore compute three separate t
tests and get three p values
the problem when performing multiple
comparisons
is that we then increase the risk that
we commit at least one type one error
recall that every time we do a t test
with a significance level of five
percent when the null hypothesis is true
we run a five percent risk over
committing a type one error
the family-wise error rate is defined as
the risk that will commit at least one
type one error
the following equation can be used to
estimate the risk of making at least one
type one error when we make several
independent tests
when k denotes the number of independent
comparisons we make
and alpha represents our significance
level
in our previous example we made three
comparisons and used the significance
level of 0.05
let's try this equation based on our
previous example where we made three
comparisons and use an alpha value of
0.05
if you set alpha to 0.05 and k to 3 and
do the math
we see that the family-wise error rate
is equal to about 0.143
which means that the risk that we make
at least one type one error if the null
hypotheses happen to be true it's about
fourteen percent
let's say that we will test if there is
a difference in the following 10
variables between young and middle-aged
individuals
then we make 10 comparisons
we therefore change k from 3 to 10.
if we make 10 independent t tests with
the significance level of 0.05 the risk
that we commit at least one type 1 error
is about 40
if the mean of all these 10 variables
are equal between young and mlh
individuals
which means that all 10 null hypotheses
are true
this means that there is a 40 risk they
will commit at least one type 1 error
for example we might by chance get a
p-value that is less than 0.05 when we
compare the bmi between the two groups
although there actually is no true
difference
this table shows the risk to commit at
least one type on error if the null
hypothesis is true for all tests
for example we see that five tests
result in 23 risk
whereas 20 tests result in a 64 risk if
you run 100 tests you can be almost
certain that you have committed at least
one type of an error if the null
hypothesis is true for all tests
note that the following formula to
calculate the family-wise error rate is
only valid when we perform independent
tests
if we draw a sample from the population
and compare the blood pressure
and then draw a new sample to compare
the body weight
and a third sample to compare the body
height
these three tests will be considered as
independent
however
if you use the same sample to measure
all three variables
the tests are no longer independent
for example if we have a person who
deviates a lot from the other
such as a very tall person
we expect that the person also deviates
for other related variables such as the
body weight
the tests are no longer independent
because one person's data is involved in
several tests
we have a similar dependency also when
we like to compare three or more groups
with each other
for example if you like to compare this
historical blood pressure between
young and middle-aged individuals
and between middle-aged and old
individuals
and between young and old individuals
the same group will be involved in more
than one comparison
in this example each group will be
involved in two comparisons
for example
the old individuals are compared to both
middle-aged and young individuals
these two tests are therefore not
independent because if we happen to
randomly select four old individuals
that deviate a lot for other old
individuals
that will affect two tests instead of
just one
although we might deal with dependent
groups
the equation would still work as a rough
estimation of the family-wise error rate
when comparing dependent groups
the formula can be seen as an upper
bound to the family-wise error rate
for more accurate calculations of the
family-wise error rate one can use
simulation studies that fit the
ex-mental design
we'll do this in the last video about
poster tests
we'll now discuss how we can control the
family-wise error rate
one common approach to deal with the
problem or multiple comparisons
is to adjust the significance level
which is what most postdoc tests do
we can reduce the significance level for
each test to a level that keeps the
family voice error rate at 0.05
for example if you make 5 independent
comparisons
and reduce the significance level of
each test from 0.05 to 0.01
the family-wise error rate will
approximately be equal to 0.05
this means that the risks that commit at
least one type one error is now five
percent which is great because we do not
want to make a type one error
however the problem is that when we
reduce the significance level for each
test
we'll increase the risk for type 2 error
which means they've reduced the
statistical power
postdoc tests are a family of tests that
have been developed to deal with the
problem when we make several comparisons
most postdoc tests keep the family voice
rare rate at the designated level for
example at 0.05 which means that
although we make several pairwise
comparisons
the risk that commits at least one type
one error is still less than five
percent
although postdoc tests keep the family
wise error rate at the designated level
they have been developed to minimize the
loss of statistical power
different post-doc tests should be used
for different types of study designs
in the next lecture we'll discuss fish's
lsd test which is basically a set of
unpaired t-tests
Ask follow-up questions or revisit key timestamps.
This lecture introduces post-hoc tests and the crucial concept of the family-wise error rate (FWER). It begins by reviewing statistical basics, including significance levels, p-values, and Type 1 and Type 2 errors, illustrating them with practical examples. The FWER is defined as the probability of making at least one Type 1 error when conducting multiple comparisons. The video demonstrates how this risk escalates with an increasing number of tests, providing a formula for independent comparisons and explaining when tests are considered dependent. Finally, it explains that post-hoc tests are designed to control the FWER at a designated level, such as 0.05, while simultaneously minimizing the loss of statistical power by adjusting the significance level for individual tests.
Videos recently processed by our community