METR's Benchmarks vs Economics: The AI capability measurement gap

METR's Benchmarks vs Economics: The AI capability measurement gap – Joel Becker, METR

Watch on YouTube

Now Playing

METR's Benchmarks vs Economics: The AI capability measurement gap – Joel Becker, METR

Transcript

604 segments

0:13

[music]

0:20

Hey guys, thank you so much for having

0:22

me. My name is Joel Becker. I work as a

0:24

researcher or member of technical staff

0:26

at MET, which stands for model

0:29

evaluation and threat research. As we'll

0:30

see in a second, I'm going to be talking

0:32

about AI capabilities. How do we know

0:34

how performant AIs are today? How how

0:36

performant they might be in the near

0:38

future from these two different sources

0:39

of evidence that seem to give somewhat

0:41

conflicting answers. You know, I I could

0:44

have done this whole talk without

0:45

reference to meter papers in particular,

0:47

but we'll look at two papers I've been

0:49

um involved with as as examples of

0:51

benchmark style evidence and then more

0:53

economic style evidence. On the

0:55

benchmark side, measuring AI ability to

0:57

complete long tasks. This is the paper

1:00

um that comes with the the charts that

1:02

many of you would have seen on on

1:03

Twitter and so on that meter is well

1:04

known for. And then the second this um

1:07

RCT measuring how allowing AI affects

1:10

developer productivity. And then we'll

1:12

be talking about how to reconcile uh the

1:14

the gap that's implied between these two

1:16

different kinds of measurements.

1:19

As I mentioned, META stands for model

1:21

evaluation and threat research. We are a

1:23

independent research nonprofit that

1:25

seeks to inform the the public, policy

1:28

makers, labs about the degree to which

1:30

AIs might pose catastrophic risks to

1:33

society. The model evaluation part uh

1:35

means that we seek to understand AI

1:37

capabilities and propensities. And the

1:39

threat research part means we try to

1:41

connect those capabilities and

1:42

propensities to potential catastrophic

1:45

risks.

1:47

Okay. The first paper we're going to

1:48

talk about associated with this chart

1:50

that that many of you I think might have

1:52

seen.

1:53

Um take taking a step back first before

1:55

we dive into the paper. You know how how

1:57

usually do we think about measuring AI

1:59

capabilities using benchmarks on a SWE

2:02

bench or a GPQA so on and so forth.

2:04

There's some notion of 0% performance um

2:07

or or random performance. So for GPQA

2:09

that's that's 25% which corresponds to

2:11

this flaw that the worst you can

2:13

possibly do. Perhaps there's a um human

2:16

baseline that's below 100% for GPQA. I

2:19

think this is something like 75% that

2:22

represents maybe expert human

2:23

performance. And then of course you can

2:25

go all the way up to 100% potentially on

2:27

on these kinds of benchmarks. But but

2:29

what does it mean? you know, if I'm

2:30

getting 50% on GPQA, if I'm like half

2:32

the way from the um from the floor to

2:35

the to the expert baseline, what you

2:37

know, what does that really mean about

2:38

how performant the AIS are? If I meet

2:40

the human baseline, does that mean that

2:42

the AIS are now as performant or even

2:44

more performant than than expert humans

2:46

in in a relevant sense that I that I

2:48

care about? It's hard to interpret. You

2:50

know, another thing that you see from

2:52

this graph is that um benchmarks seem to

2:56

have less and less time between coming

2:58

online sort of giving any signal at all

3:01

and being fully saturated. It's harder

3:04

and harder to create benchmarks that

3:06

have uh plenty of signal that you know

3:08

might might be informative to us about

3:10

how capable models are for for an

3:11

extended period of time. So, we're we're

3:13

going to go about this a different way.

3:16

First, we're going to gather human

3:18

baseline data for diverse tasks spanning

3:20

a range of difficulties. You should

3:22

think of these humans as, you know,

3:24

experienced experts, but on their first

3:27

day or or or first week on the job.

3:29

These are not people with context on the

3:32

tasks in particular. It's not exactly

3:34

the kind of thing that's come up in

3:35

their work before, but if it's a

3:36

software engineering task, you know,

3:37

there are relevantly skilled general

3:39

software engineer. Same for the machine

3:41

learning tasks and the cyber security

3:42

tasks here that we'll talk about. the

3:45

the [snorts] type of tasks come from

3:46

these three um buckets or task

3:49

distributions. Hcast which is a

3:52

collection of um softwarebased tasks

3:54

seemingly requiring autonomy you know

3:56

interacting with tools um uh interacting

3:59

with the environments thinking thinking

4:01

through the problem not not just this

4:02

kind of Q&A style um style data set um

4:06

the SWAR suite which are these atomic

4:08

problems these are problems that you

4:10

know maybe GBT2 can do maybe maybe it

4:12

can't problems like um here are four

4:15

files one of them is called

4:16

passwords.txt txt which file contains

4:19

the passwords and then on the other end

4:21

of difficulty we have rebench which are

4:23

challenging novel open-ended um machine

4:26

learning research engineering challenges

4:28

which are are very difficult even for

4:30

top human experts

4:32

in addition to gathering the the human

4:34

baseline data we'll also under as close

4:36

to identical conditions as possible

4:38

measure AI performance for the AIs that

4:40

we're that we're interested in on the

4:41

same set of tasks and then we're going

4:44

to convert the time it takes for humans

4:47

to complete these tasks into an estimate

4:49

of AI autonomous capabilities as I'll

4:52

I'll show you in a second.

4:55

Here's an illustrative diagram in this

4:57

case for claw 3.7 Sonet which was the

4:59

the frontier model at the time that this

5:01

paper came out. You can see that you

5:04

know for the for the very short tasks

5:05

something like 4 minutes or below Sonet

5:07

is getting the answers correct you know

5:09

essentially 100% of the time or or maybe

5:11

even here literally 100% of the time.

5:13

for the very hardest tasks it's

5:14

struggling and then and then there's

5:16

some range where we're kind of in the

5:17

middle you know we're somewhere between

5:18

10 and 10 and 90%. I'll say that this

5:22

empirical pattern where models are less

5:24

performant at tasks that take humans

5:26

longer is you know it's not a fact of

5:28

nature but it's it's something that we

5:29

see pretty pretty commonly pretty pretty

5:31

robustly across models at least on this

5:33

task distribution and I'd conjecture for

5:35

for other task distributions as well. So

5:37

we try and fit this dark purple line to

5:39

to something like this data on on how

5:41

long it took humans to complete the

5:43

relevant tasks that the models are uh um

5:45

are attempting. And then we call the

5:48

point on the x-axis this horizontal axis

5:50

this human time to complete axis at

5:53

which we predict the models will succeed

5:55

50% of the time the time horizon of

5:59

those models that there's much to debate

6:01

in the 50% number. I can I can talk

6:02

later about the reasons why we chose

6:04

that and and then we'll do the same

6:06

exercise for the other models. So here I

6:08

have uh claw 3 opus has a time horizon

6:11

of something like 4 minutes. That's

6:12

where we're predicting that it has a

6:14

success probability on this task

6:16

distribution of 50%. For 01 preview I'm

6:19

seeing something like 15 minutes so on

6:21

and so forth. And then of course all

6:22

these models you know they they come out

6:24

over um calendar time. So if we plot the

6:27

time horizon, the x-coordinate on uh on

6:31

on this set of plots against um against

6:33

calendar's time, we find something like

6:34

this. It looks, you know, kind of like

6:36

um kind of like an exponential trend

6:38

that's that's going up at some constant

6:40

rate. In fact, it doesn't just look like

6:42

an exponential trend. If we had a

6:43

perfectly straight line here, it would

6:45

indicate um a perfectly exponential

6:47

trend. um we we see something really

6:49

remarkably steady actually much more

6:51

steady than we were anticipating when we

6:53

uh went about doing this research

6:55

project

6:57

and that's continued to be the case. So

7:00

many of you will have seen updates that

7:01

we've made of of this graph on on on

7:03

Twitter. This is going all the way up to

7:05

GPT 5.1 CEX max. So extremely recent um

7:08

the predictions from this you know

7:10

shockingly straight line have have held

7:12

up very well I think.

7:16

Taking a quick step back, what are

7:18

benchmarks telling us or or here kind of

7:20

benchmark like evidence? Well, one thing

7:22

is that AIs can succeed at what for

7:24

humans would be exceedingly difficult

7:27

tasks. The tasks in our ebench are, you

7:29

know, really far beyond my capabilities

7:32

uh personally and and you know the AI is

7:34

having a good crack at them some some

7:35

decent percentage of the time. And the

7:38

second's you know kind of obvious is

7:39

that progress is rapid.

7:42

>> [snorts]

7:42

>> On the other hand, um you know, how much

7:44

how much stock should we put in the um

7:46

the evidence suggested by benchmarks? Um

7:49

what limitations might they have? Lots,

7:52

but here are here are three that I'll

7:54

note. One is, as I mentioned, these are

7:57

humans who are, you know, expert in some

7:59

relevant sense, but they're low context.

8:01

It's something like their their first

8:02

week on the job. They haven't seen tasks

8:04

exactly like this previously. They just

8:05

have some relevant experience.

8:07

presumably people who were more sort of

8:10

you know not not just having the

8:11

relevant experience but also highly

8:13

familiar with um uh with the with the

8:16

set of tasks would perform the tasks

8:18

even sooner and then we think relative

8:19

to those people the AIs were more

8:21

performant.

8:23

The second is that benchmarks can be low

8:25

ceiling. Even you know GPQA or use that

8:28

example again um where we're beginning

8:32

to get to the point where where that

8:33

benchmark is um is totally saturated not

8:37

providing um additional information for

8:39

marginal models whereas time horizon is

8:41

providing this nice way to sort of chain

8:43

benchmarks together in in in some sense

8:45

over time.

8:47

Um but you know nonetheless it's still

8:49

very hard to um uh to create these ever

8:52

harder tasks when the um when the time

8:55

horizon of models is doubling every

8:56

something like six to seven months. So

8:58

even time horizon might be might be

9:00

saturated in not too long or the

9:02

benchmarks underlying time horizon.

9:04

And the next one is you know not not a

9:06

concern that's limited to the to the

9:08

meter task to the task behind time

9:10

horizon. It's also true for sweet bench.

9:11

which is also true for for many of your

9:13

um favorite agentic benchmarks that the

9:15

problems aren't very messy in some

9:17

sense. They don't require a ton of

9:19

coordination with humans. They're often

9:21

in relatively small contained

9:23

environments where where not much can go

9:25

wrong. You know, not these sort of

9:26

massive open source code bases or or um

9:29

other ways in which the the problems can

9:30

involve more interaction with the real

9:32

world or or or be messy in in some

9:34

sense.

9:36

Um so we did this we did this project

9:39

and then um early this year we were you

9:41

know we were trying to think about um uh

9:44

how can we attack some of these

9:45

limitations? What what's a different

9:46

source of evidence that um might have

9:49

its own own pros and cons but you know

9:51

importantly be more externally valid in

9:53

in the scientific jargon.

9:56

Perhaps field experiments are the

9:58

answer. [snorts] So more economic style

9:59

evidence. So here we might be interested

10:02

in very high context developers who are

10:04

expert on the kind of tasks they're

10:05

already doing

10:07

speed up or some notion of productivity

10:09

boost. You know it seems to have more

10:11

signal through even some um superhuman

10:13

according to benchmarks range. You know

10:15

perhaps GPQA is fully saturated and

10:17

you're getting a 1.5x 2x speed up

10:19

something like that but you can still

10:20

achieve a 3x 4x 5x speed up even even

10:24

after that we we maintain more signal.

10:26

And the last is that you know that the

10:28

tasks are messier. They are tasks that

10:31

are coming up in people's real work.

10:32

They're not um synthetic. They're not

10:34

small and contained. Um this is a real

10:36

deployment scenario.

10:40

Here's what we're going to do for this

10:42

paper. We're going to gather 16

10:44

experienced developers on large mature

10:46

open source projects that we'll go

10:47

through in a second. Each of these

10:50

developers will on average complete

10:51

about 16 tasks from their real work.

10:54

These are these are issues on the on the

10:56

relevant GitHub repositories. The kind

10:57

of thing that they might otherwise have

10:59

completed with the with the caveat that

11:00

the very longest issues we're not going

11:02

to include.

11:04

The tasks will be randomly assigned to

11:07

AI disallowed or AI allowed. AI

11:09

disallowed, you know, it means it means

11:11

what you think it means. It means

11:12

software development in 2019. It means

11:14

no AI powered tab autocomplete. It means

11:17

no cursor agentic coding tools. It means

11:19

no LLMs via the web UI.

11:23

or they can be randomly assigned to AI

11:25

allowed in which case everything's on

11:26

the table. You know, any of the AI tools

11:28

I just mentioned or not using the AI

11:30

tools. If you're in the AI allowed

11:31

condition, you're not compelled to use

11:33

AI. You just have the option. And we buy

11:36

these developers Cursor Pro. So, um for

11:39

the for the most part, that's the tool

11:40

that they're using with typically 3.6 or

11:42

3.7s on it at the time, uh which was the

11:45

Frontier model when we conducted this

11:46

work. And then we're going to record the

11:49

time it takes for the developers to

11:50

complete each task and see the degree to

11:53

which they might save time when AI is

11:54

allowed versus when it's not.

11:58

These are some of the repositories. Many

12:00

of you will be familiar with them. We've

12:01

got the Haskell compiler represented. We

12:03

have scikitlearn. We have hugging face

12:05

transformers. These are on average a

12:07

million lines of code plus. They've been

12:09

around for 10 plus years. The developers

12:12

who are going to be working on these

12:13

repositories as part of this study are

12:15

on average the third top contributor out

12:17

of hundreds or or even in some cases

12:19

thousands of contributors to these

12:21

repositories. They personally have been

12:23

contributing to the repository for

12:24

something like 5 years on average. These

12:26

are top experts.

12:29

Some of you might have seen this graph

12:31

too. And and so the punch line's been

12:32

spoiled for for the rest of you. Um we

12:35

asked uh economics experts, machine

12:37

learning experts, you know, these are

12:38

people at major AI companies and labs,

12:41

um uh top academics, um some graduate

12:43

students, so on and so forth, you know,

12:45

how much they expect developers to save

12:47

time when they're using AI. They say

12:49

something like 40% or a little bit less.

12:51

We ask the developers themselves, the

12:53

study participants, how much they expect

12:55

to be sped up ahead of time, and they

12:56

say something like 24 25%. Then we ask

12:59

the developers after the study has been

13:01

completed how much they think they were

13:03

sped up in the past by AI being allowed

13:07

on the issues they completed as part of

13:09

this study and they say that it will

13:11

have sped them up by something like 20%.

13:13

And the punch line is that we find that

13:15

developers are slowed down by 19%. They

13:18

take 19% more time when AI is allowed

13:21

relative to when AI is not allowed.

13:24

You know, when I first saw the data

13:26

coming in, saw sort of early versions of

13:28

this plot, um, I thought presumably the

13:30

same thing that many of you might be

13:32

thinking right now, that we've messed

13:33

something up. Um, that that, you know,

13:35

something's gone wrong. There's some

13:36

there's some issue in in how we've set

13:38

up the experiments. How could it

13:39

possibly be the case? You know, at least

13:41

these um, uh, these developers have

13:44

access to the zero points because they

13:46

cannot use AI at at any time. Um, so we

13:50

poured over, you know, many, many, many,

13:53

many, many hours of screen recordings

13:56

from these developers working on issues

13:58

as part of the study. We look to dive

14:00

into um, a bunch of hypotheses that

14:02

might explain what's going on and try to

14:05

categorize, you know, the things that

14:06

that we think are going on versus not.

14:08

Um, many of this is is listed in the

14:10

paper. I I'll just quickly go through

14:11

some of the things that we think are

14:13

contributing.

14:14

First, overoptimism about AI usefulness.

14:18

that that seems like an obvious one. You

14:19

know, the developers even after the

14:21

study is completed, they think that um

14:23

uh that AI is going to be helpful to

14:25

their work. It's it makes sense they

14:26

might overuse AI um on that basis. Um

14:30

two more implicit repository context and

14:33

high developer familiarity. You know,

14:35

these developers are coming to these

14:36

problems already knowing the solution to

14:38

the problem. They don't they don't um

14:40

they're so expert in this work. you

14:42

know, I I I imagine them as as not

14:44

trying to spend a bunch of time thinking

14:46

through the solution that the the AI can

14:48

can work through. Instead, they're just

14:49

limited by how fast they can type. Um,

14:52

which which means that, you know, using

14:54

AI, instructing AIS to do it, um, comes

14:56

with some significant time cost versus

14:57

how they might otherwise have spent

14:58

their time.

15:00

I think many of us have the sense that

15:02

AIS might be less performant on on large

15:04

and complex repositories, which is a

15:06

different from this difference from this

15:07

benchmark style evidence or or from or

15:09

from some previous work. And then low AI

15:12

reliability. You know, um maybe the AIs

15:15

are very performant on these kinds of

15:17

tasks, but you know, they're only

15:18

performant um 50% of the time or 80% of

15:21

the time, 20% of the time. And so, at

15:23

the very least, you need to check their

15:25

work afterwards. And perhaps even you

15:27

need to spend time correcting their work

15:28

afterwards, which is which is something

15:30

we see quite a lot on these issues.

15:34

One thing from the factors with an

15:35

unclear effect that I that I'll mention

15:37

briefly I have to talk to people about

15:38

later is below average use of AI tools

15:40

which came up in the public discussion.

15:43

This this is in the unclear column

15:44

because it's sort of evidence evidence

15:46

for and against. Um that that's true for

15:48

for many of the things here. We don't

15:50

have anything so conclusive to say we're

15:52

still working on on this line of work.

15:56

Here are some here are some caveats. All

15:58

important. Um first you know obviously

16:00

we do not provide evidence for all

16:01

software developers or tasks. These are

16:04

extremely experienced developers working

16:06

on extremely complex longived open

16:09

source repositories. I in my own work

16:12

you know not um as expert in the

16:14

relevant sense as as these people are.

16:16

I'm working on much smaller

16:17

repositories. Um I I feel more

16:19

comfortable saying that even at this

16:20

time I was sped up by AI tools even if

16:22

even if the developers weren't. This

16:25

setting is weird. It's weird for the

16:27

same reasons that it's that it's

16:28

interesting this this unusual developer

16:29

population.

16:31

Second, the experiment is concentrated

16:33

in March 2025. As I mentioned, uh we

16:36

know that AI progress is rapid. Um

16:38

perhaps this this result will have

16:40

already changed by the by the time I'm

16:42

giving you this talk.

16:45

So there's a kind of puzzle suggested

16:47

right that the benchmark style evidence

16:49

is giving um a very impressive sense of

16:52

what benchmark of what AI capabilities

16:54

look like today whereas the more

16:56

economic style you know I include labor

16:58

market impacts um uh uh working here too

17:01

in addition to our in addition to our

17:03

field experiments look somewhat more

17:04

bearish or or unimpressive. You know why

17:06

why is the former not not translating to

17:08

the latter at least naively there seems

17:10

to be a clash. How might we go about

17:12

resolving this puzzle?

17:15

So one possibility is that in fact we we

17:17

messed something up. This is this is

17:18

still live and on the table. Uh you know

17:20

maybe the developers really are um uh

17:22

not very capable at using AI and if we

17:24

continue to run this experiment as as in

17:26

fact we are they'll you know learn more

17:28

familiarity with the tools and and so

17:30

get productivity benefits that they they

17:32

weren't getting at the time. I'm a

17:34

little skeptical of that story but but

17:35

but that's one possibility.

17:38

Another that economists like to bring up

17:39

is that we're not incentivizing these

17:41

developers to finish quickly. we're

17:43

paying them per the hour, um, which we

17:45

do for external validity reasons. Um,

17:48

you know, looking through their videos,

17:49

I I really, uh, do not think that

17:51

they're developing differently in

17:53

accordance with these incentives, but

17:54

but that certainly is one possibility

17:55

that's on the table.

17:58

You know, another um, more statistical

18:00

in nature possibility is, you know, this

18:02

is a small study. You shouldn't you

18:04

shouldn't over update so much from small

18:06

studies. We we are doing um, bigger

18:08

things that I'm excited to release at

18:10

some point. Okay, but let's let's assume

18:12

we haven't messed something up and this

18:14

is uh this this is a result um uh that

18:17

that we think that we think does hold

18:19

up. How could we resolve the puzzle?

18:22

[snorts and sighs] So, one possibility,

18:23

you know, as I as I alluded to briefly

18:25

is that reliability needs to be very

18:27

high to save time. That you need to be

18:30

getting um the the answers these

18:32

problems that developers are putting in

18:33

correct. you know, something like 95 99%

18:36

of the time in order for developers to

18:38

tab tab tab through and you know, not

18:39

not um not spend lots of time verifying

18:42

the AI's work, which which of course um

18:44

is pretty costly from a time

18:45

perspective.

18:47

Another possibility is bbenchlike or

18:50

algorithmic costless scoring at the

18:52

margin versus mergeability like scoring.

18:56

Sweet scores are not trying to account

18:58

for you know whether the code is spilled

19:00

honable by by other people in future or

19:03

whether it's matching quality

19:04

considerations that aren't um considered

19:06

by the unit tests. You know perhaps AIS

19:08

really are performance according to

19:10

SWEBenchl like scoring but not

19:11

performance according to this kind of

19:12

more holistic um uh holistic scoring

19:15

that we might care about low versus high

19:18

context baseliners. I I I mentioned I

19:20

mentioned previously these are just much

19:22

more skilled humans, you know, relative

19:23

to those humans. Perhaps the AIs are

19:25

less capable. Task distribution, maybe

19:27

these are just different kinds of tasks,

19:29

you know, in particular less less messy

19:31

than the than the benchmark style task.

19:32

Maybe that's explaining what's going on

19:34

here. [snorts] Suboptimal capability

19:36

elicitation. A huge amount of work has

19:38

gone in at meter to making the agents as

19:41

performant as possible given the

19:42

underlying models on on our kinds of

19:44

tasks. And um you know that involves

19:46

churning through a load of AI tokens.

19:49

Perhaps that's that's less the case for

19:51

cursor in particular at the time when we

19:53

completed the study.

19:55

And then interdependence across tasks.

19:57

Maybe it's the case that um you know if

19:59

humans can complete task A and task B.

20:03

AIS can only complete task A but not

20:04

task B and of course can do task A

20:06

faster. then it still makes sense to for

20:09

humans to do task A and task B, not

20:11

delegate task A because you know they

20:14

they need to know the outputs. They need

20:15

to know how how task A was completed in

20:17

order to reliably complete task B. I

20:19

think that's that's part of what's going

20:20

on. You need to maintain context as

20:22

you're working through these subtasks.

20:25

Um lastly I will say that we are hiring

20:28

not just for this kind of work that

20:30

you've um that you've seen being

20:32

extended you know ever longer tasks ever

20:34

more ambitious um RCTs um even more

20:36

sources of evidence from which we can

20:38

triangulate the truth about AI

20:40

capabilities but also for for much more

20:41

besides you can you can find this at

20:44

meter.org/careers org/careers. In

20:46

particular, I'm excited about research

20:47

engineers, research scientists who might

20:49

be um hiding in the current audience.

20:51

We're excited not just, you know, for

20:53

for research types with academic

20:54

experience, but very much for scrappy

20:56

startup people as well. And we're also

20:59

hiring for a director of operations.

21:02

And with that, thank you very much for

21:03

listening.

21:21

Heat

Interactive Summary

Ask follow-up questions or revisit key timestamps.

Joel Becker from MET (Model Evaluation and Threat Research) presents findings on AI capabilities from two seemingly conflicting sources of evidence: benchmark-style and economic-style studies. Benchmark research, particularly using the "time horizon" metric, indicates rapid and consistent exponential progress in AI models' ability to complete tasks that would take humans longer. However, a field experiment measuring developer productivity showed a surprising result: highly experienced developers working on complex, real-world open-source projects were actually slowed down by 19% when using AI tools compared to working without them. The talk explores various hypotheses for this discrepancy, including developers' overoptimism, high familiarity with tasks, AI reliability issues, the

Recently Distilled

Videos recently processed by our community

The Car Collector Who A Ferrari Worth $38 Million; Car of the Century Part 2 | Bloomberg Hot...

Feb 21, 2026

by Bloomberg Podcasts

GEOGRAFIA 4|Dział IV.Problemy polityczne współ.świata Roz.5:Cywilizacja zachodnia-cywilizacja islamu

Feb 21, 2026

by Czytanie Na Ekranie

HISTORIA 4 | Dział VI. Roz.36: Jesień Ludów 1989 r. i jej konsekwencje #historia

Feb 21, 2026

by Czytanie Na Ekranie

GEOGRAFIA 4|Dział IV.Problemy polityczne współczesnego świata - Podsumowanie rozdziału #geografia

Feb 21, 2026

by Czytanie Na Ekranie

GEOGRAFIA 4|Dział V.Problemy społeczne współczesnego świata Roz.1: Problemy demograficzne na świecie

Feb 21, 2026

by Czytanie Na Ekranie

HISTORIA 4 | Dział VI.Roz.37:Rozpad Związku Sowieckiego, Czechosłowacji i Jugosławii #historia

Feb 21, 2026

by Czytanie Na Ekranie

Introducing the New Handbook of Surveys on Households and Individuals Foundations and Emerging App..

Feb 21, 2026

by UNStats