Intelligence Is Legos [Dr. Jeff Beck]
1411 segments
Geometric deep learning is a big part of
like is a big part of the stack if for
no other reason than when we talk about
like modeling the physical world that
means like incorporating the symmetries
that exist in the physical world. It's
like we're highly motivated to employ a
lot of those methods and techniques.
>> But is the world written in code or do
you mean exploiting the regularities in
the code that seem to have some
>> exploiting the regularities? No, it's
like look we it things are it is the
world is translation invariant. The
world is like rotation. Well, not really
because there's gravity, but like in
principle, you know, there is a
principal axis, but it's certainly
rotationally invariant in the xy plane.
>> Yeah.
>> Um, and if you if you want to have a
good model of the world as it actually
is, it should incorporate those
features. Of course, you can discover
it, you know, in a brute forcy way, but
the mathematician in me really wants to
build build the symmetries in. And
fortunately, we've got a lot of great
tools that were developed over the last
several years that can do that. What's
your view on agency?
>> If I'm being, you know, like an FEP
purist, I have to sort of say like, oh,
well, there's no difference between, you
know, an agent and an object in in a
very real way, or at least there's
nothing structurally distinct between
what how we model an agent and how we
model an object. Um, it's really just a
question of of degrees, right? An agent
is is a really sophisticated object,
right? It has internal states that
represent things over very long time
scales. um you know uh it has uh
sophisticated policies that are context
dependent which is basically saying
really long time scales again um and
things like that.
>> Yeah. You know um there's the kind of
the philosophical highbrow notion of
agency that we introduce notions of um
intentionality and self-causation and
things like that. I mean the the really
nononsense version of an agency is it
it's just it's just a thing which acts
and performs some kind of computation
and I guess you could almost model
anything as an agent you know.
>> Yeah. Well so if if if your definition
of an agent is something that executes a
policy then anything is an agent right a
rock is an agent right every everything
has you know it's an input a policy is
an input output relationship. When many
people talk about agents, they they're
adding a few they're adding um a few
additional elements that I think have a
lot to do with how the policy is
computed, right? So, for example, when
we think of how the difference between
like us and like like really like
amiebas, we we often cite things like
planning, counterfactual reasoning,
goaloriented behavior, right? We're
specifying things that that um have that
that are specific mean that that are all
related to how it is we compute our
policies, right? They're latent
variables that represent policies um
that are uh you know that are compatible
with like well reinforcement learning,
right? And um and that's the defining
characteristic of an agent. But you
could very easily just sort of say like
from an outside perspective if you can't
look at how someone or something is
doing the computations if the only thing
you observe is the policy
right does that mean that you can never
conclude that something's an agent and I
would say no right you'd still like to
be able to conclude that this is an
agent even though the only thing I ever
get to measure is its policy
>> but do you think we should have some
notion of the strength of an agent
>> the strength of an agent or how is this
like a measure of agency is that what
you either. Yeah. So, I mean, I think
you could use like notions of like
transfer entropy and things like that in
order to estimate like the timetable
over which something is incorporating
information or the degree to which it's
taken into it. It it exhibits a context
dependent behavior and things like that
and that would be a pretty good measure.
Now, is it normative? No, it's not. It's
it's a but it is a measure and you could
use things like that. But at that point,
you're really just talking again about
policy sophistication,
right? Not does it have a reward
function? Like is it actually executing
planning?
>> Yeah. I mean certainly intuitively
agents to me seem to be kind of causally
disconnected
because they're planning into the
future. They are not impulse response
machines. They're not just, you know,
part of the mass of things going on
around them. They are just obviously
disconnected from the locality.
>> So here the trick is is that okay, so
I've got this agent and I know exactly
what it does, right? It takes into it
takes into account information. um it
rolls out future you know internally it
rolls out a whole bunch of like future
uh consequences of of various different
actions or plans that it could take it
selects the best one and then it
executes it right so all of those
variables all of those variables that
were that occurred inside right from the
outside perspective it just looked like
a function transformation right it it's
I don't unless I unless I'm somehow
going in and recording and somehow
demonstrating the fact that the manner
in which it is calculating its policy
you know, like involved doing those
rollouts,
right? I wouldn't be able to show that
it's actually doing those rollouts. I
would just be able to conclude it has a
really sophisticated policy. So, can you
conclude that something isn't is is so
so the question is how do you identify
something is actually doing planning?
And I think that's a really hard
question as opposed to having an
incredibly sophisticated policy. I think
my my intuition is if it feels to me
that a function a simple input output
mapping can't be an agent. And and in a
way this is related to what we were
talking about with grounding. You know,
it it seems that when things are
physically embedded in the world, then
they're more likely to be agents. This
functionalist idea that just a bit of
computer code running on a machine, it
kind of feels like that can't be an
agent.
>> It does. So suppose I coded it up so it
was doing all of that planning. It's
like gets its inputs, does some crazy
like massive Monte Carlo research, picks
the best policy possible, and then
executes it. Now, you don't observe any
of that, right? Because you know what's
going on. You could say, "Oh, well, it's
it's clearly like executing, you know,
this is it's doing planning and
counterfactual reasoning. It's going on
like look there it is because you coded
it, so you know it's doing it." But if
you're looking at it from the outside,
right, it, you know, if you don't know
what's happening inside, it's going, you
know, all you have access to is, oh,
here's the action that it that it that
it did given this long series of inputs.
And so it's it's really hard to identify
what you know something as an agent per
se from the outside. You kind of have to
know what's going on inside. This by the
way is why I don't think that like you
know you know these sort of prediction
based approaches to like AI
um are necess you know you could sort of
say well it's not really doing anything
even remotely agentic unless it's
executing it's doing planning and
counterfactual reason. So like your
chess program is is like oh clearly it's
doing some planning and counterfactual
reasoning because you know it's doing it
but um but it but you could like write I
could describe the exact same set of
behaviors just with a policy function. I
I think the counterfactual thing is is
an important feature here because we
could take something which was conscious
or something which had agency and we
could just take a trace of the actual
path which was found and now we've just
got this a reductio at absurd but you
know now we've just got a computational
trace and that thing clearly has now
lost whatever agency or consciousness it
had. So there's something about
considering all of the possibilities.
>> Yeah. Yeah. I think so in my mind that
is the fundamental feature of of of of
an agent. like if you can show that it's
engaged in planning counterfactual
reasoning and and then it's definitely
an agent. My my argument is just simply
that that's hard to do unless you crack
it open and see what's going on inside.
Now, you could take a a pragmatic view
and say, well, if the simplest
computational model of the behavior,
model it as if it was doing planning and
counterfactual reasoning, then you can
draw an implicit conclusion that oh yes,
well, I may as well say it's an agent.
And that's kind of the approach that
I've taken. So like one of the things
that comes out of the physics discovery
algorithm is that you apply it to agents
and what do you get? Well, you get a
model. Now bear in mind I called them
all objects before and I didn't change
anything to make it special to an actual
agent, right? But what I do have the
ability to do because of the model is I
can look at the internal states
associated with that object that I want
to call an agent and look at how
sophisticated it is.
>> Right? And that degree of sophistication
is what allows me to say, "Oh, well, I'm
going to go ahead and say that like and
I like the whole idea. It's a great
idea. Like it's have a metric, right?
And I'm sure it would be something that
would effectively be like transfer
entropy or something like that." But we
have this metric on like, well, how
sophisticated were the internal states
that were necessary in order to generate
this output. And if it's above some
threshold, we'll call it an agent. I
don't like thresholds, but you know, we
just sort of say a degree of agency, a
degree of sophistication. And coming
back to Dennit's intentional stance. So
this is that you know there is um a
level of representation which serves as
a useful explanation even though it's
not actually you know the the the
microscopic causal graph. And maybe we
can agree that no agent can possibly be
the cause of its own actions. But when
there is a degree of planning
sophistication
for you know macroscopically it's as if
it's the cause of its own actions.
>> Yes. And that's why this as if phrase
comes up a lot. Right. I mean this it's
it's important to remember that like no
matter how clever your model is and no
matter how clever your approach is and
how clever the words are that you use to
describe it um a lot of this stuff is is
is as if right this is this is the best
model right it's not the it's not this
is why like I I I repeat this over and
over again
grind it into the students right is that
that you know science is about like
prediction and data compression and like
nothing else and the same thing is going
on here right you you'll never, you
know, just looking at behavior, you'll
never know for sure in any meaningful
way like whether or not it's it's just
doing a function transformation or
whether it's engaged in planning and
counterfactual reasoning. But if your
best model of it, if you sort of say,
well, I tried to model as a function
transformation, but god damn it, it had
a lot of parameters, right? But then I
tried to model it as something that was
just doing Monte Carlo research on the
inside and giving the answer and that
had like, you know, 40 parameters and
it's like, well, that's the model I'm
going to go with and now I'm going to
call it an agent. If we had a physical
agent in the real world that was doing
all of this planning and so on, would
that have some kind of primacy to a
computer simulation of agents that were
doing all of this planning?
>> Oh, is this is this like uh if I
uploaded my brain onto a computer and
didn't connect it to the world, would it
still be thinking even though it's like
doing all of those things? Is that the
idea here or am I like
>> that works? So, yeah, let's say
highfidelity computer simulation of
Jeff. Would would would Jeff be an
agent?
>> No. Oh, wasn't expecting you to say that
>> because I'm the agent and if you uh
uploaded No, I don't know. Um, so if you
is do a highfidelity computer simulation
and you put it in my body, then I think
I would have to say it's an agent.
>> Yeah.
>> Right. If it's doing exactly the same, I
mean, this is like the standard. It's
doing exactly the same calculations from
from a purely like phenomenological
perspective, it's like it's the same.
It's indistinguishable.
>> Okay. So agents need to be physical.
>> So I do believe that an agent needs to
be physical. That absolutely. I don't
believe, you know, I I believe you can
have a model of agency and not have an
agent, right? I, you know, you can put
that model in a computer and run it and
make predictions as to what an agent
would do. You and it might even be 100%
correct, but I still wouldn't call it an
agent. But again, this is like getting
into philosophy and like philosophy
frustrates the basian because philosophy
is not probabilistic,
right? [laughter]
philosophy is really about drawing clear
lines and distinctions and in my world
those don't really exist right there's
everything has an error bar you know all
of there isn't a clear delineation
between you know uh you know an object
and an agent it's really you know in
from this modeling perspective it's
really just a question of degrees and
philosophy is terrible at handling
questions of degree
>> my friend Keith he he's a big fan of um
computability
and and he thinks that an agent is
basically you know like a type of
computation and it has access to ambient
state and it can take action and there's
this kind of like cybernetic loop and
for him the strength of the agency in
the system is the compute type that the
thing is doing right so if it's if it's
a finite state automter then it's a weak
agent if it's a touring machine it's a
strong agent
>> yeah it's the degree of sophistication
of the compute right
>> pretty much does That ring true to you?
>> I mean that if if you were going to make
if you forced me like, you know, at the
point of a gun to put a measure on
agency, it'd probably look a lot like
that.
>> Yes. Jeeoff, let's talk about energy
based models.
>> Sure.
>> So, um, Yan Lun, he had a monograph out,
I think, in 2006 talking about this.
Been talking about this for a long time.
>> Oh, yeah. When you fit your neural
network to data, you know, via gradient
descent, right? then you have written an
energy function in weight space and you
are follow and you're following it to
its energetic minimum. You know the the
advantage of using an energy based uh
taking an energy based approach as
opposed to taking say a straight up like
function approximation approach is that
an energy based model comes with
something that's kind of like an
inductive prior right it it basically
you know an energy based model you know
if you're just doing function
approximation you're basically saying
there's any mapping from x to y x is my
inputs y any mapping is out there I just
want to figure out what it is right now
in an you know in an energy based model
right you're you're you're you're
effectively placing constraint s on what
that input output relationship can be. I
like thinking about the distinction
between an energybased model and a and a
traditional sort of feed forward neural
network um uh has to do with where your
cost function is applied. Right? So in a
in a traditional neural network, you
take in your inputs, you got your
outputs, and the cost function is just a
function of the inputs and the outputs.
And the only thing that you're
optimizing is the weights. In [snorts]
an energy based model, there's another
thing that that your cost function
operates on, and that's something one of
the internal states of your model. And
as a result like in order to figure out
what the best you know the the the best
approach is right you actually have to
do two minimizations. One that that
finds the energetic minimum associated
with the the the part of the cost
function that operates on the internal
states like the hidden nodes of your
network right and then one that is the
prediction that is your like effective
prediction error. Um this is this is
very much consistent with the approach
that a basian would take right you have
a you have a a prior probability
distribution which gives you an energy
function over every single latent
variable in your model and you are
optimizing with respect to all of them.
So so you take a probabistic approach
good examples of this are like a
variational autoenccoder. A variational
autoenccoder I think is a is the best
example of the most commonly used
energybased model out there. Why?
because you have an encoder network, you
have a decoder network, right? And your
cost function is based on the difference
between inputs and outputs, right? So
that's just like a that's fine. That's
still a regular, but it also is how how
Gaussian in a well, it depends on what
flavor of V8, but you also have some uh
some some part of your cost function um
is a function of the actual rep internal
representation, right? In a traditional
VAE, it's it's how Gaussian is. You want
that internal representation to be as
Gaussian as possible. Um if it's a VQ
VAE then it's like mixture of Gaussians
but it's still like a cost function that
is applied on the internal states as
well as on the inputs and outputs.
>> Very cool. So a VAE is is a fairly
cononical example of an energy based
model and what you were saying about the
I mean you know the whole DL world is
obsessed with test time inference at the
moment and in a way that that is a step
towards what you're talking about. So
yeah, you're treating a certain Yeah,
you're treating some of the weights of
your model, right? I mean, well, yeah,
you're treating some of the weights of
your model as if they're latent
variables, right? Because when you when
you show a new input, right, you're
allowed to change some of the weights
without looking at the output, right?
And so what are you doing? Well, you're
treating the weights as latent. Now, I
think that like which makes it a great
trick in my opinion. It's like, oh,
great. Like, yeah, they're they're
they're they're moving in the direction
of energy based models. I love it. The
only thing I don't like about test time
training is the vast majority of the
training that is done. So in a
traditional energy based model, you
always find the minimum with respect to
the latent variables, right? These extra
weights that you know which in this case
which in the case of test time training
is the you know the subset of weights
that you're allowed to to change during
you know during test time.
>> Um when you do the training for a
traditional energy based model, you're
allowed to make those changes right
throughout the entire course of
training.
The way that we're often doing test time
training these days is we just do
regular old neural network learning like
we don't do and and then and then and
then finally when it comes to when we
get to the deployment phase then we
suddenly turn on right this these
additional latents which are basically
some of the weights of the network and
we do additional an additional bit of
learning at that point. This seems
monument. Now again not an expert here
right but this seems unwise to me and
the reason it seems unwise is because
you didn't train the original network
with that on right you trained it as in
a completely supervised way
>> yes
>> now I'm sure that people have are aware
of this and it's been addressed in the
literature but I'm not personally aware
of that and I don't think that's how
it's used in practice super
>> we should also introduce this term
transduction so my definition of
transduction is that you're actually
doing search or optimization as a
function of the test samples like I
interviewed Clement Bonnet he had a VAE
on on arc you know searching latent
spaces and he actually um searched
through the decoder as a function of the
test sample. Yeah
>> and because these models they are
maximum likelihood estimators right
which means they're always giving you a
kind of smoothed out average and there's
so much information in the test sample.
Let's just riff on the relationship
between energy based models and and
basian inference. So of course they have
this advantage that you don't need to do
this very expensive intractable
normalization.
>> Yes.
>> Yes. Tell me about that.
>> My take on it is is that an energy based
model and a basian model have a lot in
common right in many ways like energy I
mean well literally in physics right
energy is like log probab energy is log
probability. Now of course there's the
normalization you know factor that you
don't need to worry about if you're just
doing if you're just minimizing energy.
And so the difference between uh you
know like- which is sort of like you
know in a basian framework that's like
saying well you know I'm not actually
going to treat some of these latent
variables in a probabilistic way. I'm
just going to do maximum or map
estimation on some of my variables and
just be okay with that. And that's one
way to interpret the relationship
between an energybased model and a
properly basian model. There's there's a
happy medium here though, right? And the
happy medium is you can still treat it
as if it's you know you know you don't
have to just minimize the energy
function but you can calculate the
curvature down there too do a lelass
approximation and call yourself a basian
again right yes there is more
computation involved but we've got a lot
of great tricks for making that totally
tractable.
>> What's the relationship between the free
energy in the free energy principle and
the energy and energy based models? uh
regularization term I think is the short
answer right um no so so uh the
difference between an if you're being
very very very pedantic the difference
between an energy based you know
minimizing energy and minimizing free
energy is that free energy has this
additional entropy penalty term now if
you're just doing maximum likelihood
estimation if you're minimizing your
energy function with respect to some
partic well just we'll pretend we're
only at one variable um and I'm just
going to like get a point estimate and
call it a day do like you know some kind
of map estimation to get to get that
that one thing there's not that big of a
difference right because you're you're
not there is no probability distribution
over the latent that allows you to
compute that regularization term but
that's the only difference it's it's are
you regularizing or not is I think the
easiest way to think about it
>> so lun is a big advocate of jer so these
joint embedding prediction architectures
using this non-contrastive learning
where essentially the the learning
objective is is comparing the um the the
the latence of observed and unobserved
parts of the space. This is an
architectural design.
>> Well, what is Okay, so what does Jea
stand for? It's is it it's joint
embedding and prediction architecture.
There we go.
>> So, what's the joint embedding bit
about? Well, the joint embedding bit
about is is, you know, is well, I'm
going to take my inputs, I'm going to
take my outputs, and I'm going to embed
them in some space, right? And then I'm
going to learn a prediction between the
two embeddings. And that's a great idea.
It's a great idea because it has some of
the flavor of what we would like to get
out of our models. Like we're not
interested in predicting every in many
situations, I should be very particular
about this. In many situations, we're
not interested in predicting every
single pixel on the image. We want to
get, you know, maybe something that's a
little more gestalt, a little more high
level, a little more conceptual
understanding of what's going on. And so
emphasizing the goal of predicting every
single pixel, which is what's typically
done in generative modeling right now,
you know, might lose some of the power,
the abstractive power of some of the
networks. And so like let's do so so the
whole point of Japa as I understand it.
I'm sure there are other points um is
that uh is that you're going to take
you're going to you're going to compress
your inputs and compress your outputs
and then do all the learning in this
compressed space. Love it. Right.
Science is about prediction and data
compression. Let's make that compression
explicit on the front end and the back
end.
>> The downside of this approach is that is
it is it it doesn't work out of the box,
right? Because it's very easy to find a
compression
or an embedding of the inputs and an
embedding of the outputs for which
prediction is perfect which is to
basically make both of them zero and so
you have to do some other things other
tricks need to be employed in order to
make it work.
>> Yes. Yes. I remember Lum was talking
about this. So there was there's the the
traditional contrast if method which is
from it's kind of Hinton's idea
apparently of like the negative sampling
and whatnot and and that's very
expensive because you actually have to
do lots and lots of sampling and this
non non-contrastive thing.
>> Yeah. This is this by the way is what he
should have won the Nobel Prize for
>> right [laughter]
>> in my opinion. Yes. Because the the
whole point of of of of of the wake
sleep algorithm and contrasted
divergence was that oh it's actually
biologically plausible right it was a it
was it was an endun around the need to
do back prop and that's what made it so
clever and interesting in my opinion.
>> Lun is a big fan of this non-contrastive
thing where you work in the the latent
space. There are many different
algorithms that do this. We we had a
whole load of shows all about
non-contrastive learning. There's things
like VC Craig and BOL and Barlow twins
and there's there's an entire thread of
research all around that and in many
different ways what they're trying to do
is avoid this motor collapse problem
that you're talking about and they use
different forms of regularization.
There's an old school way of
accomplishing the same thing and that is
that is to to um do all of your is it's
called pre-processing right and this is
this is something that a lot of people
do. take your data and in fact we do
this all the time with with with like
vision language models right so we want
to do we want to use an LLM and we want
to predict images so what do we do well
the first thing we have to do is
tokenize the image
>> right and so what do we do we run a VA
we do the pre-processing and we do it by
is the pre-processing step is completely
independent
right from the actual algorithm that's
going to be the be be tasked with
solving the problem of interest
um And you know
that's not something that
we necessarily have to stick with,
right? It would be very nice if there
was a way of if if there was a way of
like again well jointly we're getting
right back to Jeep again. What we'd like
to do is we'd like to choose our
pre-processing algorithm in a manner
that that you know you know not a priori
not do it first. We like to choose the
pre-processor that works the best in in
this space.
>> Y
>> and I think that that's the ultimate
motivation for a lot of this work is
that there like what's the right
embedding. One of my favorite tricks
like of course I you know I pre-process
with VAS all the time. In fact, it's
when you know the second every time
someone hands me a new neural data set,
the first thing I do and you can I'm I'm
not ashamed to admit I run PCA on it and
pass it through a VAE and then sort of
take a look, right? It's the first thing
you do with your data because it gives
you a good idea of what the signal to
noise ratio is in the data set itself.
>> Yes.
>> And then I Yeah. And then what do I do?
I subsequently do most of my analysis
right in that discovered embedding
space. Um, and there's I I I I don't see
a huge problem with that from a purely
pragmatic perspective, but it it's
certainly cleaner, right, to to have a
single algorithm and approach and not
just be stringing these sort of things
together in an ad hoc way. There's, you
know, when when doing PCA, PCA is a
really great example of this. There's a
failure mode for principal component
analysis, um, which is actually really
common in neural data because principal
component analysis basically goes, well,
where's the most variability? Okay, I'm
worry about that. And then all the stuff
that's not varying very much, I'm just
going to throw it away, right? Just like
look, you know, dimensions in which
there's low variability are not
important. Well, it turns out that in
neural data, the dimensions in which
there's very little variability are some
of the most important dimensions. And so
pre-processing with PCA runs a risk of
throwing out the most valuable
information in your data set.
>> Yes. And so there's a lot of wisdom in
in in jointly right pre in in jointly
fitting your pre-processing model as
well as your inference and prediction
model. I mean on this subject of not
throwing things away um jeoper and
non-contrasted learning it's part of
this bigger field of self-supervised
learning and we want to learn
representations that maintain fidelity
and richness and lun's hypothesis is
that when you do something like
supervised learning with you know some
particular downstream task in mind um
the neural network gets wise and what it
does is it kind of discards all of the
the the longtail stuff that aren't
relevant for that particular task. So
when you train these models, what you're
trying to do is sort of maintain enough
ambiguity so that it it compresses the
information but it also maintains enough
fidelity to work broadly for different
things.
>> Yes. And that that and that is a lotable
goal, right? And and I certainly share
it. Right. The last thing you want to do
is I mean, you know, fortunately like
networks are so big, we don't really run
the risk of of like uh overfitting so as
much as we used to. Um, but the last
thing you want to do is throw is is is
train your network to toss information
that you might need down the road. Um,
that said, like the vast majority of
what you know the brain does just like
these neural networks is decide what
information is currently task
irrelevant. But that's all the more
reason to do things in a self-supervised
or unsupervised way, right? Because
you're basically not telling it this is
the important, you know, you're not
telling it like what's all task relevant
and task irrelevant. So um I interviewed
Shalet about the version two of the arc
challenge and one thing that struck me
is I think of intelligence as being
multi-dimensional. So version one got
saturated. The ark was actually really
amazing because it's the only
intelligence bench benchmark that has
survived for 5 years before being
defeated. You know since the advent of
these thinking models, it has been
defeated very quickly. But they're
working on version three and there'll be
version four, there'll be version five.
Will there always just be something left
over?
>> That sounds like another philosophical.
So yes is my answer. There will always
be there will always be something left
over in the sense that like you know you
know we we we have this has been the
trajectory things have been going for a
really long time, right? It's sort of
like we get algorithms that do amazing
new cool things and then someone comes
along and says, "Yeah, but it can't
build me. It can't pull a rabbit out of
a hat." Right? And then and then of
course what does someone do? they oh
they they figure out the new training
protocol slightly different architecture
or they just train it to pull rabbits
out of hats and then suddenly it can and
then someone proposes a new challenge
and a new challenge and a new challenge
and it's always this game of like
one-upsmanship.
So the question becomes, well, what's
the point at which there are no more new
challenges? And I'm not entirely certain
we're ever going to get there, right? Um
it may very well be the case that we
get, you know, these sort of algorithms
that are capable of replicating the
complete suite of human behaviors and
then someone will come up with some
criticism like, "Yeah, but it's not
really doing X. It's just faking it,
right? This is just the direction things
go because people really do think
they're important."
>> Yeah. Do do you think that the concept
of recursive self-improving intelligence
is a valid one? Yes, I do think that is
so so I think that one of the most
critical missing elements right now is
some form of continual learning, right?
You at the end of the day, you really
want an algorithm that that doesn't just
learn on the training that on the
training set and then just gets
deployed. You want something that that
that runs around in the world and comes
across things that it doesn't
understand, right? And then is able to
incorp to build, you know, append its
model in some sense, right? So this is
like the this you know and there are
some approaches to it's all based on
like basian nonparametrics and dish
process priors and stuff like that where
you you sort of see something that's
surprising or unique or different
something you didn't expect and it
causes you to say I need to turn
learning on because I got to figure this
out. That is an absolutely critical
element that we need to be developing.
We are developing that. And it turns out
that that's one of the nice things about
this sort of object- centered physics
discovery thing is because it's object-
centered. If it comes across a new
situation that it does not understand,
it is capable of instantiating a
completely brand new object just to
explain this new situation.
>> Continually learning agents can acquire
new knowledge autonomously and and the
whole you know the whole thing just
learns more knowledge. But intelligence
feels different. It it it feels like in
the system that we've been describing
the intelligence is the way we're
implementing the you know the basian
updates and and you know actually
building the algorithms. Could could the
systems on their own meta program
themselves and develop better algorithms
or something like that? That's a very
good question.
something that would be closer to true
artificial intelligence than what we
currently have would be capable of
building models on the fly to deal with
new situations to taking things that it
knows about right and combining them in
new and different ways. Um uh there are
approaches that have some of that aspect
to it. Like GFlow nets from like Benio
stuff is like is like a great example of
something that at least in principle is
a generative model of generative models,
right? It's sort of like oh like you
know I might actually need a new node
like it's time to create a new latent
variable cuz like like the current set's
just not cutting the mustard anymore.
Those are things that that that I think
are hallmarks of of true intelligence. I
don't want to ever make the statement as
soon as it's got that it's truly
intelligent. I will never ever ever say
that. Um but I do think that that is a a
critical component that that needs to be
present, right? Is the ability to
generate new models on the fly to deal
with novel situations and data. Um most
of that you know um you know as well as
the ability to um uh combine old models,
previous models in new and interesting
ways. This is actually how the brain
evolved, right? We started out with like
um you know really simple brains and
there were different regions and they
solved sort of different problems and
what eventually happened as we evolved
is is that these different regions of
the brain learned to communicate with
each other in new ways and through that
communication acquired new abilities,
right? And then eventually evolved into
in you know you know um new capabilities
and things like that, right? I I often
like to point out to the the I think
old-fashioned is like the the sense
that's not studied nearly enough. It's
an incredibly old part of the brain. Um
and arguably, right, it's the it's the
first part of the brain that evolved the
ability to do proper like associative
processing, right? Odor the odor unlike
visual space, right, where there's
translation symmetries and and all that
sort of stuff and things are smooth.
Alactory space that does not exist,
right? It's it's really really really
combinatorial and complicated. And the
part of the brain that evolved to solve
the alactory problem arguably is the
part that evolved into our frontal
cortex. Don't quote me on that. There's
a lot of disagreement there. That's just
my take. Um but it certainly has a lot
of the features that we associate with
associative cortex. Right? It is it wow
I just said got like six uses six three
different uses of the word associate in
that sentence. But but I think you see
what I mean right? It it um it was all
about like taking old capabilities,
right? Combining combining, you know,
simple models and modules to create
something that was more complex and then
over time, right? So, so that was what
made the brain work, right? It was all
about taking little things that worked
and combining them in new and different
ways in order to evolve, you know,
effectively an emergent, you know,
emergent properties, emergent, you know,
computational abilities and an emergent
understanding of the world in which we
live. And I do think that like what what
you know, if when we get to the point
where we start really saying, oh, this
is actually truly intelligent, it's
going to have that feature. It's going
to have the ability to have a it's going
to have a modular description of the
world and it's going to have the ability
to to combine those modules in a way
that creates a more sophisticated
understanding. It's like Legos, right? I
can, you know, the the Lego bricks all
connect in certain ways and I can build
like all sorts of new and amazing things
that were never built before, right? Out
of them. That's a capability that we
have and that's the essence of like
creativity. It's why I refer to systems
engineering as like the thing we really
want our our our AI models to be able to
do.
>> Collective intelligence is a bit
different. We we have this plasticity,
right? We can adapt our behavior day by
day. We might see some kind of
metalarning or some kind of change in
our organization dynamics. You know,
maybe some agents will specialize and it
might be an existence proof of this kind
of recursive, you know, super
intelligence that we're talking about.
>> Yeah, I do. I I I think that's
absolutely correct. Right. is that you
know so the specialization is great in
fact I would argue that specialization
is how we got all of this right and this
was I'm pointing at London in case you
there was some confusion there um right
it was it was really about you know the
interconnected highly specialized
intelligences that are people and their
ability to learn how to to to work
together that that that you know gave
rise to the technological revolution the
brain is the same way right it's in my
view it's highly specializ ized little
modules or agents that are capable of of
of of
um being repurposed, reused, um capable
of communicating with one another in
order to solve really complicated
problems. But there's always a benefit
to specialization. I don't believe in
like like AGI. AGI seems like a bit of a
a misnomer to me. What we really want is
not artificial general intelligent. We
want collect we want collective
specialized intelligences.
>> What about scientific discovery? Do you
think that we could, you know, what
would the world look like when we could
discover new drugs? We could discover
new knowledge in science.
>> You know, right now the way that we're
doing that is is um largely focused on
summarizing vast troves of data and
looking for correlations that are
present in it. Um I think the next major
milestone um in this trajectory is is
experimental design, right? Not just oh
well here's here's some correlations you
you may not have seen because they're
really small and this is what computers
are good at. They're really good at
identifying small but highly relevant
correlations. Um and uh the next step of
course is design is is constructing a
system that tests these hypotheses
explicitly right and generates the
experiments that will identify like that
will they'll fill in the gaps of our
knowledge and all of this I believe can
in fact be automated in a very sensible
way. I I you know I I don't see any like
major obstacles to automating empirical
inquiry other than we probably want to
place some safety constraints when we
start letting them work when we start
letting the AIs run the labs right
because you never know. So you always
have this AI was like well you know the
most effective experiment to determine
if this is correct is to set off a nuke
and that that would be bad.
>> Yes.
>> Right. So pure empirical inquiry right
does run risks like that but I think
that that's not not not the biggest
issue. I think what we need to do is we
just had need to have a nice concise
framework for saying like oh look you
know like I'll give you an example. So,
we had this we we we had this um problem
that popped up a while back. A gentleman
we were talking to is is um is you've
got these long, you know, you got these
robots and the robot sees something it's
never seen before. And in an I, you
know, so a robot is like running around.
It comes across like a beach ball. Never
seen a beach ball in its entire life.
And what you'd like is you'd like the
robot to know how to figure out that
it's a beach ball and to figure out what
its properties are. And if you tell the
robot like like if you see something new
just stop, right? You're kind of then
that's that's no good, right? What you
really want to do is you want to figure
out a relatively non-invasive procedure
for the robot to like poke do what a
child would do. What does a kid do when
they see a beach ball, right? They run
up and they poke it and they say, "Oh,
right. Yeah." And then it moved and and
it it actually learned it actually
experiments with its environment for the
purposes of identifying the properties
of the objects that exist in it. Um, now
I do think we probably want to test this
out virtually before it's deployed in
the real world because you never know.
It might very well be that the optimal
experime experiment is to run up and
kick it as hard as you possibly can. Um,
and we we certainly want to avoid that.
But like something along those lines,
something, you know, a robot that is
able to test the theories that it has um
about how things work in an online way
and learn from those results in an
online way is definitely part of the
goal. Looking forwards, what do you
think the future will look like when we
have more autonomous AIs among us? A lot
of people worry about infeeblement, loss
of control, you know, it making us dumb,
all of this kind of stuff.
>> I do I do worry about AI making us dumb,
right? I mean, offloading offloading
your thinking onto a machine, which is
something that that that that AI allows,
is is is a potentially a big problem. I
I don't really want to have a situation
where humans are reduced to like val
they're just re reduced to like value
function selectors. They're just
basically going, "Oh, no. I don't like
that outcome. Like do this instead." I
do want to see a future where where
where we have an AI that actually
improves our understanding of the world.
And simply automating everything runs
the risk that you specified, right? It
runs the risk of people becoming couch
potatoes that just watch TV and
occasionally say like, "Yeah, you know,
these chips are no good." Um uh that
seems like a bad outcome to me.
>> Um I worry less about that I think than
some because people are remarkably
adaptable,
>> right? I mean I you know you they have
all these arguments about like oh you
know this new technology comes along and
it's going to completely destroy this
way of life and you know and that's
going to be awful for people and it is
maybe in the short term. um you know I
think of like tractors right or just go
back how many hundred years do you have
to go back when like 99% of people were
involved in agriculture and now it's
like what two right I consider that a
solid improvement right because it
allowed the rest of us to it allowed us
to do a bunch of other things that we
find more satisfying that are more
interesting it allowed us to like you
know I I can read you know spend some
time reading a book don't have to labor
in the fields all day um that's the
future that I sort of see and that's the
future that I hope for is that is is one
in which you know all of these
artificial agents running around and
doing things autonomously
um are there to to free us up to pursue
more interesting more you know you know
to improve ourselves in in in in in more
interesting ways but at the end of the
day it's just another techn you know at
least initially it'll just be another
technology like the tractor um now 100
years from now who knows
>> what will the value of work be if the
AIs can do everything and there's
nothing left for us to do.
>> I don't think that it will ever be the
case that the AIs can do everything.
Like I said, the future I worry about is
one where like it's, you know, the the
sole role of people is like sitting
around like making sure the AIs aren't
aren't going rogue and and and things
like that. Um, which I don't consider a
good outcome. I would really like to see
human improvement. You know, I I I
envision a future of I don't know this
like cybernetic transhumanism if I'm
going to go sci-fi on this, right? where
where you know the technology and us
evolve together in a way that's
beneficial for both. That's the goal. Um
you know are there these dystopic
possibilities where like oh well what
are humans in a world where well what
are they what are what are humans in a
world where everything can be done by a
robot.
>> Yeah. You know, that's that's a good
question. And that's and at the end of
the day, right, they end up just
becoming like reward function selectors,
right? They end up just sort of saying,
"Oh, I don't like this and I do like
that." And they're basically, you know,
I mean, you end up with a this is
another nightmare scenario. I don't like
talking about these dystopian futures
because honestly I think people are too
clever and I think people are too
motivated and people are too interested
in how the world really works and people
are too interested in actually
understanding things that they will
never stop that they that AI will become
a partner not an adversary or a crutch
and that's that's that's what I think
will happen because that's but that
that's a statement more about my belief
about humans than it is about my belief
about the development of AI you know I
am a techno optimist if if you will, not
a not a pessimist. I I believe that we
will find a way to adapt to an
everchanging world as we have done for
millions of years, including one that
includes technology that alleviates most
of our labors.
>> On on that, there's an AI literacy thing
because AI has moved so quickly now that
certainly my parents don't understand
anything about it. But by the same
token, policy makers don't understand
anything about it. And there are people
saying AI is going to kill everyone and
there's people making negative
arguments. There's people making
positive arguments. So, there's a bit of
a fog of war now because there are so
many people saying different things
about AI. How should they make sense of
all of this?
>> We are now well outside my area of
expertise. So, I'm just going to say
that before I say anything else. Um, AI
is developing very quickly, but I am
much more concerned about what people
will do with the new technology than I
am with what the technology will do all
by itself. I don't have the this big
concern about I don't really believe
that like you know Skynet's going to
take over or the internet's going to
suddenly become conscious and kill us
all
>> right um in part because you know AI is
not that advanced but also because we
are telling a we you know we are still
in the position where we specify the
goals of the system and that will likely
continue for a very long time and it
will always be the case that these
systems you know will can be you
are are subject to review. We will
always keep an eye on them. They will
always at least initially be be released
in relatively restricted domains and
where we're where we're test where where
we're keeping a a close eye on what it
is that they are and are not doing. So I
don't worry too much about like the
going rogue. I worry a lot more about
somebody
building, you know, it's sort of like a
virus which we already have to deal
with. like somebody builds like some
insane virus and like takes down the
internet. I'm more worried about
malicious human actors than I am
malicious AI actors because at the end
of the day all of these algorithms they
simply do what they are told right we
train them we tell them here's your
objective function as long as we are
specifying the objective function and we
understand the objective function we're
probably going to be okay I think the
safest way to deal with AI concerns is
to tell people hey look this AI is just
doing what we told it to we we you know
we set it up to make really good
predictions and to achieve these
outcomes now is it dangerous to like
specify these outcomes without being
very very very careful. Yes, it is
right. That's this is the whole like hey
Skynet end world hunger and it kills all
humans. That that's a that is that that
is a real possibility. But whose fault
was that? The fault was the person who
like was very very naively specified
their goals. There are in fact
relatively straightforward ways to
specify the the reward function that
that don't run that risk nearly as
badly. And the best one is so are you
familiar with like maximum entropy
inverse reinforcement learning? I like
to call it active inference because it's
really similar. Um and so there what
you're doing is you're basically
observing someone's policy and then
you're trying to do a maximum entropy um
model. You're doing maximum model on the
reward function itself.
Um at the end of the day what ends up
happening when you do this is this is
why it's like basically just like active
inference. You get a reward fun. So you
have some you know organism or whatever
and you're trying to do this for it and
and it it's got some stationary
distribution over actions and outcomes
right it's inputs and outputs of a
stationary distribution that becomes
your reward function like not directly
there's some math involved but basically
your reward function is a function of
the steady state distributions over
actions and outcomes so we could do this
right we could take the current we could
take the current manner in which humans
are making decisions and we could write
down right what's the stationary what
what is the current estimate of the
stationary distribution of our actions
and outcomes. So this would include
things like everyone's getting you know
this number of people are going hungry
this you know and and you know all the
stats that describe like the inputs and
outputs to our policy make you know to
our policy decision um and then we could
just ask an AI
your reward function is the one that
results in the same outcome that we
currently have right on average and it
would execute it and it would and and to
the extent that it works right it it it
would it would ultimately result in a in
an AI algorithm that just sort of is
like mimicking human behavior, right? Or
it's at least achieving the same outcome
that we were achieving before. Now,
here's the safe way to like improve the
situation. You don't say end world
hunger, right? You perturb that
distribution
>> over outcomes, right? And just just over
outcomes a little bit
>> and then you evaluate the consequences,
right? It's it's all you're doing. You
make these little changes in the reward
in an empirically estimated reward
function, right? rather than just sort
of specifying one by hand because that's
the dangerous thing.
>> Jeeoff, thank you so much for joining us
today.
>> It's my pleasure.
>> Amazing.
Ask follow-up questions or revisit key timestamps.
The discussion centers on geometric deep learning, agency, and energy-based models. Geometric deep learning is highlighted for its ability to incorporate real-world symmetries like translation and rotation invariance into models. The concept of agency is explored, with a distinction made between a sophisticated object and a true agent, often involving characteristics like planning and counterfactual reasoning. Energy-based models are presented as a powerful tool, offering an inductive prior and constraining input-output relationships, with Variational Autoencoders (VAEs) serving as a prime example. The conversation also touches upon the relationship between energy-based models and Bayesian inference, the challenges of test-time training, and the potential for AI in scientific discovery and automation. The speakers express a techno-optimistic view, believing that AI will ultimately augment human capabilities rather than diminish them, emphasizing adaptability and the potential for human-AI collaboration. Concerns about AI safety are addressed, with a focus on the importance of carefully specifying objective functions and the potential for human actors to pose greater risks than AI itself. The discussion concludes with a reflection on the future of work and human value in an increasingly automated world, emphasizing human adaptability and the pursuit of deeper understanding.
Videos recently processed by our community