This Is How To Tell if Writing Was Made by AI | Odd Lots
1023 segments
Using your model, are you able to quantify like what
percentage of the internet at the moment is I slop?
It's about 40%. Oh,
how do you.
Based on what you just read.
How'd you get that number?
So a lot of the internet is just like SEO written articles and like much of that.
Yeah, it's articles written for for search, basically, so that,
your website comes up more often.
Search because it's targeting certain keywords.
And a lot of that industry has switched over to using AI because
then instead of having to pay writers,
you could churn out articles for pennies on the dollar.
But I think that kind of results in a lot of the internet being AI written.
It's a little bit.
It is also kind of platform dependent.
So this is just it's about 40% from like a internet page perspective.
About a year and a half ago, we looked at medium and found that over
50% of newly written medium articles were AI generated,
which was a crazy high number.
What about Reddit?
Reddit about it was 7% a year ago, I believe.
A little over 10% today.
Hello and welcome to another episode of the Odd Lots podcast.
I'm Joe Weisenthal and I'm Tracy Alloway.
So, Tracy, you know, you ever come across some writing you can't articulate
exactly why, but you're like, I'm pretty sure I wrote this.
Does this happen too much?
So full disclosure, I haven't really thought about it that much.
Yeah, because the thing is, if I, I probably should think about it more, but
there's a lot of bad writing out there, and I've become sort of inured to it.
And I also think that, I don't know,
trying to figure out whether or not something was generated by AI.
Nowadays, if you actually dedicate a lot of your own time to doing that,
that is a huge like mental burden.
Truly attempting especially you and I are in the journalism industry.
How many of the pitches
do you think that we get from PR right now are being generated by?
I imagine if you're reading each one of those
and trying to figure it out on a daily basis, you know what?
I suppose I think about it the most is, someone will respond to a tweet.
Yeah.
And I'll be like, well, if this is a real person,
then maybe this person deserves some engagement
and they ask a question or I want to respond.
But if this person isn't a bot, then obviously I don't.
And that's where I'm like, you know, I don't want to figure it out.
I would like to know the answer.
You know, I have a controversial view
about AI writing, by the way, which is that it's pretty good.
I mean, like by and large,
when I said this, I think maybe in a recent episode,
when you consider the fact that I don't know, the majority of the population,
like, doesn't know where to put a comma within the sentence, well,
this is my opinion pretty good.
I mean, yeah,
one thing I will say about AI is it never gets the placement of a comma wrong.
On some level, it's perfect.
Did you do that?
I think it was in the New York Times.
The the times. I kind of hated that.
Okay. Why? Well, because I'll tell you why.
First of all, it's only five examples.
There's not very many to it. Ask the reader, which do you prefer?
But I think they were different subjects as well.
Yeah.
And also I think most people probably treated
that as can you guess which ones are human?
Because everyone wants to say they prefer the human.
I didn't think it was like a great test nonetheless.
Well, it is. I mean, not only is it often
indistinguishable, not often is it often fine writing.
Sometimes I could come up with a really remarkable turn of phrase.
Yeah, but I still, by and large, don't like it.
You read like a thing, especially a long text
that say hi, and it's like, even if you can't articulate it,
it's like this feels I, it has a certain sickly ness, sweetness to it
that is often annoying.
So what I notice about it is it doesn't do style very well.
Right?
So if you ask it to write something in the style of a writer, if you choose
anything other than something really obvious, like Shakespeare,
yeah, it really it suffers.
But the text that it actually outputs is pretty clear.
Yeah. Right.
Like for basic understanding.
Totally.
It's probably better than a lot of what's on the internet.
You know, the, the real people who are going to have to worry about
this are like teachers.
Obviously university is and and a lawyer, student lawyers and maybe law.
It's fine.
But there are some times I was like, okay, did someone write this or not?
And there has to be.
It'd be nice if like, we could know the answer.
Well, the other thing that's starting to happen
is have you seen any books out there that actually come with a disclosure
or disclaimer that say, this book has been written only by humans?
No, I used it all.
I saw that for the first time
on a book that we actually read, for an all bots episode.
I don't think it's come out yet, but that kind of threw me.
Yeah.
No, it's a it's more and more anyway,
as we enter a world in which the vast majority,
if not already of words written are written by,
I was going to be interested in this question of whether we know.
Anyway, there's this company called Pan Gram Labs,
and they have a little thing and you can pay for it,
but also a free service where you can drop like a text in and and they'll say
the odds that is written by human or AI and I'm pretty impressed by it.
I like did some samples of my own writing and then I outputs it.
Got them all right.
But then I did some like further like I tried to stump it to see if like
so what I did was I took a piece of AI writing,
and then I had had it translated into Chinese.
Okay.
And then I had it translate that into Chinese.
So it's like, okay, imagine this is being written by a more formal register.
And then I had that translated into Hebrew,
and then I had that translated into English.
So the original thing is a series of I telephoned through various translations,
and then I put that output back into Tangram, and I got that right.
It said it was I.
So even after a series of sort of transformations
designed to obfuscate the original style of the piece,
to see if, you know, eventually it would emerge in something else.
So I was pretty impressed. It seems to work.
And, you know, I think that's interesting
for a couple of reasons, which is maybe there is something that you can just tell,
but two, it sort of worries me because, you know, there have been articles in
the US say like this is written by AI, and I think one of my big fears would
be that I write something.
I'd like to use the name Dash.
I have always been named Dash fan.
I love em dashes. That's how people talk.
I'm sorry.
And then what if it says you wrote this by AI and I'm like, I didn't.
And then here is this black box that is suddenly
I judge, jury and executioner for my career.
Potentially.
You wrote this the AI, the lab says so.
You are now done. Like that worries me.
So I think this raises a lot of very interesting questions
about this model detection thing.
And I want to learn more about how well, there's
also a lot of philosophical questions about just what we value in writing.
True as well, because no one's going to yell at you
for using spellcheck or correcting like that, right?
Like it's kind of crazy to think that reputational risk
is going to hinge on whether or not you might have used a platform, a chat
platform to, like, do some basic copy editing.
Totally.
Well, very happy to say we do, in fact have the perfect guest.
We're going to be speaking with Max Spero
He is the founder and CEO of Pangram Labs, and he can answer all of our questions.
So Max, thank you so much for coming on Odd Lots.
Thanks for having me.
How do you know what's right?
So you someone puts in a piece of text and we'll get into the method in a second.
But someone puts in a piece of text and it says human AI.
What makes you believe that?
You have a very good track record on this question.
So when we started Pan Graham,
we started by doing the thing we call a human baseline, which is how
well can we, like as a human predict whether something's AI or not?
That's the first step at like learning.
Is this problem tractable?
Is it how hard or easy is it?
And I found like, me personally, I was able to get about 90% accuracy.
And so we figured, and AI model should be able to do it much better than that.
So I have a bunch of methodology questions which we can get into.
But just before we get into any of that,
why is I saw that, in your opinion?
Why does it need to be tracked and identified?
I think the problem is it's just so easy to generate.
And so like there's it's
very difficult to know, like what is the like intent behind it.
Basically like right now I think we're actually pretty lucky living.
We live in a world where the signal to noise ratio on the internet
and in our information channels is pretty high.
We have pretty high signal to noise, but any bad actor can come in
and just flood our information channels with AI slop that looks legitimate.
It looks like somebody put actual effort and thought into it, but really,
it was just like a single prompt, which could have also been automated.
This is something that I think about a lot, which is that
there was a point in time and maybe still is the point in time
where if you read something that was grammatically correct
or the punctuation was strong or the spelling was strong,
there was reason to think that the person who wrote it was a person
of like a certain seriousness and a certain intelligence behind it.
And I think that the issue that you're identifying is that that link
is now being severed, so that we can't use these heuristics anymore,
such as the strict quality of the prose to know, in fact,
whether this was published by someone who was like a serious actor,
intelligent or not.
And now you have people inserting typos into their courses.
I know that to do that, they are just to establish.
Yeah. Boyd.
Sorry, just to go back to my original question.
So you mentioned, okay, you were able to get at 90%, right?
But now we've just been used a lot more, and you have people
paying for your software, presumably teachers and journalists, etc.
given all of that getting from 90% to 100,
I mean, if you could make one out of ten, it is clearly an unacceptable error rate.
Of course,
for a piece of commercial software that could call someone an AI creator.
So you have to do a lot better than 90% talk to us about like,
what you've seen so far in your data since releasing it as commercial software.
The makes you believe the software is doing
a correct job of allocating between the two categories.
So so we've built out, really comprehensive evals.
Okay.
And so our evaluations, there's two kinds of errors.
There's a false positive,
which is when something is written by human error and by human.
And then we say that it's written by an AI, okay.
And there's a false negative, which is if it was AI written and we don't catch it.
And so we track our numbers for both of these,
and for human writing, we're actually pretty fortunate.
We have like millions and millions of samples so we can get like, a very
we can get a false positive number
that we have a very high degree of confidence in.
And our number right now is about 1 in 10,000.
Okay.
So if we scan, 10,000 documents,
up on average one we'll come back is, I, when I was actually human.
And what about in the other direction.
False negative I would say around 99% accuracy.
So, so like around 1% false negative rate.
I think this depends a little bit more on like how adversarial
the prompting is, how much they're trying to like what I did.
Exactly right.
We send it through multiple filtration to obfuscate the original output.
That would be an example of adversarial prompting.
Exactly.
But in in like the general case where we're just looking at
straight outputs from AI, it's above 99%.
Okay, okay.
So what what is your model looking for exactly?
When it's evaluating a text.
Because as we mentioned in the intro, you know, syntax and grammar
tends to be pretty good on, AI generated copy.
The style is sometimes more of an identifier,
I would argue to your point, Joe, like sometimes it reads very saccharine
and kind of overly earnest in some ways.
So what?
What exactly are you focusing on here? What are the towels? Yeah.
So the style and, the word choices are definitely part of it.
But I think what a lot of people don't realize is they're actually making
a lot of decisions when they write a piece of text.
So every there's, you know, dozens or hundreds of ways to phrase
every single phrase.
And over the course of 50 or 100 or 200 words,
you're making thousands of decisions, actually.
And so what we're doing is we're learning the patterns
and how, like these frontier models make these decisions.
And if the vast majority of these decisions
line up with how the frontier models are doing it, then it's
vanishingly unlikely that this was written by a human.
You would have to just happen to make the same exact decisions that the Lem does
hundreds of times. Interesting.
Okay, but this is a really important point.
So everyone at this point has some feel for let go.
The auto. Right.
But my understanding is it's not like you don't go in and like hard code
if you see a bunch of dashes. This is the thing.
These decisions in many cases, I imagine neither you
nor the model itself can articulate in English what the decisions are.
All you know is that the decision pattern exists.
Is this correct?
This is correct. Okay. Can you explain?
So therefore, what does it mean that your model has learned
these decision patterns.
So so what we're doing on a lark
on the very broad scale is we're training a deep learning model.
So it's a pretty big black box.
But it has the base model of a language model.
And then instead of predicting the next token,
it's predicting whether the text is I or not.
Okay. And what it does.
Well, how we train it is we train on tens of millions of examples.
So it sees millions and millions of human examples.
And for each human example, we also show it an AI example.
So for example, let's say one of these is a five star
review, for Denny's, that's 78 words long.
Then we'll ask an AI to write a five star review about Denny's at 78 words long,
in the style of the first one.
And obviously these two will be different.
And so our model is able to learn through contrast.
What is the difference between and the important thing?
Sorry, just to be clear here, is that you and you and I might not be able
to articulate the difference.
There will be some difference in maybe the sentence length.
There will be some difference in word choice.
There'll be some difference in, punctuation syntax, whatever.
But you and I wouldn't obviously spot it.
However, after millions of examples of these side by side,
the model learns what the difference is. Exactly.
I think the best that a human can do is
look for some of these, like really obvious tells, like,
ChatGPT loves
the like it's not just X, it's Y framing.
It earlier models really liked some specific words
like tapestry and intricate and delve.
Yeah. Delve. Tapestry. Yeah.
But but yeah, I think the, by training program,
we're able to go much deeper than this and look deeper
than the high level science at the like, document level science.
So one thing that's kind of reminds me of and I'm thinking how to phrase this,
but it reminds me of you know, those exercises
people used to do where you would take a bunch of different faces
and meld them all together and come up with like one,
oh yeah, that was attractive.
So like,
to what extent is this basically
a distributional detector in the sense that you're looking for
like certain paths that you think I would choose.
Could you get a false positive just from someone who's choosing, like the
average of the average of the average in a way, to state a particular sentence?
Maybe I yeah.
I mean I think that's there's a reason
we have our false positive rate is one inch 10,000 and not zero.
It's because you know sometimes we look at the false positive
and it's like oh it reads exactly like an AI generated
review or essay, except that it was written in 2019.
So was probably a human who just happened to, find the exact,
like, mode collapsed type of way that, like.
Yeah, lungs. Right? Yeah.
I think it's a good way to think about, the distribution of writing or writing
as a distribution where, like, you know, there's a space of all human writing,
and then AI writing is really just like a small point within the space.
It's very no matter how much you prompt it, it doesn't go
that far from, where it was trained to be.
Yeah. Okay.
What's the black box?
So I built I built a little model myself.
I built this thing that detects you can upload text,
and it says whether it's more resembling of the written word or the, spoken word.
Oh, I saw that. Yeah, yeah.
And I used Bert, which is like, one of these things, open source
one from Google.
What is the core model that you trained on, or is it something
or do you build it yourself, like talk to us about that.
Our very first model was actually built on Bert.
Okay.
But future models, we needed to up our capacity hiring that.
So basically we were running into capacity limits with our model.
It was capping out at a certain false positive.
False positive, false negative rate.
It wasn't learning the deeper signals.
So we had to ten x and then 100 x the parameter count.
So that can learn like really deeply
what like how these frontier models.
Right.
Have you noticed any interesting differences between how the models.
Right.
Can you and actually is, is your model trained to identify
different models as well as whether or not this is just broadly AI generated.
So we don't specifically train it on different models.
We don't see like hey, this one is cloud three and this one is chat or GPT five.
But what we've done, we've done some interpretability work to look at,
basically the output embeddings of the model
and where we find that it actually learns, which, which model the text came from.
So you can see like little clusters like this is the cloud cluster
and like all the clouds. Yeah. Cluster around here.
And then these are like the, the deep sik and Kwan and then this is like ChatGPT
and they all kind of like cluster into different spaces and embedding space.
So clearly the model is able to learn what the differences
between these frontier models are.
Actually, since you mentioned Kwan I'm very interested.
Is there anything like distinct in terms of how Kwan generates text
versus platforms that have been developed in the US?
I think when is unique because it's trained
on a lot more Chinese and multilingual tokens and other models.
So, you know, I've heard from Chinese friends that it's it's much better,
like being conversationally fluent in Chinese.
Beyond that, I don't know that I can tell,
it would be hard for me to look at a text and say, like, I know that's Kwan,
but I think somebody who is more familiar with it might be able to.
Let's talk about the, sort of some of the philosophical
or societal implications of this work.
Have you had anyone whose text has been, judged to be
AI written by pen Graham and they're like, I swear to God, this isn't great.
And they, like, really insist.
And what do you think about this?
Or what do you do or talk to us about that?
I've had a couple times this happened.
I think one time, like there have been times
where I genuinely believe that, you know, this is just a false positive.
We scan, we've scanned hundreds of millions of documents.
So like at a at a certain scale like this will happen.
But I also get people who all the time they're just like AI detectors don't work.
You know, they're they're like, it's like a total fraud.
And and then whatever they're putting out on LinkedIn, it's just 100% AI generated
and they're just not there. They're getting called out.
And then you look back like farther into their past in their history,
like everything they're putting out is AI generated until
about like 2020 theory like like for everyone.
If you look historically,
there's a lot of like, sloppy accounts that are putting out total slop.
And and you can tell either they like, weren't posting as much before.
And if you scan back in time, then you see that they were writing
human text at some point.
So there's a number of accounts out there that basically right around
the beginning of 2023, where if you scan the entire corpus of their work,
it very clearly shows a switch right around early 2023.
Yeah, it really like depends on the account.
I think one thing we saw that was interesting was there is,
a writer for The Guardian that was covering the Winter Olympics
and somebody was like, hey, your this article is like total AI slop.
Ran it through playing grandma was AI.
The Guardian was like, no, of course our writers don't use AI.
And then we so we scanned this.
Single writers like history.
And we found that they really did start picking up.
I like mid to late 2024, and we're using it more and more in their articles.
I mean, just to play devil's advocate for a second does
does intent matter when it comes to identifying
AI slop in the sense that okay, I get you can have a bad actor who's
maybe trying to influence how people feel about
a particular topic, and maybe they've created a bunch of bots on Twitter,
and they're using AI to just flood the zone
with a bunch of AI slop supporting their particular viewpoints.
On the other hand, if you're a journalist and your business is to write,
you know, like basic, understandable copy about a news topic,
just to be clear, I'm not advocating this at all.
But that intent is very different to I'm going to try
to influence something by just, you know, sheer volume.
Yeah.
I mean, definitely these are like to one is a lot more severe than the other.
But I think at the same time, if you're a journalist
and you're using AI to basically, shirk your work and, like,
not do your work, I think that's also a problem.
And I think it's a reputational risk, to the outlet, because people can tell
and people are going to call you out, and you know that there's a lot of people
who don't want to read AI slop, kind of regardless of where it's from.
Yeah. This is, definitely true.
Has it changed?
Like, are you ever going to run out of human material to change on?
Right?
Like, you could be pretty confident that if you find some piece of text
that was published on the internet prior to 2023,
but certainly prior to, like 2019 or something like that,
you can be extremely sure that this was human generated.
Do you worry that in the future that like, it's going to be harder
to even establish the provenance of your training data?
Yeah, it's definitely a concern for us.
Talk to us about how to think about that.
So we have a near infinite data reservoir of pre 2023 data.
There's just like more than enough for us to train on for a long, long time.
But part of the problem is we also want to train on modern text.
We want to
there's all this talk about like
if somebody is writing about LMS or about AI,
we don't want to incorrectly flag that as AI
because our training data has no sense of this topic.
So I think we're looking at different ways to do this, but
most of them are just like figuring out like, who is a trusted actor?
Who who do we know is putting out human written content?
And we could use our model for that, like to some degree.
So we have known actors, we know they're putting out
human written content, and then we could use their data as well.
Slightly random question, but using your model, are you able to quantify
like what percentage of the internet at the moment is AI slop?
It's about 40%.
How would you, based on what you just read, how do you get that number?
So a lot of the internet is just like SEO written articles.
And like much of that.
Yeah, it's articles written for for search, basically, so that,
your website comes up more often, search because it's targeting certain keywords.
And a lot of that industry has switched over to using AI because
then instead of having to pay writers,
you could try out articles for pennies on the dollar.
But I think, yeah,
that kind of results in a lot of the internet being AI written.
It's also kind of platform dependent.
It's about 40% from like a, like internet page.
Perspective.
About a year and a half ago, we looked at medium and found that over 50% of newly
written medium articles were generated, which is a crazy high number.
What about Reddit?
Reddit?
About it was 7% a year ago, I believe.
A little over 10% today.
Actually, this reminds me so I, I, I'm on Reddit
a lot and I really enjoy it nowadays as a platform,
but I do worry about how much of it is being generated by AI.
And the thing I don't necessarily understand is
what are the economic incentives to actually write
a bunch of AI generated posts on Reddit and get upvoted?
Like, why does that system or motivation even exist?
So there are startups.
I'm not going to name names because I don't want to promote them,
but they will sell a promise to companies
that we're going to get you organic mentions on Reddit.
We're going to run our AI bots that, seem organic.
And they're just going to, you know, naturally recommend your product or,
you know, just mention your product, in the comments or in a post.
And so I've seen evidence of this.
We we can find these like they're basically like bot farms
that are mostly engaging,
seemingly organically, just like doing a short reply
and then sometimes they're doing this brand mention.
And so that's why these posts are very valuable.
That's really interesting.
I have to also imagine it's valuable
because all of the models train on Reddit, right?
Yeah.
And if you want your products name to appear in model outputs,
it's like, what is the best, you know, nose hair, trimmer or whatever.
And there's a bunch of bots that on Reddit talked about this nose hair trimmer.
And then that's probably more likely to show up in a chat.
You've been quite straight.
Yeah. Yeah, it's been weirdly gamed.
You know, you used to just Google best nose hair trimmer.
Yeah. There's like a thousand.
Well the Reddit search results like show up first nowadays.
Yeah. It's what people are looking.
Yeah. And then people start searching.
Best nose trimmer Reddit. Yeah.
To get their Reddit comments on it.
And now it's like it.
People have realized that that's what people are searching for.
So you need to populate Reddit with your, advertisements.
I'm, I'm on the Men's Health.
Are you looking for a nose hair trimmers?
The Panasonic Ear Nose hair trimmer is the number one choice.
Men's Health Pros.
Easy to hold anyway.
It's not. Yeah, it's it's all these affiliate links.
Yeah. Just destroyed the internet.
I know, it's, it's really too bad.
But whatever.
Talk to us more about the whole pipeline.
So I, I'm very fascinated by this idea.
It's like, okay, you see this review for Denny's?
You have the AI model try to replicate it as best as it could be.
The subtle differences.
Talk to us to know about, like, the whole pipeline.
What are the other tests that you're using to get the true.
You know, because what I imagine you're trying to do
is get the most similar data set with an almost imperceptible differ.
It's a really stress test. Absolutely.
Yeah. Talk to us really about this whole pipeline. Yeah.
So what we're really trying to do here is working as a model maker myself, I tried.
No, no. Sorry. Yeah.
As an AI expert. Yeah.
As an AI expert I need to hear some tips of the field.
Yeah.
So what we're really looking for is, examples that are as close to the boundary
between human and AI as possible so that our model learns better.
Something that's very obviously AI is, you know, our model's
not learning as much.
Same thing for something that's obviously human.
And so step one is,
you know, creating this data set with synthetic mirrors of human examples.
And then we train a model.
And then step two is something called active learning.
So we then take this model and use it to scan a much larger corpus of data
and look for errors false positives, false negatives.
And then we pulled those back
into our training set and are able to train a much better model
because it's seen these errors, which and these errors we believe
are just much closer to the the boundary between human and AI.
Sorry, sorry.
Just to be clear, the first pass is like, okay, you have known
human writing in known AI, right?
And you train a model
and then the next pass is once again unknown human, unknown AI.
Right?
So you already know the answer of each of these,
and therefore you could come up with a list of which you got wrong.
And then that gets fed back into the first version.
Exactly.
And so that makes once we retrain, then the model gets much, much better.
And then we could do this as many times as we want to kind of
just have a self-improving model that gets better with every training run.
I can also tell you go a little bit more into how we deal
with AI edits, because I think that's increasingly important.
Problem is, like I think most writing will be AI assisted in the future.
I think it's already in Google Docs and it's in, yeah, Google keyboard.
Grammarly arguably has been doing this for a while.
Exactly.
Yet Grammarly uses LMS on the back end,
and we don't want to just say, like all writing is AI now,
we want to be able to differentiate between AI assisted and AI generated.
So what we do is we also have different prompts.
So rather than saying so for like human review of Denny's, rather
than saying, generate a review like this, we could say, help improve this,
make it more formal, make it more, like clean up the grammar.
And so we have like a long list of AI editing prompts,
And then we're able to look at basically the cosine difference, the distance
between the original human text and the edited text.
Dimensional space. Exactly.
So how much did I change this text?
And then we're able to train our model to say, like, we're just going to like
put a point on this distance and say, like this is moderate AI assistance.
This is like AI assistance and this is heavy assistance.
Interesting.
I'm going to do something I don't think I've ever done before,
which is ask a founder about their their corporate mission.
But, you know, you've set up this company.
And when you think about what you're trying to do here, is it
just basic AI detection in the sense that there might be, you know,
a few groups of people like teachers that find this very valuable?
Or is the mission something broader where you're actually trying
to improve the internet and what people see on it?
I believe the technology of being able to detect AI generated
content is immensely valuable, and it's valuable
not just for teachers, but for basically everybody in every profession.
Lawyers pleasure is just
an individual who consumes content on the internet.
I think it's valuable for all these people.
But ultimately, yeah, our like, high level goal is to help mitigate some of these,
negative effects of growing AI content.
But for instance, just using the product review example,
if it's the vision that like a Yelp, for instance,
would want to use this technology to make sure that IT system
isn't being gamed or is the vision like, if I am a particularly diligent consumer
who has a lot of time on my hands and I'm looking to go out to a restaurant,
I can run all these individual restaurant reviews through pan gram
and then like, actually figure out if it's real high or not.
So I think right now it's a lot of the former.
We work with platforms.
One of our biggest customers is Quora, and they run a bunch of content
through pan gram.
But we have a lot of different, platforms that use pan gram
to help moderate and, find AI bad actors and,
get them off their platform.
But I also think, yeah, the individual consumer case has been growing a lot,
and we're really interested in pushing here
the free version of pain gram.com.
Like you get a handful of tests a day or something like that.
If someone had an unlimited number of pan gram responses
and maybe had an access to the pan gram API at infinite scale,
could they theoretically learn a prompt that they would
then be able to put into an AI to generate human style writing?
I actually had a friend do that.
He put his cloud code on a loop.
I gave him some API credits, and then his cloud code
just basically worked overnight, writing a prompt, trying to get it to.
Yeah, but something that's human written or that was from programmers.
Human written, it got there, but the text
was pretty like, incoherent.
So, so like, yeah, it
was producing more or less long gibberish.
It was like grammatically incorrect.
It, a lot of the words just didn't really make sense
because this was my first thought.
Like when I saw it, I was like, that would be like a fun experiment
to see if you could take
all the output, find the difference, and just keep iterating on the prompt.
You would have to tell AI in order to eventually
get an output that look to pan like it was human generated.
Yeah, I think there's a way to do it too.
If you also had like an Lem judge on coherency and who's like Pan
Gram and the coherency judge, both to score your text.
I think that's definitely possible.
And I'm excited for someone to try to do it
because we could make our model a lot better and more robust if this existed.
So I want to know what your personal like token budget is nowadays,
that you're even, like contemplating some of this stuff.
But I feel like I have the cloud Max plan, you know,
and I don't work like when I'm at work.
I don't work on any of my vibe coding projects,
you know, like when we were kids.
I don't know if you remember.
Like, if you didn't eat all your food, like,
someone would say, oh, there's, like, starving kids in the world. Yeah.
I'm like, oh, that's like a starving vibe.
Coders, they need the turbulence.
It's like, oh, you didn't like,
I have this four hour token window and I'm almost never maxing it out.
And I'm just saying it's like
dark kids on the other side of the world that wish they had your tokens and you're,
you're not using all of your tokens for the window.
How dare you?
I feel a little guilty when I don't max out my, cloud max token program.
I also have Claude,
Max, and, yeah, most of most days I'm not doing much coding at all.
I'm not maxing it out.
And then some days I'm going, well, we'll see about that though.
It's like, oh, yeah. Yeah.
So can I ask you like, you know, writing is kind of interesting, but like,
what are the prospects of this being able to work on, say,
and you must get this like image and video generation.
Isn't it all theoretically similar?
Is there
reason to think
that it will be replicable, or is this just a different beast of a problem?
I think the approach is definitely doable.
I think some of the economics change, especially if we look at video
and the costs of generating video today, okay, we
we can't generate video at the same scale that we can generate text.
And so we might need a kind of different approach.
But I also believe that if we're able to solve this for Image Plus maybe
like audio, that that could be enough to just solve it for video as well.
Zero shot.
Could you ever envision, I don't know, launching some sort of like
certification program for video because this seems to be
like my dad's a boomer, spends a lot of time on Facebook.
Like this seems to be what society needs, right?
Like a video that comes with a little thing that says, this is not
AI generated and someone has actually like rubber stamp that.
So there's an organization called PR, and I think they're doing
pretty good work on content provenance.
Basically, they are working with phone makers and hardware makers to,
basically embed like hardware signatures
to prove that image and video are
like were truly taken from the hardware, like watermarks basically.
Yeah, exactly.
So, so rather than marking the AI outputs, we're instead, embedding
like a proof of authenticity in the, the like thing.
That's real. Yeah. Captured in real life.
That's interesting.
All right. So big picture.
Where is the internet going?
You know, you mentioned 40% of
the internet is already generated, but maybe that's not the end of the world.
Like, you know, you just want your SEO pages that I never read.
I don't know, whatever, but like what?
What's the give us some thoughts high level about like what?
The trajectory of the internet, regardless of the uptake of Pam Graham
and other ad detection models,
I'm a little bit worried about the state of the internet.
I'm going to be honest.
I think like right now,
there's still like so much of it is built around trust and norms in a way
that like, we're we're not really it's we're not really well equipped
to suddenly deal with an onslaught of bots at a completely different scale
than we've dealt with before.
I so there's maybe, like a good case and a bad case, I would say like the bad
case is the internet goes the way of dead internet theory, just like every
every space that's open and accessible is just flooded by bots.
And then the only place people are able to communicate authentically
is in, like, very walled garden, like closed,
servers, like, like discord servers, for example, where, you know,
everybody's identity is known and, you know, you don't know what's in here.
So that's maybe the, like, bad scenario.
Can I tell you an insane thought that I've had?
Go on.
We're going to kick
out just just so in front of like I forget what they call like
this idea of like with bad actors, it's called like heaven mode or heaven burning.
Have you heard of this? Yeah.
So there's this thought that one way you could deal with bad actors
on the internet is suddenly they
they're they you're they're on a version of, say, Twitter
in which there are only bots
and everyone always agrees with them on everything,
and it drives them crazy and stuff like that.
And they would never know it because they're like, oh, it's clever.
And then it's like slowly like, yeah, they just
this is like, oh, you could punish people by putting them on the internet.
Well, they will never get any flat.
You can get heaven banned and put into basically jail.
You're talking to a bunch of. That's right, that's right.
That would be jail. But you haven't been.
But I thought and again, this is you know, like I built this little AI model myself
and I showed it to my friend. I was like, oh, that's really cool, Joe.
I'm really impressed. Like, I'm really impressed by like that.
You're able to do this well.
And I was like, are people being honest with me?
Have I been heaven banned because I just like
like you can be honest with me if it sucks and I'm and I sort of have this fear
of the biggest humblebrag ever, like, oh,
I did this thing and everyone thought it was great.
I'm just saying, like, people are like, I think people I'm worried
that, like, people are being nice to me because like, oh, cool.
Yeah, this will impress you. Like, did that.
And I have this like deep anxiety that like, people aren't giving it to me
straight about it. I'm not.
I know that sounds like a humblebrag where it's really not like that's why
you can never get like two successful, like surrounded by a bunch of. Yes.
Oh yeah.
Like, oh, this is a first try at doing something with I'm coding.
I'm like deeply anxious, like, no, you can just tell me if it sucks, that's fine.
That's my worry.
I don't worry about this.
If I tweet that I'm eating a steak, I will get like a hundred people
criticize me for something. The meat crew.
Yeah, yeah.
So that's the other thing, which is that the two things you are never
allowed to tweet about
meat preparation and enjoying life.
Because if you ever enjoy life, this is over. Enjoy.
And if you ever prepare meat, people will flip out at you on the internet.
Those are the two things that you're not allowed to do online.
Very true.
This sort of related question, but just going back to the methodology,
if you're focused on this sort of like path dependent idea,
I'm kind of envisioning it as like a giant decision tree.
Right.
Is there a possibility that as the models get better and better and we know that
they're already injecting like some degree of randomness into their output?
Although I know there's going to be a pedant out there who like messages
me and says like, well,
you know, computers can't do like, true randomness, but, you know,
setting that aside, setting that aside, like, we know that
they're adjusting, they're becoming more sophisticated at an incredible rate.
We know that they're trying to adjust and inject
some randomness in order to avoid exactly this kind of detection.
Do you worry about their own adaptation at all?
I have noticed that the models, as they get more capable,
I believe it's like their output distribution gets more complex.
Yeah, it's harder to learn with a simple model, which is why
we've been increasing our model size to capture a greater complexity.
Higher complexity function, that can capture the output.
So I think we may have to continue to, make our models better.
We're going to have to work to keep up with it. Yeah.
We can't just rest on our laurels. What?
Our bursting us in perplexity.
Yeah.
This is a metric that's used by some AI detectors, but not pan gram.
Okay. And so I can explain a bit about how it works.
So perplexity is basically a measure.
And this is not perplexity I the website this is a technical term okay.
This is a metric.
This is a measure of how confusing a piece of text is to a language model.
So basically, for example, with every token
we can calculate some perplexity which is basically like how expected is this is.
So for example, like if it's I went home to my pet
and then the next token is chinchilla, that would be a much higher perplexity
token than my pet dog.
So alarm outputs tend to be low perplexity.
They're not going to produce outputs that are surprising to themselves.
This is a decent way to get an AI detector that's around 90 to 95% accurate.
But it has some problems.
The main one is that you can't improve upon it.
Basically, it has false positives.
Texts written by non-native English speakers often
is low perplexity just because when you're they don't take as many risks.
Exactly. Yeah. Yeah. Interesting.
That's why a lot of the early AI detectors had a bunch of false positives
with ESL speakers.
It's because their text was low perplexity.
So I think like this is a very cool metric, but
it is not the path for pan gram.
Instead, we went the deep learning approach.
So we can do better than what's first in this.
Is that just the opposite side of the coin?
Yeah, bursting us is basically.
Actually, yeah.
I don't know if I can define it. Okay, fine. Yeah. Okay.
Bursting us just sounds like one of those like sort of, I guess
manosphere terms, doesn't it?
Like, oh yeah, he has like he's been looks maxing with high bursting.
Yeah. Something like smug.
Yeah that's great.
Yeah I think it might just be like a measure of like sentence length and.
Got it. Like how?
Yeah. The ups and downs of the text.
I have one more question, which is if we assume that the world
is collectively concerned about AI slop and wants to do something about it,
what would be like the single biggest change to the system, either in terms
of like the economics of the internet or regulation or technology
like what you're developing that would actually help produce slop.
Yeah, I think the biggest one is norms.
So there have been a couple great blog posts written about how it is rude to send
other people undisclosed AI outputs,
and, I think I like completely agree here.
I think, you know,
if somebody like ask the question on the internet
and then somebody else like goes and puts into ChatGPT
and then like paste the answer that's kind of rude.
Like, like I was going here to ask the opinions of my friends or,
you know, my followers, not just like, not ChatGPT.
I could have done that myself.
And so I think like, building this norm is something that,
you know, it's very new technology.
So we need to do it quickly.
But I think this would help a lot for society.
Well, then actually, this gets to a question that I have that which is
it sound?
I feel as though the major internet platforms are actually moving in the exact
opposite direction.
I mean, I'm stoned, I'm maybe I accidentally clicked on something
at some point, but the frequency with which I get an email
and then I open it up to respond in Gmail, and there's that ghost text there that,
I do.
You just want Gemini to respond to this?
I've never done that.
I also consider I think that would be extremely rude.
I've never responded to, any email
with, I, I, respond, but they're basically telling you to do that.
They're doing the exact opposite. They're blowing up these norms.
And so I'm curious, from your perspective, you mentioned you work with Quora,
but from your impression, the major internet platforms think this is a problem
with solving or from their concern is like, you know what?
Yeah, it feels like content. The better.
There's mixed incentives there for them.
It was for the big companies.
It's funny because like Google seems to be playing both sides.
So like on one hand they had that, advertisement
which people kind of blew up about where it's like,
oh, children can now send their heroes, notes on like how much
they respect them by using AI instead of, like, writing the note themselves.
I'm like, this is wrong. This is like societally bad.
But at the same time, they're working very hard to deal with the
AI slop on the internet in search results, to make sure people get
served real content and not AI slop content.
So I think, I mean, I think obviously there's
a lot of incentives that play around, like product people who are incentivized
to push AI because, yeah, that is the, the corporate mandate.
But yeah, I think overall, even like in my sphere of a bunch of people
who are AI researchers, generally, consensus is that, like
AI is a powerful tool, but like slop is bad.
This reminds me, my parents used to make me, do these,
like, handmade greeting cards for, you know, for Christmas
for, like, all of relatives and stuff.
And it was supposed to be a demonstration of my commitment to communicating,
I think to family.
No, no, it traumatized me forever.
And I hate greeting cards as a result of them,
of doing this, just spending hours manufacturing these things.
But then secondly, the funniest thing was once we got e-cards,
my parents immediately switched to using e-cards and just.
And now this is also the funniest thing my dad uses.
He figured out that the e-card system can tell him whether or not you opened it,
so he just uses it as like day to day communication. Now.
It's so funny.
I just sent an email to your daughter.
Yeah, do it via e-card.
It's like I noticed
you haven't opened up my e-card for, International Hot Dog Day.
Please, let me know what's going on.
I'm terrible in writing as a kid, and my mother made me write all of these
handwritten notes to thank people for the gift I got for my bar mitzvah.
Yeah, I hated it. But you know what?
I have deep connections with all of those people that have laid on over the years.
And that miserable one week where I just wrote and I got, you know,
hand cream, I think it paid off.
So, all right, well, imagine doing that for like 16
years, basically, in a never ending stream.
Meg Spiro, thank you so much for coming on outlast.
That was a lot of fun. I'm fascinated by this conversation.
Thanks so much for having me. Yeah, really exciting to talk about this.
And I think slop is a growing problem.
So hopefully also we're able to deal with it 40% of the internet.
I can't tell if I'm surprised by that or not.
And what's it going to be next year at this time?
Oh man, I don't know.
It'll be like hard to say over majority.
Oh for sure. Yeah, almost certainly crazy.
Thanks for coming on Odd Lots.
Thanks
Tracy, I love that conversation.
I just think it's like a really fun puzzle, right? Yeah. No, totally.
It's very like, it seems like a fun question to solve.
And I'm fascinated by this idea of how, like, when both humans
and AI there is going to be this gap inevitable
between what we know and what we can articulate.
Could you enable setting aside AI versus text?
There are things that we both know.
For example, this is newsworthy.
And this is this is a good episode of a podcast.
This is a this is a credible sounding guess.
And this isn't, the gap between that and then being able
to like, explain why it's like, well, you just sort of know it, right?
You just sort of have this feeling there's an intuition
and that intuition is built on from numerous examples,
which is the same way in a sense that like the AI is trained,
it's like these things that you only know from patterns, and you can see them
without fully being able to, like, articulate exactly what's going on.
Well, the other question I would have on that is,
is it even going to matter in the long run?
If you think about like, so much of the internet
is already built on bots and the sort of false attention economy,
like if, if our entire like world view
becomes shaped by AI driven drivel.
Yeah.
Does it matter if, like, the economics of the internet
are still attached
to individual bot accounts and things like that, I don't know if I'm in
if I'm explaining this, but no, no, I think it makes lot sense.
And I do think like it is important and like we're going to have to change
the entire way we think, Mark said at the beginning, which is,
and I've thought about this, which is that it used to be
that if you came across a piece of writing and the punctuation was excellent
and the spelling was excellent, and it was like cogent sounding,
you're like, okay, this has been written by a smart person.
I will be serious. Yeah, right.
And now there's this complete severance of sort of like craft and output
because you could and you do this like, ask Claude
to write an argument in favor of the most absurd proposition.
Yeah, well, as Claude, to write an argument for me that,
you know, write an argument for me
that the reason why, Reagan wanted to do tax cuts in the early 1980s
related to these reports of UFO sightings in the 1970s.
And it will write something.
Yeah, that not only is it grammatically correct,
I don't actually like straining to come up
with a better version of this argument before and again, if prior to that, having
read it like, oh, maybe the person like this person took this argument seriously,
but now this argument is just created ex nihilo.
We're going to have to really, like, change our heuristics about this stuff.
We've created an unlimited stream of basically cranks with really good grammar.
Yeah. That's right, that's right.
Because it used to be we knew the crank show.
They had bad grammar where they would email us
and like half the words would be in yellow and the other half would be underlined.
Green.
Inquisitors, classic examples
of the tools that we used to just like, oh, this person was a crank.
They like, you know, half the roads are at all caps and stuff like that.
Those don't work anymore.
All right, on that note, shall we leave it there?
Let's leave it there.
This has been another episode of the Odd Lots podcast.
I'm Tracy Alloway. You can follow me @tracyalloway
And I’m Joe Weisenthal You can follow me @thestalwart
Follow our guest Max Spero he’s @Max_Spero_
Follow our producers Carmen Rodriguez @carmenarmen,
Dashiel Bennett @dashbot and Cale Brooks @calebrooks
And if you want more Odd Lots content, you should definitely check out
our daily newsletter, human generated newsletter
over @bloomberg.com/oddlots
And you can chat about all of these topics 24-7
in our discord, discord.gg/oddlots
And if you enjoyed this video
then please leave a comment or like or better yet, subscribe!
Thanks for watching.
Ask follow-up questions or revisit key timestamps.
The video discusses the increasing prevalence of AI-generated content on the internet, with estimates suggesting around 40% of web pages might be AI-written. It explores the challenges and implications of this trend, including the difficulty in distinguishing between human and AI-generated text, the potential for AI to be used for malicious purposes like spreading misinformation, and the impact on various professions like journalism and education. The discussion features Max Spero, CEO of Pangram Labs, who developed a tool to detect AI-generated content. He explains the technology behind his model, its accuracy rates, and the ongoing race between AI development and detection methods. The conversation also touches on the philosophical and societal implications of AI-generated content, the changing norms around online communication, and the future trajectory of the internet.
Videos recently processed by our community