HomeVideos

Programming Isn't Math, It's Linguistics

Now Playing

Programming Isn't Math, It's Linguistics

Transcript

767 segments

0:02

[Music]

0:07

Huh?

0:10

[Music]

0:18

[Music]

0:28

[Applause]

0:29

[Music]

0:35

What you just saw is a quirk in every

0:37

single language. This is actually a

0:38

subset of issues called linguistic

0:40

ambiguity. If I'd had more context, I

0:43

might have understood the original

0:45

intent. But you know we don't always

0:46

have those kinds of luxuries.

0:48

Interestingly enough, programming

0:50

languages face very similar challenges.

0:53

We as programmers instead of

0:55

communicating personto person are now

0:58

communicating person to compiler.

1:00

Whether you're coding in C++, Java, Rust

1:03

or any programming language, compilers

1:06

being a human-made invention are also

1:08

imperfect at interpretation. This is why

1:11

since they can face very similar issues,

1:14

programming can also fundamentally be

1:16

considered a linguistics issue. First of

1:18

all, I have a question for you. I've got

1:21

a very small program over here. What do

1:23

you think the output of this program is

1:25

going to be? We've got a few options

1:27

here. First would be it prints out

1:29

partially true. Second would be it

1:31

prints out totally false. Or I guess a

1:33

third option could be it prints out

1:35

absolutely nothing. So, pause for a

1:37

second and give me your best guess.

1:39

might not be exactly what you would

1:40

expect, which you probably got because

1:42

I'm sitting here asking that as a

1:44

question.

1:46

So, let's go over. I'm going to compile

1:47

my program. We're going to ignore our

1:50

warning. I'm going to do /hm.

1:53

We've got our answer totally false. Now,

1:56

you might be thinking, looking back at

1:57

the code, this really doesn't make any

1:59

sense. I would probably guess that the

2:01

output would be absolutely nothing. But

2:03

instead, we're getting this totally

2:05

false statement. And that's because this

2:07

is part of a language ambiguity in the

2:09

syntax called a dangling else. And we're

2:12

going to get into a real world example

2:13

of this a little bit later. Nearly every

2:15

distinct programming language issue can

2:17

be traced back to a very similar human

2:19

language issue. In all these examples,

2:22

I'm going to talk about the English

2:23

language quirk and then I'm going to

2:25

talk about the very equivalent

2:27

programming language quirk and I'm going

2:29

to kind of sort them into different

2:31

buckets of the three distinct

2:33

compilation phases which I'll also talk

2:35

about in a second. Now, English in all

2:37

languages for that matter can have a lot

2:39

of different ambiguities and one of my

2:42

favorite examples of this is actually a

2:44

book from 1967 called Beyond Language:

2:47

Adventures in Word and Thought. And this

2:50

explores recreational linguistics. And

2:53

actually, I think that would be a really

2:54

good name for coming up with insults,

2:57

like an official name. I'm not insulting

3:00

you. I'm just practicing my recreational

3:02

linguistics. One of the best sentences

3:04

from the book, I think, is buffalo

3:06

buffalo buffalo buffalo buffalo buffalo

3:10

buffalo buffalo. Yes, that is a real

3:12

sentence. Don't even question me. Just

3:14

go look it up. But that's because the

3:16

word buffalo can act as both a noun and

3:18

a verb. But relating this back to

3:20

programming languages with compilers,

3:22

these kinds of ambiguities are even more

3:25

serious and important to clarify. If you

3:28

think about it, compilers need to be

3:29

correct 100% of the time, but they also

3:32

don't have the advantage of being able

3:33

to ask for more context when they need

3:35

it. Let's take a minute to think about

3:36

how compilers actually parse the syntax

3:39

of a program. Now, compilers have it a

3:41

little bit worse off than humans because

3:43

they have to do this in multiple

3:45

different phases of parsing. And you can

3:47

kind of break this up into three

3:48

distinct stages. The first stage is

3:50

going to be the lexical analysis. The

3:53

second is going to be the parsing stage.

3:55

And then the final stage is going to be

3:57

the semantic analysis stage. So during

3:59

the lexical analysis, it's going through

4:02

and it's breaking up the tokens of your

4:04

program. So like the variables, the

4:07

operators, and it's parsing those out

4:09

into distinct tokens. And then the next

4:12

phase is going to take all of those

4:13

tokens in your program and turn those

4:15

into an abstract syntax tree or an a.

4:19

And during that second parsing phase,

4:21

it's this is going to be the part where

4:23

it's finding all those syntax errors

4:25

that are so irritating in your code,

4:27

like if you forgot to add a semicolon or

4:29

something in your C++ program. Side

4:32

note, when I was in school, I remember I

4:34

came up with the ten commandments of

4:36

computing and the first one was remember

4:38

the semicolon to keep it holy.

4:42

Don't forget guys, during the final

4:44

stage, the compiler uses that a and

4:47

parses it to look for semantic

4:49

correctness in your code. For example,

4:51

it does type checking or maybe checks

4:53

for scope resolution inside of your

4:55

code. So if you have any kind of logical

4:57

issues with your code, this is where the

4:59

compiler discovers those. Like for

5:01

example, if you're trying to use a

5:03

variable that's out of scope. Now, if

5:05

you're looking for a very understandable

5:07

example of how a compiler works, I

5:09

highly encourage you to take a look at

5:10

the local C compiler, which is available

5:12

open source on GitHub, and this is going

5:14

to be a lot more readable than trying to

5:16

understand a really big compiler like

5:19

GCC or clang for example, that might be

5:22

spread across like literally hundreds of

5:24

different files. So, this is really well

5:26

documented and easy to understand. I'm

5:28

going to go over to the LCC GitHub

5:31

repository. And just side note on here,

5:34

this has to be the oldest commit I've

5:37

ever seen in my life from 23 years ago.

5:40

That's absolutely insane. That just

5:41

blows my mind. I'm going to go over to

5:44

the source code and you can see we have

5:47

all of the different files that kind of

5:49

make sense to put it into the three

5:51

three different phases of the compiler

5:53

analysis. You can see like the tree

5:55

parsing it into an abstract syntax tree

5:57

and you can see the process where it's

5:59

doing both semantic and syntactic

6:01

analysis. If you look at tree, I think

6:04

it's very descriptive over here. You can

6:06

see it actually going through and

6:08

parsing the different op codes inside of

6:11

this and then constructing that a so it

6:15

can do the remaining analysis on it.

6:17

Going back over to our linguistics

6:18

challenges, let's think about how these

6:20

ambiguities can come into existence.

6:23

Now, older languages are going to face a

6:25

lot of these challenges because if you

6:27

think about it, you have an old existing

6:29

language and you're constantly adding

6:31

new additions on top of that. So, it

6:34

kind of makes sense how these issues can

6:36

happen. But this can also happen a lot

6:38

inside of younger languages as well,

6:40

which are also not perfect. I think a

6:43

really good example of this is the word

6:45

literally. Now, if you think about it in

6:47

old original English, literally

6:50

literally meant not figuratively. So it

6:53

actually happened. But as English has

6:55

progressed, literally has started to

6:57

change meaning to actually mean

7:00

figuratively.

7:01

And a lot of people use it now as kind

7:04

of a synonym for very, which drives me

7:07

literally nuts.

7:11

Now, now going back over to some C++

7:14

examples, let's take a look at them. And

7:16

we're going to use C++ since that's

7:18

literally the best programming language.

7:20

Remember those three different compiler

7:22

phrases I talked about previously? Let's

7:24

take a look at each one of those

7:26

individually. So for our first bucket,

7:28

those would be syntactic quirks. And in

7:30

linguistics terms, you could call this a

7:32

grammar ambiguity, aka what happens if

7:35

the same token fits into multiple

7:38

different parse trees. And C++ has a

7:40

really good example of this one, and

7:42

it's very fittingly called the most

7:45

vexing parse. Think about this English

7:47

sentence for a second. Visiting

7:49

relatives can be boring. Now, if you

7:51

look at the word visiting, is visiting

7:53

acting as a verb here, like the act of

7:56

visiting your relatives, or is visiting

7:58

acting as an adjective describing a

8:00

noun, as in you're visiting relatives,

8:03

can be boring. If you see, the word

8:05

visiting can have multiple different

8:07

meanings, completely changing the

8:09

structure of the sentence. Now, keeping

8:10

that English example in your mind, let's

8:12

go over to a quick C++ example. So for

8:15

this example, I have a very short

8:16

program right here. I have a couple

8:18

declarations up top. We have a simple

8:21

function definition that's going to be

8:22

declaring a function named m that's

8:24

going to be returning a vector of

8:25

integers. And then we have our

8:27

definition for that function down here.

8:29

Keeping that in mind for the next few

8:31

seconds or or minutes pro probably

8:34

seconds, I hope seconds. Let's go down

8:37

to our main function and instead let's

8:39

say I want to declare a variable. And

8:42

we're going to name our variable n. And

8:44

this is of type vector. And if I want to

8:46

default initialize this with a few

8:47

values, I can initialize this with let's

8:50

say 10 values all initialized to the

8:52

integer five. And if I go down here,

8:55

adding a couple of extra values just

8:57

because why not. And I'm going to be

8:59

printing out the size of that vector as

9:02

well as all of the numbers inside of it.

9:04

So let's just run this for sanity and

9:07

make sure this works.

9:09

So we do slash vex. And you can see

9:12

we've just got our vector printed out as

9:14

well as the size just like we would

9:16

expect. But now take a second and think,

9:18

what if we didn't want to have default

9:20

initialized values inside of our vector?

9:22

What if we just wanted to have an empty

9:24

n vector? You might think we can just

9:27

get rid of those values. And let's do

9:29

something like

9:32

this for example. This looks right,

9:33

right? But why am I getting some syntax

9:37

errors over here? Let me save this and

9:40

run this. We'll compile

9:43

and you see doesn't compile. What's

9:46

going on here? So if I go up here, I can

9:49

see invalid range expression of type

9:52

vector int. No viable begin function

9:54

available. Well, that's very

9:56

interesting. See what's happening. If

9:58

you look at this closely, there is no

10:00

difference in the declaration of this

10:02

vector n as well as this vector m over

10:06

here. So the compiler is basically

10:08

saying, I don't know whether this is a

10:10

function definition or a function

10:12

declaration or if this is declaring a

10:14

new variable for a vector of integers.

10:17

This is an example of a syntax

10:19

ambiguity. Now the way to fix this,

10:22

let's get rid of this. It looks it looks

10:24

really right. So if you're a beginner,

10:26

you would look at this and you would

10:27

think, huh, what's wrong? And it's

10:28

probably why it got the name the most

10:30

vexing parse. But the way to fix this in

10:34

old C++ was like this where you have to

10:38

set your new variable equal to a vector

10:40

of integers. But this is just like the

10:43

longest syntax in the world. But it does

10:45

indeed get rid of our syntax errors. But

10:48

if you go over to C++ 11, luckily in

10:51

modern times we have this declaration

10:55

option available. we can just use curly

10:57

braces and it default initializes our

11:00

vector to nothing. No default values,

11:03

just calls the default constructor. And

11:05

if we want to make sure that this does

11:07

actually compile, got my dummy values

11:09

down here. So we should have our vector

11:11

size two. So I'm going to recompile.

11:15

Let's run our vex. And here we go.

11:17

Worked just fine. Now we can

11:20

unambiguously declare a vector of

11:22

integers as well as a function returning

11:25

a vector of integers. Confused?

11:28

Hopefully not. Our next syntactic quirk

11:31

is going to be the dangling else, which

11:33

we talked about briefly at the beginning

11:34

of this video. So, let's think about

11:36

this sentence for a second. If you see

11:39

Sam, if he's busy, leave a note.

11:41

Otherwise, call me. Now, this is kind of

11:44

ambiguous, but it's mostly related to a

11:46

punctuation problem here. Like what if I

11:48

don't see Sam at all? Am I supposed to

11:50

call you or not? So if we rearrange the

11:52

punctuation and the structure a little

11:54

bit and we get this, if you see Sam,

11:57

leave a note if he's busy, otherwise

11:59

call me. The meaning becomes a lot

12:01

clearer. So this kind of ambiguity of

12:03

association happens a lot in programming

12:06

as well. Going over to our C++ example

12:08

for our dangling else, I've got a really

12:10

simple program here. Now, if you got the

12:14

question wrong in the beginning of this

12:16

video, now you have a second chance. If

12:18

you can't remember what the question

12:19

was, you should rewind and come back

12:21

because you'll probably get this one

12:22

right. So, we have in our simple

12:25

program, this is kind of simulating a

12:27

real world scenarioish

12:29

of logging into a machine. So, we have a

12:31

couple statements here basically saying

12:33

like login was successful, login wasn't

12:35

successful, and let's go to our main

12:37

function. I've set this to is logged in

12:40

is true. So, we're going to successfully

12:43

log in and we're not an admin. So, we

12:45

should not be able to log in as an

12:47

admin. And then this is our nice little

12:49

login check over here. And then at the

12:51

end after we've logged in, we're going

12:53

to get like a nice little goodbye

12:54

message. So, let's take a closer look at

12:56

this if statement. And again, tell me

12:59

what you think is going to print here. I

13:02

should have kind of a a boolean and over

13:05

here. It should be if is logged in and

13:07

is admin, then we're going to log in as

13:09

an admin. Otherwise, we're just going to

13:10

completely fail. So, if I'm logged in,

13:13

but I'm not admin, I should really just

13:15

be able to log in and then get the

13:16

goodbye message. But let's go and

13:18

execute this. So, I'm going to compile

13:21

this.

13:22

Ignore our warning. It's going to be the

13:25

the phrase of the day, ignore all

13:27

warnings.

13:28

But what you see here, even though we

13:31

should be logged in, we're getting a

13:32

failed to log in. Get out. And then we

13:35

get our thanks for logging in with us

13:37

today. I didn't code that part. Great.

13:40

So, what's happening? If we look at this

13:42

if else, it looks like it should be

13:44

working. Right now, what's happening is

13:46

this else is ambiguous. It doesn't know

13:49

whether to associate this else with this

13:51

if statement or with this if statement.

13:55

Kind of a side statement here. I

13:58

honestly, personal pet peeve. I cannot

14:01

stand when you don't use the curly

14:03

braces. like always use the curly braces

14:06

because what space are you trying to

14:08

save? You're you're you're saving like a

14:10

bite of space. It's just absolutely not

14:12

worth it and it makes your code so much

14:15

less reason readable. So, let's go over

14:17

to the fix of this. So, I'm going to log

14:19

in or get rid of this login right here.

14:23

And if we fix this with some beautiful

14:27

curly braces, we can make this so much

14:31

less ambiguous. Now we know if logged in

14:35

and is admin. We have our login admin

14:38

right here. Otherwise, we're going to

14:40

fail that login if our login was

14:42

unsuccessful. So if I save this and

14:46

recompile,

14:49

you can see we've managed to fix all of

14:52

our challenges with the ambiguous else

14:55

association. All right, drum roll.

14:58

It's time to look at our next bucket.

15:00

And this is a whole different can of

15:02

worms.

15:08

It's not worms, guys. It's just monster.

15:11

It's not even monster worms.

15:15

So, for our next phase, remember the

15:17

first phase of the compiler parsing was

15:20

that lexical analysis phase. So, that's

15:22

taking in all of your characters and

15:24

parsing them into their respective

15:26

tokens. But this parsing can be pretty

15:28

tricky if you don't have the proper

15:30

context to know what token some

15:33

characters are going to map out to.

15:34

Think about this sentence for a second.

15:36

He said, she said hi. Now, we have the

15:39

auditory advantage of being able to hear

15:41

like my tone and inflection, so you can

15:44

probably understand what meaning I'm

15:46

trying to convey. But let's say you read

15:48

this sentence instead inside of a book

15:50

or something and you're looking at all

15:52

of the different punctuation inside of

15:54

it. In English, we have this concept of

15:56

nested generics. So, this is really

15:58

convenient for conveying meaning because

15:59

you can have your single quotes inside

16:02

of your double quotes to show what

16:04

you're actually trying to say. Now,

16:06

programming languages, especially

16:08

earlier ones, faced these same kind of

16:11

trickiness and challenges. Now, relating

16:13

this back over to our C++ example, we

16:17

can also have this kind of idea of

16:18

nested generics inside of the C++

16:21

programming language as well. So if I

16:22

took my previous vector of integers n

16:25

and I instead decided to nest this with

16:27

an additional type and make this a

16:29

vector of vector of integers then I can

16:32

put this one type inside of another type

16:35

and nest this. Now this looks perfectly

16:39

valid and it's going through I'm adding

16:41

some different vectors to this printing

16:43

this out once again. But let's see what

16:46

happens when we compile this. And I'm

16:47

going to go back in time for a second.

16:49

And I'm going to go compile this with

16:50

C++ 98. So let's do this.

16:54

Traveling back in time. And something

16:57

interesting is happening. So we're

16:59

actually getting an error here. And

17:00

let's read that error message cuz it's

17:02

really interesting. So error. A space is

17:04

required between consecutive right angle

17:07

brackets. What does that mean? Basically

17:09

that means that these two right angle

17:13

brackets are being recognized as a shift

17:16

operation. So this is going to be a

17:18

shift write operation and the compiler

17:20

during the syntax parsing process is

17:23

basically saying I don't know whether

17:25

this is a nested type or if this is a

17:28

shift operation. So it basically can't

17:30

tell what these tokens are supposed to

17:32

be mapped to. And as our C++ 98 compiler

17:35

is so kindly telling us, the old fix for

17:38

this was instead of doing this, you had

17:41

to actually add a space inside of here

17:44

so that it could tell it wasn't supposed

17:46

to be this shift write operation. And

17:49

you can do this and recompile with C++

17:51

98, but it won't accept any of my nice

17:54

pretty syntax over here in my for loops.

17:57

and you have to do this awful

18:01

iterator syntax. So, I'm not going to

18:03

put you through that. Instead, we're

18:05

going to travel forward in time and go

18:07

to C++ 20 and compile this. So, I'm

18:09

going to have my original version cuz

18:12

they figured out how to parse this

18:14

properly eventually. And instead of

18:17

doing C++ 98,

18:19

jumping forward to 20. And here we go.

18:22

We are we have successfully compiled

18:24

this and we can execute this. perfectly

18:27

fine. Wow, fascinating vectors. All

18:29

right, let's move on to our next bucket.

18:32

And this is probably going to be our

18:33

worst bucket, and this is semantics.

18:36

Now, context, context, context is so

18:39

important here, but I think English is

18:41

probably one of the worst languages for

18:43

this since we just have so many

18:45

overloaded terms. So, take a second and

18:47

think about this sentence. Polish

18:50

ambassadors are rare. Imagine you were

18:52

reading this and you couldn't hear how I

18:54

was pronouncing the word Polish. Well,

18:57

since I have the capital P at the

18:58

beginning, do I mean Polish or Polish? I

19:02

would say Polish ambassadors are

19:04

probably pretty rare. So rare they might

19:07

even be non-existent. Now, through

19:09

pronunciation and context, you can tell

19:11

what I'm trying to say, but when it's

19:13

written down, the only subtle

19:15

differentiator that's not always

19:17

available to us as a differentiator is

19:19

the capital P. So consider this sentence

19:22

instead. Meeting Polish ambassadors is

19:25

rare. Now we've changed this so we're

19:28

able to explicitly type cast the the

19:31

capital P to the word Polish. So we can

19:34

say for sure we do mean a Polish

19:36

ambassador. I think I'd enjoy being a

19:38

Polish ambassador. When I was a kid and

19:41

I was picking chores, I'd always pick

19:42

polishing as a chore. It's very, very

19:44

satisfying. I don't think I'm qualified

19:47

to be a Polish ambassador

19:50

unless you guys will take me. Now, a

19:52

great programmatic example of this is

19:54

dependent type names. So, if I go over

19:56

to my C++ program, you can see I have a

19:58

simple function template right here,

20:00

passing in a container, getting an

20:02

iterator for that container, and then

20:04

this allows me to print out the first

20:06

element inside of that. If I go down to

20:08

my main function, you can see I've got

20:09

two different vectors. One is a vector

20:11

of integers, and one is a vector of

20:13

strings. And I'm able to call the same

20:16

function and pass in those different

20:18

types of containers. Now, if I go up

20:20

here, let's focus on this line just a

20:23

little bit. I want to try to compile

20:26

this code and let's see what happens.

20:27

So, I'm going to press enter. Oh, and

20:30

we're getting an error. And the error

20:31

message is missing type name prior to

20:34

dependent type name. So, it's really

20:36

complaining about this particular line.

20:39

Now I want to copy this and let's go

20:41

over to our CPP reference and look at

20:43

our specific container that I'm using in

20:45

this case which is a vector. And if I go

20:48

down

20:50

I can see part of our member types we

20:53

have this const iterator available. Now

20:55

if I go back over to my code you can see

20:57

I'm expecting this to be a type const

20:59

iterator. The problem is this right

21:04

here. These two colons can also be

21:06

referring to potentially let's say a

21:08

static member variable. So the compiler

21:10

is saying I don't know which one you're

21:12

trying to do. And how do we fix that? We

21:15

can be explicit about it just like we

21:17

were inside of the Polish Polish example

21:19

where we used the capital P in the

21:21

sentence to distinguish that we meant

21:23

Polish. All we need to do is simply

21:26

specify explicitly that this is a type

21:29

name that we're expecting. So save this

21:35

and now if we compile our code comp

21:38

compilation works perfectly and we can

21:41

run this and see we can either print out

21:44

the first element as 42 the integer or

21:47

42 the string. Sometimes the smallest

21:49

tokens make the biggest difference about

21:52

how a statement is parsed. Really good

21:54

example of this is the humble colon

21:56

operator. Consider this sentence. pack

21:58

the following snacks, water, maps versus

22:02

pack the following snacks, water, maps.

22:05

Very big difference between the two

22:07

interpretations. So the colon here is

22:09

kind of like a whoop whoop to your brain

22:11

of telling you what you're supposed to

22:13

expect. Otherwise, in the second

22:15

sentence, I kind of feel like I'm about

22:16

to start eating water or maps. Not

22:19

really not really the most tasty meal

22:22

I've ever had. If you like C++, don't we

22:25

all? then you'll know that the template

22:27

feature is one of the most powerful

22:29

parts of the C++ programming language.

22:32

And much like the colon, the keyword

22:35

template in C++ is an extremely

22:38

important marker for the language to

22:40

understand. If you use the word

22:41

template, it completely changes the

22:44

interpretation of the C++ compiler. Now,

22:46

let's move on to our last C++ example.

22:49

And yes, it includes templates. Going to

22:51

go over to my program. I have a strct

22:53

tool right here with a function template

22:55

inside called perform action. And if I

22:58

go down here, I have another function

22:59

template. It's going to be our run

23:01

processor that's going to trigger our

23:03

perform action over here. And then I

23:05

have a processor strct that contains my

23:07

tool strct declared inside. And my main

23:10

function is simply going to trigger this

23:12

entire run processor process. How many

23:14

times can I say the word process in a

23:16

sentence?

23:18

So, if I go back up, let's find the

23:20

problematic line right here inside of

23:22

our run processor function template when

23:25

we're calling this perform action. Pay

23:27

attention to this particular syntax

23:29

right here. I'm going to compile this

23:31

and see what happens. Now,

23:33

interestingly, I'm getting an error

23:35

here. It says use template keyword to

23:37

treat perform action as a dependent

23:39

template name. Basically, what's

23:41

happening here is the compiler does not

23:43

realize that this is actually a

23:46

template. Now, it's looking at this

23:48

particular symbol right here, our less

23:50

than sign, and it's thinking that it is

23:52

actually a less than sign, because it's

23:54

not taking in the entire context of

23:57

this, that we're simply performing an

23:59

action on an integer. Now, how do we fix

24:02

this? So, what we need to do is we need

24:03

to be explicit with the compiler and

24:06

tell this you need to treat this as a

24:08

template. So simply enough, we'll just

24:12

say that this is a template and then we

24:14

can trigger our perform action template

24:16

with our integer and run our processor

24:20

on our tool ID 42. Makes a lot of sense,

24:22

guys. Trust me.

24:25

So I'm going to recompile this. Here we

24:27

go. Worked perfectly fine. If I run

24:30

this, perform action on tool 42

24:33

successfully. Being explicit is very

24:37

helpful when you're programming. Now,

24:38

I've been throwing a lot of shade at

24:39

compilers in this video, but compilers

24:42

are extremely complicated and super

24:45

difficult to write. So, it makes sense

24:47

why they would have so many ambiguities

24:49

and syntax problems. I actually have an

24:52

entire video that talks about like the

24:53

history of compilers and how they can be

24:56

self-hosted that you should definitely

24:58

take a look at. But, if I'm going to be

25:00

talking so much smack about compilers, I

25:02

should probably try writing my own kind

25:03

of compiler, which you can do inside of

25:06

C++. So I have created my own custom

25:10

parser for my extremely sophisticated

25:14

language and this is called

25:15

Remmbercript.

25:18

So if I go over to my main.cp, this is

25:22

going to run my Remmber program. So it's

25:24

going to take in the input file that

25:26

it's going to execute. And if I go over

25:29

to my parser.y, you can see my extremely

25:32

sophisticated parser. What this is going

25:34

to do, this has a vector of strings that

25:37

has really good song lyrics inside of

25:39

it. And here are my tokens. I have my

25:42

Rember token and my semicolon token. So,

25:45

you remember that tokenizing portion of

25:47

the compilation process. Here I have my

25:50

own custom tokens inside of this. It's

25:52

actually really cool that you can do

25:54

this. I think it's really fun. Now, if I

25:56

go down here, here's the logic for my

25:59

interpreter. You can see every time I am

26:01

encountering the Remmber semicolon

26:04

tokens together I am going to print a

26:07

random line of this song. Now let's go

26:10

through and let's execute rememberers

26:11

script. Now I have my repository here

26:14

locally. I've already made my parser so

26:17

I can run dot /erscript. It makes me

26:21

laugh every time.

26:24

And then I need to pass in my custom

26:27

remember code. Well first of all let me

26:29

show you. Let's cat one of these files.

26:31

Let's do test remembermber.

26:34

Every time you can see I've got my uh

26:38

tokens here. Remember semicolon remember

26:41

semicolon. So each one of these is going

26:42

to print a random song lyric. So let's

26:45

do dot / remembermber script

26:48

and let's pass in test.ber.

26:52

Here we go. We have our song lyrics that

26:55

are printing randomly to the console.

26:57

And if we want to get really crazy, let

27:00

me show you. We can even do inline

27:03

remember script.

27:06

So we can do remember remember all in

27:10

the same line.

27:11

So let's execute this dot / remembermber

27:15

and then we'll pass in our test inline

27:18

remember.

27:19

Here we go. Now we've got our three

27:22

random song lyrics both in line and on

27:25

separate lines. Tell me you haven't seen

27:27

that kind of sophistication before. I

27:29

really got to stop making these examples

27:31

really late at night with no sleep and

27:34

multiple monsters.

27:37

I think this was only funny in my head

27:39

when I did it. I don't I don't remember

27:41

why.

27:43

Anyway, all jokes aside, I highly

27:46

encourage you to try to write your own

27:47

custom parser in C++ because I think

27:49

it's really cool to be able to do. It

27:52

kind of shows you the challenges of it

27:54

and it's just very powerful when you can

27:57

write your own custom syntax. I think

27:58

that's really nice to be able to declare

28:00

your own tokens. Do you remember 21st

28:04

day December

28:06

always remember happy day? The real

28:10

question is what can we do about all

28:11

these different ambiguities and

28:13

complexities in languages? And actually

28:16

formal language theory is a really

28:17

fascinating field to study. Now,

28:20

although there aren't really many true

28:22

examples of contextfree languages, we

28:25

can get pretty close. And some languages

28:27

are going to be better or worse than

28:29

other languages at reducing these kinds

28:31

of lexical complexities. A really neat

28:33

example of this is ASD STE 100, also

28:36

known as simplified technical English.

28:38

And this was created by the European

28:40

airline industry as a standard form of

28:43

English for writing technical manuals

28:46

in. And this was created to be a very

28:49

easily understandable language for

28:51

non-English speakers. You can actually

28:53

find this language online. It's super

28:56

simple and really neat. It's only

28:57

limited to a total of about 900 words.

29:00

As for programming, outside of toy and

29:02

experimental languages, lisp and lisp

29:05

like programming languages are kind of

29:07

the clear winners here. Now, a lot of

29:09

people really don't like the multitude

29:12

of parentheses inside of lisp, but it

29:15

makes it really clear and super

29:17

unambiguous for the compiler. I remember

29:20

my programming teacher in college

29:22

talking about when she was learning to

29:24

program and she was learning in lisp and

29:26

she had to count. She was told by the

29:28

teacher to count on her fingers opening

29:30

and closing parentheses so that you

29:33

could keep track of how many parentheses

29:35

you needed to close at like the very end

29:37

of a statement which is absolutely

29:39

crazy. But other than lisp, ADA and

29:42

hasll are pretty good about this. And

29:44

while C++ might be my favorite

29:47

programming language, historically it's

29:49

not great at this. It's had a lot of

29:51

lexical confusion in the past. So what

29:53

do you think? In this video, I mostly

29:55

stuck to C++ and English examples, but

29:58

I'm sure there's a ton of other

29:59

interesting lexical ambiguities in many

30:01

different languages. So, feel free to

30:03

leave yours in the comments section

30:05

below. In any case, this is a super

30:07

tricky field, but it's also very

30:09

interesting. So, as always, thank you so

30:11

much for watching everyone. I hope you

30:12

enjoyed this video. to where he wired

30:14

out.

Interactive Summary

Ask follow-up questions or revisit key timestamps.

This video explores the concept of linguistic ambiguity and how it relates to programming languages. It highlights that just as human languages can be ambiguous, programming languages, despite being created by humans, also present ambiguities that compilers struggle to interpret. The video discusses three main categories of programming language ambiguities: syntactic quirks, semantic issues, and lexical analysis challenges. It provides examples from C++ to illustrate these concepts, such as the dangling else problem, the most vexing parse, nested generics, dependent type names, and the use of the 'template' keyword. The presenter also touches upon the history of compilers and even demonstrates creating a custom parser for a simple language called 'Remmbercript'. Finally, it discusses approaches to mitigate ambiguity in languages, mentioning simplified technical English and Lisp as examples of more unambiguous systems.

Suggested questions

6 ready-made prompts