HomeVideos

LTX 2.3 Tutorial Part 1: Master Pro Image-to-Video (I2V) Workflow

Now Playing

LTX 2.3 Tutorial Part 1: Master Pro Image-to-Video (I2V) Workflow

Transcript

319 segments

0:00

Hi guys. Hi. Hi everyone. Uh, welcome

0:04

back. I'm Edo. New setting. I changed my

0:08

camera. Hope uh this work better. Um,

0:13

thank you to be here today with me. And

0:15

today we are going to see something

0:18

cool. I think I hope we are going to see

0:22

the new XLTX2.3

0:26

video

0:27

or engine. Um,

0:30

we already know the the last uh update,

0:34

the LTX 2.0.

0:37

Um, it's quite good. But this this new

0:43

video

0:44

uh model, opensource new video model, it

0:48

is amazing in my opinion.

0:52

um for many reason. Um

0:56

some of these reason are here. Uh you

1:00

can prompt audio as you can as you can

1:03

do with the 2.0 version. Uh you can

1:08

create video clip up to 20 second. uh

1:12

there is the the possibility to uh to

1:15

encode to 50 um frame per second and you

1:20

can create something with 4K

1:24

uh resolution.

1:26

Uh today we're going to see everything I

1:30

hope um the image to video model and the

1:36

first frame last frame uh workflow. Um,

1:40

I want to show you some of my uh render

1:45

some some of my um videos and try to

1:49

talk with you about um pros and cons uh

1:53

of this model. Um

1:56

I'm working in local.

1:59

Okay. So, I'm working with my Ryzen 9 uh

2:04

5060Ti with 16 gig of VRAM and 74 uh 64

2:10

gig of RAM uh for the uh the

2:17

for Windows. Okay. Uh the system RAM,

2:20

this is my

2:22

um spec uh and I'm working local. So, no

2:28

um

2:29

I I I I want just to create something

2:32

that works for me, for my spec, for my

2:36

confi, for my uh computer. Um I will

2:40

guide you in the in the

2:43

installation process. And now we can go

2:47

uh right here.

2:50

This is the confui

2:52

uh blog you know um

2:56

part with everything uh with the

2:58

highlights of the model with the some

3:01

with some example output very cool and

3:05

with this controversial

3:08

workflow um first of all I don't work

3:12

with prompt to video um workflow so I'm

3:16

just working with image to video I think

3:19

it's the only way to work professional

3:22

in a professional way. Uh prompting it's

3:25

uh not so basic when you want to have

3:30

control of your uh scene. Um and if you

3:35

want to see something more specific,

3:38

this is the GitHub paper with

3:42

everything.

3:43

Okay, you have to install a bunch of of

3:47

things in order to work with this LTX.

3:51

But uh but but but but uh we see

3:56

everything um to to to work with. Okay.

4:01

So um

4:04

these are all the checkpoints the 2.3

4:09

uh 20 billions dev or distilled if you

4:12

want to have some uh um save for your

4:16

memory. This is the upscaler. We talk

4:18

about the upscaler um in a second

4:21

moment. This is the Laura uh to uh to

4:28

in order to work better with um spatial

4:32

interpolation. This is the text encoder

4:35

Gemma. It's very heavy in terms of

4:38

memory and space but whatsoever.

4:41

And these are all the if you want to

4:46

have the the precise control

4:49

um uni union control or motion track

4:54

control. Okay, I will not use these two

4:58

in the workflow but you can whatever you

5:02

want. So

5:04

this is the um workflow

5:08

you can find here. I think I think this

5:13

is the workflow. This one

5:19

right there we can download it and you

5:24

know you can just drag and drop in your

5:28

uh voila.

5:31

Yeah is the same. Okay.

5:34

Uh do um this is the simple workflow.

5:40

Everything you need to know it's is this

5:43

um your your model checkpoint your

5:47

loras. Okay.

5:51

And the upscaler. This is a simplified

5:55

version of the the LTX um workflow. We

6:01

have this right here is is an a scale

6:07

node

6:09

in order to work uh fast. LTX first

6:13

downsize your frame and then upscale

6:16

that. So it it can work very quickly

6:21

with the downside version of your your

6:24

images and then with upscale with

6:26

upscaler you upscale everything. Okay.

6:29

in order to do something very quick and

6:31

very fast.

6:34

This is the the you know the subg graph.

6:38

Okay, seems to be very simple

6:40

checkpoint. This is the Laura. I'm using

6:44

another version of Laura, but it's the

6:46

same. Okay, this still Laura this one.

6:50

Okay, this Laura here we found it.

6:54

Sorry, right here. Okay, this Laura,

6:58

this still Laura right there. Bam. The

7:02

gemma and the upscaler.

7:06

The value of your frame rate 24. And the

7:10

value of the image. Okay. Heights and

7:14

weight.

7:16

I just click this one, this button here

7:19

to go to the subgraph. This is a cropped

7:23

version of a specific workflow

7:27

that is this.

7:30

Uh it seems to be very you know

7:35

uh um

7:37

demanding in terms of um understanding

7:43

and it's quite quite heavy.

7:46

Uh and you know you we have to um

7:51

one thing very important this version

7:54

this version here is a modified version

7:57

of the um confui

8:01

version. This is the confi you can

8:03

download from and in the confi version

8:06

you can find this prompt enhmentment

8:10

something

8:11

for me it's a mess every time I'm using

8:16

this prompt enhancement here I will face

8:20

errors errors and errors and errors so I

8:22

delete it it's completely useless use

8:26

Gemini or whatsoever to create a perfect

8:30

prompt this is a mess

8:32

And the second thing

8:35

um for me for me I had some many

8:39

problems with the first key sampler. I

8:42

don't know why. I don't know why. So for

8:46

the gener low resolution generation

8:48

process I swap with the LTX normalizing

8:52

sampler. So it's a same simple um you

8:57

know um

8:59

um node for LTX and remember you have to

9:03

update your confi of course everything

9:06

is in confi but you have to update

9:08

confui. So the the idea of the the

9:13

process if is to uh reduce the the scale

9:19

of your image. There is a process right

9:23

here. Um the less you use this

9:27

compression

9:28

uh the the most the the the model will

9:33

find details and everything very crispy.

9:36

So you can you can handle with 18 20 24

9:41

of compression to have something very

9:43

balanced in terms of quality. Okay. Uh

9:46

this part will reduce crop the the the

9:50

the image ratio. Uh and then this part

9:55

generate a low resolution image and then

9:58

there is an upscale

10:01

an upscale that creates something with

10:04

the right resolution. So he um he reduce

10:10

of enough the resolution and then

10:12

multiply by two. So it's it's a trick to

10:16

uh create something very very specific,

10:18

very cool. Um and this is the part with

10:22

the um high resolution generation with

10:24

the the code everything

10:29

it's

10:30

right here. Okay. So in my case with

10:35

this

10:36

um with this node deleted removed and

10:41

with this change here you can leave this

10:44

one

10:46

as the the provided one from the simple

10:51

um checkpoint sorry workflow from from

10:54

confi but I had to um replace this

11:03

Sometime confi is magic. I surfing on

11:07

Reddit, surfing on github in order to

11:10

resolve my problem. I fix it with this.

11:14

I don't know if this will fix your LTX,

11:18

your confi.

11:20

Uh guys, I hope so. Um I'm trying to do

11:24

my best in order to um show my process

11:28

to find the resolution. You can also uh

11:31

call Gemini Gemini or code whatsoever in

11:34

order to solve your problems. But it's

11:37

very uh stressful because everything

11:40

with Python could be different in every

11:43

single um computer. So my my problem my

11:48

pro my solution could be not your

11:51

solution. But for me this workflow works

11:54

fine for the generation of one you know

11:58

one

12:00

um

12:02

one video

12:06

um the value of um 168 it's the value of

12:12

the time okay the the length of your

12:16

generation it's about seven six seconds

12:20

7 seconds Okay. And you can work

12:24

whatsoever with your um with your uh

12:29

model and your specs. Of course,

12:32

uh some tests. As you can see, I use

12:35

this

12:36

uh girl here

12:39

for a simple test. And you can see

12:47

okay, there is a simple camera movement.

12:51

But we have a nice quality and um a nice

12:56

sound with the rain. I create something

12:59

with the rain. Uh beautiful girl rain

13:02

socks or four clothes blah blah blah. Uh

13:05

a pandon. Um I done something like this

13:10

also.

13:12

Okay. I used another

13:16

another u single image.

13:22

Okay, as you can see, we have a nice

13:26

nice video and nice quality with the

13:28

music with the the sound of the rain,

13:31

precise eyes movement.

13:35

I think it's quite impressive, quite

13:37

good. Uh let me see if this is this is

13:41

not uh sorry another uh um you know

13:45

another uh test and this is another one.

13:54

Okay I think it's it's impressive. Okay,

13:57

it's a open source video generation

14:01

model

14:02

and in with my spec with my confi with

14:06

my 5060Ti.

14:09

Okay, I generate this video in about

14:14

150 seconds, two minutes

14:19

for me is is impressive. And the

14:21

resolution, sorry, the resolution is

14:25

uh right there

14:30

and HD resolution.

14:33

So you can simply upscale that with an

14:37

upscaler RTX C2. I made a video I'll

14:41

link right here to choose the better

14:44

tops if you want.

14:47

But with this resolution, the sound

14:51

it's very very impressive, very good.

14:54

And there is also a voice

15:03

in Italian

15:05

or I made.

15:15

Okay, this is Sara that says something

15:19

like, "Hi, I'm Sara. Nice to meet you."

15:21

Okay, in Italian, it's it's powerful.

15:26

There's a good lip sync. Okay, you can

15:29

add music. You can add um uh speaking uh

15:34

and it's very sensitive in terms of um

15:37

you know um

15:40

prompting. So, be careful.

15:43

The next thing I want to show you is

15:46

this this workflow here.

15:50

Uh this workflow it's another workflow.

15:57

Um I leave the the link in description.

16:01

a great work of uh you know um I think

16:06

this workflow is from

16:10

uh CQ

16:12

uh channel but I leave in description

16:14

thank you for your your um work it's

16:20

powerful okay um I think I will find um

16:29

just a second I want

16:34

to show you

16:37

if I

16:40

able

16:49

well sorry guys I just want to show you

16:51

the page but otherwise I'm going to link

16:56

in description because it's very useful

17:03

right there. Thank you. Thank you. What

17:06

dreams cost?

17:08

What dream cost? Thank you for your

17:11

incredible nodes for the first and the

17:14

last frame generation.

17:18

Uh it's very powerful.

17:22

uh and I will create another video for

17:27

this uh you know um this part it's very

17:32

very cool very powerful and I think it's

17:35

worth the the the length of a single

17:38

video uh just to create a right uh you

17:42

know process uh to uh to this this um

17:47

node.

17:48

So uh for today it's all I think it's

17:52

impressive quite powerful and very um

17:56

handy in terms of demand of spec. So you

18:00

can try with your with your um specific

18:04

with your GPU and I think uh worth the

18:08

the the

18:11

you know the time um it's I'm I'm using

18:16

also for professional work um just to

18:20

have um a very precise

18:24

um way to look at um um camera movement

18:29

or uh you know a mode. I think it's very

18:32

very impressive, very cool and I think

18:36

uh it's uh funny to use. So I hope this

18:40

video could help you to work inside, you

18:44

know, um inside LTX, inside this um this

18:50

uh workflow. I hope some tricks could

18:53

help you uh to

18:57

to to have something to to work with.

19:02

Okay, I'm just trying to to do my best

19:05

to provide something cool for this kind

19:08

of of content. And I hope to see you uh

19:11

next video. And I think we'll talk about

19:14

this um uh first first and last video uh

19:19

you know um workflow and I'll post um

19:25

as as soon as I can. Okay,

19:29

for today it's all. Thank you guy to

19:32

being here to to be here with me. Um

19:36

and to also um give me some feedback um

19:41

some tips. Um I I appreciate that. I'm

19:45

trying to do my best. If I can I I

19:48

answer you with my with my capabilities

19:53

and I hope to hope to see you again next

19:56

video.

19:59

Dr. Sh

Interactive Summary

The video introduces the new XLTX 2.3 open-source video generation model, an upgrade from the 2.0 version. It highlights key features such as audio prompting, video clips up to 20 seconds, 50 frames per second encoding, and 4K resolution. The presenter, working locally on a Ryzen 9 system with 16GB VRAM and 64GB RAM, demonstrates the installation process and a simplified workflow focusing on image-to-video rather than prompt-to-video for better control. The video covers the model checkpoints, upscaler, LoRAs for spatial interpolation, and the Gemma text encoder. It explains a simplified workflow involving downscaling, low-resolution generation, and upscaling for speed. The presenter shares troubleshooting tips, including replacing the first key sampler with the LTX normalizing sampler and dealing with prompt enhancement issues. Several impressive video examples are shown, generated with good quality, sound, and even lip-syncing in Italian, all produced locally in a relatively short time on moderate hardware. The presenter also briefly mentions a first and last frame generation workflow from another creator, promising a future video on it.

Suggested questions

10 ready-made prompts