LTX 2.3 Tutorial Part 1: Master Pro Image-to-Video (I2V) Workflow
319 segments
Hi guys. Hi. Hi everyone. Uh, welcome
back. I'm Edo. New setting. I changed my
camera. Hope uh this work better. Um,
thank you to be here today with me. And
today we are going to see something
cool. I think I hope we are going to see
the new XLTX2.3
video
or engine. Um,
we already know the the last uh update,
the LTX 2.0.
Um, it's quite good. But this this new
video
uh model, opensource new video model, it
is amazing in my opinion.
um for many reason. Um
some of these reason are here. Uh you
can prompt audio as you can as you can
do with the 2.0 version. Uh you can
create video clip up to 20 second. uh
there is the the possibility to uh to
encode to 50 um frame per second and you
can create something with 4K
uh resolution.
Uh today we're going to see everything I
hope um the image to video model and the
first frame last frame uh workflow. Um,
I want to show you some of my uh render
some some of my um videos and try to
talk with you about um pros and cons uh
of this model. Um
I'm working in local.
Okay. So, I'm working with my Ryzen 9 uh
5060Ti with 16 gig of VRAM and 74 uh 64
gig of RAM uh for the uh the
for Windows. Okay. Uh the system RAM,
this is my
um spec uh and I'm working local. So, no
um
I I I I want just to create something
that works for me, for my spec, for my
confi, for my uh computer. Um I will
guide you in the in the
installation process. And now we can go
uh right here.
This is the confui
uh blog you know um
part with everything uh with the
highlights of the model with the some
with some example output very cool and
with this controversial
workflow um first of all I don't work
with prompt to video um workflow so I'm
just working with image to video I think
it's the only way to work professional
in a professional way. Uh prompting it's
uh not so basic when you want to have
control of your uh scene. Um and if you
want to see something more specific,
this is the GitHub paper with
everything.
Okay, you have to install a bunch of of
things in order to work with this LTX.
But uh but but but but uh we see
everything um to to to work with. Okay.
So um
these are all the checkpoints the 2.3
uh 20 billions dev or distilled if you
want to have some uh um save for your
memory. This is the upscaler. We talk
about the upscaler um in a second
moment. This is the Laura uh to uh to
in order to work better with um spatial
interpolation. This is the text encoder
Gemma. It's very heavy in terms of
memory and space but whatsoever.
And these are all the if you want to
have the the precise control
um uni union control or motion track
control. Okay, I will not use these two
in the workflow but you can whatever you
want. So
this is the um workflow
you can find here. I think I think this
is the workflow. This one
right there we can download it and you
know you can just drag and drop in your
uh voila.
Yeah is the same. Okay.
Uh do um this is the simple workflow.
Everything you need to know it's is this
um your your model checkpoint your
loras. Okay.
And the upscaler. This is a simplified
version of the the LTX um workflow. We
have this right here is is an a scale
node
in order to work uh fast. LTX first
downsize your frame and then upscale
that. So it it can work very quickly
with the downside version of your your
images and then with upscale with
upscaler you upscale everything. Okay.
in order to do something very quick and
very fast.
This is the the you know the subg graph.
Okay, seems to be very simple
checkpoint. This is the Laura. I'm using
another version of Laura, but it's the
same. Okay, this still Laura this one.
Okay, this Laura here we found it.
Sorry, right here. Okay, this Laura,
this still Laura right there. Bam. The
gemma and the upscaler.
The value of your frame rate 24. And the
value of the image. Okay. Heights and
weight.
I just click this one, this button here
to go to the subgraph. This is a cropped
version of a specific workflow
that is this.
Uh it seems to be very you know
uh um
demanding in terms of um understanding
and it's quite quite heavy.
Uh and you know you we have to um
one thing very important this version
this version here is a modified version
of the um confui
version. This is the confi you can
download from and in the confi version
you can find this prompt enhmentment
something
for me it's a mess every time I'm using
this prompt enhancement here I will face
errors errors and errors and errors so I
delete it it's completely useless use
Gemini or whatsoever to create a perfect
prompt this is a mess
And the second thing
um for me for me I had some many
problems with the first key sampler. I
don't know why. I don't know why. So for
the gener low resolution generation
process I swap with the LTX normalizing
sampler. So it's a same simple um you
know um
um node for LTX and remember you have to
update your confi of course everything
is in confi but you have to update
confui. So the the idea of the the
process if is to uh reduce the the scale
of your image. There is a process right
here. Um the less you use this
compression
uh the the most the the the model will
find details and everything very crispy.
So you can you can handle with 18 20 24
of compression to have something very
balanced in terms of quality. Okay. Uh
this part will reduce crop the the the
the image ratio. Uh and then this part
generate a low resolution image and then
there is an upscale
an upscale that creates something with
the right resolution. So he um he reduce
of enough the resolution and then
multiply by two. So it's it's a trick to
uh create something very very specific,
very cool. Um and this is the part with
the um high resolution generation with
the the code everything
it's
right here. Okay. So in my case with
this
um with this node deleted removed and
with this change here you can leave this
one
as the the provided one from the simple
um checkpoint sorry workflow from from
confi but I had to um replace this
Sometime confi is magic. I surfing on
Reddit, surfing on github in order to
resolve my problem. I fix it with this.
I don't know if this will fix your LTX,
your confi.
Uh guys, I hope so. Um I'm trying to do
my best in order to um show my process
to find the resolution. You can also uh
call Gemini Gemini or code whatsoever in
order to solve your problems. But it's
very uh stressful because everything
with Python could be different in every
single um computer. So my my problem my
pro my solution could be not your
solution. But for me this workflow works
fine for the generation of one you know
one
um
one video
um the value of um 168 it's the value of
the time okay the the length of your
generation it's about seven six seconds
7 seconds Okay. And you can work
whatsoever with your um with your uh
model and your specs. Of course,
uh some tests. As you can see, I use
this
uh girl here
for a simple test. And you can see
okay, there is a simple camera movement.
But we have a nice quality and um a nice
sound with the rain. I create something
with the rain. Uh beautiful girl rain
socks or four clothes blah blah blah. Uh
a pandon. Um I done something like this
also.
Okay. I used another
another u single image.
Okay, as you can see, we have a nice
nice video and nice quality with the
music with the the sound of the rain,
precise eyes movement.
I think it's quite impressive, quite
good. Uh let me see if this is this is
not uh sorry another uh um you know
another uh test and this is another one.
Okay I think it's it's impressive. Okay,
it's a open source video generation
model
and in with my spec with my confi with
my 5060Ti.
Okay, I generate this video in about
150 seconds, two minutes
for me is is impressive. And the
resolution, sorry, the resolution is
uh right there
and HD resolution.
So you can simply upscale that with an
upscaler RTX C2. I made a video I'll
link right here to choose the better
tops if you want.
But with this resolution, the sound
it's very very impressive, very good.
And there is also a voice
in Italian
or I made.
Okay, this is Sara that says something
like, "Hi, I'm Sara. Nice to meet you."
Okay, in Italian, it's it's powerful.
There's a good lip sync. Okay, you can
add music. You can add um uh speaking uh
and it's very sensitive in terms of um
you know um
prompting. So, be careful.
The next thing I want to show you is
this this workflow here.
Uh this workflow it's another workflow.
Um I leave the the link in description.
a great work of uh you know um I think
this workflow is from
uh CQ
uh channel but I leave in description
thank you for your your um work it's
powerful okay um I think I will find um
just a second I want
to show you
if I
able
well sorry guys I just want to show you
the page but otherwise I'm going to link
in description because it's very useful
right there. Thank you. Thank you. What
dreams cost?
What dream cost? Thank you for your
incredible nodes for the first and the
last frame generation.
Uh it's very powerful.
uh and I will create another video for
this uh you know um this part it's very
very cool very powerful and I think it's
worth the the the length of a single
video uh just to create a right uh you
know process uh to uh to this this um
node.
So uh for today it's all I think it's
impressive quite powerful and very um
handy in terms of demand of spec. So you
can try with your with your um specific
with your GPU and I think uh worth the
the the
you know the time um it's I'm I'm using
also for professional work um just to
have um a very precise
um way to look at um um camera movement
or uh you know a mode. I think it's very
very impressive, very cool and I think
uh it's uh funny to use. So I hope this
video could help you to work inside, you
know, um inside LTX, inside this um this
uh workflow. I hope some tricks could
help you uh to
to to have something to to work with.
Okay, I'm just trying to to do my best
to provide something cool for this kind
of of content. And I hope to see you uh
next video. And I think we'll talk about
this um uh first first and last video uh
you know um workflow and I'll post um
as as soon as I can. Okay,
for today it's all. Thank you guy to
being here to to be here with me. Um
and to also um give me some feedback um
some tips. Um I I appreciate that. I'm
trying to do my best. If I can I I
answer you with my with my capabilities
and I hope to hope to see you again next
video.
Dr. Sh
Ask follow-up questions or revisit key timestamps.
The video introduces the new XLTX 2.3 open-source video generation model, an upgrade from the 2.0 version. It highlights key features such as audio prompting, video clips up to 20 seconds, 50 frames per second encoding, and 4K resolution. The presenter, working locally on a Ryzen 9 system with 16GB VRAM and 64GB RAM, demonstrates the installation process and a simplified workflow focusing on image-to-video rather than prompt-to-video for better control. The video covers the model checkpoints, upscaler, LoRAs for spatial interpolation, and the Gemma text encoder. It explains a simplified workflow involving downscaling, low-resolution generation, and upscaling for speed. The presenter shares troubleshooting tips, including replacing the first key sampler with the LTX normalizing sampler and dealing with prompt enhancement issues. Several impressive video examples are shown, generated with good quality, sound, and even lip-syncing in Italian, all produced locally in a relatively short time on moderate hardware. The presenter also briefly mentions a first and last frame generation workflow from another creator, promising a future video on it.
Videos recently processed by our community