Thursday 17 October 2024

ghJzNYrwfVg

ghJzNYrwfVg

so hello there and welcome to another
tutorial my name is tammy bakshi
and today we're going to be going over
how you can implement
your own variety of the pulse algorithm
in tensorflow 2.0
using python now before we get deeper
into this i do want to say that if you
do enjoy this kind of content and want
to see more of it please do make sure to
subscribe to the channel and turn on
notifications
that way you actually know whenever i
release a new video just like this one
now today we're going to be covering
some really fascinating stuff but first
of all
we have to take a little bit of a step
back what is
pulse well simply put pulse is an
algorithm that enables you to use
gans or generative adversarial networks
in order to
upscale images of really whatever it's
currently been fl
applied to faces but it can be applied
to anything that a gan has been trained
to generate
but how exactly does pulse use gans to
upscale
pre-existing images well let's take a
look at that
but to understand that we have to take
another step back
how do gans work in the first place now
simply put generative adversarial
networks work with
two separate neural networks a generator
and a discriminator
the generator is trying to generate as
realistic fakes as they can possibly
create
and the discriminator is trying to
determine whether a certain image was
generated by the generator
or was a genuine image from the data set
now as the gen
as the discriminator becomes better and
better because it itself is a neural
network and because that's
differentiable
we can actually calculate gradients for
the generator
through the discriminator as a loss
function so we are essentially evolving
the loss function
for the generator alongside the
generator itself
it's really fascinating stuff but here's
the thing how does the generator work i
mean
the generator is just math it can't just
generate things out of nowhere
and that is where what's called a latent
vector comes into play
now the latent vector is really
interesting because what it does
is it just defines a point in space now
not
three-dimensional space as we know it
but rather a hundred-dimensional or
five-12-dimensional space
so way more dimensions than our minds
can possibly fathom
but basically you're defining a point in
that space and you're just finding a
completely random point in that space
and you're telling the gan all right
here's the point i want you to generate
whatever
data you have at that point so kind of
like how
a regular neural network would take an
image and map it to a feature space that
is then classified by a linear
classifier
in a similar way the gam has a sort of
input manifold of
all these different varieties of images
and you're just picking a random point
in that space and telling it to generate
whatever it is that you've trained it to
generate in this case faces
now this gan was already trained on
what's known as the celeb
a the celeb aligned data set it
generates images of approximately 128 by
128 resolution
now this is definitely not the
state-of-the-art trained version that
would take
weeks to do on a cluster of gpu machines
however
it does generate realistic faces that we
can use
for the pulse example today now the way
pulse works itself is really interesting
traditionally to train a neural network
you calculate
the gradient with respect to the network
through a loss function and so for
example if we were training the
discriminator
we would use something like the sigmoid
cross entropy or known as also known as
binary cross
entropy loss function at the end of our
neural network calculate gradients
through it with respect to the weights
of the network and then use those
gradients
to move the direction move the network
in the right direction
but with pulse we reverse that we
already have
a pre-trained gam and theoretically
something you could do is if you already
have an image and you want to know where
that image lies within this gans
manifold of different images is well you
find the
gradient of the uh of the you find the
gradient of this function
loss function with respect to the input
to the network the input is your latent
vector so you start off with a random
point in space
you have your final resulting face that
you know you want to find
you calculate the gradient with respect
to the input you iterate you
optimize and eventually you find out
where your face is located
in that vector space in the latent space
now imagine this down sampling is
technically a differentiable operation
because it's really just averaging a
bunch of pixels and so
what if you had an already down scaled
image let's just say a 32 by 32 image
it lacks a lot of detail now because it
has
fewer pieces of detail or few pieces of
information
than a 128x128 image it's always
possible that
multiple different 128x128 images
can map to the same 32x32 image
that's why it's impossible to go from
32-32
to 128 back to the original image
because there are multiple different
images to map to
regular neural networks would try to
average these and that's why you get
weird blurring and smoothing and why
upscaling has never really worked in the
past with neural nets but this technique
is different
because they're using again they can
actually precisely find
a point within that vector space that
latent space
and they can tell the neural network to
generate at exactly that point
so it can hallucinate exactly the
details that it needs to hallucinate
they may not be the exact same as the
original image but they're close enough
now as i mentioned because down sampling
is a differentiable operation
what does that mean that means you have
an input of a downsampled image say 32
by 32
your neural network starts off with a
random latin vector that is fed into the
network
you get a 128x128 output you scale that
down
that's differentiable that's part of
your graph of operations now
you calculate the loss based off of
what you know you are expecting from
that downscaled output
then all you do is you calculate the
gradients with respect to the latent
vector
and you optimize that input over and
over again
until you find a latent vector whose
output
when down sampled is nearly equivalent
to the input to the model
that is really a pretty genius way of
scaling images
because it forces the network to
hallucinate exactly the details that it
wants to hallucinate
instead of having to average out but
here's the thing
pulse is pretty complex and if you take
a look at the pytorch codebase
you would notice there's hundreds of
lines of code it can be difficult to go
through and it's difficult to really
understand the way it works through code
so what i've done
is i've gone ahead and boiled that down
to approximately 20 lines of tensorflow
2.0 code
and what that enables you to do is load
in an entire generative adversarial
network
use the generator in order to generate
images and
implement pulse at the same time and so
now without any further ado let's take a
look at how you can implement
pulse in tensorflow 2.0 all right let's
dive
right into the implementation now as you
can see i'm currently on my mac
but this is pretty compute intensive so
we're going to need a gpu
so what i'm going to do is i'm going to
ssh into a server
that actually provides us with gpu power
now this specific server has two gpus
but we're only going to be using
one and we're only going to be
implementing on a batch size of
one technically you could scale up the
batch size and get a free performance
increase
but we're not going to because we're
only going to be working with a single
image in this set
in this case uh and so now what we're
going to do
um is we are going to go ahead and exec
into a docker container so i'm going to
go into vi mode and
exec into my docker container this
docker container is a
tensorflow uh image and so this does
of course still support gpu access from
within the container
now within this container what i'm going
to do is i'm going to open up a little
python file that i've already got set up
now this python file is great because it
only has
four imports tensorflow and tensorflow
hub
as well as numpy and pillow now after
we've taken a look at those imports
as you can see i've also just got a very
simple function called show image
basically what this is going to do is
it's going to take output from the model
and it's going to convert it to a pillow
image which can then be saved to the
disk
or shown to the user because we're ssh
into a headless server
it's going to need to be saved to disk
then transported back to my mac
let's take a look at what we do in this
function basically the output of the
model is really simple
it's just the uh regular image output
except
instead of it being on a scale of 0 to
255 so an 8-bit
unsigned integer it's from from the
scale of 0 to 1
and so it has decimal values so it's a
32-bit floating point value
and this is because of course neural
network operations wouldn't really work
with 8-bit unsigned integers
and so we have to use floating point
values
however after you basically just
multiply all the values by 255 to
rescale convert it to 8 bit integers
you can simply call image.fromarray and
it converts it to an image for you
now this line of code over here this one
really makes the program special it's
basically
loading in something called a
progressively growing generative
adversarial network it's by nvidia
and this program as it's called is
basically going to enable you
to use really high fidelity uh
image synthesis algorithms now what is
that what is progressively growing mean
basically what it means is
the generator and the discriminator are
trained in an incrementally sort of
uh resolution or incremental increasing
um
method and so we'll start off by
generating just eight by eight images
that don't really have any faces in them
but give you know again an approximate
sort of an estimate
of what the final manifold is going to
look like then we
keep those layers and add on a few extra
layers of the generator and the
discriminator
that goes 16 by 16 and then we go 32 by
32 and in between we keep training until
we're good at each of these levels
and then we we continue to add layers
until we get to say 1024 by 1024
and suddenly we've got super high
fidelity images
and so the neural networks work in this
way uh because it's a lot
easier to train the gans when you can do
it progressively rather than all at once
1024x1024
and so nvidia had some really great
innovation with that
now the next thing that i'm doing is
basically just a little bit of
code uh sort of structure i'm just
defining a small constant called
rd so resolution down sampled currently
it's 32 by 32
and so what that means is we're going to
feed into the network
an image that is only 32 by 32 pixels
large and we're going to expect
that the network will help us upscale
that from 32
by 32 to 128 by 128. as a matter of fact
on screen right now you're actually
taking a look
at just some of the samples of what this
exact code that you can go ahead and
download
from github right now has already
generated
it's really fascinating and i'm going to
show you actually running this code now
how it works and how you can generate
your own images too
now from there i've got uh three lines
of code and these actually work with
tensorflow so what i'm doing is i'm
first of all starting off by generating
what i mentioned as the random point in
the latent space that tells the gan
where to generate
now this is not a necessary step i
really want to highlight that
these three lines of code are what
they're doing
is they're telling the gan to generate
an image
this is just so that we can get an image
of a face
if you would like to you could take a
picture of your own face and scale it
down to 32 by 32
and feed it into the down image variable
so instead of having these three lines
of code you could just have down image
equals image.load and then your own
image
you could very easily do that however
for the sake of this example what i want
to do just so you can get an idea of how
these gans work
is actually generate an image using the
gan
cut it off from the network so that's no
longer part of the operational graph
so it's its own standalone image then
load that back in through down image as
a down sampled
image pretend we don't have the high
resolution copy
and try and hallucinate a high
resolution copy
so let's go ahead and continue keeping
that assumption in mind
now what i what i go ahead and do just
that we have some ground truth
is i save both the real high quality
image 128 by 128
as well as the down sample 32 by 32
image
to the disk as real and down dot png
respectively
from there i go ahead and generate
another random point in the latent space
in this case 512 dimensions why exactly
did we do that
well remember we want to optimize so
what we're doing is we're starting it
again some click
from some completely random point in
that space
and now we're going to use the power of
gradient descent and tensorflow
in order to help us optimize this
prediction
to get us as close to the original value
as we can possibly get it to be in an
ideal world
after the next few lines of code pred
coordinates so
the prediction coordinate will be
exactly equal to the real coordinate
then again we don't live in a perfect
world we have friction and so it's not
going to be exactly there
but it's going to be in the vicinity or
or and this is completely a valid option
there could be another
local minimum that's close enough to
this original image
closer to where the actual next random
point was in the manifold
and so we can't really say exactly
what's going to happen because it's
stochastic but for all what we want is
the output
and so we don't really care what the
latent space vectors are
so now what we do is we actually have
the main loop of the program
in this main loop what we're doing is
we're starting off by defining a
tensorflow gradient tape
now tensorflow gradient tape is
essentially a data structure that is
going to
watch different variables and when it
watches these different variables
what you're able to do at the end is
you're able to tell tensorflow all right
you've been noting down the operations
that i've run on these variables
now there's here's my output value can
you help me determine
the gradients with respect to whatever
variables you've been watching
according to the function that you've
written down on your tape
and so for example in this case what i'm
doing is i'm creating new tensorflow uh
gradient tape and i'm telling it watch
the predicted coordinate
and then after i tell it watch it then i
go ahead and feed that coordinate into
the model
i then go ahead and take the output of
the model and down sample it to 32 by
32 because remember we want the down
sampled images to look the same and by
proxy the high quality images should be
somewhat similar
then after resizing i calculate a loss
value
slightly different from what the pulse
paper originally used
essentially what this is going to do is
we're going to take the absolute
different
the absolute difference of the of of our
target
and what we actually are currently
predicting and we're going to calculate
the mean of that
across all pixels then after we have the
loss i can go ahead and print that out
print that out that's just housekeeping
but then from there
i go ahead and tell the gradient tape
hey you've been recording
all the operations i ran on that tensor
now i have this final loss value
and can you give me the the gradients
with respect to the predictive
coordinate
and with that gradient we now have a
vector
that is pointing in the direction of
steepest ascent
and what do we want to do we want to do
gradient descent and so we multiply that
gradient by negative 1.5
why 1.5 why that magic number well it's
because
uh well you know traditionally with
machine learning we have a learning rate
and that learning rate is to stabilize
training by making it so we don't take
too many big steps in a single direction
uh to to avoid overshooting something
but in this case
the gradients are so small that we
actually have to amp
up the gradients um and so i'm actually
multiplying by negative 1.5 to amp up
the gradients a little bit to train a
little bit faster
but at the same time it has to be
negative so that we're pointing the
direction of sleep as descent
and not ascent i then go ahead and add
that to our predicted coordinate
and then i check simply if the loss is
less than 0.035 which is the sort of
target threshold of quality then we
break
if we wanted the images to be even
further high quality
we could really want this loss to be as
low as we wanted it to get
but even this takes quite some time to
compute on the nvidia tesla p100 gpu
at only 128 by 128 so i would say
that this is a very achievable target
for something that's not
production um something that doesn't
need to be pushed to production
uh and then what i just go ahead and do
is take that final computed predicted
coordinate
feed that into the neural net uh take
the output and save that to the disk as
prediction.png.preb.png
now that we've got our three png files
it's time to take a look at some results
but first we've got to actually run the
code
so i'm going to go ahead and quit vim
looks like i made a change so we're
going to have to quit and uh
write as well so now what i'm going to
do is i'm just going to quickly run
python print.py well not python3 python
because tensorflow
docker image comes with python 2. but if
i go ahead and run this what's going to
happen is it's going to initialize a few
things it's going to take a look at the
gpus make sure they exist it's going to
initialize cu dnn uh the cuda deep
neural network library
and off it goes as you can see you can
take a look at the lost value per
iteration it's printing it out
and at first you're going to see some uh
sort of larger jumps in in loss going
down
over time it's going to slow down that's
just natural in the sense that as you
get closer and closer to what you
actually want
first of all there's a lot more
jittering happening because remember
multiple different
initial inputs can result in the same
down sample vector
and it's very difficult to compute those
differences at that small scale
but another thing is that is that just
generally as you get closer
you sort of start to plateau in the
grading space right it's not as
steep and therefore you're not taking as
big steps and you have to take very very
small steps at the end that's also why
we want to amp up
the learning rate but again i don't want
you to have to watch paint dry
so we're going to speed up this clip and
i'll be back in
a minute or two right as we reach our
target loss
i'll be right back
all right so as you can see the neural
network has just finished its whole
optimization procedure tensorflow
has ended the program and now let's take
a look at what our results actually look
like
so what i'm going to do is i'm going to
quickly remove the previous prediction
and then zip up all of our png files
into a single zip then of course because
we're doing this on an external machine
i'm going to go ahead and copy that over
to the actual uh host machine
that that doc is running in and then i'm
going to go ahead and run
a secure copy and so we're going to do
root credzip um
to my home directory
so from there we're going to go into
finder if finder ever
loads up the new file oh does that not
work do we need to do ch flags nope
hidden
let's see there we go so now what i'm
going to do is i'm going to go ahead and
unzip the file and let's take a look at
what we uncover
so as you can see even just from the
thumbnails before i open it up you can
tell that uh
there's a there's definitely a
resemblance between all
three but just how far does that
resemblance go well let's first of all
start off by taking a look at the real
image
now this in and of itself is
pretty remarkable right the fact that we
can generate
such a wonderful fake image of a person
is is just really incredible um this
human doesn't exist
right that's literally a website this
human does not exist.com uh
and now we've we've actually been able
to generate a
face locally now let's see what happens
when we take a look at a similar face
but
very much down sampled all right as you
can see we have a very pixelated version
of that face
you wouldn't want to put that in a
presentation or anything of that sort
but now what if we wanted to upscale
that same image
well let's take a look at what that
result looks like
oh wow that is that is actually really
incredible that's one of the best
results i've gotten from this network so
far wow that is
that is really close take a look at that
so what happened was now remember we
have this ground truth original image
right we know what this looks like
but the thing is the neural network
doesn't have that image the neural
network doesn't have
this ground truth sample the only thing
that the neural network sees is this
super
weird pixelated face and what the neural
network again is trying to do
is it's trying to generate a latent
vector for the gan
such that when you take the output of
the generator
and you down scale it similar to this
super pixelated image
the two downscaled versions look the
same and
once again by proxy that means
that the super high resolution versions
should look similar as well
you can actually take a look at some of
the features here that looks super
similar
uh it's the facial structure of course
is incredibly similar
uh the hair is pretty much exactly the
same it's it's the lighting
it's it's really incredible the way this
works as a matter of fact if i actually
probably go ahead and preview resize
this to 32 by 32
and if i go ahead and scale up it should
be
pretty difficult to tell the difference
between the two down sampled images
because remember
the gan was actually trying to optimize
the input once again
such that the two down scaled versions
of the images look the same as you can
see
they look essentially exactly the same
there's some differences you know around
the eyes you can see these two pixels
over here
you know around um in fact i think
that's the only noticeable difference
apart from the hair over here
but by using this technique we've been
able to upscale a face
without using the you know traditional
methods like variational autoencoders
and u-net and these sorts of things
that introduce their own sort of
smoothing and noise into the actual
solutions
gans are really incredible really
flexible architectures i will say
they're very very picky about the way
they like to be trained
because i mean you've got two competing
neural networks
that is inherently a very difficult
thing to actually stabilize
uh however there are going to be more
tutorials on this channel in the future
regarding how exactly you can do that
and so thank you very much everyone for
joining in today i do hope you enjoyed
today's tutorial
on the pulse system and how it works now
if you do have any questions or
suggestions feel free to leave them down
in the comment section below and i will
definitely get back to you this code is
already on github and the link is in the
description below
apart from that if you do like this sort
of content and want to see more of it
please do make sure to subscribe to the
channel and turn on the notifications
so you're actually notified whenever i
release new content
and without any further ado thank you
very much everyone for joining in today
hope you enjoyed
goodbye

No comments:

Post a Comment

PineConnector TradingView Automation MetaTrader 4 Setup Guide

what's up Traders I'm Kevin Hart and in today's video I'm going to be showing you how to install Pine connecto...