Sunday, August 14, 2022
HomeArtificial IntelligenceUtilizing autograd in TensorFlow to Clear up a Regression Drawback

Utilizing autograd in TensorFlow to Clear up a Regression Drawback

We normally use TensorFlow to construct a neural community. Nonetheless, TensorFlow just isn’t restricted to this. Behind the scene, TensorFlow is a tensor library with automated differentiation functionality. Therefore we are able to simply use it to unravel a numerical optimization downside with gradient descent. On this submit, we’re going to present how TensorFlow’s automated differentiation engine, autograd, works.

After ending this tutorial, you’ll study:

  • What’s autograd in TensorFlow
  • The way to make use of autograd and an optimizer to unravel an optimization downside

Let’s get began.

Utilizing autograd in TensorFlow to Clear up a Regression Drawback
Photograph by Lukas Tennie. Some rights reserved.


This tutorial is in three components; they’re:

Autograd in TensorFlow

In TensorFlow 2.x, we are able to outline variables and constants as TensorFlow objects and construct an expression with them. The expression is basically a perform of the variables. Therefore we might derive its spinoff perform, i.e., the differentiation or the gradient. This function is likely one of the many elementary options in TensorFlow. The deep studying mannequin would make use of this within the coaching loop.

It’s simpler to clarify autograd with an instance. In TensorFlow 2.x, we are able to create a continuing matrix as follows:

The above prints:

Which implies we created an integer vector (within the type of Tensor object). This vector can work like a NumPy vector in many of the circumstances. For instance, we are able to do x+x or 2*x and the result’s simply as what we’d anticipate. TensorFlow comes with many capabilities for array manipulation that match NumPy, akin to tf.transpose or tf.concat.

Creating variables in TensorFlow is simply the identical, for instance:

This could print:

and the operations (akin to x+x and 2*x) that we are able to apply to Tensor objects will also be utilized to variables. The one distinction between variables and constants is the previous permits the worth to vary whereas the latter is immutable. This distinction is vital once we run a gradient tape, as follows:

This prints:

What it does is the next: We outlined a variable x (with worth 3.6) after which created a gradient tape. Whereas the gradient tape is working, we compute y=x*x or $$y=x^2$$. The gradient tape monitored how the variables are manipulated. Afterwards, we ask the gradient tape to search out the spinoff $$dfrac{dy}{dx}$$. We all know $$y=x^2$$ means $$y’=2x$$. Therefore the output would give us a worth of $$3.6times 2=7.2$$.

Utilizing autograd for Polynomial Regression

How this function in TensorFlow useful? Let’s contemplate a case that we now have a polynomial within the type of $$y=f(x)$$ and we’re given a number of $$(x,y)$$ samples. How can we get well the polynomial $$f(x)$$? One technique to do it’s to imagine random coefficient for the polynomial and feed within the samples $$(x,y)$$. If the polynomial is discovered, we should always see the worth of $$y$$ matches $$f(x)$$. The nearer they’re, the nearer our estimate is to the right polynomial.

That is certainly a numerical optimization downside such that we need to reduce the distinction between $$y$$ and $$f(x)$$. We are able to use gradient descent to unravel it.

Let’s contemplate an instance. We are able to construct a polynomial $$f(x)=x^2 + 2x + 3$$ in NumPy as follows:

This prints:

We might use the polynomial as a perform, akin to:

And this prints 8.25, for $$(1.5)^2+2times(1.5)+3 = 8.25$$.

Now we might generate a lot of samples from this perform, utilizing NumPy:

Within the above, each X and Y are NumPy arrays of form (20,1) and they’re associated as $$y=f(x)$$ for the polynomial $$f(x)$$.

Now assume we have no idea what’s our polynomial besides it’s quadratic. And we want to get well the coefficients. Since a quadratic polynomial is within the type of $$Ax^2+Bx+C$$, we now have three unknowns to search out. We are able to discover them utilizing gradient descent algorithm that we implement, or utilizing an current gradient descent optimizer. The next demonstrates the way it works:

The print assertion earlier than the for loop provides three random quantity, akin to

however the one after the for loop provides us the coefficients very near that in our polynomial:

What the above code does is the next: First we create a variable vector w of three values, specifically the coefficients $$A,B,C$$. Then we create an array of form $$(N,3)$$, which $$N$$ is the variety of samples in our array X. This array has 3 columns, that are respectively the worth of $$x^2$$, $$x$$, and 1. We construct such an array from the vector X utilizing np.hstack() perform. Equally, we construct the TensorFlow fixed y from NumPy array Y.

Afterwards, we use a for loop to run gradient descent in 1000 iterations. In every iteration, we compute $$x instances w$$ in matrix type to search out $$Ax^2+Bx+C$$ and assign it to the variable y_pred. Then we evaluate y and y_pred and discover the imply sq. error. Subsequent, we derive the gradient, i.e., the speed of change of the imply sq. error with respect to the coefficients w. And based mostly on this gradient, we use gradient descent to replace w.

In essence, the above code is to search out the coefficients w that minimizes the imply sq. error.

Placing all the things collectively, the next is the entire code:

Utilizing autograd to Clear up a Math Puzzle

Within the above, we used 20 samples and it’s greater than sufficient to suit a quadratic equation. We might use gradient descent to unravel some math puzzle as properly. For instance, the next downside:

In different phrases, we want to discover the values of $$A,B,C,D$$ such that:

A + B &= 9
C – D &= 1
A + C &= 8
B – D &= 2

This will also be solved utilizing autograd, as follows:

There will be a number of options to this downside. One resolution is the next:

Which implies $$A=4.68$$, $$B=4.32$$, $$C=3.32$$, and $$D=2.32$$. We are able to confirm this resolution matches the issue.

What within the above code does is to outline the 4 unknown as variables with a random preliminary worth. Then we compute the results of the 4 equations and evaluate it to the anticipated reply. We then sum up the squared error and ask TensorFlow to attenuate it. The minimal doable sq. error is zero attained when our resolution matches precisely the issue.

Observe in the way in which we ask the gradient tape to provide the gradient: We ask the gradient of sqerr respective to A, B, C, and D. Therefore 4 gradients are discovered. We then apply every gradient to the respective variables in every iteration. Quite than in search of the gradient in 4 totally different calls to tape.gradient(), that is required in TensorFlow as a result of the gradient of sqerr can solely be recalled as soon as by default.

Additional Studying

This part supplies extra sources on the subject if you’re seeking to go deeper.



On this submit, we demonstrated how TensorFlow’s automated differentiation works. That is the constructing block for finishing up deep studying coaching. Particularly, you discovered:

  • What’s automated differentiation in TensorFlow
  • How we are able to use gradient tape to hold out automated differentiation
  • How we are able to use automated differentiation to unravel a optimization downside

Develop Deep Studying Initiatives with Python!

Deep Learning with Python

 What If You Might Develop A Community in Minutes

…with just some traces of Python

Uncover how in my new E-book:

Deep Studying With Python

It covers end-to-end initiatives on subjects like:

Multilayer PerceptronsConvolutional Nets and Recurrent Neural Nets, and extra…

Lastly Convey Deep Studying To

Your Personal Initiatives

Skip the Teachers. Simply Outcomes.

See What’s Inside



Please enter your comment!
Please enter your name here

Most Popular

Recent Comments