We normally use TensorFlow to construct a neural community. Nonetheless, TensorFlow just isn’t restricted to this. Behind the scene, TensorFlow is a tensor library with automated differentiation functionality. Therefore we are able to simply use it to unravel a numerical optimization downside with gradient descent. On this submit, we’re going to present how TensorFlow’s automated differentiation engine, autograd, works.

After ending this tutorial, you’ll study:

- What’s autograd in TensorFlow
- The way to make use of autograd and an optimizer to unravel an optimization downside

Let’s get began.

## Overview

This tutorial is in three components; they’re:

## Autograd in TensorFlow

In TensorFlow 2.x, we are able to outline variables and constants as TensorFlow objects and construct an expression with them. The expression is basically a perform of the variables. Therefore we might derive its spinoff perform, i.e., the differentiation or the gradient. This function is likely one of the many elementary options in TensorFlow. The deep studying mannequin would make use of this within the coaching loop.

It’s simpler to clarify autograd with an instance. In TensorFlow 2.x, we are able to create a continuing matrix as follows:

import tensorflow as tf
x = tf.fixed([1, 2, 3]) print(x) print(x.form) print(x.dtype) |

The above prints:

tf.Tensor([1 2 3], form=(3,), dtype=int32) (3,) <dtype: ‘int32’> |

Which implies we created an integer vector (within the type of Tensor object). This vector can work like a NumPy vector in many of the circumstances. For instance, we are able to do `x+x`

or `2*x`

and the result’s simply as what we’d anticipate. TensorFlow comes with many capabilities for array manipulation that match NumPy, akin to `tf.transpose`

or `tf.concat`

.

Creating variables in TensorFlow is simply the identical, for instance:

import tensorflow as tf
x = tf.Variable([1, 2, 3]) print(x) print(x.form) print(x.dtype) |

This could print:

<tf.Variable ‘Variable:0’ form=(3,) dtype=int32, numpy=array([1, 2, 3], dtype=int32)> (3,) <dtype: ‘int32’> |

and the operations (akin to `x+x`

and `2*x`

) that we are able to apply to Tensor objects will also be utilized to variables. The one distinction between variables and constants is the previous permits the worth to vary whereas the latter is immutable. This distinction is vital once we run a **gradient tape**, as follows:

import tensorflow as tf
x = tf.Variable(3.6)
with tf.GradientTape() as tape: y = x*x
dy = tape.gradient(y, x) print(dy) |

This prints:

tf.Tensor(7.2, form=(), dtype=float32) |

What it does is the next: We outlined a variable `x`

(with worth 3.6) after which created a gradient tape. Whereas the gradient tape is working, we compute `y=x*x`

or $$y=x^2$$. The gradient tape monitored how the variables are manipulated. Afterwards, we ask the gradient tape to search out the spinoff $$dfrac{dy}{dx}$$. We all know $$y=x^2$$ means $$y’=2x$$. Therefore the output would give us a worth of $$3.6times 2=7.2$$.

## Utilizing autograd for Polynomial Regression

How this function in TensorFlow useful? Let’s contemplate a case that we now have a polynomial within the type of $$y=f(x)$$ and we’re given a number of $$(x,y)$$ samples. How can we get well the polynomial $$f(x)$$? One technique to do it’s to imagine random coefficient for the polynomial and feed within the samples $$(x,y)$$. If the polynomial is discovered, we should always see the worth of $$y$$ matches $$f(x)$$. The nearer they’re, the nearer our estimate is to the right polynomial.

That is certainly a numerical optimization downside such that we need to reduce the distinction between $$y$$ and $$f(x)$$. We are able to use gradient descent to unravel it.

Let’s contemplate an instance. We are able to construct a polynomial $$f(x)=x^2 + 2x + 3$$ in NumPy as follows:

import numpy as np
polynomial = np.poly1d([1, 2, 3]) print(polynomial) |

This prints:

We might use the polynomial as a perform, akin to:

And this prints `8.25`

, for $$(1.5)^2+2times(1.5)+3 = 8.25$$.

Now we might generate a lot of samples from this perform, utilizing NumPy:

N = 20 # variety of samples
# Generate random samples roughly between -10 to +10 X = np.random.randn(N,1) * 5 Y = polynomial(X) |

Within the above, each `X`

and `Y`

are NumPy arrays of form `(20,1)`

and they’re associated as $$y=f(x)$$ for the polynomial $$f(x)$$.

Now assume we have no idea what’s our polynomial besides it’s quadratic. And we want to get well the coefficients. Since a quadratic polynomial is within the type of $$Ax^2+Bx+C$$, we now have three unknowns to search out. We are able to discover them utilizing gradient descent algorithm that we implement, or utilizing an current gradient descent optimizer. The next demonstrates the way it works:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
import tensorflow as tf
# Assume samples X and Y are ready elsewhere
XX = np.hstack([X*X, X, np.ones_like(X)])
w = tf.Variable(tf.random.regular((3,1))) # the three coefficients x = tf.fixed(XX, dtype=tf.float32) # enter pattern y = tf.fixed(Y, dtype=tf.float32) # output pattern optimizer = tf.keras.optimizers.Nadam(lr=0.01) print(w)
for _ in vary(1000): with tf.GradientTape() as tape: y_pred = x @ w mse = tf.reduce_sum(tf.sq.(y – y_pred)) grad = tape.gradient(mse, w) optimizer.apply_gradients([(grad, w)])
print(w) |

The `print`

assertion earlier than the for loop provides three random quantity, akin to

<tf.Variable ‘Variable:0’ form=(3, 1) dtype=float32, numpy= array([[–2.1450958 ], [–1.1278448 ], [ 0.31241694]], dtype=float32)> |

however the one after the for loop provides us the coefficients very near that in our polynomial:

<tf.Variable ‘Variable:0’ form=(3, 1) dtype=float32, numpy= array([[1.0000628], [2.0002015], [2.996219 ]], dtype=float32)> |

What the above code does is the next: First we create a variable vector `w`

of three values, specifically the coefficients $$A,B,C$$. Then we create an array of form $$(N,3)$$, which $$N$$ is the variety of samples in our array `X`

. This array has 3 columns, that are respectively the worth of $$x^2$$, $$x$$, and 1. We construct such an array from the vector `X`

utilizing `np.hstack()`

perform. Equally, we construct the TensorFlow fixed `y`

from NumPy array `Y`

.

Afterwards, we use a for loop to run gradient descent in 1000 iterations. In every iteration, we compute $$x instances w$$ in matrix type to search out $$Ax^2+Bx+C$$ and assign it to the variable `y_pred`

. Then we evaluate `y`

and `y_pred`

and discover the imply sq. error. Subsequent, we derive the gradient, i.e., the speed of change of the imply sq. error with respect to the coefficients `w`

. And based mostly on this gradient, we use gradient descent to replace `w`

.

In essence, the above code is to search out the coefficients `w`

that minimizes the imply sq. error.

Placing all the things collectively, the next is the entire code:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
import numpy as np import tensorflow as tf
N = 20 # variety of samples
# Generate random samples roughly between -10 to +10 polynomial = np.poly1d([1, 2, 3]) X = np.random.randn(N,1) * 5 Y = polynomial(X)
# Put together enter as an array of form (N,3) XX = np.hstack([X*X, X, np.ones_like(X)])
# Put together TensorFlow objects w = tf.Variable(tf.random.regular((3,1))) # the three coefficients x = tf.fixed(XX, dtype=tf.float32) # enter pattern y = tf.fixed(Y, dtype=tf.float32) # output pattern optimizer = tf.keras.optimizers.Nadam(lr=0.01) print(w)
# Run optimizer for _ in vary(1000): with tf.GradientTape() as tape: y_pred = x @ w mse = tf.reduce_sum(tf.sq.(y – y_pred)) grad = tape.gradient(mse, w) optimizer.apply_gradients([(grad, w)])
print(w) |

## Utilizing autograd to Clear up a Math Puzzle

Within the above, we used 20 samples and it’s greater than sufficient to suit a quadratic equation. We might use gradient descent to unravel some math puzzle as properly. For instance, the next downside:

[ A ] + [ B ] = 9 + – [ C ] – [ D ] = 1 = = 8 2 |

In different phrases, we want to discover the values of $$A,B,C,D$$ such that:

$$start{aligned}

A + B &= 9

C – D &= 1

A + C &= 8

B – D &= 2

finish{aligned}$$

This will also be solved utilizing autograd, as follows:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
import tensorflow as tf import random
A = tf.Variable(random.random()) B = tf.Variable(random.random()) C = tf.Variable(random.random()) D = tf.Variable(random.random())
# Gradient descent loop EPOCHS = 1000 optimizer = tf.keras.optimizers.Nadam(lr=0.1) for _ in vary(EPOCHS): with tf.GradientTape() as tape: y1 = A + B – 9 y2 = C – D – 1 y3 = A + C – 8 y4 = B – D – 2 sqerr = y1*y1 + y2*y2 + y3*y3 + y4*y4 gradA, gradB, gradC, gradD = tape.gradient(sqerr, [A, B, C, D]) optimizer.apply_gradients([(gradA, A), (gradB, B), (gradC, C), (gradD, D)])
print(A) print(B) print(C) print(D) |

There will be a number of options to this downside. One resolution is the next:

<tf.Variable ‘Variable:0’ form=() dtype=float32, numpy=4.6777573> <tf.Variable ‘Variable:0’ form=() dtype=float32, numpy=4.3222437> <tf.Variable ‘Variable:0’ form=() dtype=float32, numpy=3.3222427> <tf.Variable ‘Variable:0’ form=() dtype=float32, numpy=2.3222432> |

Which implies $$A=4.68$$, $$B=4.32$$, $$C=3.32$$, and $$D=2.32$$. We are able to confirm this resolution matches the issue.

What within the above code does is to outline the 4 unknown as variables with a random preliminary worth. Then we compute the results of the 4 equations and evaluate it to the anticipated reply. We then sum up the squared error and ask TensorFlow to attenuate it. The minimal doable sq. error is zero attained when our resolution matches precisely the issue.

Observe in the way in which we ask the gradient tape to provide the gradient: We ask the gradient of `sqerr`

respective to `A`

, `B`

, `C`

, and `D`

. Therefore 4 gradients are discovered. We then apply every gradient to the respective variables in every iteration. Quite than in search of the gradient in 4 totally different calls to `tape.gradient()`

, that is required in TensorFlow as a result of the gradient of `sqerr`

can solely be recalled as soon as by default.

## Additional Studying

This part supplies extra sources on the subject if you’re seeking to go deeper.

**Articles:**

## Abstract

On this submit, we demonstrated how TensorFlow’s automated differentiation works. That is the constructing block for finishing up deep studying coaching. Particularly, you discovered:

- What’s automated differentiation in TensorFlow
- How we are able to use gradient tape to hold out automated differentiation
- How we are able to use automated differentiation to unravel a optimization downside