<code_&_cosine>
Root Finding #2: Newton-Raphson Method

Photo by Dan Cristian Pădureț on Unsplash

Root Finding #2: Newton-Raphson Method

24 January 2024

A Quick Recap

In the previous post, we delved into the Bisection Method and implemented our own version of the algorithm. In this post, we will build on top of that knowledge and move a step ahead towards a little bit more advanced method: Newton-Raphson Method.


If you have any questions regarding the intention of finding roots, what are roots, why do we need to understand numericals methods for root finding or why its important to even implement these algorithms, please go back to the previous post and read the introductory section of that post. That should answer most of your queries.

Before We Proceed

Before we begin our dive into Newton-Raphson Method, we need to clear our understanding of a few concepts. These are essential to grasp the method completely. If you already know about these concepts, feel free to move ahead. However, I'am assuming most of us are starting from scratch and don't have clear understanding of these concepts.


In this section, we will talk about slopes, what they mean, line equations, and Slope-point form of line equation.


What is slope?

Slope is how steep a line is. By comparing some lines on the graph, we can guess which line is comparatively steeper.

Remember, y=f(x)y = f(x).


In the above graph, we can see visually that y=3xy = 3x is steeper than y=2xy = 2x and y=xy = x. But how do we define the slope mathematically? Take a look at this graph:

Here, we can see that the slope can be defined using change in xx and change in yy. A more mathematical term for slope is rate of change. Slope gives us the rate of change of a function, how much the value of yy increases as xx changes. For example, in y=2xy = 2x, if xx increases 1 unit, the value of yy increases 2 unit. Even if the change in xx is smaller than 1, this relationship will hold. So, a mathematical identity of slope can be defined as, "For every change in xx, how much yy changes." This relationship can be written like this:Slope, m=change in ychange in x=y2y1x2x1=ΔyΔx\text{Slope, m} = \frac{\text{change in y}}{\text{change in x}} = \frac{y_2 - y_1}{x_2 - x_1}= \frac{\Delta y}{\Delta x}For a straight line, the slope mm is constant. This measures how steep the line is and how much yy changes with respect to xx.


However, for a curve, the slope is not constant, it changes at different points. To measure the slope of a curve at a specific point (since curve's don't have constant slope), we use the concept of a tangent line. The tangent line is a straight line that touches the curve at a single point.

f(x)=12x2+2xf(x) = -\frac{1}{2}x^{2}+2x

In the graph below, we can see that the slope is different at every point.

f(x)=12x2+2xf(x) = -\frac{1}{2}x^{2}+2x

When we want to find the slope of a curve at a specific point, we begin by taking two points on the curve:

  • P(x0,f(x0))P(x_0, f(x_0)): The point of interest.
  • Q(x0+h,f(x0+h))Q(x_0 + h, f(x_0 + h)): Another point close to PP.

The line that passes through PP and QQ is called a secant line. Its slope is: msecant=f(x0+h)f(x0)x0+hx0=f(x0+h)f(x0)hm_{secant} = \frac{f(x_0 + h) - f(x_0)}{x_0 + h - x_0} = \frac{f(x_0 + h) - f(x_0)}{h}Here, x0=x1x0+h=x2f(x0)=y1f(x0+h)=y2h=x2x1\\ x_0 = x_1 \\ x_0 + h = x_2 \\ f(x_0) = y_1 \\ f(x_0 + h) = y_2 \\ h = x_2 - x_1, distance betwen the two points on x-axis


So, we can see that the formula for msecantm_{secant} is actually the general formula for slope that we derived from a straight line. Now, to make this slope represent the slope of the curve (instead of the secant line), we bring QQ closer and closer to PP. Mathematically, this means: h0, h tends to 0 or h is infinitly close to 0.h \to 0, \text{ h tends to 0 or h is infinitly close to 0.}As hh gets smaller, the secant line becomes closer to the tangent line, which just "touches" the curve at PP.

f(x)=12x2+2xf(x) = -\frac{1}{2}x^{2}+2x

Take a look at the consecutive graphs as QQ gets closer to PP

f(x)=12x2+2xf(x) = -\frac{1}{2}x^{2}+2x

f(x)=12x2+2xf(x) = -\frac{1}{2}x^{2}+2x

f(x)=12x2+2xf(x) = -\frac{1}{2}x^{2}+2x

When h=0h=0, PP and QQ merge into one point, and the secant becomes the tangent line. The slope of the tangent line is the limit of the secant slope as h0h \to 0: mtangent=limh0f(x0+h)f(x0)hm_{tangent} = \lim_{h \to 0} \frac{f(x_0+h)−f(x_0)}{h}In the above expression, the value of hh never reaches zero as it will make the slope undefined. Instead, it is a value that is infinitly close to zero. This is the definition of the derivative, f(x0)f^{\prime}(x_0) which is basically mtangentm_{tangent}.


Now, let's define line equations.


Line Equations

Suppose, we have two points A(x1,y1)A(x_1, y_1) and B(x2,y2)B(x_2, y_2) on a straight line. Now, we want to find the general equation of the line so that any arbitrary value of xx gives us a point P(x,y)P(x, y) that falls on that line.

To form the equation, we need minimum two points. A single point gives us a location but doesn't provide the direction (slope) of the line. To form the equation, we need to know the direction/slope of the line, mm.


Now, if AA, BB and PP are on the same line, then their slope will be constant and equal. Thus, we can say,


yy1xx1=y2y1x2x1yy1=y2y1x2x1(xx1)(1)yy1=m(xx1)(2) \begin{aligned} \frac{y - y_1}{x - x_1} &= \frac{y_2 - y_1}{x_2 - x_1} \\ \Rightarrow y - y_1 &= \frac{y_2 - y_1}{x_2 - x_1} \cdot (x - x_1) \dots (1) \\ \Rightarrow y - y_1 &= m \cdot (x - x_1) \dots (2) \end{aligned}

Here, (1)(1) is the general formula for line equation. We can see that the term y2y1x2x1\frac{y_2 - y_1}{x_2 - x_1} is actually the slope of the line. This leads us to (2)(2), the slope-point form of line equation.


However, if the function y=f(x)y = f(x) is a curve or any other function including lines, then the slope mm will be defined as f(x)f^{\prime} (x). The derivative of the original function f(x)f(x) is used as the general formula for any slope whether its a straight line or a curve. So, we can rewrite the last expression (2)(2) as,


yy1=f(x1)(xx1)f(x)f(x1)=f(x1)(xx1)f(x)=f(x1)+f(x1)(xx1)(3) \begin{aligned} y - y_1 &= f^{\prime}(x_1) \cdot (x - x_1) \\ \Rightarrow f(x) - f(x_1) &= f^{\prime}(x_1) \cdot (x - x_1) \\ \Rightarrow f(x) &= f(x_1) + f^{\prime}(x_1) \cdot (x - x_1) \dots (3) \end{aligned}

Here, (3)(3) is the Tangent Line Equation (slope-point form of line equation).


Now, equiped with this knowledge, we will begin our dive into Newton-Raphson Method.

Introduction to Newton-Raphson Method

Newton-Raphson Method finds a root by repeatedly drawing the tangent line to a curve, using its x-intercept as the next better guess where the root might be.

Core Idea

Imagine we're on a roller coaster. At the peak of a hill, the track is curving sharply. The tangent line at that peak would be horizontal, reflecting the fact that we're momentarily at a standstill before plunging down. Even though the roller coaster track is curvy, the tangent line gives us a good sense of the immediate direction.

y=sin(x)+cos(2x)+110x+120sin(5x)y=\sin\left(x\right)+\cos\left(2x\right)+\frac{1}{10}x+\frac{1}{20}\sin\left(5x\right)

Prediction: At a standstill on the peak

If we want to predict where the roller coaster might go if it went a very short distance forward or backward, following the tangent line can give us a pretty good idea.

Prediction: Going down after slightly moving forward

Prediction: Going up after slightly moving backward

Observing the above graphs, its understandable that if we pick an arbitrary point on our roller coaster journey, by moving very slightly forward or backward, we can "guess" where our roller coaster is headed. However, if our roller coaster is at a standstill, we cannot determine where it will go but if our roller coaster is already going towards a specific direction, we make rough guess about where its headed. This "guess" is basically an approximation.


When you look very closely at the roller coaster track, that tiny piece of the track (curve) looks almost like a straight line because the track doesn't curve much in that very small stretch. This is the idea of "local linear approximation", even if the overall function is curvy, very near a particular point, the tangent line gives a good approximation of the function.


Suppose we want to know where the track (the function) hits the ground (the x-axis). We might not be able to find this exactly because the road is curvy. But if we pick a point xnx_n on the road and draw the tangent line there, we can see where that straight line hits the xaxisx-axis (ground). Since the tangent line is a good approximation of the track right at xnx_n, the point where it hits the ground is a "good enough" guess for where the track might actually hit the ground.


Now, from Eqn (3)Eqn \ (3), we know that the Tangent Line Equation is f(x)=f(x1)+f(x1)(xx1)f(x) = f(x_1) + f^{\prime}(x_1) \cdot (x - x_1)Since we are finding roots, f(x)f(x) will equal to 0.0=f(x1)+f(x1)(xx1)0 = f(x_1) + f^{\prime}(x_1) \cdot (x - x_1)Now, we solve for xx,x=x1f(x1)f(x1)  (4)x = x_1 - \frac{f(x_1)}{f^{\prime}(x_1)} \ \dots \ (4)This gives us the formula for the next guess, x2x_2,x2=x1f(x1)f(x1)x_2 = x_1 - \frac{f(x_1)}{f^{\prime}(x_1)}Now, we can define a general formula like so,xn+1=xnf(xn)f(xn)  (5)x_{n+1} = x_n - \frac{f(x_n)}{f^{\prime}(x_n)} \ \dots \ (5)

Steps

  1. Find the derivative: Find the derivative f(x)f^\prime(x) of the function. This is needed for the formula.
  2. Choose an initial guess: Pick a starting value xnx_n. This should be as close as possible to the expected root.
  3. Apply the Newton-Raphson formula: Use the formula to get the next approximation: xn+1=xnf(xn)f(xn)x_{n+1} = x_n - \frac{f(x_n)}{f^{\prime}(x_n)}
  4. Repeat: Repeat step 3 until the difference between consecutive guesses is small enough (we're close to the root):xn+1xn<tolerance|x_{n+1} - x_n| < tolerance Or until:f(xn)<tolerance|f(x_n)| < tolerance Here, tolerancetolerance is the maximum amount of acceptable error.

Practical Example

Let's find a root of the function f(x)=ex(x3+x21)+sin(x)f(x) = e^{-x}(x^{3}+x^{2}-1)+\sin(x). We can see from the graph that the obvious root will be at 0.456.

f(x)=ex(x3+x21)+sin(x)f(x) = e^{-x}(x^{3}+x^{2}-1)+\sin(x)

The derivative of our function will be:ex(x3+x21)+ex(3x2+2x)+cos(x)−e^{−x}(x^3+x^2−1)+e^{−x}(3x^2+2x)+cos(x)Suppose,
initial guess,  x0=1.5\text{initial guess, } \ x_0 = 1.5

Iteration 1
xn=x0=1.5xn+1=x1=0.17133873020154078 x_n = x_0 = 1.5 \\ x_{n+1} = x_{1} = -0.17133873020154078
Iteration 2
xn=x1=0.17133873020154078xn+1=x2=0.5502184091814264 x_n = x_1 = -0.17133873020154078 \\ x_{n+1} = x_{2} = 0.5502184091814264
Iteration 3
xn=x2=0.5502184091814264xn+1=x3=0.4566726757952543 x_n = x_2 = 0.5502184091814264 \\ x_{n+1} = x_{3} = 0.4566726757952543
Iteration 4
xn=x3=0.4566726757952543xn+1=x4=0.45667580732770807 x_n = x_3 = 0.4566726757952543 \\ x_{n+1} = x_{4} = 0.45667580732770807

As we can see, we start from an initial guess x0x_0 and using the derivative, we close in on the root. Unlike Bisection Method, we don't need intervals. Using only one value (our guess), we can approximate the root.

Implementation

Below is the psuedocode for the implementation. Try to understand the steps and implement it on your own before looking at the solution.

1. Define the function f(x) that you want to find the root of.

2. Define the derivative fPrime(x) of the function f(x).

3. Input:
x0 - Initial guess
tolerance - Acceptable error in the root value
max_iteration - Maximum number of times the loops should run

4. Set xn = x0, xNext = xn

5. For i=0; i < max_iteration; i++ :
a. If fPrime(xn) = 0, then break loop can't divide by 0
b. Set xNext = xn - f(xn) / fPrime(xn)
c. If abs(xNext - nx) is smaller than tolerance, break loop, root found.

6. Return xn

Here's the implementation of Newton-Raphson Method in a few commonly used languages:

const f = x => Math.exp(-x) * (x ** 3 + x ** 2 - 1) + Math.sin(x);
const fPrime = x => Math.exp(-x) * ((3 * x ** 2 + 2 * x) - (x ** 3 + x ** 2 - 1)) + Math.cos(x);

function newtonRaphson(x0, tol = 1e-10, maxIter = 100) {
  let xn = x0;
  for (let i = 0; i < maxIter; i++) {
    const fx = f(xn), fpx = fPrime(xn);
    if (fpx === 0) return null;
    const xNext = xn - fx / fpx;
    if (Math.abs(xNext - xn) < tol) return xNext;
    xn = xNext;
  }
  console.log("Did not converge.");
  return null;
}

const root = newtonRaphson(1.5);
console.log("Root:", root);

Conclusion

The Newton-Raphson method stands out as one of the most powerful and widely-used techniques for finding roots of real-valued functions. Its speed and elegance lie in its use of both the function and its derivative, allowing it to converge rapidly when given a good initial guess. However, it's not without limitations—issues like division by zero, divergence, or poor initial estimates can hinder performance. To address these shortcomings, several modified versions of the Newton-Raphson method have been developed. For instance, Maehly's procedure, a stabilized version of the Newton-Raphson method designed specifically for finding complex roots of polynomials. Another example is Chebyshev's third-order method, a variant of Newton’s method that used cubic approximations. It looks like this: xn+1=xnf(xn)f(xn)(1+f(xn)f(xn)2f(xn)2) x_{n+1} = x_n - \frac{f(x_n)}{f^\prime(x_n)} \bigg( 1 + \frac{f(x_n) f^{\prime \prime}(x_n)}{2 f^\prime(x_n)^2} \bigg) You can look at the concept and derivation from here.

Part of series: Numerical Methods for Root Finding

Part 2 of 2