Optimizing Pi Estimation Code

Question

I'm trying to write a program in Python that estimates pi, but to get a more accurate estimation in a feasible amount of time, I want to make it faster.

I'm using a method that estimates the area of the circle by counting all the coordinates that land inside of a circle with radius r and dividing that result by r**2 to get pi. So far to speed it up, I've:

cut the count requirement in quarter by only using a quarter of the circle (the upper right quarter)
skipping chunks of each row if both ends are inside the circle only starting from the previous position instead of restarting from 0
starting from halfway down the arc and disregarding the big square in the middle from the count (only counts about 30% of the quarter circle but uses it to work out the rest)

So far I've got it to effectively work out 100 trillion coordinates (radius of 10,000,000) in ~15 seconds and yielding 11 correct digits. This is cut down to ~13 seconds if you remove the progress check print statements. Below is my code, please help in any way to optimize. It is greatly appreciated.

from time import time

r = 10000000
rs = r**2
aCount = 0
prevy = 8008135
prevTime = "meow"

#=============================================#
y = 0.5 * r
x = 0.5 * r
prevX = x
while x**2 + y**2 <= rs:
    if (x*1.00001)**2 + (y*1.00001)**2 <= rs:
        x = int(x*1.0001)                       # move up from the origin at a 45 degree angle to find
        y = int(y*1.0001)                       # the middle of the arc
    x+=1
    y+=1
    if x == prevX:
        break
    prevX = x
#=============================================#

startX = x
start = time()
endX = 0
while y > 0:
    if int(time()) != prevTime:
            print(f"{(int((-y/r)*100))+100}% complete, {-int((y/(y-prevy))/2)} Seconds to go   [{-(y-prevy)}]")
            prevy = y
            prevTime = int(time())
    ys = y**2
    aCount += ((x - 1) - startX)
    while x <= r:
        xs = x**2
        if xs + ys > rs:
            break
        if xs + ys <= rs and (x+99999)**2 + ys <= rs:
            aCount += 100000
            x += 100000
        elif xs + ys <= rs and (x+9999)**2 + ys <= rs:
            aCount += 10000
            x += 10000
        elif xs + ys <= rs and (x+999)**2 + ys <= rs:
            aCount += 1000
            x += 1000
        elif xs + ys <= rs and (x+99)**2 + ys <= rs:
            aCount += 100
            x += 100
        elif xs + ys <= rs and (x+9)**2 + ys <= rs:
            aCount += 10
            x += 10
        elif xs + ys <= rs:
            aCount += 1
            x += 1
        else:
            x += 1
        endX = x
    y -= 1

aCount *= 2             #work out the quarter using the right-most fragment (see diagram)
aCount += startX ** 2   #add the missing square between the two fragments

aCount *= 2             #double the quarter circle and add the y=0 (where x is greater than 0) line
aCount += r

aCount *= 2             #double the half circle and add the x=0 (where y is greater than or less than 0) line
aCount += 2*r + 1       #then add the (0,0) point for the full area

print(aCount/(rs))
print("That took "+str(int(time()-start))+" seconds")

There's a reason that no serious Pi calculator uses this method. So I guess my question is: why this method, and how amenable are you to learning the math behind the better ones? — Reinderien, Commented Jul 4 at 16:56
im mostly doing this for fun and a challenge for problem solving so i did it with no research. im less doing it for the factor of finding pi, but more so for the optimizations practice from using a more ineffective method. plus i came up with this method myself — Rhys Bradshaw, Commented Jul 4 at 17:02
honestly i have no clue of what this code is doing. but if i just delete all the if/elif part (except the guard) the result is exactly the same (3.14159265351885) and the time is cut in half? — folen gateis, Commented Jul 4 at 17:30
Dear moderators: My answer suggested that a picture would be helpful, and I am very very happy to see the addition of a labeled diagram here. @folengateis: Yeah, I agree with you that the logic was a bit obscure. — J_H, Commented Jul 4 at 21:13

MT0 · Accepted Answer · 2024-07-06 21:35:11Z

You can calculate Pi using numerical integration of the circular arc-segment centred at (0,0), from the point (radius, 0) to (radius/sqrt(2), radius/sqrt(2)) (the green area on your diagram).

To do that, you do not need to repeatedly calculate squares; instead you can use the property that:

x**2 - (x-1)**2 = 2*x - 1

Therefore, as you decrease x by 1 you will decrease x**2 by 2*x - 1 and, given Pythagoras theorem, then the maximum y**2 will correspondingly increase by 2*x - 1.

With each iteration, decrease the x co-ordinate by 1 and increase the y co-ordinate while it still lies within the circle and increase the total by the number of points on that vertical line. Repeat until x < y when you will have found the area under an eighth of a circular arc-segment (the green area on your diagram).

Like this:

def calculate_pi(radius: int) -> float:
    """Calculate Pi by numerical integration.

    Perform numerical integration over the circle segment centered at `(0,0)` from
    `(radius, 0)` to `(radius/sqrt(2), radius/sqrt(2))` and then calculate the total
    area of the circle and divide by `radius**2` to approximate Pi.

    Args:
        radius: The radius of the circle to calculate.

    Returns:
        The appoximation of Pi.

    """
    total_points_under_arc = 0

    # Starting point
    x = radius
    y = 0

    radius_squared = radius ** 2
    x_squared = radius_squared
    next_y_squared = 1
    max_y_squared = 0

    while x >= y:
        # Stop at an eighth of the circle, when x == y

        total_points_under_arc += y
        # Include the line (x,1) .. (x,y) in the total

        x -= 1
        max_y_squared += 2 * x + 1

        while next_y_squared <= max_y_squared:
            # If the (x,y+1) point is inside the circle then move y outwards.
            y += 1
            next_y_squared += 2 * y + 1

    total_points_inside_circle = (
        (
            # To get a quarter circle, double the area under the arc
            2 * total_points_under_arc
            # Include the (1,1) .. (x-1,x-1) box (x has already been decremented)
            + x**2
            # Include the (0,1) .. (0, radius) line.
            + radius
        )
        # Multiply by 4 to get a full circle
        * 4
        # Include the point (0,0)
        + 1
    )

    return total_points_inside_circle / radius_squared

You can then test the code using:

if __name__ == "__main__":
    from math import pi
    from time import time

    start_time = time()
    value = calculate_pi(10_000_000)
    end_time = time()
    print(value)
    print(pi)
    print(f"That took {end_time - start_time} seconds")

Which outputs:

3.14159265350589
3.141592653589793
That took 1.4651856422424316 seconds

You can speed up the calculation using, as commented by @nocomment, the Midpoint Circle Algorithm:

def calculate_pi(radius: int) -> float:
    """Calculate Pi by numerical integration.

    Perform numerical integration over the circle segment centered at `(0,0)` from
    `(radius, 0)` to `(radius/sqrt(2), radius/sqrt(2))` and then calculate the total
    area of the circle and divide by `radius**2` to approximate Pi.

    Args:
        radius: The radius of the circle to calculate.

    Returns:
        The appoximation of Pi.

    """
    total_points_under_arc = 0

    # Starting point
    x = radius
    y = 0
    t1 = radius >> 4;

    while x >= y:
        # Stop at an eighth of the circle, when x == y

        # https://en.wikipedia.org/wiki/Midpoint_circle_algorithm#Jesko's_Method
        y += 1
        t1 += y
        t2 = t1 - x
        if t2 > 0:
            # Include the line (x,1) .. (x,y-1) in the total
            total_points_under_arc += y - 1
            t1 = t2
            x -= 1

    total_points_inside_circle = (
        (
            # To get a quarter circle, double the area under the arc
            2 * total_points_under_arc
            # Include the (1,1) .. (x-1,x-1) box (x has already been decremented)
            + x**2
        )
        # Multiply by 4 to get a full circle
        * 4
    )

    return total_points_inside_circle / radius**2

Which outputs:

3.14159269412492
3.141592653589793
That took 1.1812567710876465 seconds

Note: this will overestimate Pi whereas the previous version will slightly underestimate pi.

Might be faster to integrate the non-green part instead. Requires fewer updates of y, and the while would be an if. Also see Wikipedia for perhaps more improvements. — no comment, Commented Jul 6 at 1:15

J_H · Accepted Answer · 2024-07-04 17:58:32Z

scientific notation

effectively work out 100 trillion coordinates (radius of 10,000,000)

Prefer to describe these numbers as \$10^{14}\$ and \$10^7\$, which immediately makes the "squared" relationship apparent. You could phrase the radius as "half a million score", but that's hardly useful.

r = 10000000

Similarly, prefer
r = 10_000_000, or r = int(1e7), as an aid to the Gentle Reader.

naming

aCount = 0
prevy = 8008135
prevTime = "meow"

Pep-8 asked you nicely to call these a_count, prev_y, and prev_time. Similarly for prev_x.

Prefer a value of None over "woof! woof!" for no useful value yet. Better still, based on subsequent usage, initialize it to int(time()).

The counter name is obscure and unhelpful. Minimally, tack on a # comment, if you don't rename it to something more instructive. The 8_008_135 magic number similarly demands an explanation.

ASCII art

#=============================================#

Resist the temptation.

If you feel your code is in need of better organization, then Extract Helper so you can name each section of code. You can even associate a """docstring""" with such code, if you want.

floating point

y = 0.5 * r
x = 0.5 * r

The code and the Review Context don't really make it clear if python's BigNum integer support is important here, for correctness or perhaps for elapsed time. I don't understand why these two lines are converting to FP, as opposed to assigning r // 2.

epsilon

    if (x * 1.00001) ** 2 + (y * 1.00001) ** 2 <= rs:

This is essentially

    eps = 1e-5
    if (x * (1 + eps)) ** 2 + (y * (1 + eps)) ** 2 <= rs:

The value of epsilon is not well justified. We keep moving by a larger and larger increment on each iteration. Wouldn't you rather zoom in on the point of interest using binary search?

Also, please please stop pasting in that Magic Number, for a third and a fourth time. Assign it a symbolic name already.

ULP

    x += 1 ...
    if x == prev_x:
        break
    prev_x = x

You gave us a comment about "the middle of the arc" but you won't describe what's going on here?

Are we concerned about exhausting the 53 bits of a double? So that incrementing gives us the same number? How would a radius of 1e7 manage to exhaust the significand? Is this maybe leftover code, from when radius was much larger?

We're not very near \$2^{53}\$:

>>> x = 9007199254740991.
>>> x
9007199254740991.0
>>> x + 1
9007199254740992.0
>>> x + 2
9007199254740992.0
>>> x + 3
9007199254740994.0

elapsed interval

    if int(time()) != prev_time:

This is a curious way of (approximately) telling that a second has elapsed. Prefer to assign elapsed = time() - prev_time, and use that.

extract helper

        if xs + ys <= rs and (x+99999)**2 + ys <= rs:
            aCount += 100000
            x += 100000

Please DRY up this code by putting that logic in a helper function, which the loop logic calls half a dozen times.

Why are we even doing that? Wouldn't you like to rely on binary search? Or Newton's method?

math

In the Review Context you took a stab at explaining your approach and its motivation. Write a module-level """docstring""" which explores that in more depth.

Given that a picture is worth more than a few words, consider sketching out the approach on a napkin and adding a photo of that to the codebase and to the question. As presented, the what and the why of your computations are not completely clear, nor how the various Magic Numbers dovetail with that. Since we're focused on an approximation, it would be very helpful to talk about error bounds, and about what happens to them on each iteration.

We're moving both X and Y, which corresponds to a slope. To move one by numeric increments and the other according to how much computation time has elapsed is, ummm, an interesting design choice.

define a function

Please package up this logic in a defined function. This will avoid polluting the module level namespace with lots of globals, and will allow an automated test suite to exercise both your main and your helper functions. That might help you to parameterize, so you're solving sub-problems that have a lower accuracy target than the OP code's final target.

progress bar

You mentioned there's a noticeable slowdown due to reporting on progress thus far. Consider adding a tqdm progress bar. It adapts to the calling code so it avoids using more than a fraction of a percent of the total time.

JIT

Consider using numba to compile this fairly simple numeric code down to machine language instead of interpreted bytecode.

"Pep-8 asked you" - Did it? Its first sentence is "This document gives coding conventions for the Python code comprising the standard library in the main Python distribution". Also, it's called PEP 8. — no comment, Commented Jul 6 at 1:20
Looks like you missed the point. You said it asked them. Are they writing "Python code comprising the standard library in the main Python distribution"? — no comment, Commented Jul 6 at 1:27
Btw ironic to misspell its name in your "naming" section :-P — no comment, Commented Jul 6 at 1:29

folen gateis · Accepted Answer · 2024-07-04 23:05:36Z

on the pure optimization part you must precompute more

while x**2 + y**2 <= rs:
    if (x*1.00001)**2 + (y*1.00001)**2 <= rs:

can be faster if you precompute epsilon and the squares like @j-h said

    epsilon = 1.00001**2
    while (xs:=x**2) + (ys:=y**2) <= rs:
        if epsilon*(xs + ys) <= rs:

you can make your code way more DRY. after your guard, you already know that xs + ys <= rs so you can skip all those tests. (and yes a helper function is needed here)

        rs_minus_ys = rs-ys
        if (x+99999)**2 <= rs_minus_ys:
            do_stuff

your last else is unreachable (?)

this sums up to ~20% gain

but like i said in the comments, deleting your whole test block doesn't change the result and that's ~60% gain

i completely missed a major enhancement: putting your code into functions

def generate_acount(x):
    startX = y = x
    aCount = endX = 0
    while y > 0:
        rs_minus_ys = rs-y**2
        aCount += ((x - 1) - startX)
        while x <= r:
            if x**2 > rs_minus_ys:
                break
            aCount += 1
            x += 1
            endX = x
        y -= 1
    return startX, aCount

i started at 11s, now it's ~3s

i think that your lack of precision is due to the first part of your code where you compute x,y. the value is roughly sqrt(rs/2) with the magic salt. with your algorithm, pi is a bit farther than 7071081 somewhere between 7071100 and 7071101.

Stack Exchange Network

Optimizing Pi Estimation Code

3 Answers 3

scientific notation

naming

ASCII art

floating point

epsilon

ULP

elapsed interval

extract helper

math

define a function

progress bar

JIT

Not the answer you're looking for? Browse other questions tagged
python
performance
mathematics
or ask your own question.

Hot Network Questions

3 Answers 3

scientific notation

naming

ASCII art

floating point

epsilon

elapsed interval

extract helper

math

define a function

progress bar

JIT

Not the answer you're looking for? Browse other questions tagged pythonperformancemathematics or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
performance
mathematics
or ask your own question.