20% faster with unused variable? Why?

2 years ago

#339119

Pychopath

I'm doing a lot of benchmarks. I've never seen something like this. I'm stumped. Creating an extra global variable, not used at all, makes part of my code about 20% faster. Why?

I'm benchmarking a function that produces iterables, measuring how long it takes to consume (iterate) them. I have two ways for consuming. Typical times when I get high CPU share:

       | Without the variable | With the variable 
-------+----------------------+-------------------
Output |   0.74 s  consume_1  |  0.72 s  consume_1 
       |   0.96 s  consume_2  |  0.77 s  consume_2 
       |                      |                    
       |   0.74 s  consume_1  |  0.75 s  consume_1 
       |   0.96 s  consume_2  |  0.78 s  consume_2 
       |                      |                    
       |   0.73 s  consume_1  |  0.73 s  consume_1 
       |   0.95 s  consume_2  |  0.78 s  consume_2 
-------+----------------------+-------------------
Debug  |  Real time: 5.110 s  | Real time: 4.560 s
       |  User time: 4.546 s  | User time: 4.386 s
       |  Sys. time: 0.535 s  | Sys. time: 0.150 s
       |  CPU share: 99.43 %  | CPU share: 99.47 %

Creating the pointless variable makes the consumption with consume_2 about 0.2 seconds faster (from 0.97 to 0.77). Also, significant differences in the "Debug" statistics. The most drastic is "Sys. time": for "without" it's consistently around 0.5 seconds and for "with" it's consistently around 0.14 seconds.

I'm doing this on TIO, and you can reproduce it there yourself:
Without the variable / With the variable

Here's the code, I called the extra variable foobar. Also note that consume_1 loads the global deque 10000 times while consume_2 has just a handful of loads of globals, so if anything, I'd think that consume_1 would be the affected one.

from timeit import timeit
from operator import itemgetter
from itertools import repeat
from collections import deque

def each_with_others_1(iterable):
    xs = tuple(iterable)
    for i, x in enumerate(xs):
        yield x, xs[:i] + xs[i+1:]

consume_0 = None

def consume_1(each_with_others):
    for each, others in each_with_others:
        deque(others, 0)

def consume_2(each_with_others):
    otherss = map(itemgetter(1), each_with_others)
    deque(map(deque, otherss, repeat(0)), 0)

lst = list(range(10000))
foobar = None
for solver in [each_with_others_1] * 3:
    for consume in consume_1, consume_2:
        t = timeit(lambda: consume(solver(lst)), number=1)
        print('%.2f s ' % t, consume.__name__)
    print()

Update: Also reproduced on a Google Compute Engine instance after installing Python 3.8.2, creating the variable made consume_2 about 15% faster:

       | Without the variable | With the variable 
-------+----------------------+-------------------
Output |   0.64 s  consume_1  |  0.65 s  consume_1 
       |   0.80 s  consume_2  |  0.68 s  consume_2 
       |                      |                    
       |   0.64 s  consume_1  |  0.65 s  consume_1 
       |   0.80 s  consume_2  |  0.68 s  consume_2 
       |                      |                    
       |   0.64 s  consume_1  |  0.64 s  consume_1 
       |   0.78 s  consume_2  |  0.68 s  consume_2 
-------+----------------------+-------------------
Debug  |   real   0m 4.327s   |  real   0m 3.987s
       |   user   0m 3.987s   |  user   0m 3.902s
       |   sys    0m 0.340s   |  sys    0m 0.084s

The "Debug" came from calling it as time python test.py. For "without", sys is consistently around 0.32 seconds. For "with" it's consistently around 0.09 seconds.

python

performance

cpython

python-internals

0 Answers

Your Answer

Posts

Questions

Blogs