1 year ago
#339119
Pychopath
20% faster with unused variable? Why?
I'm doing a lot of benchmarks. I've never seen something like this. I'm stumped. Creating an extra global variable, not used at all, makes part of my code about 20% faster. Why?
I'm benchmarking a function that produces iterables, measuring how long it takes to consume (iterate) them. I have two ways for consuming. Typical times when I get high CPU share:
| Without the variable | With the variable
-------+----------------------+-------------------
Output | 0.74 s consume_1 | 0.72 s consume_1
| 0.96 s consume_2 | 0.77 s consume_2
| |
| 0.74 s consume_1 | 0.75 s consume_1
| 0.96 s consume_2 | 0.78 s consume_2
| |
| 0.73 s consume_1 | 0.73 s consume_1
| 0.95 s consume_2 | 0.78 s consume_2
-------+----------------------+-------------------
Debug | Real time: 5.110 s | Real time: 4.560 s
| User time: 4.546 s | User time: 4.386 s
| Sys. time: 0.535 s | Sys. time: 0.150 s
| CPU share: 99.43 % | CPU share: 99.47 %
Creating the pointless variable makes the consumption with consume_2
about 0.2 seconds faster (from 0.97 to 0.77). Also, significant differences in the "Debug" statistics. The most drastic is "Sys. time": for "without" it's consistently around 0.5 seconds and for "with" it's consistently around 0.14 seconds.
I'm doing this on TIO, and you can reproduce it there yourself:
Without the variable / With the variable
Here's the code, I called the extra variable foobar
. Also note that consume_1
loads the global deque
10000 times while consume_2
has just a handful of loads of globals, so if anything, I'd think that consume_1
would be the affected one.
from timeit import timeit
from operator import itemgetter
from itertools import repeat
from collections import deque
def each_with_others_1(iterable):
xs = tuple(iterable)
for i, x in enumerate(xs):
yield x, xs[:i] + xs[i+1:]
consume_0 = None
def consume_1(each_with_others):
for each, others in each_with_others:
deque(others, 0)
def consume_2(each_with_others):
otherss = map(itemgetter(1), each_with_others)
deque(map(deque, otherss, repeat(0)), 0)
lst = list(range(10000))
foobar = None
for solver in [each_with_others_1] * 3:
for consume in consume_1, consume_2:
t = timeit(lambda: consume(solver(lst)), number=1)
print('%.2f s ' % t, consume.__name__)
print()
Update: Also reproduced on a Google Compute Engine instance after installing Python 3.8.2, creating the variable made consume_2
about 15% faster:
| Without the variable | With the variable
-------+----------------------+-------------------
Output | 0.64 s consume_1 | 0.65 s consume_1
| 0.80 s consume_2 | 0.68 s consume_2
| |
| 0.64 s consume_1 | 0.65 s consume_1
| 0.80 s consume_2 | 0.68 s consume_2
| |
| 0.64 s consume_1 | 0.64 s consume_1
| 0.78 s consume_2 | 0.68 s consume_2
-------+----------------------+-------------------
Debug | real 0m 4.327s | real 0m 3.987s
| user 0m 3.987s | user 0m 3.902s
| sys 0m 0.340s | sys 0m 0.084s
The "Debug" came from calling it as time python test.py
. For "without", sys
is consistently around 0.32 seconds. For "with" it's consistently around 0.09 seconds.
python
performance
cpython
python-internals
0 Answers
Your Answer