1 year ago
#179309
Frank Liu
Why does a function run faster when I call it continuously?
I am deveoping a low latency service use c++ on linux. I do two group performance tests:
- send 1 request per second, it's average latency is 3.5 microseconds.
- send 10 request per second, it's average latency is 2.7 microseconds.
I cannot understand why? I guess call a function frequently, it may run faster. So I do a demo to test it。
#include <stdio.h>
#include <unistd.h>
#include <sys/time.h>
#include <syscall.h>
#include <thread>
using namespace std;
long long get_curr_nsec()
{
struct timespec now;
::clock_gettime(CLOCK_MONOTONIC, &now);
return now.tv_sec * 1000000000 + now.tv_nsec;
}
long long func(int n)
{
long long t1 = get_curr_nsec();
int sum = 0;
for(int i = 0; i < n ;i++)
{
//make sure sum*= (sum+1) not be optimized by compiler
__asm__ __volatile__("": : :"memory");
sum *= (sum+1);
}
return get_curr_nsec() - t1;
}
bool bind_cpu(int cpu_id, pthread_t tid)
{
int cpu = (int)sysconf(_SC_NPROCESSORS_ONLN);
cpu_set_t cpu_info;
if (cpu < cpu_id)
{
printf("bind cpu failed: cpu num[%d] < cpu_id[%d]\n", cpu, cpu_id);
return false;
}
CPU_ZERO(&cpu_info);
CPU_SET(cpu_id, &cpu_info);
int ret = pthread_setaffinity_np(tid, sizeof(cpu_set_t), &cpu_info);
if (ret)
{
printf("bind cpu failed, ret=%d\n", ret);
return false;
}
return true;
}
int main(int argc, char **argv)
{
//make sure the program would not swich cpu
bind_cpu(3, ::pthread_self());
//first argv:call times
//second argv:interval between call function
int times = ::atoi(argv[1]);
int interval = ::atoi(argv[2]);
long long sum = 0;
for(int i = 0; i < times; i++)
{
if(n > 0)
{
std::this_thread::sleep_for(std::chrono::milliseconds(interval));
}
sum += func(100);
}
printf("avg elapse:%lld ns\n", sum/ times);
return 0;
}
The compile command: g++ --std=c++11 ./main.cpp -O2 -lpthread
, And I do the below tests:
- Call function 100 times without sleep,
./a.out 100 0
, output:avg elapse:35 ns - Call function 100 times with sleep 1 ms,
./a.out 100 1
, output:avg elapse:36 ns - Call function 100 times with sleep 10 ms,
./a.out 100 10
, output:avg elapse:40 ns - Call function 100 times with sleep 100 ms,
./a.out 100 100
, output:avg elapse:45 ns - Call function 100 times with sleep 1000 ms,
./a.out 100 1000
, output:avg elapse:50 ns
My OS is CentOS Linux release 7.6.1810 (Core) My CPU is Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
I am confused. I do not know why? CPU ? OS? System Call(sleep) ?
Afterwards I use perf to stat branches:
perf stat ./a.out 100 1
, there are 241779 branches,7091 branch-misses;perf stat ./a.out 100 100
, there are 241791 branches, 7636 branch-misses.
It seems sleep 100 ms has more branch-misses. But I am still not certain this is the reason, And I don't know why sleep 100 ms has more branch-misses.
c++
linux
performance
cpu
low-latency
0 Answers
Your Answer