1 year ago

#179309

test-img

Frank Liu

Why does a function run faster when I call it continuously?

I am deveoping a low latency service use c++ on linux. I do two group performance tests:

  1. send 1 request per second, it's average latency is 3.5 microseconds.
  2. send 10 request per second, it's average latency is 2.7 microseconds.

I cannot understand why? I guess call a function frequently, it may run faster. So I do a demo to test it。

#include <stdio.h>
#include <unistd.h>
#include <sys/time.h>
#include <syscall.h>
#include <thread>

using namespace std;

long long get_curr_nsec()
{
    struct timespec now;
    ::clock_gettime(CLOCK_MONOTONIC, &now);
    return now.tv_sec * 1000000000 + now.tv_nsec;
}

long long func(int n)
{
    long long t1 = get_curr_nsec();
    int sum = 0;
    for(int i = 0; i < n ;i++)
    {
        //make sure sum*= (sum+1) not be optimized by compiler
        __asm__ __volatile__("": : :"memory");
        sum *= (sum+1);
    }

    return get_curr_nsec() - t1;
}

bool bind_cpu(int cpu_id, pthread_t tid)
{
    int cpu = (int)sysconf(_SC_NPROCESSORS_ONLN);
    cpu_set_t cpu_info;
    
    if (cpu < cpu_id)
    {
        printf("bind cpu failed: cpu num[%d] < cpu_id[%d]\n", cpu, cpu_id);
        return false;
    }
    
    CPU_ZERO(&cpu_info);
    CPU_SET(cpu_id, &cpu_info);
    
    int ret = pthread_setaffinity_np(tid, sizeof(cpu_set_t), &cpu_info);
    if (ret)
    {
        printf("bind cpu failed, ret=%d\n", ret);
        return false;
    }
    
    return true;
}
int main(int argc, char **argv)
{
    //make sure the program would not swich cpu
    bind_cpu(3, ::pthread_self());

    //first argv:call times
    //second argv:interval between call function
    int times = ::atoi(argv[1]);
    int interval = ::atoi(argv[2]);

    long long sum = 0;
    for(int i = 0; i < times; i++)
    {
        if(n > 0)
        {
                std::this_thread::sleep_for(std::chrono::milliseconds(interval));
        }
        sum +=  func(100);
    }

    printf("avg elapse:%lld ns\n", sum/ times);
    return 0;
}

The compile command: g++ --std=c++11 ./main.cpp -O2 -lpthread, And I do the below tests:

  1. Call function 100 times without sleep, ./a.out 100 0, output:avg elapse:35 ns
  2. Call function 100 times with sleep 1 ms, ./a.out 100 1, output:avg elapse:36 ns
  3. Call function 100 times with sleep 10 ms, ./a.out 100 10, output:avg elapse:40 ns
  4. Call function 100 times with sleep 100 ms, ./a.out 100 100, output:avg elapse:45 ns
  5. Call function 100 times with sleep 1000 ms, ./a.out 100 1000, output:avg elapse:50 ns

My OS is CentOS Linux release 7.6.1810 (Core) My CPU is Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz

I am confused. I do not know why? CPU ? OS? System Call(sleep) ?

Afterwards I use perf to stat branches:

  1. perf stat ./a.out 100 1, there are 241779 branches,7091 branch-misses;
  2. perf stat ./a.out 100 100, there are 241791 branches, 7636 branch-misses.

It seems sleep 100 ms has more branch-misses. But I am still not certain this is the reason, And I don't know why sleep 100 ms has more branch-misses.

c++

linux

performance

cpu

low-latency

0 Answers

Your Answer

Accepted video resources