1 year ago

#382913

test-img

Devios

Job with self-hosted gitlab runner (aws ec2 autoscale spot instance) sometimes stuck infinitely on cache downloading

I have a bit of a specific problem

Sometimes on our runner, a job will be stuck infinitely on the cache downloading.

This runner is a runner that use aws ec2 spot instance to run

Here's the config.toml:

concurrent = 32
check_interval = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "gitlab-runner-xxx-xx"
  url = "https://gitlab.com/"
  token = "xxxx"
  executor = "docker+machine"
  limit = 32
  request_concurrency = 100
  [runners.docker]
    image = "alpine"
    privileged = true
    disable_cache = true
  [runners.cache]
    Type = "s3"
    Shared = true
    [runners.cache.s3]
      ServerAddress = "s3.amazonaws.com"
      AccessKey = "xxxx"
      SecretKey = "xxxxx"
      BucketName = "xxxxx"
      BucketLocation = "eu-west-3"
  [runners.machine]
    IdleCount = 0
    IdleTime = 500
    MaxBuilds = 20
    MachineDriver = "amazonec2"
    MachineName = "gitlab-docker-machine-%s"
    MachineOptions = [
      "amazonec2-access-key=xxx",
      "amazonec2-secret-key=xxxx",
      "amazonec2-region=eu-west-3",
      "amazonec2-vpc-id=vpc-xxxx",
      "amazonec2-subnet-id=subnet-xxxx",
      "amazonec2-tags=runner-manager-name,gitlab-aws-autoscaler,gitlab,true,gitlab-runner-autoscale,true",
      "amazonec2-security-group=xxxxx",
      "amazonec2-instance-type=m5.large",
      "amazonec2-request-spot-instance=true",
      "amazonec2-spot-price=0.07",
      amazonec2-use-private-address=true,
    ]
    [[runners.machine.autoscaling]]
      Periods = ["* * 8-18 * * mon-fri *"]
      IdleCount = 1
      IdleTime = 1000
      Timezone = "UTC"
    [[runners.machine.autoscaling]]
      Periods = ["* * * * * sat,sun *"]
      IdleCount = 0
      IdleTime = 60
      Timezone = "UTC"

And now the job logs

https://pastebin.com/5TRr2AJ2

It stuck here infinitely

There's not much in gitlab-runner logs for the given time periods

Apr 06 13:25:59 ip-172-31-24-13 gitlab-runner[155844]: Machine removed                                     lifetime=53m13.857477797s name=runner-k9ee2cag-gitlab-docker-machine-1649248366-b8c70535 now=2022-04-06 13:25:59.996569815 +0000 UTC m=+1109081.553801666 reason=too many idle machines retries=0 used=1.295241679s usedCount=0
Apr 06 13:35:08 ip-172-31-24-13 gitlab-runner[155844]: Checking for jobs... received                       job=2299342280 repo_url=https://gitlab.com/ads-development/awa.git runner=k9ee2CAg
Apr 06 13:35:09 ip-172-31-24-13 gitlab-runner[155844]: Using existing docker-machine                       created=2022-04-06 07:39:37.533117067 +0000 UTC m=+1088299.090348926 docker=tcp://35.180.75.59:2376 job=2299342280 name=runner-k9ee2cag-gitlab-docker-machine-1649230777-c10facd5 now=2022-04-06 13:35:09.469844968 +0000 UTC m=+1109631.027076818 project=27672694 runner=k9ee2CAg usedcount=17
Apr 06 13:35:11 ip-172-31-24-13 gitlab-runner[155844]: Running pre-create checks...                        driver=amazonec2 name=runner-k9ee2cag-gitlab-docker-machine-1649252110-5691836a operation=create
Apr 06 13:35:11 ip-172-31-24-13 gitlab-runner[155844]: Creating machine...                                 driver=amazonec2 name=runner-k9ee2cag-gitlab-docker-machine-1649252110-5691836a operation=create
Apr 06 13:35:11 ip-172-31-24-13 gitlab-runner[155844]: (runner-k9ee2cag-gitlab-docker-machine-1649252110-5691836a) Launching instance...  driver=amazonec2 name=runner-k9ee2cag-gitlab-docker-machine-1649252110-5691836a operation=create
Apr 06 13:35:13 ip-172-31-24-13 gitlab-runner[155844]: (runner-k9ee2cag-gitlab-docker-machine-1649252110-5691836a) Waiting for spot instance...  driver=amazonec2 name=runner-k9ee2cag-gitlab-docker-machine-1649252110-5691836a operation=create
Apr 06 13:35:29 ip-172-31-24-13 gitlab-runner[155844]: (runner-k9ee2cag-gitlab-docker-machine-1649252110-5691836a) Created spot instance request sir-9e86a6bn  driver=amazonec2 name=runner-k9ee2cag-gitlab-docker-machine-1649252110-5691836a operation=create
Apr 06 13:35:30 ip-172-31-24-13 gitlab-runner[155844]: Waiting for machine to be running, this may take a few minutes...  driver=amazonec2 name=runner-k9ee2cag-gitlab-docker-machine-1649252110-5691836a operation=create
Apr 06 13:35:30 ip-172-31-24-13 gitlab-runner[155844]: Detecting operating system of created instance...   driver=amazonec2 name=runner-k9ee2cag-gitlab-docker-machine-1649252110-5691836a operation=create
Apr 06 13:35:30 ip-172-31-24-13 gitlab-runner[155844]: Waiting for SSH to be available...                  driver=amazonec2 name=runner-k9ee2cag-gitlab-docker-machine-1649252110-5691836a operation=create
Apr 06 13:35:41 ip-172-31-24-13 gitlab-runner[155844]: Detecting the provisioner...                        driver=amazonec2 name=runner-k9ee2cag-gitlab-docker-machine-1649252110-5691836a operation=create
Apr 06 13:35:42 ip-172-31-24-13 gitlab-runner[155844]: Provisioning with ubuntu(systemd)...                driver=amazonec2 name=runner-k9ee2cag-gitlab-docker-machine-1649252110-5691836a operation=create
Apr 06 13:35:53 ip-172-31-24-13 gitlab-runner[155844]: Installing Docker...                                driver=amazonec2 name=runner-k9ee2cag-gitlab-docker-machine-1649252110-5691836a operation=create
Apr 06 13:36:29 ip-172-31-24-13 gitlab-runner[155844]: Copying certs to the local machine directory...     driver=amazonec2 name=runner-k9ee2cag-gitlab-docker-machine-1649252110-5691836a operation=create
Apr 06 13:36:30 ip-172-31-24-13 gitlab-runner[155844]: Copying certs to the remote machine...              driver=amazonec2 name=runner-k9ee2cag-gitlab-docker-machine-1649252110-5691836a operation=create
Apr 06 13:36:31 ip-172-31-24-13 gitlab-runner[155844]: Setting Docker configuration on the remote daemon...  driver=amazonec2 name=runner-k9ee2cag-gitlab-docker-machine-1649252110-5691836a operation=create
Apr 06 13:36:32 ip-172-31-24-13 gitlab-runner[155844]: Checking connection to Docker...                    driver=amazonec2 name=runner-k9ee2cag-gitlab-docker-machine-1649252110-5691836a operation=create
Apr 06 13:36:33 ip-172-31-24-13 gitlab-runner[155844]: Docker is up and running!                           driver=amazonec2 name=runner-k9ee2cag-gitlab-docker-machine-1649252110-5691836a operation=create
Apr 06 13:36:33 ip-172-31-24-13 gitlab-runner[155844]: To see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: docker-machine env runner-k9ee2cag-gitlab-docker-machine-1649252110-5691836a  driver=amazonec2 name=runner-k9ee2cag-gitlab-docker-machine-1649252110-5691836a operation=create
Apr 06 13:36:33 ip-172-31-24-13 gitlab-runner[155844]: Machine created                                     duration=1m22.376991683s name=runner-k9ee2cag-gitlab-docker-machine-1649252110-5691836a now=2022-04-06 13:36:33.185950721 +0000 UTC m=+1109714.743182572 retries=0
Apr 06 13:39:33 ip-172-31-24-13 gitlab-runner[155844]: Starting docker-machine build...                    created=2022-04-06 07:39:37.533117067 +0000 UTC m=+1088299.090348926 docker=tcp://35.180.75.59:2376 job=2299342280 name=runner-k9ee2cag-gitlab-docker-machine-1649230777-c10facd5 now=2022-04-06 13:39:33.033567365 +0000 UTC m=+1109894.590799240 project=27672694 runner=k9ee2CAg usedcount=17
Apr 06 13:53:17 ip-172-31-24-13 gitlab-runner[155844]: Checking for jobs... received                       job=2299447035 repo_url=https://gitlab.com/ads-development/izanami-proxy.git runner=k9ee2CAg
Apr 06 13:53:18 ip-172-31-24-13 gitlab-runner[155844]: Using existing docker-machine                       created=2022-04-06 13:35:10.808885536 +0000 UTC m=+1109632.366117387 docker=tcp://13.36.172.133:2376 job=2299447035 name=runner-k9ee2cag-gitlab-docker-machine-1649252110-5691836a now=2022-04-06 13:53:18.579111892 +0000 UTC m=+1110720.136343742 project=24129043 runner=k9ee2CAg usedcount=1
Apr 06 13:53:20 ip-172-31-24-13 gitlab-runner[155844]: Running pre-create checks...                        driver=amazonec2 name=runner-k9ee2cag-gitlab-docker-machine-1649253200-fcbdf5d2 operation=create
Apr 06 13:53:20 ip-172-31-24-13 gitlab-runner[155844]: Creating machine...                                 driver=amazonec2 name=runner-k9ee2cag-gitlab-docker-machine-1649253200-fcbdf5d2 operation=create
Apr 06 13:53:20 ip-172-31-24-13 gitlab-runner[155844]: (runner-k9ee2cag-gitlab-docker-machine-1649253200-fcbdf5d2) Launching instance...  driver=amazonec2 name=runner-k9ee2cag-gitlab-docker-machine-1649253200-fcbdf5d2 operation=create
Apr 06 13:53:22 ip-172-31-24-13 gitlab-runner[155844]: (runner-k9ee2cag-gitlab-docker-machine-1649253200-fcbdf5d2) Waiting for spot instance...  driver=amazonec2 name=runner-k9ee2cag-gitlab-docker-machine-1649253200-fcbdf5d2 operation=create
Apr 06 13:53:37 ip-172-31-24-13 gitlab-runner[155844]: (runner-k9ee2cag-gitlab-docker-machine-1649253200-fcbdf5d2) Created spot instance request sir-t6vp8pzm  driver=amazonec2 name=runner-k9ee2cag-gitlab-docker-machine-1649253200-fcbdf5d2 operation=create
Apr 06 13:53:39 ip-172-31-24-13 gitlab-runner[155844]: Waiting for machine to be running, this may take a few minutes...  driver=amazonec2 name=runner-k9ee2cag-gitlab-docker-machine-1649253200-fcbdf5d2 operation=create

Just a 3 minute pause in the job and gitlab runner between 13:37 and 13:39

I have a guess that spot instance is destroyed during cache downloading that's why the job is stuck here But it may be another thing, has anyone an idea of what's happening ?

Thank you very much guys !

amazon-web-services

gitlab-ci-runner

spot-instances

0 Answers

Your Answer

Accepted video resources