GitLab runtimes

Many of the MinGW autotools CI build jobs that I’m currently working on takes more that the allowed 1 hour to build and thus fails with: ERROR: Job failed: execution took longer than 1h0m0s seconds.

I’ve measured the actual build time in a job that actually finishes before the deadline and the actual build time (excluding installation) is 45 minutes. The same process takes around 15 minutes on my local Windows 10 laptop with this CPU spec: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz 2.69 GHz.

Is it reasonable that the GitLab servers are that slow in comparison?

All the jobs can be found in this pipeline.

It’s possible. These are backed by GCP, right? My experience with other GCP-backed CI providers is that there is a massive difference in what tier the CI provider are using. Empirically for the same task:

  • Travis CI: regularly hitting a 50 minute timeout. I had to shard the task into 5 separate jobs, which each individually occasionally hit the 50 minute timeout too.
  • Cirrus CI: same unsharded task finishes in <25 minutes.
1 Like

Yeah I’m also not very surprised to hear that cloud VMs we get for free might be underpowered, particularly in how much CPU they can access.

1 Like

On internal GitLab server we were able to alter the default timeout, not sure if you can on the public version: Configuring runners | GitLab

There seems to be a hard limit of 3600 seconds on the Windows cloud runners:

I was going to suggest my favorite solution to accelerating any system: do less. We could avoid building Lefty on MinGW. Though looking at ci/build.sh it looks like we already don’t build Lefty.

Do you know how many cores these runners have? We could run make -j …. It will make the build output a bit unreadable, but I don’t think we’re in the phase of paying down MinGW compiler warnings yet, so maybe this is bearable.

No, and I’m not sure that’s how they work. We might be sharing their capacity with other jobs.

I’ve tried that now. It might have given a 5% improvement. The fastest of the problematic jobs that took very close to 60 minutes without -j, which sometimes succeeds and sometimes fail, now took 57 minutes, but I’ve done a rebase on main in between so it might not be significant. The other three jobs still failed though.

57-min job:

All jobs:

What I’m hearing is that you’d like to join me in the dead code removal quest :wink:

Your hearing is extraordinary :grin:

This problem seems to be MinGW specific:

Some of the linked to answers seems to indicate that Cygwin has a better approach. Possibly that’s why we don’t see this problem with Cygwin in CI.

FWIW, here is /proc/cpuinfo from the GitLab runners:

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 63
model name	: Intel(R) Xeon(R) CPU @ 2.30GHz
stepping	: 0
microcode	: 0x1
cpu MHz		: 2300.000
cache size	: 46080 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 17
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht constant_tsc rep_good nopl xtopology cpuid pni pclmuldq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat md_clear arch_capabilities
bogomips	: 4600.00
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 63
model name	: Intel(R) Xeon(R) CPU @ 2.30GHz
stepping	: 0
microcode	: 0x1
cpu MHz		: 2300.000
cache size	: 46080 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 1
apicid		: 1
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 17
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht constant_tsc rep_good nopl xtopology cpuid pni pclmuldq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat md_clear arch_capabilities
bogomips	: 4600.00
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

My thoughts on what we could try:

  • -j2. May not help as I’ve come across other CI environments where the number of virtual CPUs is basically a lie and there aren’t enough physical CPUs backing them to give you any meaningful parallelism.
  • persistent pacman cache
  • persistent ccache cache

I suspect the last one may be the most valuable thing to go after.

1 Like