GitLab runtimes

magjac · October 7, 2021, 5:39pm

Many of the MinGW autotools CI build jobs that I’m currently working on takes more that the allowed 1 hour to build and thus fails with: ERROR: Job failed: execution took longer than 1h0m0s seconds.

I’ve measured the actual build time in a job that actually finishes before the deadline and the actual build time (excluding installation) is 45 minutes. The same process takes around 15 minutes on my local Windows 10 laptop with this CPU spec: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz 2.69 GHz.

Is it reasonable that the GitLab servers are that slow in comparison?

All the jobs can be found in this pipeline.

smattr · October 8, 2021, 12:28am

It’s possible. These are backed by GCP, right? My experience with other GCP-backed CI providers is that there is a massive difference in what tier the CI provider are using. Empirically for the same task:

Travis CI: regularly hitting a 50 minute timeout. I had to shard the task into 5 separate jobs, which each individually occasionally hit the 50 minute timeout too.
Cirrus CI: same unsharded task finishes in <25 minutes.

mark · October 8, 2021, 1:11am

Yeah I’m also not very surprised to hear that cloud VMs we get for free might be underpowered, particularly in how much CPU they can access.

schmoo2k · October 8, 2021, 6:45am

On internal GitLab server we were able to alter the default timeout, not sure if you can on the public version: Configuring runners | GitLab

magjac · October 8, 2021, 10:32am

There seems to be a hard limit of 3600 seconds on the Windows cloud runners:

smattr · October 8, 2021, 3:16pm

I was going to suggest my favorite solution to accelerating any system: do less. We could avoid building Lefty on MinGW. Though looking at ci/build.sh it looks like we already don’t build Lefty.

Do you know how many cores these runners have? We could run make -j …. It will make the build output a bit unreadable, but I don’t think we’re in the phase of paying down MinGW compiler warnings yet, so maybe this is bearable.

magjac · October 9, 2021, 4:30pm

No, and I’m not sure that’s how they work. We might be sharing their capacity with other jobs.

I’ve tried that now. It might have given a 5% improvement. The fastest of the problematic jobs that took very close to 60 minutes without -j, which sometimes succeeds and sometimes fail, now took 57 minutes, but I’ve done a rebase on main in between so it might not be significant. The other three jobs still failed though.

57-min job:

All jobs:

smattr · October 9, 2021, 4:42pm

What I’m hearing is that you’d like to join me in the dead code removal quest

magjac · October 9, 2021, 4:54pm

Your hearing is extraordinary

magjac · November 14, 2021, 8:08pm

This problem seems to be MinGW specific:

Some of the linked to answers seems to indicate that Cygwin has a better approach. Possibly that’s why we don’t see this problem with Cygwin in CI.

magjac · November 21, 2021, 1:57pm

FWIW, here is /proc/cpuinfo from the GitLab runners:

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 63
model name	: Intel(R) Xeon(R) CPU @ 2.30GHz
stepping	: 0
microcode	: 0x1
cpu MHz		: 2300.000
cache size	: 46080 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 17
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht constant_tsc rep_good nopl xtopology cpuid pni pclmuldq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat md_clear arch_capabilities
bogomips	: 4600.00
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 63
model name	: Intel(R) Xeon(R) CPU @ 2.30GHz
stepping	: 0
microcode	: 0x1
cpu MHz		: 2300.000
cache size	: 46080 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 1
apicid		: 1
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 17
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht constant_tsc rep_good nopl xtopology cpuid pni pclmuldq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat md_clear arch_capabilities
bogomips	: 4600.00
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

smattr · November 21, 2021, 4:58pm

My thoughts on what we could try:

-j2. May not help as I’ve come across other CI environments where the number of virtual CPUs is basically a lie and there aren’t enough physical CPUs backing them to give you any meaningful parallelism.
persistent pacman cache
persistent ccache cache

I suspect the last one may be the most valuable thing to go after.

Topic		Replies	Views
Docker containers for Windows builds Dev	2	1795	July 23, 2020
"fatal error C1060: compiler is out of heap space" for lib/edgepaint/lab_gamut.c Dev	42	3637	July 2, 2020
GitLab macOS runners GA? Dev	7	676	September 12, 2021
Requesting new Docker images with ghostscript Dev	42	3998	January 28, 2021
Windows x86, x86-64 builds Help	19	3334	August 27, 2020

GitLab runtimes

Related topics