Docker containers for Windows builds

TL;DR

Several problems makes me very pessimistic about actually being able use a Windows Docker image on GitLab.com in the near future and I may soon give up, at least temporarily. Hence this write-up.

Introduction

This is perhaps more of a “note to self” (hence the somewhat unstructured contents), but I feel that I have to write up the current status of trying to create and use a Windows Docker image containing the Graphviz build dependencies because at the moment I’m very pessimistic about actually being able to accomplish this in the near future. There are several reasons for my pessimism and I will try to describe them below if someone else (including my future self) wants to have a stab at it or have useful insights that will allow us to get further.

Current status

First I tried to create a Docker image containing all the dependencies but ran into problem 1, 2, & 3 below. I then tried to create a bare minimum image that only contains the stuff that was needed in order to even start a CMake compilation, but ran into problem 1 again and also found problem 4 below. I guess all problems except problem 1 can be solved by someone with enough knowledge and/or time, but problem 1 makes is rather fruitless since it seems to be a show stopper.

The Dockerfiles under ci/windows-x86 and ci/windows-x64 in https://gitlab.com/magjac/graphviz/-/commits/gitlab-ci-windows-docker are these minimum images. This pipeline shows the (non) result.

EDIT: In this job, the whole image was pulled and the containter just started before the timeout.

Problem 1: Not possible to use large Windows Docker images on GitLab.com

TL;DR

The produced Windows Docker images will become so large that it will take longer time to pull them than what is currently available for a single job on GitLab.com (1h). I filed this issue about it.

Detailed causes

Cause 1A: No caching of Docker images

TL;DR

Images are not cached and must be pulled for each job.

Details

From https://docs.gitlab.com/ee/user/gitlab_com/#limitations-and-known-issues:

“The Windows Shared Runner virtual machine instances do not use the GitLab Docker executor. This means that you will not be able to specify image or services in your pipeline configuration.”

This means that in order to use Windows Docker images, you need to explicitly use docker run .... This is in itself not a problem, but it leads to other problems detailed below.

(I’m assuming here that since our Linux images are pulled within a minute there’s some caching involved when using the images keyword, but maybe there’s some other mechanism that makes it fast?)

Cause 1B: The size of the Docker image

TL;DR

We must start with .NET framework SDK as base image which is 10.2 GB. With VCTools & vcpkg it’s at 15+ GB before adding the Graphviz build dependencies.

Details

From https://docs.microsoft.com/en-us/visualstudio/install/build-tools-container?view=vs-2019#create-and-build-the-dockerfile:

# Use the latest Windows Server Core image with .NET Framework 4.8.
FROM mcr.microsoft.com/dotnet/framework/sdk:4.8-windowsservercore-ltsc2019

Warning
If you base your image directly on microsoft/windowsservercore, the .NET Framework might not install properly and no install error is indicated.”

I can confirm that using the microsoft/windowsservercore-ltsc2019 image (which is much smaller, only 5 GB and also already preloaded in docker) indeed doesn’t work since I’ve tried it. Everything goes fine, but you end up without the C compiler.

The size of this image is in itself 10.2 GB and when you’ve installed Visual Studio build tools in it, it’s 14.6 GB. Add git and vcpkg and you’re at 15.6 GB before you even start to install the Graphviz dependencies.

~$ docker images
REPOSITORY                               TAG                                      IMAGE ID            CREATED             SIZE
graphviz                                 windows-visualstudio-vctools-git-vcpkg   a2f2e9730d0c        42 minutes ago      15.6GB
graphviz                                 windows-visualstudio-vctools             c2b4a9f3bfd5        43 minutes ago      14.6GB
mcr.microsoft.com/dotnet/framework/sdk   4.8-windowsservercore-ltsc2019           be6035551084        2 days ago          10.2GB
mcr.microsoft.com/windows/servercore     ltsc2019                                 987b1d5e0abf        9 days ago          4.99GB

Cause 1C: The time it takes to pull the image

TL;DR

It takes around one hour to pull a 15 GB image.

Details

I’ve tried pulling both from hub.docker.com and from registry.gitlab.com. The speed is similar. If anything hub.docker.com was slightly faster.

Currently there is a soft (10GB) size restriction for Registry on GitLab.com. I don’t know exactly what that means, but it’s possible that the pull bandwidth is throttled when you get above 10GB. I haven’t seen any error messages saying that the image is too large, just timeouts when it fails to pull within one hour.

Some data is collected in this pipeline.

Cause 1D: The current GitLab.com Windows shared runner timeout

TL;DR

The timeout is currently one hour.

Details

Because the image is not cached by GitLab it needs to be pulled every time a job starts and this will take more than the MaximumTimeout = 3600 that is the hard limit on the current shared window runners.

Problem 2: Installation of GTK hangs

Installing GTK with vcpkg never finishes. I’ve tracked this down to its dependency libepoxy and filed this issue.

Problem 3: Pixman doesn’t (any longer?) exist as a dynamic library (.dll)

The CMake build expects to find pixman-1.dll, but vcpkg installs only a pixman-1.lib.

I expect this to be solvable. Either by changing to use the static lib or by somehow installing a dynamic lib (perhaps through pacman), but I currently know too little about CMake and linking on Windows for this to be a quick fix. Given the problems above, I even don’t know if it’s worth the effort.

Problem 4: ‘aaglval’: undeclared identifier

When building in the minimum Docker images mentioned above, the following errors occur:

scan.l(79,9): error C2065: 'aaglval': undeclared identifier [C:\graphviz\build\lib\cgraph\cgraph.vcxproj]
scan.l(79,10): error C2224: left of '.str' must have struct/union type [C:\graphviz\build\lib\cgraph\cgraph.vcxproj]
scan.l(84,9): error C2065: 'aaglval': undeclared identifier [C:\graphviz\build\lib\cgraph\cgraph.vcxproj]
scan.l(84,10): error C2224: left of '.str' must have struct/union type [C:\graphviz\build\lib\cgraph\cgraph.vcxproj]
scan.l(214,10): error C2065: 'aaglval': undeclared identifier [C:\graphviz\build\lib\cgraph\cgraph.vcxproj]
scan.l(214,11): error C2224: left of '.str' must have struct/union type [C:\graphviz\build\lib\cgraph\cgraph.vcxproj]
scan.l(215,44): error C2065: 'aaglval': undeclared identifier [C:\graphviz\build\lib\cgraph\cgraph.vcxproj]
scan.l(215,45): error C2224: left of '.str' must have struct/union type [C:\graphviz\build\lib\cgraph\cgraph.vcxproj]

I expect this also to be solvable, but not worth the effort at the moment.

Yet unknown problems

There are a lot of dependencies that I haven’t even tried to install yet and I expect that some of them will cause problems.

What to do?

Alternative 1: Pursue Docker images

Workaround or solve the problems above. Perhaps await a better Shared Windows Runner?

Alternative 2: Continue to use Git submodules

Install all dependencies in a Git repo à la Erwin Janssen’s graphviz-windows-dependencies

Alternative 3: Move to another CI/CD cloud provider.

I’ll note another alternative: we could give up on GitLab Runners for this and use another cloud provider? We’d need to get some credits, but perhaps we can get some open source project credits.

Yes, we could do that. Not all of them have good integration with GitLab though. Moving the whole project back to GitHub would make life easier.

I’ve used Travis and CircleCI with GitHub, but CircleCI does not support GitLab:

I’ve added this as a third alternative to the OP.