Randomly some Docker-containers on a clients Linux machine wouldn’t start - they’d fail with the error:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:319: getting the final child's pid from pipe caused \"EOF\"": unknown.

It didn’t matter what container it was, nor where or who started it. Sometimes it happened with Gitlab-CI jobs, sometimes with manually run containers. It also didn’t happen everytime. Sometimes it just worked, other times it didn’t work for hours and I’d ignore it.

The last time it happened I googled the error again and again found this huge open issue on Docker’s Github issue-tracker. This time I read every response in the thread and came upon this answer:

Okay, just found another interesting post on other forum. If you are running an vps which is virtualized with Virtuozzo you hosting provider maybe locked your tasks…

Im using strato and it seems to be that they have limited my server. Under /proc/user_beancounters you can find those settings. The numprocs is set to 700 and my actual held is 661. Starting an bigger docker stock seems to be impossible…

You can find more in this post https://serverfault.com/questions/1017994/docker-compose-oci-runtime-create-failed-pthread-create-failed/1018402

It seems to be there is no bug…

This sounded like my kind of problem as my client is using a Strato server, too!

Now my native language isn’t English, however I remembered the term “beancounter” from the BOFH and I really do hope that this feature originates somewhat from there (though I know that the term is older than the BOFH).

Looking at the servers beancounter gives me the following output:

root@h2939459:~# cat /proc/user_beancounters 
Version: 2.5
    uid  resource                     held              maxheld              barrier                limit              failcnt
2939459: kmemsize                353345536           1649811456  9223372036854775807  9223372036854775807                    0
            lockedpages                     0                   32  9223372036854775807  9223372036854775807                    0
            privvmpages               5861268              6833651  9223372036854775807  9223372036854775807                    0
            shmpages                  2222328              2255102  9223372036854775807  9223372036854775807                    0
            dummy                           0                    0  9223372036854775807  9223372036854775807                    0
            numproc                       541                  541                 1100                 1100                    0
            physpages                 1461605              8319910              8388608              8388608                    0
            vmguarpages                     0                    0  9223372036854775807  9223372036854775807                    0
            oomguarpages              1525697              8388608                    0                    0                    0
            numtcpsock                      0                    0  9223372036854775807  9223372036854775807                    0
            numflock                        0                    0  9223372036854775807  9223372036854775807                    0
            numpty                          1                    2  9223372036854775807  9223372036854775807                    0
            numsiginfo                     16                  147  9223372036854775807  9223372036854775807                    0
            tcpsndbuf                       0                    0  9223372036854775807  9223372036854775807                    0
            tcprcvbuf                       0                    0  9223372036854775807  9223372036854775807                    0
            othersockbuf                    0                    0  9223372036854775807  9223372036854775807                    0
            dgramrcvbuf                     0                    0  9223372036854775807  9223372036854775807                    0
            numothersock                    0                    0  9223372036854775807  9223372036854775807                    0
            dcachesize              264458240           1497632768  9223372036854775807  9223372036854775807                    0
            numfile                      4284                 7930  9223372036854775807  9223372036854775807                    0
            dummy                           0                    0  9223372036854775807  9223372036854775807                    0
            dummy                           0                    0  9223372036854775807  9223372036854775807                    0
            dummy                           0                    0  9223372036854775807  9223372036854775807                    0
            numiptent                     1997                  2000                2000                 2000                    74

Take a look at the last line:

uid  resource  held maxheld barrier limit failcnt
     numiptent 1997 2000    2000    2000  74

This shows that the resource “numiptent” is currently using 1997 “units”, where 2000 can be held totally. “failcnt” shows the count of refused allocations. When I run another container, this count increased so it must be the problem!

A quick search revealed that “numiptent” are the number of NETFILTER (IP packet filtering) entries, or in simpler terms: the number of iptables rules.

I immediatly knew the reason for this: the client is using fail2ban which blocks IP-addresses that try to bruteforce ssh-logins on the server. Looking at the fail2ban-overview I noticed that 1900 IP-addresses where blocked, conspicuously close to the beancounter-limit of 2000!

Restarting fail2ban threw out all these IP-addresses which decreased the beancounter on “numiptent”.

Reason found! Now why Strato decided to limit the number of iptables-entries and what to do against this (apart from disabling fail2ban) - I don’t know yet.



Related posts: