Unable to upgrade the kubernetes version on the nodes

I’m unable to successfully update my kubernetes cluserter using KubeOne. I get the following error:

ubuntu@ip-10-50-1-39:~/kubeone/examples/terraform/kube$ kubeone apply --manifest kubeone.yaml -t tf.json --upgrade-machine-deployments
INFO[14:14:02 UTC] Determine hostname…
INFO[14:14:02 UTC] Determine operating system…
INFO[14:14:02 UTC] Running host probes…
INFO[14:14:02 UTC] Electing cluster leader…
INFO[14:14:02 UTC] Elected leader “ip-10-50-1-139.ec2.internal”…
INFO[14:14:03 UTC] Building Kubernetes clientset…
INFO[14:14:03 UTC] Running cluster probes…
The following actions will be taken:
Run with --verbose flag for more information.
~ upgrade control plane node “ip-10-50-1-139.ec2.internal” (10.50.1.139): 1.18.2 -> 1.18.3
~ upgrade control plane node “ip-10-50-1-134.ec2.internal” (10.50.1.134): 1.18.2 -> 1.18.3
~ upgrade control plane node “ip-10-50-1-77.ec2.internal” (10.50.1.77): 1.18.2 -> 1.18.3
~ ensure nodelocaldns
~ ensure CNI
~ ensure credential
~ ensure machine-controller
~ upgrade MachineDeployments
Do you want to proceed (yes/no): yes
INFO[14:14:15 UTC] Determine hostname…
INFO[14:14:15 UTC] Determine operating system…
INFO[14:14:15 UTC] Generating kubeadm config file…
INFO[14:14:15 UTC] Uploading config files… node=10.50.1.77
INFO[14:14:15 UTC] Uploading config files… node=10.50.1.139
INFO[14:14:15 UTC] Uploading config files… node=10.50.1.134
INFO[14:14:15 UTC] Building Kubernetes clientset…
INFO[14:14:15 UTC] Running preflight checks…
INFO[14:14:15 UTC] Verifying that Docker, Kubelet and Kubeadm are installed…
INFO[14:14:15 UTC] Verifying that nodes in the cluster match nodes defined in the manifest…
INFO[14:14:15 UTC] Verifying that all nodes in the cluster are ready…
INFO[14:14:15 UTC] Verifying that there is no upgrade in the progress…
INFO[14:14:15 UTC] Verifying is it possible to upgrade to the desired version…
INFO[14:14:15 UTC] Labeling leader control plane… node=10.50.1.139
INFO[14:14:15 UTC] Draining leader control plane… node=10.50.1.139
INFO[14:14:49 UTC] Upgrading kubeadm binary on the leader control plane… node=10.50.1.139
INFO[14:14:50 UTC] Running ‘kubeadm upgrade’ on leader control plane node… node=10.50.1.139
WARN[14:24:56 UTC] Task failed, error was: failed to run ‘kubeadm upgrade’ on leader control plane: export PATH=/usr/bin:/bin:/usr/sbin:/sbin:/sbin:/usr/local/bin:/opt/bin
PATH=/usr/bin:/bin:/usr/sbin:/sbin:/sbin:/usr/local/bin:/opt/bin
sudo kubeadm upgrade apply -y --certificate-renewal=true 1.18.3 --config=./kubeone/cfg/master_0.yaml
W0825 14:14:51.025123 4192 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
W0825 14:14:51.040512 4192 common.go:94] WARNING: Usage of the --config flag for reconfiguring the cluster during upgrade is not recommended!
W0825 14:14:51.042473 4192 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[upgrade/apply] FATAL: couldn’t upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: timed out waiting for the condition
To see the stack trace of this error execute with --v=5 or higher
: Process exited with status 1
WARN[14:25:01 UTC] Retrying task…
INFO[14:25:01 UTC] Labeling leader control plane… node=10.50.1.139
INFO[14:25:01 UTC] Draining leader control plane… node=10.50.1.139
INFO[14:25:02 UTC] Upgrading kubeadm binary on the leader control plane… node=10.50.1.139
INFO[14:25:04 UTC] Running ‘kubeadm upgrade’ on leader control plane node… node=10.50.1.139
WARN[14:25:07 UTC] Task failed, error was: failed to run ‘kubeadm upgrade’ on leader control plane: export PATH=/usr/bin:/bin:/usr/sbin:/sbin:/sbin:/usr/local/bin:/opt/bin
PATH=/usr/bin:/bin:/usr/sbin:/sbin:/sbin:/usr/local/bin:/opt/bin
sudo kubeadm upgrade apply -y --certificate-renewal=true 1.18.3 --config=./kubeone/cfg/master_0.yaml
W0825 14:25:04.486319 5332 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
W0825 14:25:04.530437 5332 common.go:94] WARNING: Usage of the --config flag for reconfiguring the cluster during upgrade is not recommended!
W0825 14:25:04.532465 5332 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[upgrade/health] FATAL: [preflight] Some fatal errors occurred:
[ERROR ControlPlaneNodesReady]: there are NotReady control-planes in the cluster: [ip-10-50-1-139.ec2.internal]
[preflight] If you know what you are doing, you can make a check non-fatal with --ignore-preflight-errors=...
To see the stack trace of this error execute with --v=5 or higher
: Process exited with status 1
WARN[14:25:17 UTC] Retrying task…
INFO[14:25:17 UTC] Labeling leader control plane… node=10.50.1.139
INFO[14:25:17 UTC] Draining leader control plane… node=10.50.1.139
INFO[14:25:17 UTC] Upgrading kubeadm binary on the leader control plane… node=10.50.1.139
INFO[14:25:19 UTC] Running ‘kubeadm upgrade’ on leader control plane node… node=10.50.1.139
WARN[14:25:21 UTC] Task failed, error was: failed to run ‘kubeadm upgrade’ on leader control plane: export PATH=/usr/bin:/bin:/usr/sbin:/sbin:/sbin:/usr/local/bin:/opt/bin
PATH=/usr/bin:/bin:/usr/sbin:/sbin:/sbin:/usr/local/bin:/opt/bin
sudo kubeadm upgrade apply -y --certificate-renewal=true 1.18.3 --config=./kubeone/cfg/master_0.yaml
W0825 14:25:19.519746 5391 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
W0825 14:25:19.535200 5391 common.go:94] WARNING: Usage of the --config flag for reconfiguring the cluster during upgrade is not recommended!
W0825 14:25:19.537205 5391 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[upgrade/health] FATAL: [preflight] Some fatal errors occurred:
[ERROR ControlPlaneNodesReady]: there are NotReady control-planes in the cluster: [ip-10-50-1-139.ec2.internal]
[preflight] If you know what you are doing, you can make a check non-fatal with --ignore-preflight-errors=...
To see the stack trace of this error execute with --v=5 or higher
: Process exited with status 1

The Cluster nodes are all running on flatcarOS.
Also, is there a way to add user data to the worker nodes?

I have some updates on this issue.

I was able to update the cluster from version 1.18.3 to 1.18.8. However, it was not a clean update as I had to do the following steps.

First of all, my cluster is running on FlatcarOS 2512.3.0. on AWS EC2 instances.

I had to set the hostnames of the nodes in the cluster as the internal FQDN (For example ip-10-50-1-60.ec2.internal)

Also, I noticed that the update freezes when the process is waiting for the kubelet to restart the kubernetes components as shown below:

INFO[15:31:53 UTC] Upgrading Kubernetes binaries on follower control plane…  node=10.50.1.31
[10.50.1.31] + export PATH=/usr/bin:/bin:/usr/sbin:/sbin:/sbin:/usr/local/bin:/opt/bin
[10.50.1.31] + PATH=/usr/bin:/bin:/usr/sbin:/sbin:/sbin:/usr/local/bin:/opt/bin
[10.50.1.31] + HOST_ARCH=
[10.50.1.31] + case $(uname -m) in
[10.50.1.31] ++ uname -m
[10.50.1.31] + HOST_ARCH=amd64
[10.50.1.31] + source /etc/kubeone/proxy-env
[10.50.1.31] + sudo mkdir -p /opt/cni/bin
[10.50.1.31] + curl -L https://github.com/containernetworking/plugins/releases/download/v0.8.6/cni-plugins-linux-amd64-v0.8.6.tgz
[10.50.1.31] + sudo tar -C /opt/cni/bin -xz
[10.50.1.31]   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
[10.50.1.31]                                  Dload  Upload   Total   Spent    Left  Speed
100   658  100   658    0     0  12185      0 --:--:-- --:--:-- --:--:-- 12415
100 35.1M  100 35.1M    0     0  26.3M      0  0:00:01  0:00:01 --:--:-- 31.4M
[10.50.1.31] + RELEASE=v1.18.8
[10.50.1.31] + sudo mkdir -p /var/tmp/kube-binaries
[10.50.1.31] + cd /var/tmp/kube-binaries
[10.50.1.31] + sudo curl -L --remote-name-all https://storage.googleapis.com/kubernetes-release/release/v1.18.8/bin/linux/amd64/kubeadm
[10.50.1.31]   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
[10.50.1.31]                                  Dload  Upload   Total   Spent    Left  Speed
100 37.9M  100 37.9M    0     0  71.6M      0 --:--:-- --:--:-- --:--:-- 71.7M
[10.50.1.31] + sudo mkdir -p /opt/bin
[10.50.1.31] + cd /opt/bin
[10.50.1.31] + sudo systemctl stop kubelet
[10.50.1.31] + sudo mv /var/tmp/kube-binaries/kubeadm .
[10.50.1.31] + sudo chmod +x kubeadm
INFO[15:31:56 UTC] Running 'kubeadm upgrade' on the follower control plane node…  node=10.50.1.31
[10.50.1.31] + export PATH=/usr/bin:/bin:/usr/sbin:/sbin:/sbin:/usr/local/bin:/opt/bin
[10.50.1.31] + PATH=/usr/bin:/bin:/usr/sbin:/sbin:/sbin:/usr/local/bin:/opt/bin
[10.50.1.31] + sudo kubeadm upgrade node --certificate-renewal=true
[10.50.1.31] [upgrade] Reading configuration from the cluster...
[10.50.1.31] [upgrade] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[10.50.1.31] [upgrade] Upgrading your Static Pod-hosted control plane instance to version "v1.18.8"...
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-controller-manager-ip-10-50-1-31.ec2.internal hash: 308a8dd207043bc03d15f609228945a8
[10.50.1.31] Static pod: kube-scheduler-ip-10-50-1-31.ec2.internal hash: a8caea92c80c24c844216eb1d68fe417
[10.50.1.31] [upgrade/etcd] Upgrading to TLS for etcd
[10.50.1.31] [upgrade/etcd] Non fatal issue encountered during upgrade: the desired etcd version for this Kubernetes version "v1.18.8" is "3.4.3-0", but the current etcd version is "3.4.3". Won't downgrade etcd, instead just continue
[10.50.1.31] [upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests079843721"
[10.50.1.31] W0901 15:31:57.731151   29670 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[10.50.1.31] [upgrade/staticpods] Preparing for "kube-apiserver" upgrade
[10.50.1.31] [upgrade/staticpods] Renewing apiserver certificate
[10.50.1.31] [upgrade/staticpods] Renewing apiserver-kubelet-client certificate
[10.50.1.31] [upgrade/staticpods] Renewing front-proxy-client certificate
[10.50.1.31] [upgrade/staticpods] Renewing apiserver-etcd-client certificate
[10.50.1.31] [upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-apiserver.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2020-09-01-15-31-56/kube-apiserver.yaml"
[10.50.1.31] [upgrade/staticpods] Waiting for the kubelet to restart the component
[10.50.1.31] [upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f
[10.50.1.31] Static pod: kube-apiserver-ip-10-50-1-31.ec2.internal hash: d47cf9f09213846a8a4166d21ab6ec9f

I then had to SSH into each of the master nodes to manually start the kubelet systemd service (See kubelet service status below). After starting the kubelet service, the update completes successfully.

core@ip-10-50-1-31 ~ $ systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: inactive (dead) since Tue 2020-09-01 15:31:55 UTC; 1min 45s ago
     Docs: https://kubernetes.io/docs/home/
  Process: 5438 ExecStart=/opt/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=0/SUCCESS)
 Main PID: 5438 (code=exited, status=0/SUCCESS)
Sep 01 15:31:47 ip-10-50-1-31.ec2.internal kubelet[5438]: W0901 15:31:47.202810    5438 watcher.go:87] Error while processing event ("/sys/fs/cgroup/pids/libcontainer_29176_systemd_test_default.slice": 0x400001>
Sep 01 15:31:47 ip-10-50-1-31.ec2.internal kubelet[5438]: W0901 15:31:47.353165    5438 watcher.go:87] Error while processing event ("/sys/fs/cgroup/devices/libcontainer_29221_systemd_test_default.slice": 0x400>
Sep 01 15:31:47 ip-10-50-1-31.ec2.internal kubelet[5438]: W0901 15:31:47.353230    5438 watcher.go:87] Error while processing event ("/sys/fs/cgroup/pids/libcontainer_29221_systemd_test_default.slice": 0x400001>
Sep 01 15:31:47 ip-10-50-1-31.ec2.internal kubelet[5438]: W0901 15:31:47.385362    5438 watcher.go:87] Error while processing event ("/sys/fs/cgroup/memory/libcontainer_29228_systemd_test_default.slice": 0x4000>
Sep 01 15:31:47 ip-10-50-1-31.ec2.internal kubelet[5438]: W0901 15:31:47.385442    5438 watcher.go:87] Error while processing event ("/sys/fs/cgroup/devices/libcontainer_29228_systemd_test_default.slice": 0x400>
Sep 01 15:31:47 ip-10-50-1-31.ec2.internal kubelet[5438]: W0901 15:31:47.385469    5438 watcher.go:87] Error while processing event ("/sys/fs/cgroup/pids/libcontainer_29228_systemd_test_default.slice": 0x400001>
Sep 01 15:31:55 ip-10-50-1-31.ec2.internal kubelet[5438]: I0901 15:31:55.624514    5438 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: 7e65b6d9fe2928c38237d85924ddcf0f15b42fd1e5493c2>
Sep 01 15:31:55 ip-10-50-1-31.ec2.internal systemd[1]: Stopping kubelet: The Kubernetes Node Agent...
Sep 01 15:31:55 ip-10-50-1-31.ec2.internal systemd[1]: kubelet.service: Succeeded.
Sep 01 15:31:55 ip-10-50-1-31.ec2.internal systemd[1]: Stopped kubelet: The Kubernetes Node Agent.

However, if I do not manually start the kubelet service on each master node, the upgrade fails with the error below:

[10.50.1.32] error execution phase control-plane: couldn't complete the static pod upgrade: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: timed out waiting for the condition
[10.50.1.32] To see the stack trace of this error execute with --v=5 or higher
WARN[15:47:19 UTC] Task failed, error was: failed to upgrade follower control plane: Process exited with status 1
WARN[15:47:29 UTC] Retrying task…

Please is there a better way to do this or how can I ensure that the KubeOne upgrade process starts the kubelet service on each master node?

Any input will be appreciated.

Our AWS terraform example includes this line:

So we force hostnames as AWS see it and kubernetes expect exactly that, does you config have this?

Yes It is. See below

output "kubeone_hosts" {
  description = "Control plane endpoints to SSH to"

  value = {
    control_plane = {
      cluster_name         = var.cluster_name
      cloud_provider       = "aws"
      private_address      = aws_instance.control_plane.*.private_ip
      hostnames            = aws_instance.control_plane.*.private_dns
      ssh_agent_socket     = var.ssh_agent_socket
      ssh_port             = var.ssh_port
      ssh_private_key_file = var.ssh_private_key_file
      ssh_user             = var.ssh_username
    }
  }
}

And here is my terraform output below:

ubuntu@ip-10-50-7-144:~/mounts/terraform-cluster-deployments/testk8s$ ~/terraform output
kubeone_api = {
  "endpoint" = "internal-testk8s-api-lb-395431455.us-east-1.elb.amazonaws.com"
}
kubeone_hosts = {
  "control_plane" = {
    "cloud_provider" = "aws"
    "cluster_name" = "testk8s"
    "hostnames" = [
      "ip-10-50-1-60.ec2.internal",
      "ip-10-50-1-61.ec2.internal",
      "ip-10-50-1-62.ec2.internal",
    ]
    "private_address" = [
      "10.50.1.60",
      "10.50.1.61",
      "10.50.1.62",
    ]
    "ssh_agent_socket" = "env:SSH_AUTH_SOCK"
    "ssh_port" = 22
    "ssh_private_key_file" = "/home/ubuntu/.ssh/id_rsa"
    "ssh_user" = "core"
  }
}