MachineController Webhook on KubeOne

I’ve tried KubeOne with the AWS example and it was working fine. However, I created a new Terraform to add other things, Kubernetes Cluster + other things, and I have a problem with the MachineController Webhook at the end of the deployment with KubeOne. I have a cryptic message : error was: failed to deploy Machines: failed to ensure MachineDeployment: failed to create *v1alpha1.MachineDeployment: Internal error occurred: failed calling webhook “machine-controller.kubermatic.io-machinedeployments”: the server is currently unable to handle the request

In my Terraform output, I have this (I removed the last 2 kubeone workers since they are the same):

kubeone_api = {
  "endpoint" = "k8s-cluster-lb-502536948.eu-west-1.elb.amazonaws.com"
}
kubeone_hosts = {
  "control_plane" = {
    "bastion" = "34.240.143.238"
    "bastion_port" = 22
    "bastion_user" = "ubuntu"
    "cloud_provider" = "aws"
    "cluster_name" = "automation-leonardo"
    "hostnames" = [
      "ip-192-168-1-113.eu-west-1.compute.internal",
      "ip-192-168-2-211.eu-west-1.compute.internal",
      "ip-192-168-3-238.eu-west-1.compute.internal",
    ]
    "private_address" = [
      "192.168.1.113",
      "192.168.2.211",
      "192.168.3.238",
    ]
    "ssh_agent_socket" = "env:SSH_AUTH_SOCK"
    "ssh_port" = 22
    "ssh_private_key_file" = "./k8s-sandbox_keypair.pem"
    "ssh_user" = "ubuntu"
  }
}
kubeone_workers = {
  "automation-leonardo-eu-west-1a" = {
    "providerSpec" = {
      "cloudProviderSpec" = {
        "ami" = "ami-09652a7c0d6ff41a3"
        "assignPublicIP" = true
        "availabilityZone" = "eu-west-1a"
        "diskIops" = 500
        "diskSize" = 20
        "diskType" = "gp2"
        "ebsVolumeEncrypted" = false
        "instanceProfile" = "kubernetes-cluster-profile"
        "instanceType" = "t3.medium"
        "isSpotInstance" = false
        "region" = "eu-west-1"
        "securityGroupIDs" = [
          "sg-04d3d134cf56e50f8",
        ]
        "subnetId" = "subnet-062fb58020631c383"
        "tags" = {
          "Environment" = "Automatisation Leonardo"
          "Owner" = "...by Terraform"
          "Project" = "Leonardo"
          "automation-leonardo-workers" = ""
          "kubernetes.io/cluster/automation-leonardo" = "shared"
        }
        "vpcId" = "vpc-0f0376c0a0e4cdd37"
      }
      "operatingSystem" = "ubuntu"
      "operatingSystemSpec" = {
        "distUpgradeOnBoot" = false
      }
      "sshPublicKeys" = [
        "snip",
      ]
    }
    "replicas" = 1
  }
}

And I have this in my cluster.yaml:

apiVersion: kubeone.io/v1alpha1
kind: KubeOneCluster
versions:
  kubernetes: "1.16.10"
cloudProvider:
  name: "aws"
clusterNetwork:
  cni:
    provider: weave-net
machineController:
  deploy: true

And I don’t really understand why it’s failing

To add more informations on this ticket, you will find the terraform I have used to deploy in terraform.zip
You’ll need to provide aws credentials and a public SSH Key

Link to the zip : https://www.transfernow.net/eQU0j3082020

Do you have access to the machine-controller.machine-controller-webhook logs?

A quick peek into the terraform did not yield something wrong in regards to network configuration

  • egress to 0.0.0.0:443 is allowed (aws apis)
  • the vpc doesn’t have a NAT GW but the control-plane-instances live in a public subnet

I suspect that the iam policy is lacking access to iam and/or sts.
Can you please try adding access to iam:* and sts:* and if it works gradually take it away.

Hello !

I have tried to add the Administrator Access to the role (not through Terraform), and I have got the same error :frowning:

INFO[10:14:02 DST] Creating worker machines…
WARN[10:14:27 DST] Task failed…
WARN[10:14:27 DST] error was: failed to deploy Machines: failed to ensure MachineDeployment: failed to create *v1alpha1.MachineDeployment: Internal error occurred: failed calling webhook “machine-controller.kubermatic.io-machinedeployments”: the server is currently unable to handle the request
WARN[10:14:32 DST] Retrying task…
INFO[10:14:32 DST] Creating worker machines…
WARN[10:14:57 DST] Task failed…
WARN[10:14:57 DST] error was: failed to deploy Machines: failed to ensure MachineDeployment: failed to create *v1alpha1.MachineDeployment: Internal error occurred: failed calling webhook “machine-controller.kubermatic.io-machinedeployments”: the server is currently unable to handle the request
WARN[10:15:07 DST] Retrying task…
INFO[10:15:07 DST] Creating worker machines…
WARN[10:15:32 DST] Task failed…
WARN[10:15:32 DST] error was: failed to deploy Machines: failed to ensure MachineDeployment: failed to create *v1alpha1.MachineDeployment: Internal error occurred: failed calling webhook “machine-controller.kubermatic.io-machinedeployments”: the server is currently unable to handle the request

Logs of the Machine Controller Webhook

W0807 08:08:12.045315 1 client_config.go:541] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0807 08:08:13.085585 1 plugin.go:95] looking for plugin “machine-controller-userdata-centos”
I0807 08:08:13.085618 1 plugin.go:123] checking “/usr/local/bin/machine-controller-userdata-centos”
I0807 08:08:13.086341 1 plugin.go:136] found ‘/usr/local/bin/machine-controller-userdata-centos’
I0807 08:08:13.086353 1 plugin.go:95] looking for plugin “machine-controller-userdata-coreos”
I0807 08:08:13.086372 1 plugin.go:123] checking “/usr/local/bin/machine-controller-userdata-coreos”
I0807 08:08:13.086403 1 plugin.go:136] found ‘/usr/local/bin/machine-controller-userdata-coreos’
I0807 08:08:13.086408 1 plugin.go:95] looking for plugin “machine-controller-userdata-ubuntu”
I0807 08:08:13.086421 1 plugin.go:123] checking “/usr/local/bin/machine-controller-userdata-ubuntu”
I0807 08:08:13.086436 1 plugin.go:136] found ‘/usr/local/bin/machine-controller-userdata-ubuntu’
I0807 08:08:13.086445 1 plugin.go:95] looking for plugin “machine-controller-userdata-sles”
I0807 08:08:13.086457 1 plugin.go:123] checking “/usr/local/bin/machine-controller-userdata-sles”
I0807 08:08:13.086472 1 plugin.go:136] found ‘/usr/local/bin/machine-controller-userdata-sles’
I0807 08:08:13.086477 1 plugin.go:95] looking for plugin “machine-controller-userdata-rhel”
I0807 08:08:13.086489 1 plugin.go:123] checking “/usr/local/bin/machine-controller-userdata-rhel”
I0807 08:08:13.086503 1 plugin.go:136] found ‘/usr/local/bin/machine-controller-userdata-rhel’

Logs of the Machine Controller

W0807 08:08:13.571646 1 client_config.go:541] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
W0807 08:08:13.572385 1 client_config.go:541] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0807 08:08:13.627558 1 leaderelection.go:235] attempting to acquire leader lease kube-system/machine-controller…
I0807 08:08:13.651748 1 leaderelection.go:245] successfully acquired lease kube-system/machine-controller
I0807 08:08:13.752792 1 shared_informer.go:176] caches populated
I0807 08:08:13.752958 1 shared_informer.go:176] caches populated
I0807 08:08:13.753426 1 reflector.go:122] Starting reflector *v1beta1.CustomResourceDefinition (5m0s) from k8s.io/client-go@v0.15.10/tools/cache/reflector.go:98
I0807 08:08:13.753479 1 reflector.go:160] Listing and watching *v1beta1.CustomResourceDefinition from k8s.io/client-go@v0.15.10/tools/cache/reflector.go:98
I0807 08:08:13.853501 1 shared_informer.go:176] caches populated
I0807 08:08:13.853537 1 migrations.go:147] CRD machines.machine.k8s.io not present, no migration needed
I0807 08:08:13.853557 1 migrations.go:53] Starting to migrate providerConfigs to providerSpecs
I0807 08:08:13.950091 1 migrations.go:135] Successfully migrated providerConfigs to providerSpecs
I0807 08:08:13.950146 1 plugin.go:95] looking for plugin “machine-controller-userdata-centos”
I0807 08:08:13.950181 1 plugin.go:123] checking “/usr/local/bin/machine-controller-userdata-centos”
I0807 08:08:13.950878 1 plugin.go:136] found ‘/usr/local/bin/machine-controller-userdata-centos’
I0807 08:08:13.950895 1 plugin.go:95] looking for plugin “machine-controller-userdata-coreos”
I0807 08:08:13.951026 1 plugin.go:123] checking “/usr/local/bin/machine-controller-userdata-coreos”
I0807 08:08:13.951055 1 plugin.go:136] found ‘/usr/local/bin/machine-controller-userdata-coreos’
I0807 08:08:13.951061 1 plugin.go:95] looking for plugin “machine-controller-userdata-ubuntu”
I0807 08:08:13.951178 1 plugin.go:123] checking “/usr/local/bin/machine-controller-userdata-ubuntu”
I0807 08:08:13.951199 1 plugin.go:136] found ‘/usr/local/bin/machine-controller-userdata-ubuntu’
I0807 08:08:13.951204 1 plugin.go:95] looking for plugin “machine-controller-userdata-sles”
I0807 08:08:13.951227 1 plugin.go:123] checking “/usr/local/bin/machine-controller-userdata-sles”
I0807 08:08:13.951346 1 plugin.go:136] found ‘/usr/local/bin/machine-controller-userdata-sles’
I0807 08:08:13.951353 1 plugin.go:95] looking for plugin “machine-controller-userdata-rhel”
I0807 08:08:13.951373 1 plugin.go:123] checking “/usr/local/bin/machine-controller-userdata-rhel”
I0807 08:08:13.951389 1 plugin.go:136] found ‘/usr/local/bin/machine-controller-userdata-rhel’
I0807 08:08:13.954253 1 reflector.go:122] Starting reflector *v1alpha1.Machine (5m0s) from k8s.io/client-go@v0.15.10/tools/cache/reflector.go:98
I0807 08:08:13.954271 1 reflector.go:160] Listing and watching *v1alpha1.Machine from k8s.io/client-go@v0.15.10/tools/cache/reflector.go:98
I0807 08:08:14.054363 1 shared_informer.go:176] caches populated
I0807 08:08:14.054824 1 reflector.go:122] Starting reflector *v1.Node (5m0s) from k8s.io/client-go@v0.15.10/tools/cache/reflector.go:98
I0807 08:08:14.054933 1 reflector.go:160] Listing and watching *v1.Node from k8s.io/client-go@v0.15.10/tools/cache/reflector.go:98
I0807 08:08:14.154933 1 shared_informer.go:176] caches populated
I0807 08:08:14.155339 1 reflector.go:122] Starting reflector *v1alpha1.MachineSet (5m0s) from k8s.io/client-go@v0.15.10/tools/cache/reflector.go:98
I0807 08:08:14.155399 1 reflector.go:160] Listing and watching *v1alpha1.MachineSet from k8s.io/client-go@v0.15.10/tools/cache/reflector.go:98
I0807 08:08:14.255496 1 shared_informer.go:176] caches populated
I0807 08:08:14.255496 1 shared_informer.go:176] caches populated
I0807 08:08:14.256046 1 reflector.go:122] Starting reflector *v1alpha1.MachineDeployment (5m0s) from k8s.io/client-go@v0.15.10/tools/cache/reflector.go:98
I0807 08:08:14.256101 1 reflector.go:160] Listing and watching *v1alpha1.MachineDeployment from k8s.io/client-go@v0.15.10/tools/cache/reflector.go:98
I0807 08:08:14.356120 1 shared_informer.go:176] caches populated
I0807 08:08:14.356411 1 shared_informer.go:176] caches populated
I0807 08:08:14.356744 1 reflector.go:122] Starting reflector *v1beta1.CertificateSigningRequest (5m0s) from k8s.io/client-go@v0.15.10/tools/cache/reflector.go:98
I0807 08:08:14.356762 1 reflector.go:160] Listing and watching *v1beta1.CertificateSigningRequest from k8s.io/client-go@v0.15.10/tools/cache/reflector.go:98
I0807 08:08:14.456629 1 shared_informer.go:176] caches populated
I0807 08:08:14.456836 1 shared_informer.go:176] caches populated
I0807 08:08:14.457005 1 main.go:438] machine controller startup complete
I0807 08:08:14.557294 1 shared_informer.go:176] caches populated
I0807 08:08:14.557555 1 node_csr_approver.go:96] Reconciling CSR csr-2bf68
I0807 08:08:14.557679 1 node_csr_approver.go:101] CSR csr-2bf68 already approved, skipping reconciling
I0807 08:08:14.557866 1 node_csr_approver.go:96] Reconciling CSR csr-nzd9w
I0807 08:08:14.557999 1 node_csr_approver.go:101] CSR csr-nzd9w already approved, skipping reconciling
I0807 08:08:14.558150 1 node_csr_approver.go:96] Reconciling CSR csr-x8c4q
I0807 08:08:14.558212 1 node_csr_approver.go:101] CSR csr-x8c4q already approved, skipping reconciling

So … is there a way to add even more log than this ? Increase “v” Level ?

I have also tried again with the example provided in the kubeone repository, and I have got the same error. So … I suspect a problem on the Machine Controller itself ?

@Wihrt which kubeone version are you using?

> kubeone version

Result of the command kubeone version

{
“kubeone”: {
“major”: “0”,
“minor”: “11”,
“gitVersion”: “0.11.2”,
“gitCommit”: “a457d799479fa761149f1fc8e6dbffb70d2cd8ee”,
“gitTreeState”: “”,
“buildDate”: “2020-05-21T16:22:12Z”,
“goVersion”: “go1.14.3”,
“compiler”: “gc”,
“platform”: “linux/amd64”
},
“machine_controller”: {
“major”: “1”,
“minor”: “11”,
“gitVersion”: “v1.11.3”,
“gitCommit”: “”,
“gitTreeState”: “”,
“buildDate”: “”,
“goVersion”: “”,
“compiler”: “”,
“platform”: “linux/amd64”
}
}

I just tried your kubeone configuration yaml + kubeone’s example aws terraform with v1.0.0-beta.3 and it worked fine. May I suggest using https://github.com/kubermatic/kubeone/releases/tag/v1.0.0-rc.1?