What to do if certificates are rotten and the cluster turns into a pumpkin?

If in response to the kubectl get pod



command you get:



 Unable to connect to the server: x509: certificate has expired or is not yet valid
      
      





then, most likely, a year has passed, your kubernetes certificates expired, the cluster components stopped using them, the interaction between them stopped and your cluster turned into a pumpkin.



image



What to do and how to restore a cluster?



First, we need to understand where the certificates that need to be updated are located.



Depending on the way the cluster was installed, the location and name of the certificate files may vary. So, for example, when creating a cluster, Kubeadm decomposes certificate files according to best-practices . Thus, all certificates are located in the /etc/kuberenetes/pki



, in files with the extension .crt



, private keys, respectively, in the .key



files. Plus in /etc/kubernetes/



are .conf



files with access configuration for user accounts administrator, controller manager, sheduler and kubelet from the master node. Certificates in .conf



files are in the user.client-certificate-data field in base64-encoded form.



You can look at the expiration date to whom it was issued and by whom the certificate was signed using this small shcert script



shcert
 #!/bin/bash [ -f "$1" ] || exit if [[ $1 =~ \.(crt|pem)$ ]]; then openssl x509 -in "$1" -text -noout fi if [[ $1 =~ \.conf$ ]]; then certfile=$(mktemp) grep 'client-certificate-data:' "$1"| awk '{ print $2}' | base64 -d > "$certfile" openssl x509 -in "$certfile" -text -noout rm -f "$certfile" fi
      
      







There are still certificates that use kubelet on work nodes for authentication in the API. If you used kubeadm join to add nodes to the cluster, then most likely the node was connected using the TLS bootstrapping procedure and in this case kubelet can renew its certificate automatically if it is given the --rotate-certificates



option. In recent versions of kubernetes, this option is already enabled by default.

Checking that the node is connected using the TLS bootstrap procedure is quite simple - in this case, the /etc/kubernetes/kubelet.conf



file is usually specified in the client-certificate field in the /var/lib/kubelet/pki/kubelet-client-current.pem



file which is a symlink to the current certificate.



You can also see the expiration dates of this certificate using the shcert



script



We return to the problem of renewing certificates.



If you installed the cluster using kubeadm, then I have good news for you. Starting with version 1.15, kubeadm can update almost all control plane certificates with one command



 kubeadm alpha certs renew all
      
      





This command will renew all certificates in the / etc / kubernetes directory, even if they have already expired and everything has broken.



Only the kubelet certificate will not be updated - this is the one that lies in the /etc/kubernetes/kubelet.conf



file!

To renew this certificate, use the create user account command



 kubeadm alpha kubeconfig user --client-name system:node:kube.slurm.io --org system:nodes > /etc/kubernetes/kubelet.conf
      
      





If the system has a user account, this command updates the certificate for this account. Do not forget to specify the correct host name in the --client-name



option, you can --client-name



host name in the Subject field of an existing certificate:



 shcert /etc/kubernetes/kubelet.conf
      
      





And of course, after updating the certificates, you need to restart all components of the control plane, rebooting the entire node or stopping the containers with etcd, api, controller-manager and scheduler with the docker stop



, and then restarting kubelet systemctl restart kubelet



.



If your cluster is an old version: 1.13 or less, it simply will not work to upgrade kubeadm to 1.15, since it pulls along the dependencies kubelet and kubernetes-cni, which can cause problems, since the performance of cluster components differing in versions by more than one stage, not guaranteed. The easiest way out of this situation is to install kubeadm on some other machine, take the binary file /usr/bin/kubeadm



, copy it to the master nodes of the deceased cluster and use it only to renew certificates. And after the cluster has been revitalized, update it step by step using regular methods, installing kubeadm one version newer each time.



And finally, from version 1.15 kubeadm learned how to renew all-all certificates when updating a cluster with the kubeadm upgrade



command. So if you regularly update your cluster at least once a year, your certificates will always be valid.



But if the cluster is not installed using kubeadm, then you will have to pick up openssl and renew all the certificates individually.



The problem is that the certificates contain extended fields, and different cluster installation tools can add their own set of fields. Moreover, the names of these fields in the openssl configuration and in the output of the certificate contents are correlated, but weakly. It is necessary to google and select.



I will give an example configuration for openssl, in separate sections of which extended attributes are described, specific for each type of certificate. We will refer to the corresponding section when creating and signing csr. This configuration was used to revitalize the cluster established a year ago by the rancher.



openssl.cnf
 [req] distinguished_name = req_distinguished_name req_extensions = v3_req [v3_req] keyUsage = nonRepudiation, digitalSignature, keyEncipherment extendedKeyUsage = clientAuth [client] keyUsage = critical,digitalSignature, keyEncipherment extendedKeyUsage = clientAuth [apiproxyclient] keyUsage = critical,digitalSignature, keyEncipherment extendedKeyUsage = clientAuth, serverAuth [etcd] keyUsage = critical,digitalSignature, keyEncipherment extendedKeyUsage = clientAuth, serverAuth subjectAltName = @alt_names [api] keyUsage = critical,digitalSignature, keyEncipherment extendedKeyUsage = clientAuth, serverAuth subjectAltName = @alt_names [alt_names] DNS.1 = ec2-us-east-1-1a-c1-master-2 DNS.2 = ec2-us-east-1-1a-c1-master-3 DNS.3 = ec2-us-east-1-1a-c1-master-1 DNS.4 = localhost DNS.5 = kubernetes DNS.6 = kubernetes.default DNS.7 = kubernetes.default.svc DNS.8 = kubernetes.default.svc.cluster.local IP.1 = 10.0.0.109 IP.2 = 10.0.0.159 IP.3 = 10.0.0.236 IP.4 = 127.0.0.1 IP.5 = 10.43.0.1
      
      







Actual attributes and additional names in the certificate can be viewed using the command



 openssl x509 -in cert.crt -text
      
      





When renewing the certificate for the server API, I had a problem: the updated certificate did not work. The solution was to issue a certificate that was valid for 1 year in the past.



In openssl, you cannot issue a certificate valid in the past with a simple command, the code strictly states that the certificate is valid only from the current moment. But you can locally go back in time using the libfaketime library



 yum install libfaketime LD_PRELOAD=/usr/lib64/faketime/libfaketime.so.1 FAKETIME="-365d" openssl x509 -req ...
      
      





We issue extended certificates according to the following algorithm:



We create a CSR using an existing certificate, specify the desired section with a list of advanced attributes in the configuration file:



 openssl x509 -x509toreq -in "node.cert" -out "node.csr" -signkey "node.key" -extfile "openssl.cnf" -extensions client
      
      





We sign it with the corresponding root certificate, shifting the time by 1 year ago and specifying the desired section with a list of advanced attributes in the configuration file



 LD_PRELOAD=/usr/lib64/faketime/libfaketime.so.1 FAKETIME="-365d" openssl x509 -req -days 36500 -in "node.csr" -CA "kube-ca.pem" -CAkey "kube-ca-key.pem" -CAcreateserial -out "node.new.cert" -extfile "openssl.cnf" -extensions client
      
      





We check the attributes and restart the components of the control plane.



Sergey Bondarev,

Slurm teacher

slurm.io



All Articles