Advanced Configuration
Docker login for private registry
If you need to use a docker registry with authentication, then you will need to create a specific kubernetes secret with this information. Then you will configure the CRD with the secret name, so that it provides the data to each Statefulset, which in turn propagate it to each created Pod.
Create the secret :
kubectl create secret docker-registry yoursecretname \
--docker-server=yourdockerregistry
--docker-username=yourlogin \
--docker-password=yourpass \
--docker-email=yourloginemail
Then we will add a imagePullSecrets parameter in the CRD definition with value the name of the previously created secret. You can give several secrets :
imagePullSecrets:
name: yoursecretname
Management of allowed Cassandra nodes disruption
CassKop makes use of the kubernetes PodDisruptionBudget objetc to specify how many cassandra nodes disruption is allowed on the cluster. By default, we only tolerate 1 disrupted pod at a time and will prevent to makes actions if there is aloready an ongling disruption on the cluster.
In some edge cases it can be useful to make force the operator to continue it's actions even if there is already a
disruption ongoing. We can tune this by updating the spec.maxPodUnavailable
parameter of the cassandracluster CRD.
it is recommended to not touch this parameter unless you know what you are doing.
Cross Ip Management
Global mecanism
Cassandra works on IPs and not on hostname, so in a case where two cassandra cross their Ips, no Cassandra will be able to run properly and will loop on the following error :
cassandra Exception (java.lang.RuntimeException) encountered during startup: A node with address /10.100.150.35 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.
cassandra java.lang.RuntimeException: A node with address /10.100.150.35 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.
cassandra at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:577)
cassandra at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:823)
cassandra at org.apache.cassandra.service.StorageService.initServer(StorageService.java:683)
cassandra at org.apache.cassandra.service.StorageService.initServer(StorageService.java:632)
cassandra at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388)
cassandra at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620)
cassandra at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732)
cassandra ERROR [main] 2020-02-21 08:29:44,398 CassandraDaemon.java:749 - Exception encountered during startup
cassandra java.lang.RuntimeException: A node with address /10.100.150.35 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.
cassandra at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:577) ~[apache-cassandra-3.11.4.jar:3.11.4]
cassandra at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:823) ~[apache-cassandra-3.11.4.jar:3.11.4]
cassandra at org.apache.cassandra.service.StorageService.initServer(StorageService.java:683) ~[apache-cassandra-3.11.4.jar:3.11.4]
cassandra at org.apache.cassandra.service.StorageService.initServer(StorageService.java:632) ~[apache-cassandra-3.11.4.jar:3.11.4]
cassandra at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388) [apache-cassandra-3.11.4.jar:3.11.4]
cassandra at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620) [apache-cassandra-3.11.4.jar:3.11.4]
cassandra at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732) [apache-cassandra-3.11.4.jar:3.11.4]
Following the issue #170, at least using Kubernetes and Project Calico, we may fall into this issue, for example using a fixed ip pool size.
To manage this case we introduced the restartCountBeforePodDeletion
CassandraCluster spec field which takes an int32
as value.
If you set it with a value lower or equals to 0, or if you omit it, no action will be performed
In setting this field, the cassandra operator will check for each CassandraCluster
if a pod is in a restart situation (based on restart count of the cassandra container inside the pod).
In the case where the restartCount of the pod is greater than the value of restartCountBeforePodDeletion
field and if we are in a Ip cross situation, we will delete the pod, which will be recreated by the Statefulset.
In the case of Project Calico usage, this force the pod to get another available IP, which fixes our bug.
Ip Cross situation detection
To detect that we are in a Ip cross situation, we add a new status field CassandraNodeStatus
which will maintain a cache about the map of Ip node and his hostId,
for all ready pods.
to have more information about this status field, you can check CassandraCluster Status
So when we check pods, we perform a Jolokia call to get a map of the cluster nodes IPs with their corresponding HostId.
If a pod is failing with the constraints described above, we compare the hostId associated to the Pod's IP, and the hostId
associated to the Pod name stored into the CassandraNodeStatus
:
- if they match, or there are no match for the pod Ip into the map returned by Jolokia, we are not in a Ip cross situation,
- if they mismatch, we are in a Ip cross situation.