Troubleshooting install issues
If the installation fails, please check the information in the following sections.
No pods or pods not in Running state
The status sub-section of the Cloudflow custom resource contains any errors encountered during the process. Use the following kubectl
command to view the current status.
kubectl get cloudflows.cloudflow-enterprise-installer.lightbend.com -n cloudflow -o jsonpath='{ range .items[*]}{"Cloudflow status: "}{.status.status}{"\nMessage: "}{.status.message}{"\n"}{end}'
If the status is the following, then the installer has completed successfully:
Cloudflow status: Installed Message:
Any other status indicates a problem during installation. Please contact Lightbend support and supply the whole Cloudflow custom resource object and the log from the installer.
Use the following command to extract the Cloudflow custom resource as YAML.
kubectl get cloudflows.cloudflow-enterprise-installer.lightbend.com -n cloudflow -o yaml
Use the following command to extract the installer log:
kubectl logs -n cloudflow-enterprise-installer deployment/cloudflow-enterprise-installer-deployment
Failure to login to the cluster
If kubectl
is not currently connected to a running cluster or the cluster cannot be connected to, the script will print the following error message:
Unable to access the Kubernetes cluster using `kubectl`. Please check the cluster connection.
Please refer to your cluster vendor documentation for creating a kubectl
login configuration.
No verified storage classes
If all storage classes found are marked Unknown
by the installation bootstrap script, you will have to verify that the storage class you want to choose supports the requested access mode.
There is no complete central list of all possible storage classes or what access mode it supports. The following Kubernetes documentation page lists the most common provisioners and their supported access modes:
Spark operator pod stuck in ContainerCreating
state
Sometimes the Spark operator pod might fail to mount a volume it needs to configure a webhook it uses.
This has the effect that the pod get stuck in ContainerCreating
state for an extended period of time.
You can see the following error event in the namespace event log using kubectl get events -n cloudflow
MountVolume.SetUp failed for volume "webhook-certs" : secrets "spark-webhook-certs" not found
The problem is caused by a rare race condition and can be solved by removing the Cloudflow CR and running the installation bootstrap script again.
kubectl delete cloudflow -n cloudflow default
A Cloudflow Spark streamlet pod is stuck in Pending
state after deployment
If you find a Spark pod stuck in a pending state after deployment, it may indicate that you specified a storage class that does not support the ReadWriteMany
access mode.
> kubectl cloudflow status call-record-aggregator
Name: call-record-aggregator
Namespace: call-record-aggregator
Version: 31-d8f6ad3
Created: 2019-12-18 14:29:40 +0100 CET
Status: Pending
STREAMLET POD STATUS RESTARTS READY
cdr-generator2 call-record-aggregator-cdr-generator2-driver Unknown 0
cdr-generator1 call-record-aggregator-cdr-generator1-driver Unknown 0
cdr-aggregator call-record-aggregator-cdr-aggregator-driver Unknown 0
error-egress call-record-aggregator-error-egress-5fc5477c57-hxmjq Running 0 True
cdr-validator call-record-aggregator-cdr-validator-fcbb78c6f-ljft4 Running 0 True
cdr-ingress call-record-aggregator-cdr-ingress-6b5f5b8ddf-zmh6n Running 0 True
console-egress call-record-aggregator-console-egress-85d789bb49-67fwz Running 0 True
merge call-record-aggregator-merge-84c88989c7-s2rf5 Running 0 True
You can verify this by looking at the PVC in the application namespace:
> kubectl get pvc -n call-record-aggregator
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
call-record-aggregator-pvc Pending standard 4m22s
Here we can see the volume being stuck in Pending
, preventing the Spark application from starting.
To fix this problem, run the bootstrap script again and select a storage class that supports the ReadWriteMany
mode.