Troubleshooting install issues

If the installation fails, please check the information in the following sections.

No pods or pods not in Running state

The status sub-section of the Cloudflow custom resource contains any errors encountered during the process. Use the following kubectl command to view the current status.

kubectl get -n cloudflow -o jsonpath='{ range .items[*]}{"Cloudflow status: "}{.status.status}{"\nMessage: "}{.status.message}{"\n"}{end}'

If the status is the following, then the installer has completed successfully:

Cloudflow status: Installed

Any other status indicates a problem during installation. Please contact Lightbend support and supply the whole Cloudflow custom resource object and the log from the installer.

Use the following command to extract the Cloudflow custom resource as YAML.

kubectl get -n cloudflow -o yaml

Use the following command to extract the installer log:

kubectl logs -n cloudflow-enterprise-installer deployment/cloudflow-enterprise-installer-deployment

Failure to login to the cluster

If kubectl is not currently connected to a running cluster or the cluster cannot be connected to, the script will print the following error message:

Unable to access the Kubernetes cluster using `kubectl`. Please check the cluster connection.

Please refer to your cluster vendor documentation for creating a kubectl login configuration.

No verified storage classes

If all storage classes found are marked Unknown by the installation bootstrap script, you will have to verify that the storage class you want to choose supports the requested access mode.

There is no complete central list of all possible storage classes or what access mode it supports. The following Kubernetes documentation page lists the most common provisioners and their supported access modes:

Spark operator pod stuck in ContainerCreating state

Sometimes the Spark operator pod might fail to mount a volume it needs to configure a webhook it uses.

This has the effect that the pod get stuck in ContainerCreating state for an extended period of time.

You can see the following error event in the namespace event log using kubectl get events -n cloudflow

MountVolume.SetUp failed for volume "webhook-certs" : secrets "spark-webhook-certs" not found

The problem is caused by a rare race condition and can be solved by removing the Cloudflow CR and running the installation bootstrap script again.

kubectl delete cloudflow -n cloudflow default

A Cloudflow Spark streamlet pod is stuck in Pending state after deployment

If you find a Spark pod stuck in a pending state after deployment, it may indicate that you specified a storage class that does not support the ReadWriteMany access mode.

> kubectl cloudflow status call-record-aggregator
Name:             call-record-aggregator
Namespace:        call-record-aggregator
Version:          31-d8f6ad3
Created:          2019-12-18 14:29:40 +0100 CET
Status:           Pending

STREAMLET         POD                                                    STATUS            RESTARTS          READY
cdr-generator2    call-record-aggregator-cdr-generator2-driver           Unknown           0
cdr-generator1    call-record-aggregator-cdr-generator1-driver           Unknown           0
cdr-aggregator    call-record-aggregator-cdr-aggregator-driver           Unknown           0
error-egress      call-record-aggregator-error-egress-5fc5477c57-hxmjq   Running           0                 True
cdr-validator     call-record-aggregator-cdr-validator-fcbb78c6f-ljft4   Running           0                 True
cdr-ingress       call-record-aggregator-cdr-ingress-6b5f5b8ddf-zmh6n    Running           0                 True
console-egress    call-record-aggregator-console-egress-85d789bb49-67fwz Running           0                 True
merge             call-record-aggregator-merge-84c88989c7-s2rf5          Running           0                 True

You can verify this by looking at the PVC in the application namespace:

> kubectl get pvc -n call-record-aggregator
call-record-aggregator-pvc   Pending                                      standard       4m22s

Here we can see the volume being stuck in Pending, preventing the Spark application from starting.

To fix this problem, run the bootstrap script again and select a storage class that supports the ReadWriteMany mode.