Confirm root cause with service and host system metrics

Drilling into the problematic service confirms the issue.  The port check failed for the Redis service instance immediately after memory consumption of the service instance shot up.  The memory consumption spike is because of the increases in commands processed:


Going back to the service metrics page, it can be observed that along with the spikes in commands processed [1] and memory consumed [2], the bgsave_in_progress metric [3] came down to 0.  This means there was no memory available to run the Redis bgsave command, which allows the db to be saved in the background. If this situation persists for long enough than redis service instance will certainly crash. That’s indeed our root cause in this example:

Additional metrics can be added for analysis as needed: