Monitor health of non-Prod services with Jenkins & Slack

The product I currently work on depends on 3 different vendor APIs and, as you would expect, their non-Prod environments go down fairly regularly.

I found it very frustrating that I never knew when their services came back online. So I decided to build a healthcheck endpoint which integrated with Jenkins and Slack and notified me, via Slackbot, once their services came back up. This meant I could switch back and start testing tickets again asap. When you have tight deadlines, and service regularly go down, every minute counts!

Overview

The following tutorial will ping a our health check endpoint every 15 minutes and send a message, to a Slack channel or your Slackbot, if and only if the service/endpoint’s status has changed i.e. service changed from down to up or up to down. There are 2 prerequisites to getting this solution to work for you.

  1. You will need permissions to create and configure Jenkins jobs
  2. Your instance of Jenkins will need the Slack plugin

Choose endpoint to monitor

First off, find an endpoint that you want to monitor. This tutorial works with JSON responses but you can easily tweak it for xml data.

Creating a Jenkins Job

There are 3 stages within the Jenkins job that need to be configured.

  1. Build Triggers – I used a cron job to kick of this health check every 15 minutes (I would recommend using Crontab.guru make your Cron expressions easier to read). Click the “Build periodically” checkbox and paste the below cron details into the “Schedule” textfield.
15 08-18 * * 1-5

Fyi, the above job will run “at minute 15 past every hour from 8am through 6pm on every day-of-week from Monday through Friday”.

2. Build Stage – you will want to create a simple “execute shell” job for the “Build Stage”

Our sample healthcheck endpoint returns the following JSON response…

{
    "status": "up"
}

… so our sample Shell script now looks like…

HEALTH=$(curl "https://my.sample.api.com/health/vendor_1" | jq '.status')

if [[ $HEALTH = *"UP"* ]]; then
 exit 0
else
 exit -1
fi

For anyone else who isn’t that familiar with Unix syntax, I’ve tried to explain each step in a bit more detail.

Shell script componentDescriptionResource
curl “https://my.sample.api.com/health/vendor_1Make a network call to the healthcheck endpointcURL tutorial
| jq ‘.status’Read the JSON response and return the value of “status” i.e. return the word “up”jq tutorial
jq cheatsheet
HEALTH=$( ….. )Set the return value of everything inside ( …. ) to a variable called “HEALTH”
if [[ $HEALTH = “UP” ]]; thenCheck if HEALTH equals the string “up”Unix “if statement” tutorial
exit 0“0” makes the Jenkins job pass
exit -1“-1” makes the Jenkins job fail

3. Post-build Actions – like I mentioned previously, you will need the Slack plugin for Jenkins. Now click the “Add Post-Build Action” button and select “Slack notifications” from the dropdown. Configure as follows:

Slack integration

The final piece of the puzzle is to connect Jenkins with your Slack workspace. The next few steps are a bit painful but bear with me!

Firstly, in the “Slack notifications” section of the “Post-build Actions” stage, enter the workspace and as well as the channel and/or memberId.

Next, go to your browser and login to you Slack workspace. Go to “https://your_workspace_name.slack.com/apps/manage ” (swap the workspace name for yours), search for the Jenkins CI app, click on it. Once that page loads, click the green “Add to Slack” button and follow the prompts.

Finally return back to Jenkins, navigate to the “Slack notifications” section again and click the “Test Connection” button near the bottom of the page. You should now be see a message from Jenkins appear in your nominated Slack channel or in your personal Slackbot.

Slackbot setup

If you would prefer to receive messages directly to your Slackbot you should add your memberId to the “Channel / memberId” textfield within the “Slack notifications” section. To find your memberId, go to “Profile & account” -> click on vertical three dot button -> your memberId should now be displayed

Unexpected benefits

This simple Slack/Jenkins solution brought huge benefits to everyone on our team, not just the Devs and Testers. We were now able to build reports (in Splunk) off the back of these Jenkins build results so our Delivery team could now confidently communicate outages and downtime with senior management and our Vendors.

Uncertainty and decision making

I recently discovered the Farnham Street website and something about the podcast with Annie Duke really struck a chord with me. She’s an ex-professional poker player who now gives talks on decision making. It got me thinking about how I my own decisions and how I can make better ones.

Difficult decisions are always based on missing information. If you knew an outcome was guaranteed, there isn’t really a decision to be made. I think we should reframe how we approach discussions about complex decisions so it’s front and centre in people’s mind; we’re dealing with imperfect knowledge.

Instead of thinking of decisions (and their outcomes) in terms of good and bad, we should talk about them in the context of being better and worse. Good/bad decisions imply certainty, which is very rarely the case. Changing the conversation, so people talk about decisions as being better or worse, helps clarify in people’s minds that we’re dealing with a lack of information which will invariably result in compromise or trade offs.

The real world is full of uncertainty and our discussions should accurately reflect this reality. Peddling in certainty is something best left to religion and sales people!

The “403 Forbidden” card game

There is a subtle but very important difference between naming something and explaining something.

I’m sure you’ve heard people use the phrase “technical work” in lots of different situations but have you ever stopped to question what it really means? Could you explain “technical work” to someone else in plain, simple language? Could you do it without using the words “technical” or “work”? Now how many of these familiar-sounding terms do you think could you explain to a friend in 90 seconds?

Continue reading

Naming URL and JSON components

These are some simple visuals I quickly threw together. I hope they will help to improve conversations which will lead to better-informed decisions. In IT we have a habit of using lots of names for exactly the same thing, which frequently leads to longer and more confusing conversations! I hope the visual breakdown of URLs and JSON objects into their constituent parts will be of benefit to newbies and non-techies alike.

Continue reading

Creating test data with Calabash

On returning to my old iOS team after a brief stint working on our services team, I wanted to implement what I had learned there and improved the stability of our Calabash scenarios. I wanted to shift away from our dependence on test data stored in config files and databases and create test data at the same time as executing our Calabash scenarios.

The whole process was surprisingly simple and straightforward. We used the “Rest-Client” gem and then called some simple data manager classes by using cucumber’s built-in hooks.

Continue reading

Annotating Test notes

I’m lucky in my current role to the independence to test how I see fit. I don’t have somebody forcing me to use a tool that doesn’t work for me. After weeks of plaguing my boss to pay for a JIRA Capture licence, I eventually got my way!! I have used JIRA Capture Test Sessions briefly at a previous company and found it to be a simple but very effective tool for capturing testing notes.

Continue reading