Building a reminder bot with Elixir

Home

Building a reminder bot with Elixir

Introduction

It is Friday and our sprint has ended - not how we wanted it to end - but it ended. We have some hours left before we close our laptops and get ready to enjoy our weekend and it is time for our retro. While discussing things about the sprint, we realised that our pull requests were making us waste a part of our productive time.

We are a startup and our architecture is based on microservices, which means that we have to work on different git repositories and having multiple pull requests spread across those repositories.

The result was us forgetting which PRs were open, what had to be reviewed, going back and forth to review, apply review changes and so on. The problem was more visible when that happened in the ending of our sprint, where we had to review the PRs, merge, test on staging and deploy.

Elixir to the rescue!

We recognized the problem but we didn't make any plans about any possible solutions, so we moved on.

The weekend was almost there, i didn't have any plans so i started wondering if it was possible to automate this pull request thing by building a side project for fun.

I'm fan of Elixir and i wanted to find a use case where i could use Supervisors, GenServers and OTP and BEAM in general so i can understand them better, since i like Erlang/Elixir's philosophy and architecture. I decided to go with the Elixir route and that i will try to limit my third party library usage, especially for the scheduling part which will be discussed later.

Architecture overview

Our repositories are hosted on GitHub and we use Slack as a communication tool. This means that we have to somehow gather information from GitHub regarding our pull requests, make some modifications and send it to a specified slack channel. Both companies provide a usable API so we can communicate with both parts. The important part of the application is the scheduling part of it's actions.

When should it query the pull requests?
When should it forward them to the slack channel?
Should it forward everything it queries?
How can we persist knowledge about the queried results and use them for something else than forwarding them to slack?

These are the questions which i will try to answer by explaining the structure of the bot that i have built and provide a reasoning for my choice.

Dependencies

Since i wanted to keep something with "plain" Elixir, the dependencies were no that many. The dependencies used are the following:

Jason - for json serialization
Finch - http client
Mox - for mocking purposes in tests

GitHub API

The first step for the bot is to fetch the open pull requests using the Github API.

In order to do so, i had to create a GitHub App. This would allow the bot to have access to the API using it's own access token I should mention that GitHub provides the functionality to use refresh tokens which i would have to renew each time.
I wanted to keep the bot simple and there was not a real need for refreshing tokens, so i disabled this option using the web dashboard.

*** The procedure to generate a token is here. The generation of the token is not part of the bot's architecture and implementation, so i choose to skip it.

Part 1: PRWorker

The general idea is that for a specific repository you use your access token and make a get request to the API. The application can handle multiple repositories, but the concept is the same.

In order to do this i created 2 modules, PrWorker and Http. PrWorker is a GenServer responsible to handle everything that is associated with pull requests, so in our case just fetching them from GitHub and Http is the http module which abstracts the http calls, using Finch under the hood.

Since PrWorker is a GenServer, it has to be under a supervisor. A first thought could be that i can use the root Supervisor of the application, but since i will add eventually more "workers" it will be better to use a different Supervisor which handles all similar (worker) GenServers.

So i ended up with this:

						
defmodule Chronbot.Supervisor do
	use Supervisor

	def start_link(opts) do
		Supervisor.start_link(__MODULE__, :ok, opts)
	end

	@impl true
	def init(:ok) do
		children = [
			{Finch, name: MyFinch},
			{Chronbot.Supervisors.WorkerSupervisor, name: Chronbot.Supervisors.WorkerSupervisor},
			# rest genservers we will add later
		]

		Supervisor.init(children, strategy: :one_for_one)
	end
end

						
defmodule Chronbot.Supervisors.WorkerSupervisor do
	use Supervisor
	require Logger
	alias Chronbot.Supervisors.SupervisorHelper
	@module __MODULE__

	def start_link(opts) do
		Supervisor.start_link(@module, :ok, opts)
	end

	@impl true
	def init(:ok) do
		children = [
			{Chronbot.Workers.PrWorker, name: Chronbot.Workers.PrWorker}
		]
		worker_names = SupervisorHelper.do_get_worker_names(children)
		Logger.info("Iniating Workers: #{worker_names}")
		Supervisor.init(children, strategy: :one_for_one)
	end
end

This allowed me to seperate the concerns in the application.

The Workers Supervisor, deals only with workers so it's responsible to restart them if they fail and the root Supervisor handles the Worker Supervisor itself. This keeps the root Supervisor clean and our Supervision tree more structured.
Now that the worker's supervision is ready, i can make the calls to the API. I chose to do it initially when the GenServer starts.

						
@impl true
def init(:ok) do
	state = %{
		"repos" => do_build_repo_pairs(),  # which repositories we will check for
		"checkup_period" => Application.fetch_env!(:chronbot, :checkup_period) # how often should we check
	}
	Process.send(self(), :list, [])  # IMPORTANT
	{:ok, state}
end

The next block is a vital part of the flow

						
Process.send(self(), :list, [])

For people not familiar with Erlang and Elixir, it should be mentioned that processes (which is what GenServers are) communicate only through messages.
In this case the processs sends a message to itself which will be caught by implementing one of GenServer callbacks, handle_info in this case.

					
	@impl true
	def handle_info(:list, state) do
		Map.get(state, "repos")
		|> Stream.map(fn repository_tuple -> 
			Task.async(
				fn -> 
					do_fetch_pull_requests(repository_tuple) 
					|> do_handle_fetch_response()
				end) 
		end)
		|> Enum.to_list()
		|> Task.await_many()
		|> Enum.each(fn items -> PRQueue.push(items, :pr) end)

		Process.send_after(self(), :list, check_again_in * 1000, [])

		{:noreply, state}
	end

The code above will :

make a request to the GitHub API to fetch the open pull requests
map the results to a struct
and if the response is OK will push them to the PRQueue, which we will se in the next part.

After that a simple yet magic thing happens, which allows the app to schedule tasks without the need of a third party library like most languages do. Since GenServers are processes, it schedules a message to itself using

&Process.send_after/4

This means that the function will run again at the specified time without making the developer worry about how to schedule it. OTP and BEAM magic!

One important thing to note is that, if the GitHub responds with a message that indicates that the token is invalid, then the application exits gracefully using :init.stop(). The reason for this decision is because i consider it pointless for the application to try again, since there will be never a successful response.

Part 2: Queue & Database

The next step is to decide how the fetched pull requests should be handled. One route is to forward the created structs immediately to the slack channel. The other route - which was the one i chose - is to push the records to a queue, and let the pushing to the queue and pulling from the queue happen asynchronously in different time intervals.
The queue stores only pull request structs, so i called it PRQueue.

This again allows the application to have it's concerns decoupled. One thing that the application has to do (which is not implemented yet) is to collect statistics about the pull requests (how many we closed in the sprint, average open days of each PR etc). This means that i had to persist some information about the pull requests and the first logical thought is to use a database, for example PostgreSQL, MySQL Sqlite and so on.

But this is a fun side project, nothing serious and those options seemed an overkill. So that's why i went with Mnesia. To be honest, i was impressed when i learnt that Erlang has it's own built in database. Same with it's built in temporary storage: ets.

So now, we have a new GenServer which works as a Queue, where you push and get pull requests. It is also responsible to add things to the database under the hood.
But before adding the records to the database, i had to figure out how to deal with the state of the queue. The fetching happens continously in some configurable time interval, which means that we can have the same pull request objects in the response from the API multiple times. This turned out to be easily fixed since each object from the API has an ID, making them easily filterable.

The pushing to the queue and thus the database, is implemented in this way:

					
	@impl true
	def handle_cast({items, :pr}, state) do
		new_state =
			synchronize_new_pull_requests(items, state)
			|> DB.add_pull_requests()
			|> do_handle_db_add_result()

		{:noreply, new_state}
	end

	defp synchronize_new_pull_requests(new_pull_requests, current_prs) do
		new_pr_ids = Enum.map(new_pull_requests, fn x -> x.id end)
		# remove pull requests  that are in the new pull request batch, so we always have the latest
		filtered_current_pull_requests =
			Enum.filter(current_prs, fn current_pull_request ->
				!Enum.member?(new_pr_ids, current_pull_request.id)
			end)

		new_pull_requests ++ filtered_current_pull_requests
	end

Now i know that there are no duplicated records in the queue and that they are also stored the database. This architecture also allowed me to deal with failures in different parts of the pipeline effectively, because i can isolate the part which causes a problem and fix it individually.

Part 3: Forwarding to Slack

The records are fetched, stored and now a post has to be sent to the Slack channel which is a reminder that "this PR is open and someone has to review it". Like with the GitHub API, i had to create an app for Slack so i coudld use it's API and then install it to my organization's workspace, which again i won't explain here. Here is the needed documentation.

So i created one more GenServer, the SlackManager and added it under the ManagerSupervisor. ManagerSupervisor is the Supervisor that also has as a child the CredentialsManager, which i used to store the API keys and tokens. This means that the Http module uses CredentialsManager to get the token it needs when it is accessing GitHub or Slack API.

SlackManager is the part of the system which connects the application with Slack, so every interaction with Slack, will be done through this GenServer. Since now the application just posts messages about open pull requests, it has only 1 method to do it.

So now, when the SlackManager GenServer starts it sends a message to it self

						
		Process.send_after(self(), :check, 2000, [])

which will trigger a function that pulls everything from the queue.

						
	@impl true
	def handle_info(:check, state) do
		if should_forward?() do
			PRQueue.get_all()
			|> Stream.filter(fn pull_request ->
				{:ok, created_at, _} = DateTime.from_iso8601(pull_request.created_at)
				diff_in_seconds = DateTime.diff(DateTime.now!("Etc/UTC"), created_at)
				# post a reminder only for prs that are open for more than 2 days
				diff_in_seconds > 86400
			end)
			|> Stream.map(fn pull_request -> Task.async(fn -> post_pr_reminder(pull_request) end) end)
			|> Enum.to_list()
			|> Task.await_many()
		end

		# remind only 3 times each day MAXIMUM
		Process.send_after(self(), :check, 3600 * 3 * 1000, [])

		{:noreply, state}
	end

Before pulling the records from the queue it checks if it should do it, which means that if the time that the function runs, is in our work hours

						
	defp should_forward?() do
		is_work_time?() && is_work_day?()
	end

	defp is_work_day?() do
		# 6 = Saturday
		# 7 = Sunday
		# The messages should be sent only on Mon/Friday
		DateTime.now!("Etc/UTC") |> Date.day_of_week() < 6
	end

	defp is_work_time?() do
		now = Time.utc_now()
		work_hour_start = Time.new!(7, 0, 0)
		work_hour_end = Time.new!(15, 0, 0)

		Time.diff(now, work_hour_start) > 0 && Time.diff(now, work_hour_end) < 0
	end

If it's indeed between our work hours, it will pull everything from the queue (which will empty it) and then run &post_pr_reminder/1 whose job is to build the message (using an EEx template, which allows us to create a markdown message that Slack can consume) and then send it using an HTTP call.
After forwarding the message to the slack channel, it uses Process.send_after/4 to schedule the next call of the function.

In order to prevent the bot from spamming the channel, i used the following configuration:

Fetch pull requests: Every 20 minutes
Post a message to the channel: Every 3H, which means that in our 8H working hours, it should send it 2 times.

The result is this:

Architecture

Wrapping up

This was the end of the article and i would like to express my opinion on this small adventure. Firstly, i would like to say that i'm sure that there are proper and effective measures to fix the problem that we had and obviously they are much better than my fun project.

This small bot was created mostly for me, so i can have a use case which allows me to explore better the Erlang/Elixir OTP ecosystem and i can say that indeed i learnt more about it, especially using Mnesia and Supervision trees.

I created a bot with scheduling, http communication, message creation in markdown using mostly the standard library and this in my opiinion displays how powerful languages Erlang and Elixir are.

Maybe i will enrich the bot with more features, maybe i will forget about it and move to the next side project.
Whatever happens though, i know that i improved my skills in Elixir/Erlang and that helped me to advance my knowledge about distributed and concurrent systems in general.
It was even more pleasant for me, because i did with Elixir.

Thanks for reading!