I recently built a recurring report that required matching a bunch of data from an external API to data from a local database, and I wanted to show you a good way to do that.

Say your external API returns data that looks something like this:

%{ "transactions" => [ %{"amount" => 1948, "id" => "8195dd0b", "user_id" => 1}, %{"amount" => 2048, "id" => "04226e26", "user_id" => 4}, %{"amount" => 482, "id" => "24bc99b8", "user_id" => 2}, %{"amount" => 2304, "id" => "cc86e26b", "user_id" => 1}, %{"amount" => 394, "id" => "f58dbc79", "user_id" => 3}, %{"amount" => 4028, "id" => "562ef532", "user_id" => 4}, ] } 

Let’s say you need to create a report with the user’s full name with the amount for each transaction, and the users in your database look something like this:

[ %User{id: 1, first_name: "Harry", last_name: "Potter"}, %User{id: 2, first_name: "Ginny", last_name: "Weasley"}, %User{id: 3, first_name: "Ron", last_name: "Weasley"}, %User{id: 4, first_name: "Hermione", last_name: "Granger"} ] 

Get the user from the database

In order to make this work, your code might look something like this:

Enum.map(transactions, fn txn -> user = Repo.get(User, txn["user_id"]) %{ amount: txn["amount"], user: "#{user.first_name} #{user.last_name}" } end) 

This would work, but you’re going to be running a database query for every transaction that gets returned. That’s not a huge deal if there are only the 6 transactions from the example, but if you’ve got 10,000 transactions in the list your database won’t be too happy.

Load all the users

Instead you could load all the users into memory and then find the matching user locally with code that looks like this:

users = Repo.all(User) Enum.map(transactions, fn txn -> user = Enum.find(users, &(&1.id == txn["user_id"])) %{ amount: txn["amount"], user: "#{user.first_name} #{user.last_name}" } end) 

The code looks simple, but it’s run time is quadratic (n²) in terms of big O complexity. Enum.map/2 will run once for every transaction in the list, and Enum.find/2 will run through all the users until it finds the one that matches for every transaction.

When the users are in a map, you can perform the user match in constant time, which means that the function as a whole will run in linear time.

Selecting database records to a map

If you ever provide a map to Enum.map/2 you’ll notice that the value you receive looks like this:

[{"key", "value"}, {"key2", "value2"}] 

You can convert that list of tuples back into a map using Enum.into/2:

Enum.into([{"key", "value"}, {"key2", "value2"}], %{}) # %{"key" => "value", "key2" => "value2"} 

Ecto’s select macro will allow you to get records from the database formatted in whatever way you choose, whether that be a list of structs (the default), a list of custom maps, a list of strings, a list of lists, or (our choice today) a list of tuples.

Because you want to search the user records by their ID, you’ll want a map with user IDs as keys and user records as values. In order to do that, select your records into a list of tuples and then convert that list into a map:

users = User |> select([u], {u.id, u}) |> Repo.all() |> Enum.into(%{}) 

So now your function for matching transactions with users will look like this:

Enum.map(transactions, fn txn -> user = Map.get(users, txn["user_id"]) %{ amount: txn["amount"], user: "#{user.first_name} #{user.last_name}" } end) 

While this won’t make much difference when working with small data sets, it’s good to keep this trick in mind to keep your function from becoming a bottleneck when your lists get a lot longer.

Talk to me

As always, let me know what you think of this post. Do you know a better way to do this? Would you like to see more tips like this? Let me know on Twitter!