Elixir Trickery: Cheating on Structs, And Why It Pays Off

While we can’t say cheating on anyone is okay, we’re not as absolutistic when it comes to cheating on Elixir at times.

Structs are there for a reason (we’ll start from a brief overview), and that’s certainly not for us to cheat on them. But we can if we have to – and we’ll sometimes even justify that and get away with it!

Today’s article will come in handy especially for those who are interested in developing libraries for Elixir and making them usable across different dependency versions, which is always a problem when writing code intended to be pluggable into different applications.

Welcome to Elixir Trickery, a series of articles telling stories about utilizing little-known language features, applying out-of-the-box thinking to programming, and going inventive and creative in your coding.

Introduction to Structs

What is a struct? According to Elixir’s Getting Started tutorialStructs are extensions built on top of maps that provide compile-time checks and default values. So there are maps, which is one of Elixir’s basic data structures, providing a means to store key-value pairs. So just to recap, or to show it off:

# Defining a map
> map = %{:key => :value}
%{key: :value} # alternative syntax when a key is an Atom

# Retrieving a value from a map
> Map.get(map, :key)
:value
> map.key
:value
> map[:key] # This is the Access behaviour - we'll talk about it later
:value

# Trying to retrieve nonexistent key
> Map.get(map, :foo)
nil
> map.foo
** (KeyError) key :foo not found in: %{key: :value}
> map[:foo]
nil

Values can be retrieved from maps in three ways: Map.get/3 (optionally, the third argument is a default value), the “dot” syntax (which is, as you can see, quite strict, because it fails when given key isn’t in the map), and the [] syntax, which is courtesy of Elixir’s Access behaviour – we’ll return to that.

When it comes to updating maps, you’re really not doing what you might be used to in all sorts of different languages, because – since values in Elixir are immutable – you’re creating a new map.

# Returning a new map with a new key, or an updated value under :key
> Map.put(map, :new_key, :new_value)
%{a: :b, key: :value}
> Map.put(map, :key, :new_value)
%{key: :new_value}

# Merging maps
> Map.merge(map, %{new_key: :new_value, foo: :bar})
%{foo: :bar, key: :value, new_key: :new_value}

# Shorthand for returning a new map with updated value under :key
> %{map | key: :new_value}
%{key: :new_value}

# ...the shorthand doesn't work for putting new keys, though:
> %{map | new_key: :value}
** (KeyError) key :new_key not found in: %{}

Maps can be pattern matched on:

# Pattern matching on a map
> %{key: matched_value} = map
%{key: :value}
> matched_value
:value

# Pattern matching on a map in a function argument
> function = fn %{key: matched_value} ->
>   String.upcase(matched_value)
> end
#Function<6.128620087/1 in :erl_eval.expr/5>
> function.(%{key: "Awesome!"})
"AWESOME!"

The pattern matching part is particularly awesome, because you can pattern match on nested maps as well:

> map = %{outer_key: :outer_value, inner_map: %{inner_key: :inner_value}}
> %{outer_key: outer_match, inner_map: %{inner_key: inner_match}} = map
> inner_match
:inner_value
> outer_match
:outer_value

Finally, Structs!

Now, structs are an extension of maps. Defining a struct like this:

defmodule CuriosumTime do
  defstruct [:hour, :minute, :second]
end

…allows you to create maps on steroids, that is, maps that must only contain specific keys. In this case, we’ve created a module named CuriosumTime, which uses the Kernel.defstruct/1 macro to define a set of fields that all structs following the CuriosumTime contract will be restricted to. How to use this restriction? Here’s an example:

> time1 = %CuriosumTime{hour: 21, minute: 37, second: 42}
%CuriosumTime{hour: 21, minute: 37, second: 42}

# Missing values will be filled with nil
> time2 = %CuriosumTime{}
%CuriosumTime{hour: nil, minute: nil, second: nil}

# Unknown keys will be rejected
> %CuriosumTime{foo: 1}
** (KeyError) key :foo not found

As you can see, default values for defined struct keys are nil, unless you use defstruct with a keyword list:

defstruct [hour: 12, minute: 0, second: 0] # [] can be omitted

…so that these will default to what you’ve specified. So when you access the structs’ keys, the following will be returned:

> time1.hour
21

> time2.hour
nil

The standard way to retrieve values under struct keys is to use the dot syntax because it’ll disallow you to retrieve the value of a nonexistent key. You can also use Map.get/3 if you need to. How about the [] syntax, though?

> time1[:hour]
** (UndefinedFunctionError) function CuriosumTime.fetch/2 is undefined (CuriosumTime does not implement the Access behaviour)
    CuriosumTime.fetch(%CuriosumTime{hour: 21, minute: 37, second: 42}, :hour)

This is because the [] syntax is a shorthand for CuriosumTime.fetch/2, and fetch/2 is a callback of Elixir’s Access behaviour. For a struct to be able to be accessed with [], you need to implement this behaviour in your struct’s module, which means e.g. defining the fetch/2 function – we won’t get into much detail on it, but let a library named StructAccess serve as an example of you can do that.

To cap off our brief introduction to structs, let’s stress that you can also pattern match on the other side of your expression being a specific struct type:

def process_time(%CuriosumTime{} = time) do # our custom time struct
  # ...
end

def process_time(%Time{}) = time) do # Elixir's native time struct
  # ...
end

This is useful for cases where you need a single function to process differently structured data.

And lastly, which is important for our further reasoning, it is important to know that internally, a struct is just a map with the __struct__ key referring to a specific module. Simple, ain’t it?

> time = %CuriosumTime{hour: 10, minute: 0, second: 0}
> time.__struct__
CuriosumTime

Pattern matching: %StructName{} vs. %{__struct__: StructName}

As we’ve noted, in Elixir, the defstruct construct is used to define a specific structure that describes a Map‘s requirement for the keys it contains, as well as their default values. For example:

> defmodule Dog, do: defstruct breed: :mongrel, age: nil
> dog = %{__struct__: Dog, age: 5, breed: :husky}
%Dog{age: 5, breed: :husky}

What’s underlying is just an ordinary Map where Dog is put under the :__struct__ key. This means that you can match it with both of the following syntaxes:

> %Dog{} = dog
%Dog{age: 5, breed: :husky}
> %{__struct__: Dog} = dog
%Dog{age: 5, breed: :husky}

Is the %Dog{} syntax just a syntactic sugar, then? Well, not exactly. Suppose you have an animal variable and you want to check whether it is a Dog or a Cat… but you don’t have the Cat struct defined yet.

> case animal do
>   %Dog{} -> IO.puts("Woof!")
>   %Cat{} -> IO.puts("Meow!")
> end
** (CompileError) iex:37: Cat.__struct__/0 is undefined, cannot expand struct Cat

> case animal do
>   %{__struct__: Dog} -> IO.puts("Woof!")
>   %{__struct__: Cat} -> IO.puts("Meow!")
> end
Woof!
:ok

See the difference? defstruct introduces an additional compile-time check for the actual existence of matched struct, while when simply matching the __struct__ key, Dog and Cat are just plain Erlang atoms!

This can make a huge difference when developing a library that needs to be compatible with multiple versions of a dependency – for instance, when dealing with and Ecto.Query‘s from key, which was a tuple in Ecto 2, but is an Ecto.Query.FromExpr struct (undefined in Ecto 2) from Ecto 3 on.

Cheats (never) prosper

As we’ve proven that you can cheat on Elixir when it comes to using struct definitions, you can also do it with the keys of a defined struct. Consider the following example, where we define a struct that has an enforced key – note that it is merely a compile-time check and doesn’t come with any kind of validation, hence we’re able to do this:

defmodule Foo do
  @enforce_keys [:bar]
  defstruct @enforce_keys
end

good_foo = %Foo{bar: 1337} # OK
bad_foo = %Foo{} # error - enforced key missing
bad_foo = %Foo{bar: 1337, baz: 42} # error - key not found
cheat_foo = %{__struct__: Foo} # apparently OK!
cheat_foo = %{__struct__: Foo, bar: 1337, baz: 42} # apparently OK!

Fine, but where to look for practical applications of this hack? Library developers usually avoid removing keys when creating new library versions, but this may not always be the case. While it’s rare, it might turn out that an expected list of a struct’s fields, often representing e.g. configuration options, will have an item removed or renamed in a future library revision. This might not sound exciting, but, realistically, you could find it handy in the future when pattern matching against such structs.

Structs from Maps: Kernel.struct/2

When dealing with data coming from external sources, perhaps provided from an import or an external API, the need to sanitize the data often arises, and structs provide the basic means to do this.

So let’s suppose that you’ve parsed a dataset into a map, you can call Kernel.struct/2 to annotate it as a specific struct, and what’s important is that you can control the behaviour of handling unknown key occurrences.

Specifically, there are two similar functions defined in Kernelstruct/2 will filter out keys undefined in the struct’s defstruct definition, and will not fail on missing keys defined in @enforce_keys. On the contrary, struct!/2 has a rather more strict behaviour, failing on encountering an unknown key or on an enforced key not being present.

defmodule CuriosumTime do
  @enforce_keys [:hour, :minute, :second]
  defstruct @enforce_keys
end

Since @enforce_keys is just a module attribute, you can directly reuse it in defstruct/1; alternatively, you can just provide a plain list, if you only want specific keys to be enforced.

> data = %{hour: 12, minute: 30, millisecond: 45} # missing :second, extra :millisecond

> struct(CuriosumTime, data)
%CuriosumTime{hour: 12, minute: 30, second: nil}

> struct!(CuriosumTime, data)
** (KeyError) key :millisecond not found in: %CuriosumTime{hour: 12, minute: nil, second: nil}

> struct!(CuriosumTime, data |> Map.delete(:millisecond))
** (ArgumentError) the following keys must also be given when building struct CuriosumTime: [:second]

Interestingly, a well-adopted library for parsing JSON data named Poison contains a decode!/2 function that will do the struct wrapping for you directly from a JSON dataset when passing a specific :as option. However, it looks to be flawed. While the following examples indicate that it’s working:

defmodule CuriosumTime do
  defstruct [:hour, :minute, :second]
end

json = ~s([
  {
    "hour": 12,
    "minute": 30,
    "second": 40
  },
  {
    "hour": 23,
    "minute": 15,
    "second": 50
  }
])

> Poison.decode!(json)
[
  %{"hour" => 12, "minute" => 30, "second" => 40},
  %{"hour" => 23, "minute" => 15, "second" => 50}
]

> Poison.decode!(json, keys: :atoms)
[%{hour: 12, minute: 30, second: 40}, %{hour: 23, minute: 15, second: 50}]

> Poison.decode!(json, keys: :atoms, as: [%CuriosumTime{}])
[
  %CuriosumTime{hour: 12, minute: 30, second: 40},
  %CuriosumTime{hour: 23, minute: 15, second: 50}
]

…problems arise when trying to use @enforce_keys:

defmodule CuriosumTime do
  @enforce_keys [:hour, :minute, :second]
  defstruct @enforce_keys
end

> Poison.decode!(json)
# same as above

> Poison.decode!(json, keys: :atoms)
# same as above

> Poison.decode!(json, keys: :atoms, as: [%CuriosumTime{}])
** (ArgumentError) the following keys must also be given when building struct CuriosumTime: [:hour, :minute, :second]

Something’s looking rather off here – this is just to indicate that Poison.decode!/2 is fine to be used with most of its options, when it comes to creating structs and not just maps from your JSON data, it’s better to use Kernel.struct/2 to process data in a way that you control.

To go further…

So we’ve discussed what Elixir has at its core about structs – they’re very useful and used extensively throughout all sorts of well-adopted libraries such as Ecto, where each object retrieved from the database is represented as a struct.

There are also several cool ways to build upon structs. As you may have noticed, structs are untyped, which means that our CuriosumTime struct can take :ten"Ten" or anything as the hour – hell, in fact, Elixir’s native Time struct also can. If you’re into typed structs, it might be worth having a look at a library named typed_struct – though be aware that it relies on typespecs, which is not a true replacement for typing systems known from strongly typed languages.

If you’ve got something interesting to add to the topic of structs – let us know and drop a comment below!

Leave a Reply

Your email address will not be published. Required fields are marked *