DirtyInformation

Two Duplication Refactorings in Elixir

18 Jan 2017

The other day I was working on Elixir, and preparing a PR when I noticed a little duplication going on in my PR.

  @doc """
  Converts the given string to CamelCase format.

  This function was designed to camelize language identifiers/tokens,
  that's why it belongs to the `Macro` module. Do not use it as a general
  mechanism for camelizing strings as it does not support Unicode or
  characters that are not valid in Elixir identifiers.

  ## Examples

      iex> Macro.camelize "foo_bar"
      "FooBar"

  """
  @spec camelize(String.t) :: String.t
  def camelize(string)

  def camelize(""),
    do: ""

  def camelize(<<?_, t::binary>>),
    do: camelize(t)

  def camelize(<<h, t::binary>>),
    do: <<to_upper_char(h)>> <> do_camelize(t, h)

  defp do_camelize(<<?_, t::binary>>, _),
    do: do_camelize(t, ?_)

  defp do_camelize(<<h, t::binary>>, prev) when prev >= ?a and prev <= ?z and h >= ?A and h <= ?Z,
    do: <<h>> <> do_camelize(t, h)

  defp do_camelize(<<h, t::binary>>, prev) when prev == ?_ or prev == ?.,
    do: <<to_upper_char(h)>> <> do_camelize(t, h)

  defp do_camelize(<<?/, t::binary>>, _),
    do: <<?.>> <> do_camelize(t, ?.)

  defp do_camelize(<<h, t::binary>>, _),
    do: <<to_lower_char(h)>> <> do_camelize(t, h)

  defp do_camelize(<<>>, _),
    do: <<>>

The duplication is subtle, but if you take a close look you might notice it too. There are two instances where the body of the function looks exactly the same. Here it is with just the two copies that are the same.

  def camelize(<<h, t::binary>>),
    do: <<to_upper_char(h)>> <> do_camelize(t, h)

  defp do_camelize(<<h, t::binary>>, prev) when prev == ?_ or prev == ?.,
    do: <<to_upper_char(h)>> <> do_camelize(t, h)

These were pretty far apart in that original code, and I probably would not have noticed the duplication if I didn’t originally have a third instance of the exact same body of code. Originally instead of a guard clause on do_camelize I had two different clauses.

  defp do_camelize(<<h, t::binary>>, ?_),
    do: <<to_upper_char(h)>> <> do_camelize(t, h)

  defp do_camelize(<<h, t::binary>>, ?.),
    do: <<to_upper_char(h)>> <> do_camelize(t, h)

I went ahead and combined them with a guard since the meat of the algorithm did not change from one method to another. This type of refactoring I call, “Combined Pattern with Guard.” This is something that I had noticed and done in a few places in the past so it was immediately apparent to me.

After digging through and finding that first set of duplication I decided to take a look through the bodies of the clauses and look for clauses that appeared to be the same. Here is the duplication I found just one more time.

  def camelize(<<h, t::binary>>),
    do: <<to_upper_char(h)>> <> do_camelize(t, h)

  defp do_camelize(<<h, t::binary>>, prev) when prev == ?_ or prev == ?.,
    do: <<to_upper_char(h)>> <> do_camelize(t, h)

The two method bodies are identical. The second method head no longer cared about prev and was just drooping it. It was not immediately apparent to me what to do to “fix” the situation so I started to rearrange code that worked with the two characters in the guard, ?_ ?/, and brought those together with camelize since it shared a method body.

  def camelize(<<h, t::binary>>),
    do: <<to_upper_char(h)>> <> do_camelize(t, h)

  defp do_camelize(<<h, t::binary>>, prev) when prev == ?_ or prev == ?.,
    do: <<to_upper_char(h)>> <> do_camelize(t, h)

  defp do_camelize(<<?/, t::binary>>, _),
    do: <<?.>> <> do_camelize(t, ?.)

  defp do_camelize(<<?_, t::binary>>, _),
    do: do_camelize(t, ?_)

At this point I could see that any time that I had _ or / I would be restarting the algorithm like nothing else had been processed. This meant that I didn’t need to split the first character off the string in order to capitalize it. I already had a piece of code that would do that for me. I did have to take care of the conversion of a / to a ., but then it was also starting over with a new split. This moment felt really good. I noticed a loop in the algorithm that I had not noticed before. Here is the final result of that change.

  def camelize(<<h, t::binary>>),
    do: <<to_upper_char(h)>> <> do_camelize(t, h)

  defp do_camelize(<<?/, t::binary>>, _),
    do: <<?.>> <> camelize(t)

  defp do_camelize(<<?_, t::binary>>, _),
    do: camelize(t)

This change was so simple that I just redid it here instead of a copy and paste. I was able to drop one clause, and simplify the others. It was nice to be able to cleanly reduce duplication in FP as well as I could in OOP. I hope these two refactoring help you find and eliminate duplication in you code.

I want to thank Devon Estes who made his own PR. I wish I had seen it before making mine. Also a big thanks to Ekspreimental for pointing out the issue in the first place.

The Past

Testing Named Agents 03 Feb 2017
Two Duplication Refactorings in Elixir 18 Jan 2017
B is for BigDecimal 24 Sep 2014
A is for Abbrev 02 Sep 2014
Missed Chances 15 Aug 2014
Style Guides are Failures 15 Jun 2014
What is BDD anyway? 15 Apr 2014
An Agile Contract 31 Mar 2014
What Do I Want 23 Mar 2014
Just the Tip 16 Mar 2014
Important Partnerships 11 Mar 2014
Lost in Translation 03 Mar 2014
Story Splitting Revisited 23 Feb 2014
Let your yes be yes and your no be no 16 Feb 2014
Lethal Injection 09 Feb 2014
Smaller is Better 02 Feb 2014
A Second Time 26 Jan 2014
Introspective 20 Jan 2014
Self Reflection for the Win 06 Dec 2013
Why? Why? Why? 30 Sep 2013
A New Gig 18 Sep 2013
Exceptional Comments 15 Jun 2013
Role Objects in Acceptance Tests 30 Apr 2013
Open Closed Ruby Woes 28 Apr 2013
Write Something 12 Apr 2013
A Great Cast of Characters 06 Mar 2012
When is the Release Party? 19 Aug 2011
Single Responsibilty, How Did I Forget You 12 Apr 2011
Change Classes not Styles 05 Nov 2010
My MongoDB Cookbook Recipe 20 Jul 2010
Agile Says What? 05 Nov 2009
Inject & Me - BFFs 07 Sep 2008

Contact Me

Company: Binary Noggin
Email: contact@binarynoggin.com
Github: Adkron; BinaryNoggin
Twitter: @adkron; @binarynoggin
Podcast: This Agile Life