For quite awhile we have been maintaining an application that processes XML and JSON data. Usually the maintenance consists of fixing defects and adding minor features, but sometimes it requires refactoring old code.
Consider, for example, a function that extracts an XML node by path:
This function works fine, but requirements change and now we need to:
Extract nodes from JSON and other tree-like data structures, not only XML
Return a descriptive error message if the node is not found
This post explains how to refactor getByPath to meet the new requirements.
Let’s factor out a piece of code that creates a function to extract a child node by name. We could name it createFunctionToExtractChildNodeByName, but let’s name it child for brevity.
val child: String => XmlNode => Option[XmlNode] = name => node => node.child.find(_.label == name)
Now we can make the key observation: our getByPath is a sequential composition of functions that extract child nodes.The code below shows an implementation of this composition:
Fortunately, the Scalaz library provides a more generic way to compose function A => M[A], where M is a monad. The library defines Kleisli[M, A, B]: a wrapper for A => M[B], which has method >=> to chain the Kleisli wrappers in the same way as andThen chains regular functions. We will call this chain Kleisli composition. The code below provides a composition example:
Note the point-free style we are using here. It is very common for functional programmers to write functions as a composition of other functions, never mentioning the actual arguments they will be applied to. The Kleisli composition is exactly what we need to implement our getByPath as the composition of functions extracting child nodes.
The Kleisli composition is exactly what we need to implement our getByPath as the composition of functions extracting child nodes.
Note the using of Kleisli.ask[Option, XmlNode] as the neutral element of the fold. We need this neutral element to handle a special case when path is Nil. Kleisli.ask[Option, XmlNode] is just an alias of a function from any node to Some(node).
Let’s generalize our solution and abstract it over XmlNode. We can rewrite it as the following generic function:
Now we can reuse this generic function to extract a node from JSON (we use json4s here):
Note that we wrote a new function, child: JValue => Option[JValue], to handle JSON instead of XML, but getByPathGeneric remains unmodified and handles both XML and JSON.
We can generalize getByPathGeneric even further and abstract it over Option with Scalaz, which provides an instance of scalaz.Monad[Option]. So we can rewrite getByPathGeneric as follows:
Now we can implement our original getByPath with getByPathGeneric:
Next we can reuse getByPathGeneric to return an error message if the node is not found.
To do this, we will use scalaz.\/ (aka disjunction), which is a monadic right-biased version of scala.Either. On top of that, Scalaz provides implicit class OptionOps with method toRightDisjunction[B](b: B), which converts Option[A] to scalaz.B\/A so that Some(a) becomes Right(a) and None becomes Left(b). You can find more info about \/ in other blogs.
Thus we can write a function, which reuses getByPathGeneric, to return an error message instead of None if the node is not found:
The original getByPath function handled only XML data and returned None if the node was not found. We also needed it to handle JSON and return a descriptive message instead of None.
We have seen how using Kleisli composition provided by Scalaz can factor out the generic function getByPathGeneric, which we abstracted further using generics (in order to support JSON) and disjunction (to generalize over Option).