Round tripping JSON with partial isomorphisms (1/2)
Posted on March 1, 2015
This two-part post will demonstrate how Thomas Sutton and I applied a paper from 2010 to a real-world problem (it was super effective!). I’ll detail the problem, summarise the paper, and then walk you through our solution. After reading these posts I hope that you’ll be inspired apply this technique when you notice the following problem.
Writing isomorphic round-trip printer/parsers with the get/put idiom is redundant and error prone.
By get/put I mean that we write two distinct functions to and from the destination format and by isomorphic I mean that going to your foreign format, and then back again should be the same as doing nothing.
In order to clarify, let’s look at an example of get/putting when building an isomorphic printer/parser. Pretend if you will that we’ve been given the task of converting some balls to JSON, the balls can either be bouncy or lumpy.
data Ball
= Lumpy { _colour :: Text
, _lumps :: [[Bool]]
}
| Bouncy { _bouncyness :: Double }
It is common knowledge that bouncy balls are happy, whilst lumpy ones are not. This is reflected in the following serialization format.
[ Lumpy {_colour = "Rainbow"
, _lumps = [[True,False],[False,False]]}
, Bouncy {_bouncyness = 3.141592653589793}]
[
{
"colour" : "Rainbow",
":D" : false,
"lumps" : [[true,false],[false,false]]
},
{
"bouncyness" : 3.14159265358979,
":D" : true
}
]
How shall we achieve this? By writing toJSON/fromJSON instances by hand, of course! You may wonder why we don’t just use Generics or TH. The answer to this is that the destination format really should be (and often is) arbitrary.
instance FromJSON Ball where
parseJSON :: Value -> Parser Ball
parseJSON (Object o) = do
happy_ball <- o .: ":D"
if happy_ball then parseBouncy else parseLumpy
where
parseBouncy =
Bouncy <$> o .: "bouncyness"
parseLumpy =
Lumpy <$> o .: "colour" <*> o .: "lumps"
instance ToJSON Ball where
toJSON :: Ball -> Value
toJSON (Lumpy colour lump_map) =
object [ ":D" .= True
, "colour" .= colour
, "lumps" .= lump_map
]
toJSON (Bouncy bouncyness) =
object [ ":D" .= True
, "bouncyness" .= bouncyness
]
What’s wrong here? We wrote the same information twice. Not only did I write the same information twice, but it doesn’t even look like they’re the same information! As such, it’s difficult to tell if we made a mistake.
Furthermore, in order to be sure that this is correct we will now have to write a bunch of Abitrary instances and tests so that we can an enter a lucky draw to find a possible problem. Wouldn’t it be nice if we could be correct by construction? Doesn’t it seem theoretically possible to at least palm off the hard work to library authors?
This problem is either a library design issue or a library mis-use issue. Libraries that provide a get/put interface are just fine if they expect users to only ever go one way. This is often not the case and hence the non-optimal interface is uncovered when you want to write a get that is intended to be the inverse of your put. When this use case is considered, the library interface is poor in that it provides a way to define a get that is not equivalent to put. This is the problem we will try to solve.
Thomas and I came up against this exact problem when dealing with “enterprise” JSON (the kind of JSON produced by enormous Java CRM products that like to spit out boolean values as “T”, or “F”). In our dismay, we began searching for solutions with the general idea that:
We want to define get and put at the same time This probably has something to do with composing isomorphisms in some kind of applicative-like syntax. We quickly found Invertible syntax descriptions: Unifying parsing and pretty printing and instantly recognised its relevance. Here’s an excerpt from the abstract:
Parsers and pretty-printers for a language are often quite similar, yet both are typically implemented separately, leading to redundancy and potential inconsistency. We propose a new interface of syntactic descriptions, with which both parser and pretty-printer can be described as a single program.
The paper starts off with a List data type and two implementations of a many combinator that operates on this List. One for Printers, one for Parsers.
data List a
= Nil
| Cons a (List a)
type Printer a = a -> Doc
printMany :: Printer a -> Printer (List a)
printMany p list
= case list of
Nil -> text ""
Cons x xs -> p x
<> printMany p xs
newtype Parser a = Parser (String -> [(a, String)])
parseMany :: Parser a -> Parser (List a)
parseMany p
= const Nil <$> text ""
<|> Cons <$> p
<*> parseMany p
You can probably see where we’re going here, but let me join the dots for you anyway: printMany and parseMany both define the same information with different syntax, wouldn’t it be nice to define them both at once? If we could define them under some kind of unified syntax, then there would be only one definition and there would be no place for inconsistency or redundancy to live. Something like this.
combined :: Unicorn x => x a -> x (List a)
combined p
= magic Nil <$> fairies ""
<|> Cons <$> p
<*> parseMany p
The first bit of syntax we will tackle is the fmap (<$>) operator, before reading this you will want to understand co/contravariant functors. I recommend reading the beginning of this post
Implementing <$> for parseMany is trivial.
newtype Parser a = Parser (String -> [(a, String)])
(<$>) :: (a -> b) -> Parser a -> Parser b
f <$> Parser p = Parser $ (fmap . first) f . p
Unfortunately, <$> for printer is impossible to define due to it being contravariant. The difference here is that the universally quantified a appears on the opposite side of the arrow in the data type.
type Printer a = a -> Doc
(<$>) :: (a -> b) -> Printer a -> Printer b
(f <$> Printer p) a = error "impossible"
Oh noes! We need access to both a function a→b and b→a in order to implement both of these under the same operator. Also, what if we wanted our functions to be partial? Printer and Parser have no way of respecting such partiality intelligently. This is terrible.
Let’s jam a→b and b→a into a box and bolt on some partiality. This is an instance of Category, quite trivially. We will call it Iso (not ideal in my opinion). Done, we can go home now.
data Iso a b = Iso
{ apply :: a -> Maybe b
, unapply :: b -> Maybe a
}
Wait! We don’t have our syntax yet. Let’s define a typeclass for functors from the category of partial isos to Hask (restricted to f).
class IsoFunctor f where
(<$>) :: Iso a b -> f a -> f b
Now our syntax can be unified and we can just replace (→) with Iso. I would apologise for the overloading of <$> but it wasn’t my idea and I’m equally offended. Let’s not look a gift horse in the mouth, though.
In order to parse product types (multiple fields) and recurse, we’re going to need something like an apply (<*>) operator. Let’s start by trying to use the same trick that worked for Iso.
class Applicative where
(<*>) :: f (a -> b) -> f a -> f b
class UnhelpfulIsoApplicative where
(<*>) :: f (Iso a b) -> f a -> f b
We now get stuck trying to define an instance, the type signature is ludicrous.
type Printer a = a -> Doc
instance UnhelpfulIsoApplicative Printer where
(<*>) :: (Iso a b -> Doc) -> (a -> Doc) -> b -> Doc
(f <*> g) b = error "impossible!"
The solution? Just throw tuples at it and frob the fixity, duh!
class Applicative f where
(<*>) :: f (a -> b) -> f a -> f b
class ProductFunctor f where
infixr 6 <*>
(<*>) :: f a -> f b -> f (a, b)
To make it obvious what we’re doing here, I’ll show you how this can look much like the applicative you are used to seeing. Let’s define a function f that takes a constructor of arity three and returns a thing. Then we take these arguments wrapped in functors and produce a thing wrapped in a functor. This is clearer in code. Again, take note that <*> is being overloaded here and means two different things.
f :: Applicative f
=> (a -> b -> c -> d)
-> f a -> f b -> f c -> f d
f ctor fa fb fc = ctor <$> fa <*> fb <*> fc
f ctor fa fb fc = ((ctor <$> fa) <*> fb) <*> fc
f :: (ProductFunctor f, IsoFunctor f)
=> Iso (a, (b, c)) d
-> f a -> f b -> f c -> f d
f ctor fa fb fc = ctor <$> fa <*> fb <*> fc
f ctor fa fb fc = ctor <$> (fa <*> (fb <*> fc))
Note that we build a tree of tupples, for multiple arguments. This is just leaving them in their uncurried form (with a different fixity).
λ :t uncurry . uncurry
uncurry . uncurry :: (a -> b1 -> b -> c) -> ((a, b1), b) -> c
In order to lift data types (like our Ball from earlier) we will need to define partial isomorphisms for each of the branches of the product. This is boring and we hate typing, so we just shoot some TH at it. Pew pew pew!
data List a
= Nil
| Cons a (List a)
defineIsomorphisms ''List
nil :: Iso () (List a)
cons :: Iso (a, List a) (List a)
Alternative is easy, we just drop the Applicative constraint.
class Alternative where
(<|>) :: f a -> f a -> f a
Now we can glue all of these constraints together as a Syntax (and bolt on pure).
class (IsoFunctor s, ProductFunctor s, Alternative s)
=> Syntax s where
pure :: Eq a => a -> s a
Remember our ugly (and immoral) parseMany and printMany definitions from earlier? They’re now both specific cases of a more general and much prettier definition.
parseMany :: Parser a -> Parser (List a)
parseMany p
= const Nil <$> text ""
<|> Cons <$> p
<*> parseMany p
printMany :: (a -> Doc) -> (List a -> Doc)
printMany p list
= case list of
Nil -> text ""
Cons x xs -> p x
<> printMany p xs
many :: Syntax s => s a -> s (List a)
many p
= nil <$> pure ()
<|> cons <$> p <*> many p
Nailed it! Unicorns and fairies do exist. Q.E.D.
I’ll leave the Printer instances here, have a look at the paper if you would like to see the Parser.
instance IsoFunctor Printer where
iso <$> Printer p
= Printer (\b -> unapply iso b >>= p)
instance ProductFunctor Printer where
Printer p <*> Printer q
= Printer (\(x, y) -> liftM2 (++) (p x) (q y))
instance Alternative Printer where
Printer p <|> Printer q
= Printer (\s -> mplus (p s) (q s))
instance Syntax Printer where
pure x
= Printer (\y -> if x == y then Just "" else Nothing)
Thats it for now.