Deeply Incoherent and Generic

The code for this article is available as a gist, giving an example implementation of all the concepts mentioned below.

The higgledy library started out as an internal project in an old job. The company was an online mortgage broker, and as you can imagine, this comes with a lot of paperwork. A large part of a userā€™s time in the application was spent filling out a lengthy form requiring all sorts of personal and financial information. Because of this, we designed everything assuming that users would spend days filling in this form in no particular order as they collected all the relevant data.

This meant that, while we would eventually be able to populate a type to represent a mortgage application, we would have to handle a lot of partial information to get there. Figuring out how to do this became the engineering departmentā€™s favourite puzzle, and countless hours were spent devising all sorts of approaches, from which a lot of very interesting projects were extracted1.

Higher-kinded data

One possible approach is to use higher-kinded data (HKD), an idea I first discovered from Sandy Maguireā€™s blog post on the subject. In short, we index a type by a type constructor, and then wrap all its fields with that constructor:

type UserF :: (Type -> Type) -> Type
data UserF f
  = User
      { name      :: f String
      , age       :: f Int
      , likesDogs :: f Bool
      }

type PartialUser :: Type
type PartialUser = UserF Maybe

type UserValidation :: Type
type UserValidation = UserF (Either String)

type User :: Type
type User = UserF Identity

By changing the f type, we can get all sorts of interesting versions of the same type. The barbies library, built by another colleague at the same job, provides all sorts of useful tools for working with HKDs, too:

-- Print each error to the screen.
reportErrors :: UserValidation -> IO ()
reportErrors = bfoldMap (either putStrLn mempty)

-- Turn `UserF (Either String)` into `UserF Maybe`.
ignoreErrors :: UserValidation -> PartialUser
ignoreErrors = bmap (either (const Nothing) Just)

Where I think this starts to get exciting is when we think about nested HKDs:

type ApplicationF :: (Type -> Type) -> Type
data ApplicationF f
  = ApplicationF
      { userDetails  :: UserF f
      , workHistory  :: WorkF f
      , isRemortgage :: f Bool
      }

The magic of higher-kinded data and the barbies library is that the nesting here works without an issue: we can still implement all the barbies classes and have features like partiality wherever we want throughout the type.

Generic higher-kinded data

While this approach definitely works, the types are quite cumbersome to work with, especially when f is Identity. In this case, I really just want to forget about the constructor and focus instead on the content within. Furthermore, deriving simple instances like Eq, Ord, and Show end up being rather more verbose and technical in this representation. There are a few approaches to this problem, but mine is what eventually became higgledy: what if I just define the type without the f parameter, and then use a wrapper to inject it back in?

type User :: Type
data User
  = User
      { name      :: String
      , age       :: Int
      , likesDogs :: Bool
      }
  deriving (Generic, Show)

type PartialUser :: Type
type PartialUser = HKD User Maybe

The HKD type stores a modified version of the typeā€™s Generic representation in which the leaves are wrapped with the given f type. We then use things like generic-lens to provide an API for accessing and mutating these types. Nowadays, features like OverloadedRecordUpdate could go even further to make the internals virtually invisible.

The problem

If youā€™re wondering how the nesting I mentioned earlier works in this world, youā€™ve figured out the fatal flaw in this approach: higgledy derives the HKD automatically, so we canā€™t tell it which leaf types are themselves meant to become HKDs. If we factor in the constraints of the library, there are two conditions under which weā€™d want to make something a nested HKD:

  • It has a Generic instance, which is required by higgledy to generate the HKD representation.

  • Itā€™s a type we actually want to be a nested HKD. String is a good example of a Generic type we probably want to leave as a leaf type, rather than turning it into an HKD. So, we need a way of skipping these.

Generic deriving of generic traversals is a fantastic paper for many reasons, many of which Iā€™ll surely end up discussing in future posts, but one thing that stood out for me was the notion of ā€œinteresting typesā€: when we create or traverse a generic type, we can do so equipped with a list of types we consider ā€œleavesā€, and thus wonā€™t explore further. In practice - or to strawman a justification for this blog post - this can become unwieldy as we add more and more leaf domain types; currencies, refined numeric values, and validated identifiers quickly add up to a hefty list.

This is the point at which I had an idea: why donā€™t we instead recurse into all Generic types unless theyā€™re in an ā€œuninteresting typesā€ list? This list, in theory, should grow a lot more slowly as most of our leaf types donā€™t need to have Generic instances, but those that do can still be explicitly marked to ignore. It sounds like a good idea, but with this approach we immediately hit a wall: how do we detect whether a type is Generic?

Looks like weā€™re stuck

The detection of a Generic instance should be impossible. We shouldnā€™t be able to write a function whose behaviour changes based on whether we implement a particular class, and solving this problem has always required plugins like constraints-emerge to extend the language. However, thanks to the miracle of incoherence, Adam Gundry figured out how to detect Generic instances using a couple of the darker corners of GHC.

Type families can be thought of as type-level functions, but with a few key differences. The most relevant to us is that type families donā€™t have to be total: we donā€™t need to write equations for all possible inputs2. Of course, we should immediately ask what happens when we provide an input not covered by the type family. For example, GHC.Generics.Rep is the type family that maps types to their generic representations:

ghci> :k! Rep Bool
Rep Bool :: * -> *
= M1
    D
    (MetaData "Bool" "GHC.Types" "ghc-prim" False)
    (M1 C (MetaCons "False" PrefixI False) U1
     :+: M1 C (MetaCons "True" PrefixI False) U1)

The result of applying the Rep function to Bool is the generic representation of Bool. However, the result of applying Rep to Int - a type with no Generic implementation - seems to be a little less helpful:

ghci> :k! Rep Int
Rep Int :: * -> *
= Rep Int

In this instance, GHC has no idea how to compute an answer for Rep Int, and so it instead parrots it back to us. We refer to this evaluation as stuck: GHC has got to a point where it can make no further progress with this expression. We can use stuck values in functions to return larger stuck values, but until we figure out what Rep Int is, weā€™reā€¦ well, stuck.

This is an issue for us: how do we know whether a type has a Rep if asking gets us stuck? Really, stuckness should be undetectable, so if we try to branch on a stuck value, the branching logic will get stuck, and so on. However, there is one quirk that plays to our advantage: instance head resolution still works on stuck families.

type IsGeneric :: (Type -> Type) -> Constraint
class IsGeneric rep where isGeneric :: Bool

instance IsGeneric (M1 D m x) where
  isGeneric = True

instance {-# INCOHERENT #-} IsGeneric x where
  isGeneric = False

ghci> isGeneric @(Rep Bool) -- True
ghci> isGeneric @(Rep Int) -- False

Thereā€™s one key insight needed to understand whatā€™s happening here: when we match on two instances and one is INCOHERENT, we automatically pick the other one. In other words, we only pick an INCOHERENT instance when we have no other choice.

When we call IsGeneric with Rep Bool, GHC evaluates Rep Bool to the structure we saw above: M1 D (MetaData ...) ..., which just so happens to match both instances. One is incoherent, so we default to the first instance, making isGeneric then True.

When we call IsGeneric with Rep Int, however, GHC immediately gets stuck: in this instance, weā€™ve no idea if the result would match M1 D m x, so weā€™re left with one instance: the one in which isGeneric is False. Using an unholy unity of stuckness and incoherence, weā€™ve found ourselves a very sneaky way to figure out whether weā€™re dealing with a Generic type.

Stuckness as an input

We may be able to detect a Generic instance, but weā€™re not out of the woods yet: to solve the higgledy problem, we need a way of saying, ā€œif the type has a Generic rep, then it should be wrapped in HKD, otherwise in f. When I have a problem like this, I immediately think about functional dependencies3:

type Leaf :: (Type -> Type) -> (Type -> Type) -> Type -> Constraint
class Leaf f rep leaf output | f rep leaf -> output

instance Leaf f (M1 D m x) l (HKD l f)
instance {-# INCOHERENT #-} Leaf f x l (f l)

This would be ideal, but our generic detection trick comes with a price: as far as GHC is concerned, the output type isnā€™t uniquely determined by the input types. The coverage checker is not clever enough to detect the dark arts weā€™re using, and instead simply complains that the instances overlap, and thus the functional dependency is invalid. Unfortunately, it looks like both these ideas are off the table, so weā€™ll have to dabble in the dark arts one more time.

Almost ten years ago, Chris Done wrote a post on the constraint trick for instances, and it took me about three reads over several years to understand the implications. Rather than arrogantly assume I can do a better job in a paragraph, Iā€™ll assume youā€™ve read the post and skip to the punchline: instance head matching canā€™t prompt unification, but solving the constraints of a matched instance can. In other words:

type C :: Type -> Type -> Constraint
class C x y where f :: x -> y

instance C () () where f = id
instance x ~ Bool => C Bool x where f = id
ghci> :t f ()
f () :: C () y => y

ghci> :t f True
f True :: Bool

C () () is an instance head that will only ever match if both values are known to be (). For this reason, the type of f () isnā€™t known to be () - instead, we have to specify what y is to match the instance, and so we get no inference help at all.

By contrast, C Bool x is an instance head that will match if the first value is known to be Bool regardless of the second value. So, when we call f True, GHC can immediately solve the constraint and select the second instance. When it does so, we then get a new constraint to solve: x ~ Bool. Applying this constraint teaches us that the output of f True must be a Bool too, and so the ambiguity is removed.

What makes this interesting to us is that, despite there being no x -> y functional dependency on C, ghci tells us that the type has indeed been fully determined! Using the same trick, we can write code whose type changes depending on the presence of a Generic instance as long as the output type is only ever defined by the constraints, rather than the instance head:

type IsGeneric :: (Type -> Type) -> Type -> Constraint
class IsGeneric rep x where problem :: x

instance o ~ () => IsGeneric (f x) o where
  problem = ()

instance {-# INCOHERENT #-} (o ~ String, Typeable x)
    => IsGeneric x o where
  problem = show (typeRep @x) ++ " has no rep!"

examples :: IO ()
examples = do
  print (problem @(Rep Bool)) -- ()
  print (problem @(Rep Int)) -- "Int has no rep!"

Bringing it home

Having learnt several dirty tricks, we can return to the higgledy problem. Most of the code isnā€™t too interesting: we walk the generic representation of the type, and at the leaves, we use the aforementioned dirty tricks to decide whether they should be wrapped in f or HKDs themselves. Because weā€™re not allowed to use type families, the HKD type becomes a little more creative:

type HKD :: Type -> (Type -> Type) -> Type
data HKD x f where
  HKD :: forall o x f. GHKD (Rep x) f o => o Void -> HKD x f

Iā€™ve removed the ā€œinteresting typesā€ references to make the point a bit clearer, but theyā€™re visible in the gist.

The inner representation of HKD is now determined by a constraint, and we want that type to be hidden from the outside world, so we package up the GHKD dictionary inside the HKD type. Now, all thatā€™s left to do is get all the other machinery to work (barbies, generic-lens, and so on), and for thatā€¦ weā€™re going to need a way to work with the HKD type generically.

Deriving Generic for undetermined types

The immediate problem is that Generic is a class with a type family, and weā€™ve said several times that type families are out of the question. This is a non-starter: thereā€™s nothing we can really do (to my knowledge) to solve this problem directly. However, we could side-step the problem by defining a very simple type for wrapping generic representationsā€¦

type RepWrapper :: (Type -> Type) -> Type
newtype RepWrapper o = RepWrapper { unRepWrap :: o Void }

instance (Contravariant o, Functor o)
    => Generic (RepWrapper o) where
  type Rep (RepWrapper o) = o

  from (RepWrapper o) = phantom o
  to = RepWrapper . phantom

This type isnā€™t very interesting at all: it stores a generic representation, and its generic representation is defined as whatever representation it stores. Itā€™s effectively a transparent newtype. However, with this type, we can solve all our other Generic problems simply by first transforming our HKD type into our Wrapped type:

repWrapper :: forall xs p x f. GHKD xs (Rep x) f p => Iso' (RepWrapper p) (HKD xs x f)
repWrapper = L.iso (HKD . unRepWrap) \(HKD x) ->
  RepWrapper (unsafeCoerce @(_ Void) @(p Void) x)

type HasField' t f o s a
    = ( GHKD Uninteresting (Rep s) f o
      , G.HasField' t (RepWrapper o) a
      )

field :: forall t x f i o r. HasField' t f o x r => Lens' (HKD_ x f) r
field = L.from (repWrapper @Uninteresting @o) . G.field' @t

These may look like an awful lot of type variables, but thatā€™s because weā€™re talking in extremely abstract terms. As soon as we know what x is here, everything becomes a lot less polymorphic, and we are at last rewarded for doing the inference dance4:

-- Found type wildcard ā€˜_ā€™ standing for ā€˜f [Char]ā€™
eg1 :: Applicative f => HKD_ User f -> _
eg1 u = u ^. field @"userName"

-- Found type wildcard ā€˜_ā€™ standing for ā€˜HKD_ Pet fā€™
eg2 :: Applicative f => HKD_ User f -> _
eg2 u = u ^. field @"userPet"

At long last, weā€™ve made it. We have a working version of higgledy that allows for nested HKD fields, while also being able to support all the external, Generic-driven tools that are already present in the library. Add to that the delightfully good type inference demonstrated in the above examples, and I think weā€™ve done pretty well here!

Review

Itā€™s been a long time since I left the job and the colleagues that inspired this library. These days, itā€™s looked after by the wonderful Jonathan King, a far more reliable maintainer than Iā€™ve ever been. However, I do occasionally find myself daydreaming about this library and various ways to improve it, so this little experiment has been a lot of fun.

At this point, itā€™s good to ask whether such a change is worth it. On balance, I would say yes: these details donā€™t leak to the outside world any more than the current implementation does, and itā€™s substantially more flexible. I find that the litmus test for things like this is quite simple: will a user of the library ever need to understand how this works internally? If so, itā€™s probably not a good idea to add it. The one thing I still think is rather clunky here is the ā€œuninteresting typesā€ notion. We havenā€™t really discussed it much in this post, but if you read through the gist, the xs of uninteresting types is everywhere, and Iā€™m sure thereā€™s a neater way to determine what should and shouldnā€™t be expanded.

In any case, thanks for reading along, and please feel free to ask questions - if you open an issue on the siteā€™s repository, Iā€™ll try to respond and update the posts if need be!

  1. Csongor Kissā€™ generic-lens is perhaps the most notable example, though itā€™s not the only example. Iā€™ll probably talk about others in future blog posts.Ā 

  2. More upsettingly still, we canā€™t write total type families even if we want to: there are no closed kinds.Ā 

  3. Associated type families are another option, but they run into exactly the same hurdle as described in this article for functional dependencies.Ā 

  4. You may be wondering, reasonably, why Iā€™ve glossed over the unsafeCoerce in repWrapper. If GHKD had a functional dependency to determine the representation of our HKD, then these two types would unify. However, as weā€™ve already discussed that we canā€™t have one, we need GHC to trust us.Ā