5 tips for better Hedgehog tests

Have some Hedgehog tests to write? Here’s five useful features you may not know about!


If you only try one thing on the list this should be it!

When you serialize something, no matter the format, you probably want to be able to deserialize it again and get the same result.

Even if you’re lucky enough to be deriving your encode/decode from a single specification (e.g. using TemplateHaskell or GHC.Generics), it’s a good idea to check that this actually works. These deriving tools could well be referring to other instances which are broken, or the deriving could be broken for your particular use case. Ultimately, it’s you who is responsible for your code working in production.

While you can easily implement this check by hand, Hedgehog provides a tripping function which gives prettier output, including the intermediate value when the roundtrip fails.

tripping takes the value to roundtrip and the encode/decode functions and checks that decode . encode == id.

Let’s say you want to test your Aeson ToJSON/FromJSON instances, the test will look something like this…

prop_roundtrip :: Property
prop_roundtrip =
  property $ do
    x <- forAll genUser
    tripping x Aeson.encode Aeson.eitherDecode

-- genUser is a generator that you have to write

The tripping function’s general type is a bit scary, but things become clearer when you constrain the types of your encode/decode functions.

-- general type
tripping :: (
     MonadTest m
   , Show b
   , Show (f a)
   , Eq (f a)
   , Applicative f
   => a
   -> (a -> b)
   -> (b -> f a)
   -> m ()

-- specialized for aeson
\x -> tripping x Aeson.encode Aeson.eitherDecode
  :: (MonadTest m, Show a, Eq a, ToJSON a, FromJSON a) => a -> m ()

Failures include the intermediate value which makes it much quicker to see whether it’s the encode or the decode which has a problem.

   ┏━━ test/Article.hs ━━━
44 ┃ prop_roundtrip :: Property
45 ┃ prop_roundtrip =
46 ┃ property $ do
47 ┃ x <- forAll genUser
User { userId = "00000" , userName = "stephanie" }
48 ┃ tripping x Aeson.encode Aeson.eitherDecode
│ ━━━ Intermediate ━━━
│ "{\"userName\":\"stephanie\",\"userId\":\"00007\"}"
│ ━━━ - Original) (+ Roundtrip ━━━
- Right User { userId = "00000" , userName = "stephanie" }
+ Right User { userId = "00007" , userName = "stephanie" }

In the above example you can easily see that the incorrect id 00007 is in the intermediate data so it must be the encoder which is broken.


Love them or hate them, exceptions are a fact of life in Haskell.

Having exceptions occur during a test is great! Hopefully it means you’ve found a bug. What’s not so great is that you probably have no idea which line caused the exception. This can be really annoying as tests get larger and involve more IO.

Hedgehog has your back with evalIO. The general rule is whenever you would use liftIO, just use evalIO instead.

evalIO fails the test if an exception is thrown, but the practical difference from liftIO is that the location of the exception will be shown in the failure output. With liftIO the failure is instead attributed to the test as a whole, so you’ll need to do some detective work.

In this liftIO example, the cause of the failure isn’t obvious.

   ┏━━ test/Article.hs ━━━
57 ┃ prop_launch :: Property
58 ┃ prop_launch =
59 ┃ withTests 1 . property $ do
│ ━━━ Exception (KeyStuckLaunchFailed) ━━━
│ KeyStuckLaunchFailed
60 ┃ liftIO $ launchMissiles 3
61 ┃ liftIO $ launchMissiles 2
62 ┃ liftIO $ launchMissiles 1

With evalIO, you can see the exact line.

   ┏━━ test/Article.hs ━━━
57 ┃ prop_launch :: Property
58 ┃ prop_launch =
59 ┃ withTests 1 . property $ do
60 ┃ evalIO $ launchMissiles 3
61 ┃ evalIO $ launchMissiles 2
│ ━━━ Exception (KeyStuckLaunchFailed) ━━━
│ KeyStuckLaunchFailed
62 ┃ evalIO $ launchMissiles 1

Hedgehog has a family of evalXXX functions which offer this functionality, depending on the situation. If you are using ExceptT for errors I highly recommend trying evalExceptT.

-- Fails the test if the value throws an exception when evaluated to weak
-- head normal form (WHNF).
eval :: (MonadTest m, HasCallStack) => a -> m a

-- Fails the test if the action throws an exception.
evalM :: (MonadTest m, MonadCatch m, HasCallStack) => m a -> m a

-- Fails the test if the IO action throws an exception.
evalIO :: (MonadTest m, MonadIO m, HasCallStack) => IO a -> m a

-- Fails the test if the Either is Left, otherwise returns the value in
-- the Right.
evalEither ::
  (MonadTest m, Show x, HasCallStack) => Either x a -> m a

-- Fails the test if the ExceptT is Left, otherwise returns the value in
-- the Right.
evalExceptT ::
  (MonadTest m, Show x, HasCallStack) => ExceptT x m a -> m a

Resource Cleanup

Real world tests use real world resources. Cleaning up temporary files and temporary databases is even more important when using property-based testing as your test code will be exercised many times more than with traditional testing approaches.

Brand new for 2020, Hedgehog finally has a MonadBaseControl instance for PropertyT, this will give you a lot more flexibility with which libraries can work right inside a property.

So, with the hedgehog-1.0.2 release, try using bracket from lifted-base to do resource cleanup in your tests.

The code below creates a temporary directory that will be cleaned up no matter how the test passes or fails.

prop_use_tmpdir :: Property
prop_use_tmpdir =
  property $ do
    x <- forAll ...
    y <- forAll ...
    tmpdir <- Temp.getTemporaryDirectory -- aka TMPDIR
      (liftIO $ Temp.createTempDirectory tmpdir "my_prop_files")
      (liftIO . Directory.removeDirectoryRecursive) $ \dir -> do
        -- test assertions begin

Hedgehog traditionally used ResourceT for this purpose, but as of resourcet-1.2 it is no longer a viable option. Hedgehog will never be able to implement MonadUnliftIO due to its transformer stack.

ResourceT still works but you’ll be forever limited to resourcet-1.1.


Ever think to yourself, “How do I know if my generators are good enough?”

Hedgehog’s coverage combinators will help you answer that question.

cover records the number of times a predicate is satisfied and displays the result as a percentage. If the percentage doesn’t meet your threshold then the test fails.

Below is a small example of the kind of output you get when a test fails its coverage obligations.

✗ prop_cover_number failed
after 101 tests.
⚠ small number 25% ████▉··············· ✗ 50%
medium number 57% ███████████▍········ ✓ 15%
big number 10% █▉·················· ✓ 5%
>10 number 71% ██████████████▎·····

┏━━ hedgehog-example/src/Test/Example/Coverage.hs ━━━
71 ┃ prop_cover_number :: Property
72 ┃ prop_cover_number =
73 ┃ withTests 101 . property $ do
74 ┃ number <- forAll (Gen.int $ Range.linear 1 100)
75 cover 50 "small number" $ number < 10
│ Failed (25% coverage)
76 ┃ cover 15 "medium number" $ number >= 20
77 ┃ cover 5 "big number" $ number >= 70
78 ┃ when (number > 10) $ label ">10 number"

Oskar Wickström has published a wonderful tutorial on applying property-based testing (PBT) to the real world problem of form validation which I highly recommend! He makes use of cover to test that his generators are producing inputs which adequately exercise the edge cases involved around dates and leap years.

classify works the same as cover but is purely informational and doesn’t have a threshold below which it will fail the test.

classify "even" $ n `mod` 2 == 0

label is like classify but doesn’t have a predicate, so it simply tracks the percentage of tests run which hit a certain line of code.

label "branch-x"

collect is like label but uses Show on its argument to create the label name.

collect someCounter


Hedgehog’s major point of difference with QuickCheck is how it approaches shrinking. For the most part you don’t have to worry about it, but if you want to construct high quality generators which cause easy to understand counterexamples and don’t take a million iterations to do so, then you should be looking at what happens when they shrink.

Check out Gen.print which generates a random sample from your generator and also the first level of shrinks, this is great to get an understanding of what is going on under the hood of your generator. This can produce a lot of output for big top-level data types so it’s better used for making sure your more primitive generators are shrinking well. Things like string identifiers often produce too many shrinks, testing cases that aren’t that interesting.

Here we generate the number 64. You can see that Hedgehog would first try to shrink to the smallest possible value in the range (i.e. 1) and would gradually cut its distance to 64 by half each time it doesn’t find a failure.

ghci> Gen.print (Gen.int (Range.constant 1 100))
=== Outcome ===
=== Shrinks ===

Here we generate the string "acc", this is a bit more interesting. We first try the empty string and then all combinations of the 2-letter substrings, then we try keeping the length of the string the same and shrinking the characters themselves.

ghci> Gen.print (Gen.string (Range.constant 0 3) (Gen.element ['a','b','c']))
=== Outcome ===
=== Shrinks ===

As with Gen.print, Gen.printTree generates a random sample from your generator, but this time we can see the entire shrink tree. This is rarely practical for even modest generators, but it’s super useful for debugging if you can isolate your problem shrinking problem down to a smaller generator. In theory a web interface for this could take advantage of the shrink tree’s laziness and it would be practical for displaying all generators.

ghci> Gen.printTree (Gen.string (Range.constant 0 3) (Gen.element ['a','b','c']))
 │  ├╼""
 │  ├╼"a"
 │  │  └╼""
 │  ├╼"c"
 │  │  ├╼""
 │  │  ├╼"a"
 │  │  │  └╼""
 │  │  └╼"b"
 │  │     ├╼""
 │  │     └╼"a"
 │  │        └╼""
 │  ├╼"aa"
 │  │  ├╼""
 │  │  ├╼"a"
 │  │  │  └╼""
 │  │  └╼"a"
 │  │     └╼""
 │  └╼"ba"
.. snip ..

It’s a good idea to check your lower level combinators like this especially things involving strings as they can easy cause a blow up in the space of values you’re trying to search. Remember that your shrink search space is basically the product of all the generators you use in it. So think about what you’re trying to test, there may be places which don’t benefit from a highly variable generator.

ghci> Gen.print (Gen.string (Range.constant 0 10) Gen.alphaNum)
=== Outcome ===
=== Shrinks ===
.. _many_ more lines ..

For things like user ids you can usually get away with using Gen.element on a handful of strings, as you just want a few different possibilities and not totally random data.

The hedgehog-corpus has a few fun lists to try, it’s always entertaining when your test failures are complaining about "agile alligator".

ghci> Gen.print $ mconcat [Gen.element Corpus.agile, pure (" " :: Text), Gen.element Corpus.animals]
=== Outcome ===
"test driven elephant"
=== Shrinks ===
"agile elephant"
"pair programming elephant"
"scrum master elephant"
"standup elephant"
"story points elephant"
"test driven alligator"
"test driven chimpanzee"
"test driven dog"
"test driven duck"
"test driven eagle"

Try it

I hope you learned something! I’m sure even Oskar Wickström didn’t know about all of these things.

Subscribers should feel free to email me directly about any of this stuff, I may not be able to respond immediately but I’m always happy to help!


Photo by Sierra Narvaeth on Unsplash