Help! My Haskell program consumes more memory the longer it runs

Are you building a long running app like a WebSocket server? Is its memory usage increasing steadily the longer it runs?

You’ve got a memory leak.

You’ve stared at your code for hours and everything looks fine.

You start to question the libraries you’re using. ๐Ÿง

Why am I seeing unbounded memory growth when making calls to Network.WebSockets.sendTextData?

After all… half the libraries on Hackage seem to be abandoned experiments! ๐Ÿ˜ญ How are you supposed to know which ones are production ready? Maybe you picked a dud. ๐Ÿ˜ฉ

You decide to learn how to do this profiling voodoo ๐Ÿ•ต and it tells you that the memory used by your state update function is steadily growing. ๐Ÿ“ˆ

Unfortunately that doesn’t point obviously to the problem ๐Ÿคฌ you look at the offending code and you’re updating a few maps but there doesn’t seem to be anything going on that would use a lot of memory.

Well, at least it’s not the WebSocket library…. ๐Ÿ˜…

If this story sounds familiar, you could be falling into a very common Haskell trap. ๐Ÿ‘น

Your data types are lazy! ๐Ÿ’ค

This is a problem and I’ll show you why.

Hypothetically, let’s say you have defined some data types like the following, to keep track of the metal available on the stars in your game universe. ๐Ÿ‘ฉโ€๐Ÿš€๐Ÿ’ซ

newtype StarId = StarId Int

data Star =
  Star {
      starId :: StarId
    , starName :: Text
    , starMetal :: Double

data Universe =
  Universe {
      universeStars :: IntMap Star

And you have a loop which is constantly updating the state of world.

mineMetal :: Double -> Star -> Star
mineMetal amount star =
  star {
    starMetal =
      starMetal star - amount

doMiningStar :: StarId -> Double -> Universe -> Universe
doMiningStar (StarId sid) amount universe =
  universe {
    universeStars =
      IntMap.adjust (mineMetal amount) sid (universeStars universe)

gameLoop :: IO ()
gameLoop =
  forever $ do
     universe <- readIORef universeRef

     -- figuring out what to do next..

     -- doing nothing that needs to know the value of `starMetal`

     -- ah! mine some metal from a star
     let new = doMiningStar starId amount universe

     writeIORef universeRef new
     -- send names of available stars to clients..

Pretty normal stateful app code, right?

doMiningStar is happily updating starMetal and reducing the amount of metal available, but nothing in the program is consuming that value.

This is a problem because starMetal is lazy.

Every time you update starMetal, it’s building up additional thunks and therefore using more memory. This continues every loop until something demands the value of starMetal. Which could be never.

This can be a totally non-obvious consequence of laziness. ๐Ÿ˜ฒ

Strict data ๐Ÿ”ข

The fix for this is to make all of your data strict.

data Star =
  Star {
      starId :: !StarId
    , starName :: !Text
    , starMetal :: !Double

data Universe =
  Universe {
      universeStars :: !(IntMap Star)

You should also make sure you’re using the strict interface of any containers you’re updating in the loop. In the case of the IntMap container this is exposed by Data.IntMap.Strict.

That’s it.

Now when you run the profiler, all is well.

A good default ๐Ÿ“œ

You could have avoided all of that if you had used strict data types from the start. ๐Ÿ˜‘

Strict data and lazy functions is the default you should be using for all production Haskell code. This, for the most part, gives you the best of both worlds.

You get the benefits of lazy evaluation, but with strict data, thinking about your memory usage becomes much more straightforward.

You will know that when you force a value, its entire structure has be evaluated, and therefore there will be no hidden delayed computations to worry about.

This is not entirely true, as base data like lists, tuples and Maybe are always lazy, but this hasn’t caused me many problems over the years. Just adding strictness to your own data types is usually enough. You may want to look in to deepseq if you think these things are causing you problems.

I first learned this rule of thumb from performance guru Johan Tibbell many years ago. I have rarely had to think about space leaks since I started applying it. ๐Ÿ˜‡

There are obviously still good reasons to use laziness in data types, you may want to create streams or other infinite structures. Any use of laziness should however be an explicit choice. It should probably have a comment as to why, and it should certainly set off warning bells when it goes through code review. โš 

Lazy data types are a bug waiting to happen. ๐Ÿ›

StrictData ๐Ÿ—

I remember Johan once telling me that whenever he was asked by someone to analyze their program to see why it was slow, or used too much memory, 99% of the time lazy data was involved! Over the years I have also found this to be true.

Putting bangs on every field can be tedious and error prone however, so you can have GHC do it for you by enabling the StrictData language extension.

If you want to learn more about StrictData, a good place to start is Johan’s post on the design of the Strict Haskell pragma.

Still got problems? ๐Ÿคฆโ€โ™€๏ธ

Of course, even though lazy data is the cause of a large number Haskell space leaks, it’s not the only thing that can go wrong.

If you’ve still got issues, I highly recommend Neil Mitchell’s technique for easily detecting space leaks. It has helped me on a number of occasions.


Photo by Christopher Burns on Unsplash