Stop fearing the Monad

I’ve been deep-diving into Haskell for an upcoming exam, and I’ve found that the best way to master a concept is to explain it. This post is my ‘Plain English’ guide to Monads; stripping away the academic jargon to find the practical pattern underneath.

Monads: You’re already using them (you just don’t know it) #

If you mention “Monad” in a room full of developers, half will tremble in fear, and the other half will start talking about space suits or burritos. But if you strip away the Category Theory math, a Monad can be understood as a design pattern for dealing with “context.”

In programming we rarely deal with raw values. We deal with values wrapped in some kind of context.

The context of failure the value might not be there, think of None, Nil and Null
The context of time the value isn’t there yet, think of async, promises and futures
The context of multiplicity there are many values, think of Lists, Sets, etc

A Monad is just a standard way to chain operations together while respecting that context, so you don’t have to manually unwrap and re-wrap the value at every step.

The Wrapper #

Imagine you have a value wrapped in some context, let’s say a “Box”. You have a function that takes a raw value but returns a new Box.

You have a Box: Box(5)
You have a function: makeBox(x) = Box(x + 1)

If you just map that function over your box (like a Functor would), you end up with a mess: Box(Box(6)). You don’t want a box inside a box, you just want a Box(6).

To see why we need a Monad, we have to look at the difference between a function that returns a raw value and a function that returns a new context.

The Functor Case: If your function is a simple transformation like $x + 1$, a Functor is perfect. You map it over Box(5) and get Box(6). The structure stays the same.
The Monad Case: If your function itself returns a new context, for example, makeBox(x), mapping it over Box(5) creates a box in a box: Box(Box(6)). You don’t want a box inside a box; you want a flat result. This is the core purpose of a Monad. It provides the machinery to take a value out, apply a context-returning function, and “flatten” the layers back into one.

A monad doesn’t just flatten context, it defines how effects propagate and interact as computations are sequenced.

The rule of thumb: If your function returns a raw value ($a \to b$), use a Functor. If your function returns a new context ($a \to m \space b$), you need a Monad.

The Haskell way #

We’ll first look at how Monads work in Haskell and then look at how you’ve actually been using Monads all along in other programming languages.

Haskell handles these structures elegantly due to its powerful and expressive type system. We don’t create new methods for every type, we implement the Monad typeclass for our type and that allows us to use the bind operator (>>=)

The definition looks scary, but trust me, it’s not.

$$ (\gg =) :: m \space a \to (a \to m \space b) \to m \space b $$

It says:

Take a wrapped value $m \space a$
Take a function that turns a raw $a$ into a wrapped value $b$ ($a \to m \space b$)
Return the resulting wrapped $b$ ($m \space b$)

Because this is so common, Haskell gives us do-notation, which makes monadic code look more like imperative code.

Without do-notation:

maybeResult :: Maybe Int
maybeResult =
    Just 5 >>= \x ->
    Just (x + 1) >>= \y ->
    return (y * 2)

With do-notation (syntactic sugar):

maybeResult :: Maybe Int
maybeResult = do
    x <- Just 5         -- unwrap the `5`
    y <- Just (x + 1)   -- unwrap the `6`
    return (y * 2)      -- re-wrap the result

Note: return in Haskell does not exit a function like in imperative languages. It simply injects a value into the monadic context (a -> m a).

Beyond the Box #

Up until now we’ve looked at Monads like Maybe or Lists as containers; boxes that hold value, these are sometimes known as data-like monads. This is a great starting point, but it’s only half the story. If you only see Monads as “wrappers,” you’ll eventually run into Control-flow Monads like State, Reader, or IO and get stuck. In these cases, the Monad isn’t a noun (a box); it’s a verb (a recipe).

When we work with control-flow monads, the monad isn’t some container containing some data, it’s actually a function (think of s -> (a, s) for the State monad). We can’t visualize a function as a box, it’s a description of a computation that hasn’t happened yet. Instead of a “Box”, try thinking of a control-flow monad as a “Programmable semicolon.” In imperative code, the semicolon just means “do the next thing.” In a monad, the “semicolon” (the bind >>=) has logic. It can say “stop if there’s an error” (think Maybe) or “pass the config to the next line” (think Reader). The bind operator becomes a piece of plumbing that automatically passes information between steps behind the scenes.

In these Monads, you aren’t “wrapping” a value. You are composing a script. The monad doesn’t do anything until you finally hit “play” (like runState or running your Haskell main).

Control-flow Monads are used to cleanly model effects. In Haskell a function should remain pure, this means that functions should not have side effects. An effect is something that a function does besides returning a value. For example:

a pure function calculates $2+2=4$ and returns it
an effectful function calculates $2+2=4$… and also prints to the console, or reads from a database, or mutates a global state. These are side effects.

In Haskell we want to keep functions pure, so we don’t let them actually do those things. Instead the function returns a description of the effect. A function doesn’t print “Hello”, it returns an IO () value, which is a script that says “if you run me, I will do IO operations.”

Monads are the glue for these scripts, if you have a function that reads a filename from the user and a function that opens that file and reads it’s contents, the Monad allows you to sequence these functions. It ensures that the function that opens the file does not start before the filename is present and conveniently handles the “plumbing” of passing the filename along.

The State Monad #

In a pure language like Haskell, you can’t have a global variable that changes. To simulate such a “state,” you have to pass a value into a function and return a new, updated value.

A stateful function typically looks like this: s -> (a,s). It takes an initial state s and returns a result a plus a new state s. The State Monad automates pasing that s. The state monad type looks like this:

newtype State s a = State { runState :: s -> (a, s) }

A state is literally just a wrapper around a function that takes an input state s and returns a pair (a, s). Then when you call get it will just return the state as as the result and keeps the state unchanged. While put changes the current state.

get = State $ \s -> (s, s)
put s = State $ \_ -> ((), s)

Let’s now look at an example, say we have a simple game where a player’s score increases. In a pure language we have to pass the score around manually. Doing this manually is tedious and easy to mess up:

-- every function has to take the old state and return the new state
incrementScore :: Int -> (Int, Int)
incrementScore score = (score + 1, score + 1)

doubleScore :: Int -> (Int, Int)
doubleScore score = (score * 2, score * 2)

gameResult =
    let (res1, state1) = incrementScore 0
        (res2, state2) = doubleScore state1
    in res2

The State monad hides this “state passing” logic inside the do block.

gameStep :: State Int Int
gameStep = do
    score <- get        -- get the current score
    put (score + 1)     -- put the new score into the state
    newScore <- get     -- get the new score
    put (newScore * 2)
    get                 -- return the final score

finalScore = runState gameStep 0 -- Result (2, 2)

We call the runState function of gameStep (gameStep is a monadic value, a stateful computation) we just defined with an initial value of 0, until this moment nothing has happened yet, the gameStep is jut a “recipe”. Only when we call runState is the computation run. This way we keep our logic pure and monadic as long as possible. We only “execute” the effects at the very end.

The Parser Monad #

If Maybe is a data-like monad and State is a control-flow monad, we can combine them to create some sort of hybrid monad. This is typically done when building parsers. A parser is essentially a function that looks like this:

\[ \text{Parser } a = \text{String } \to \text{Maybe } (a, \text{String}) \] Or as a type

newtype Parser a = Parser { runParser :: String -> Maybe (a, String) }

It attempts to run a computation on a string, if it succeeds, it returns the result a and the leftover string.

Why it’s a Control Monad: It manages the “state” of the input text. When you chain two parsers together using do notation, the Monad automatically takes the “leftover” string from the first parser and feeds it into the second. It also handles failure: if the first parser fails, the “programmable semicolon” stops the script before the second parser even runs.

-- A parser that expects a digit, then a slash, then another digit
dateParser :: Parser (Int, Int)
dateParser = do
    d <- doubleDigitParser    -- Parses "12" from "12/05"
    _ <- charParser '/' -- Receives "/05", parses "/", leaves "05"
    m <- doubleDigitParser    -- Receives "05", parses "05"
    return (d, m)

Why it’s a Data Monad: Once the “script” is finished, the end result is a data structure (like a Date or a Syntax Tree). It uses the control of moving through a string to fill a box of data.

A quick note on the Monad laws #

Not every type with a return and a >>= is a Monad. To behave predictably, these operations must obey three simple laws:

Left identity
Wrapping a value and immediately binding it to a function is the same as just calling the function. return a >>= f ≡ f a
Right identity
Binding a monadic value to return does nothing. m >>= return ≡ m
Associativity
The order in which you chain monadic operations doesn’t matter. (m >>= f) >>= g ≡ m >>= (\x -> f x >>= g)

These laws ensure that monadic code composes reliably and that do-notation behaves the way you intuitively expect. You usually don’t need to think about them day-to-day, but they are what separate a true Monad from a “monad-like” API.

Practically, the monad laws guarantee that refactoring do-notation, reordering bindings, extracting helper functions, or inlining expressions, does not change the program’s meaning.

You’ve been using them all along. #

Let’s take a look at other languages, where you’ve been using Monads all along, you just didn’t know it yet! As an exercise, read the title of the section (the programming language we’ll take a look at) and first think of what Monads could be used inside that language.

1. JavaScript #

In JavaScript we deal with the context of latency, a value might arrive later. We wrap this in a Promise context.

Chaining such promises is essentially using monadic operations.

// The wrapper
const value = Promise.resolve(5);

// The chain (bind)
value
  .then((x) => {
    // This function returns a new promise (a new context)
    return Promise.resolve(x + 1);
  })
  .then((y) => {
    // We don't get Promise<Promise<number>>, we are using monads!
    return Promise.resolve(y * 2);
  })
  .then((result) => {
    console.log(result);
  });

Promise.resolve is exactly what Haskell calls pure (putting a value inside the monadic context). While .then is roughly what Haskell calls >>= (bind). It takes a wrapped value, unwraps it, feeds it to the function, and handles the re-wrapping.

NB: Promises behave like control flow monads, even though their exact semantics are language defined. More precisely, Promises resemble monads operationally, but they violate monad laws in edge cases (especially associativity), which is why Haskell separates IO from asynchronous abstractions.

2. Rust #

Rust does not have exceptions; instead we use the context of success or failure. This is the Result (or Option) type.

Imagine trying to parse a string and then divide by it. Both can fail.

fn str_to_int(s: &str) -> Option<i32> { ... }
fn divide_ten_by(n: i32) -> Option<i32> { ... }

let input = Some("5");          // Our context

let result = input
    .and_then(str_to_int)       // unwrap "5", parse it, if fail we stop
    .and_then(divide_ten_by)    // unwrap "5", divide 10/5, if 0 stop

// Result: Some(2)

Some(T) is return / pure while and_then is the monadic bind (>>=). If Rust did not have this feature, you’d have to write nested match statements for every single step to check for errors.

The ? operator can be used in the same way as Haskell’s <- operator inside of do notation.

// With the ? operator (Syntactic Sugar)
fn calculate() -> Option<i32> {
    let val = str_to_int("5")?;     // unwrap or return None
    let res = divide_ten_by(val)?;  // unwrap or return None
    Some(res)                       // rewrap, like return in Haskell
}

3. Java #

Java 8 introduced Optional to handle the context of missing values without crashing on a NullPointerException.

Optional<String> name = Optional.of("Haskell");

Optional<String> upper = name
    .flatMap(s -> Optional.of(s.toUpperCase()));

Once again, Optional.of is the monadic pure / return while flatMap is the literal definition of a monadic bind. It maps the function and then flattens the Optional<Optional<String>> into Optional<String>.

4. C# #

C# is arguably the mainstream language that uses monads most heavily, thanks to LINQ and the ? operator.

The list monad (IEnumerable), in C# SelectMany is exactly the same as Haskell’s bind (or flatMap) for lists. It flattens nested lists. Say we have a List<School> and each school has a List<Student>. We want a big list of all students.

var schools = GetSchools();

// Monadic Bind: flattens the nested list automatically
var allStudents = schools.SelectMany(s => s.Students);

The query syntax (LINQ) C# query syntax (from ... in ...) is syntactic sugar just like Haskell’s do notation.

var pairs = from x in list1
            from y in list2
            select x + y;

This desugars to list1.SelectMany(x => list2.Select(y => x + y)). It is functionally identical to Haskell’s:

pairs = do
    x <- list1
    y <- list2
    return (x + y)

Null propagation (Maybe-like behavior) similar to javascript’s ? operator. Imagine you have a deep hierarchy: A Customer has an Address and an Address has a City. Any of these could be null. Without monads you’d have to manually check every step to avoid a crash.

string city = null;
if (customer != null) {
    if (customer.Address != null) {
        city = customer.Address.City;
    }
}

with the ? operator (monadic chaining) it can be done in a much cleaner way.

// If customer is null? Stop and return null.
// If Address is null? Stop and return null.
// Otherwise, give me the City.
string? city = customer?.Address?.City;

This is often described as a “Null monad,” but more precisely it behaves like automatic propagation of a Maybe-style context. The ?. operator short-circuits evaluation when a null is encountered, much like how Maybe stops a computation when it hits Nothing.

Unlike Haskell’s Maybe, this behavior is built into the language, is not user-definable, and isn’t checked against monad laws. Still, from a usage perspective, it captures the same intuition: chaining computations while safely handling missing values.

NB: ?. is not quite a monad, while the null-conditional operator mimics monadic behavior by short-circuiting on null, it is a built-in language feature, not a true Monad. A true Monad allows for chaining while ?. automatically “flattens” everything.

5. Python #

Python doesn’t natively emphasize monads, mainly due to it being a dynamically typed language. But like in most languages a List can be seen as a monad. And in python we can use List Comprehension to simulate the monadic bind.

ranks = ['A', 'K', 'Q', 'J']
suits = ['♠', '♥', '♦', '♣']

# flatten the result into a single list of strings, not a list of lists
deck = [r + s for r in ranks for s in suits]
# Result: ['A♠', 'A♥', ... 'J♣']

6. C++ #

For years, C++ didn’t really have any proper monads (except for lists). But C++23 introduced monadic operations for std::optional, very similar to Rust’s monadic Option.

#include <optional>
#include <iostream>

std::optional<int> stringToInt(std::string s) { /* ... */ }
std::optional<int> divideTenBy(int n) { /* ... */ }

std::optional<std::string> input = "5";

auto result = input
    .and_then(stringToInt)
    .and_then(divideTenBy);

7. Go #

Go does not really have any monads, hence I include it here as a counter example. We saw in the Rust and C++ examples that errors can be handled in a really clean way if we use monads. In Go you have to write the early exit logic on fails yourself, every single time.

func process(input string) (int, error) {
    val, err := strToInt(input)
    if err != nil {
        return 0, err       // Manual bind logic
    }

    result, err := divideTenBy(val)
    if err != nil {
        return 0, err       // Manual bind logic
    }

    return result, nil
}

Go chooses explicit error propagation over abstract sequencing, effectively forcing developers to write monadic bind logic by hand. This makes control flow obvious, but at the cost of boilerplate and composability.

Back to category theory #

A monad is a monoid over the category of endofunctors.

This sentence is actually a “shibboleth” but often quoted, rarely understood but quite useful to understand the monadic laws. If we break it down, it connects perfectly to what we just discussed in the code.

To understand this we only need to understand three things: Categories, Monoids and Endofunctors.

A Category #

A category consists of:

Objects
Morphisms (arrows) between objects
A way to compose morphisms
An identity morphism for every object

Composition must be associative, and identities must behave as expected.

In Haskell, we usually work in the category Hask:

Objects are the types
Morphisms are the functions
Composition: (.)
Identity: id

A monoid #

You actually use monoids every day. A monoid is just a fancy name for a set of things that can be combined.

To have a monoid you need three things:

a collection of things (e.g. integers $\mathbb{Z}$)
a rule to combine them, an associative binary operation (e.g. addition $+$)
a neutral element that changes nothing (e.g. zero $0$)

If you add zero to a number, it stays the same ($5 + 0 = 5$). If you add two numbers, you get a new number ($6 + 7 = 13$).

Lists are also monoids

collection: lists of things [a]
combination: appending (++)
neutral element: empty list []

Endofunctors #

In category theory, we don’t just combine numbers or lists; we combine Endofunctors. A Functor is a mapping between categories that preserve structure.

In Haskell, a Functor is a type that maps types to other types (it’s kind is Type -> Type or * -> *). Because these Functors map from Haskell types back to Haskell types (they map over the same category, Hask; they don’t leave the language), we call them Endofunctors (“Endo” meaning “inside” or “within”).

So instead of a set of integers, imagine we have a set of Endofunctors.

The category of endofunctors #

Now comes the important shift, there is a category whose:

objects are endofunctors on Hask (* -> *)
morphisms are natural transformations
composition is functor composition
identity is the identity functor Id

This category is often written as End(Hask) and in this category objects look like Maybe, [], IO, and they are combined using functor composition.

Key idea #

When we talk about monoids, we usually think of elements being values and an operation like (+) or (++). But category theory generalizes this idea.

A monoid in a category is an object in that category together with a multiplication morphism and a unit morphism satisfying associativity and identity laws.

Crucially: the elements of a monoid are not values, they are objects of a category.

A monad is a monoid #

Now let’s try to treat our Endofunctors (let’s use List or Maybe) as a monoid. We need three ingredients:

The collection Instead of a set of integers, our collection is the Endofunctor m (like Maybe).
The neutral element (identity) With numbers, we need a way to have a value that doesn’t change anything $(0)$. In Monads, we need a way to have a context that is “minimal” or “pure.” return :: a -> m a wraps the value into a neutral context.
The combination rule (multiplication) With numbers, we take two numbers and squash them into one ($6 + 7$ becomes $13$). With Monads, we get into a situation where we have layers of context, like Maybe (Maybe Int) or [[Int]]. We need a way to “squash” these layers into one. join :: m (m a) -> m a takes something like Just (Just 5) and squashes it to Just 5 or [[1, 2], [3, 4]] to [1, 2, 3, 4] In fact, >>= doesn’t introduce any fundamentally new power. It can be defined entirely in terms of fmap and join as follows: m >>= f = join (fmap f m) So when you use >>=, what’s really happening is:
1. Map a function that returns a context (fmap)
2. Collapse the resulting nested context (join)

These three points combined form a monoid over the endofunctors. But they also form a monad! We set out to treat our Endofunctors as a monoid, but accidentally re-invented the Monad.

When Category Theory nerds say “A monad is a monoid in the category of endofunctors,” they aren’t trying to confuse you. They are simply stating that Monads follow the same structural rules as addition

Integers combine values $6 + 7$.
Monads combine layers of structure Just (Just 5).

The “Objects” aren’t numbers; they are the effect types themselves. The “Multiplication” isn’t addition, it’s the collapsing of layers (join).

This is why Monads are so powerful. They aren’t about “wrapping values”, they are about composing computational structures. The laws governing them aren’t arbitrary, they are the fundamental laws of how things combine (exactly the monoid laws).

To wrap up, in one sentence:

A monad is a principled way of composing effects, and category theory explains this by showing that monads are simply monoids, not over values, but over the effects themselves (endofunctors).

Summary #

We started this journey by talking about how developers often tremble at the word “Monad.” But as we’ve seen, you’ve likely been using monad-shaped APIs for years in JavaScript, Rust, or C# without needing a PhD in mathematics. Whether you were handling the context of data (like a missing value) or the context of time (like a Promise).

The beauty of Haskell is that it doesn’t just let you use the pattern; it gives that pattern a name and a rigorous set of rules. We’ve seen that Monads are more than just “boxes” or “wrappers” for data. They are programmable semicolons; engines of control that allow us to model effects like state, logic and hardware interaction as pure, predictable scripts.

By treating context, whether it’s failure, multiplicity, or the “recipe” for a stateful computation, as a first-class citizen, we can:

Stop writing boilerplate: No more manual if err != nil checks or passing state variables through ten different functions.
Compose complex logic: Chain together “scripts” of instructions (like Parsers or IO) without losing track of the underlying “plumbing.”
Trust the consistency: When we say a Monad is a Monoid in the category of Endofunctors, we mean that the rules for combining effects are just as reliable as the rules for adding numbers.

Operationally, a monad is a type constructor equipped with return and >>= that sequences computations, not just by unwrapping values, but by defining exactly how one step of a program should transition into the next.

Monads aren’t burritos, and they aren’t space suits. They are a principled way of composing effects. So, next time you see that >>= operator, don’t see a scary math symbol. See a bridge that lets you move from one instruction to the next, keeping your code flat, clean, and, most importantly, mathematically sound.