Answer by Tikhon Jelvis:

There are a lot of monad tutorials out there. Too many, probably. I'm not terribly happy with most of the explanations though, so I'll throw my hat in too!The core problem, as I see it, is that the idea of a monad is

very abstract. Trying to pin it down to a single concrete idea, a single analogy is doomed to fail.I've been thinking about how to explain abstract concepts like this for a while. To me, a good explanation has to have two parts:

examplesand anactual definition. Far too often, I've found people leave out one or the other, but I need both to understand! If you just give me an abstract mathematical definition without context, I won't actuallyunderstandit, even if I can understand the parts. And if you only give me a bunch of examples but forget to explicitlydefinethe idea, I might understand how it behaves, but I won't have any idea of what it actuallyis.So a good explanation has to have these two parts. The remaining question, then, is how to order them. Examples first, to motivate a definition? Start with the definition and "flesh it out" with examples? Somehow interleave the two?

I've found that for

me, the best approach is to open with a definition and expand on it. I won't understand the definition immediately, but I'll have it in the back of my mind for each example, and I'll refer back to it periodically.If you like the other order more, take a look at one of the best monad tutorials around: You Could Have Invented Monads! (And Maybe You Already Have.)

So I'll introduce monads in two parts: first, I'll tell you

whata monad is; next, I'll tell youwhy we care; finally, I'll wak through a few concrete examples of monads.

WhatSo what is a monad, at least in functional programming? Fundamentally, it's just a type

`m`

that has three particular functions defined on it. Note that

`m`

has to accept a type argument itself: you'd always use it as

`m Int`

,

`m String`

or

`m a`

. Then it just has to have some functions defined for it:

1 2 3`return :: a -> m a`

`fmap :: (a -> b) -> m a -> m b`

`join :: m (m a) -> m a`

These functions also have to follow some laws. I usually don't think about these directly—it's enough to know that they relate to each other "reasonably"—so I'll talk about them later.

The most important part is

`join`

, which captures a notion of "flattening" a structure. You go from a nested

`m (m a)`

to a single

`m a`

: you flatten out a level. If you want a single idea to hang onto, this is it: monads let you flatten them.

As I alluded earlier, the definition probably seems dry and arbitrary right now. Don't worry about it. Just look back here as you're reading the rest of my post. Most of all, remember: a monad is just

a type with some functions. That's all.

WhySo why do we care about these

particularfunctions? What's so special about them? How do they flow together? Why is the notion of "flattening" so important?The first two functions,

`return`

and

`fmap`

both serve a simple role: they allow us to transition from normal code

`a`

to code that uses

`m`

.

`return`

wrapsa normal value, lifting it into the monad;`fmap`

lifts normal functions to operate over these wrapped values. This becomes clearer when you add an extra set of parentheses to its type signature:

1`fmap :: (a -> b) -> (m a -> m b)`

But what about flattening? That one's a bit less obvious.

Composition

Perhaps the single most important idea in functional programming (or really all programming ever) iscomposition. And this is exactly what monads help with! In particular, the three functions let us use`m`

as a customizable way to compose parts of our program, more versatile than just function composition.

Normally, we just compose normal functions: given

`a -> b`

and

`b -> c`

, we can first apply one then the other to get

`a -> c`

. Simple but surprisingly useful.

If a type

`m`

is a monad it means it gives us a

newway to compose functions. It lets us compose functions of the form`a -> m b`

: given

`a -> m b`

and

`b -> m c`

, we can get

`a -> m c`

. Note how the

`m`

in the middle gets swallowed up: this is the main difference with this form of composition.

Since we have an instance of our

`m`

in the middle, we can insert different behavior as things get composed. This means that, for every different monad

`m`

, we get a new

regime of composition.And this is exactly what

`join`

enables!

`join`

specifies a bit of computation that lets us "get rid" of an extra layer of

`m`

, and this is exactly what happens inside the composition. To compose two monadic functions like this, we start with

`fmap`

:

1`f >=> g = \ x -> fmap g (f x)`

Try following allong in GHCi, checking types as you go along.Remember that

`f :: (a -> m b)`

and

`g :: b -> m c`

, which means that the type of the function above is:

1`(>=>) :: (a -> m b) -> (b -> m c) -> (a -> m (m c))`

We start with an

`a`

, pass it into

`f`

to get an

`m b`

and map over that to get an

`m (m c)`

. And how do we go from this to the actual type we want? We need to flatten it!

So we make the whole composition work properly and not have an extra

`m`

by throwing in a call to

`join`

:

1`f >=> g = \ x -> join (fmap g (f x))`

So: a monad gives us a new kind of "join-enabled composition", letting us customize what happens when things get composed by defining an appropriate

`join`

function.

We can now view

`a -> m b`

as a

special kindof function between`a`

and

`b`

which just gets composed in a different way.

Bind

So, we have a new idea of composition:`>=>`

. But we can actually have a new idea of function

applicationtoo! This is calledbindand is written as`>>=`

. It's very useful for programming, but I find it harder to think about than

`join`

and

`>=>`

.

What does this new sort of application mean? Well, it can't just be applying

`a -> m b`

to a normal

`a`

because we can already do that. Instead, it's applying

`a -> m b`

to an

`m a`

:

1`(>>=) :: m a -> (a -> m b) -> m b`

In spirit, it's very similar to

`>=>`

: our new notion of application also "swallows" an intermediate

`m`

, so we can customize how it works.

In fact, the definition is just like

`>=>`

but not wrapped in an extra lambda:

1`x >>= f = join (fmap f x)`

This operator will come up in some of the practical examples.

Laws

Now, just before jumping into the examples, lets cover the monad laws. If you can't remember the details, don't worry: most of the time, how the laws need to apply in any given case is pretty intuitive.These laws are most easily described in terms of

`>=>`

:

1 2 3`return >=> f = f`

`f >=> return = f`

`f >=> (g >=> h) = (f >=> g) >=> h`

In other words,

`return`

is both a right and a left identity for

`>=>`

and

`>=>`

is

associative.Hey, these laws look very similar to how

`id`

and

`.`

(

normalfunction composition) behave!'

1 2 3`id . f = f`

`f . id = f`

`f . (g . h) = (f . g) . h`

This is just more evidence that monad provide a special kind of composition!

## Examples

Now that we have the main idea, lets go on to see how it's used in practice with some examples!

Maybe

Perhaps the simplest meaningful monad is`Maybe`

, defined as follows:

1 2`data Maybe a = Just a`

`| Nothing`

`Maybe`

wraps a normal type and also allows it to be null (ie

`Nothing`

): it's Haskell's option type.

How do we make it a monad? Well, we know the three functions we need, so lets just implement those. We know that

`return`

just "wraps" a normal value in

`Maybe`

; the most natural way to do this is just

`Just`

(heh):

1 2`return :: a -> Maybe a`

`return x = Just x`

What about

`fmap`

? How do we apply a function to a

`Maybe`

value? Well, if we have a

`Just`

, we can just apply the function to its contents. But if we have

`Nothing`

, we don't have anything to apply the function to! We just have to return

`Nothing`

again:

1 2 3`fmap :: (a -> b) -> Maybe a -> Maybe b`

`fmap f (Just x) = Just (f x)`

`fmap f Nothing = Nothing`

`join`

is the final function. Here, we just have to consider all the possible cases and implement them in the most natural way:

1 2 3 4`join :: Maybe (Maybe a) -> Maybe a`

`join (Just (Just a)) = Just a`

`join (Just Nothing) = Nothing`

`join Nothing = Nothing`

`Maybe`

is a nice example because, at every step, there was only one "reasonable" thing to do. In this case, "reasonable" meant not needlessly throwing away real values. As long as we keep values whenever we can, we only have one possible implementation of the monad functions.

The next question to ask is "what does

`a -> Maybe b`

mean? It's a function that takes an

`a`

and may or may not return a

`b`

. It's a function that can

fail.How do we compose two functions that can fail? Well, if they both succeed, they're just like normal functions. And if any one of them fails, if we ever get a

`Nothing`

well, we have no choice but to return

`Nothing`

for the whole thing.

So

`Maybe`

gives us a type of composition and application for functions that

can failbut propagating the failure through for you. It's very useful for avoiding deeply nested`case`

expressions. We can transform

1 2 3 4 5 6 7 8 9`case f x of`

`Nothing -> Nothing`

`Just res ->`

`case g res of`

`Nothing -> Nothing`

`Just res' ->`

`case h res' of`

`Nothing -> Nothing`

`Just res'' -> ...`

into a much nicer

1`x >>= f >>= g >>= h`

`Maybe`

abstracts over function application and composition that automatically pipes possible

`Nothing`

s through the whole computation, saving us from doing it manually and making the code much less noisy. It saves us from the equivalent of

1 2 3 4 5`if (x != null) {`

`return ...;`

`} else {`

`return null;`

`}`

Either`Maybe`

is great, but sometimes we want to fail in

multiple different ways. This is where`Either`

comes in: it's like

`Maybe`

but with the

`Nothing`

case annotated with an extra argument:

1 2`data Either err a = Left err`

`| Right a`

It uses the generic names

`Left`

and

`Right`

to show it isn't

justabout error handling, but we'll pretend it is.It actually forms a monad almost exactly like

`Maybe`

. If anything, it's easier: since we can't come up with a value of type

`err`

from thin air, we can only really implement these functions one way:

1 2 3 4 5 6 7 8 9 10 11`return :: a -> Either err a`

`return x = Right x`

`fmap :: (a -> b) -> Either err a -> Either err b`

`fmap f (Right x) = Right (f x)`

`fmap f (Left err) = Left err`

`join :: Either err (Either err a) -> Either err a`

`join (Right (Right x)) = Right x`

`join (Right (Left err)) = Left err`

`join (Left err) = Left err`

In fact,

`Maybe`

can be seen as a special case of

`Either`

where the

`Left`

case doesn't carry extra information:

1`type Maybe a = Either () a`

`Either`

lets us abstract over error checking via return types. We can avoid the

`Go`

error-handling pattern:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16`res, err = f(x)`

`if err != nil {`

`return nil`

`}`

`res2, err = g(res)`

`if err != nil {`

`return nil`

`}`

`res3, err = h(res)`

`if err != nil {`

`return nil`

`}`

`...`

The moral equivalent of the above in Haskell could just be written as

1`x >>= f >>= g >>= h`

even though they're both using return types to manage errors!

`Either`

abstracts over all the plumbing for us.

List

So far, we've seen how monads let us compose functions that might fail. What else can we do?Well, a simple one is the list type which is written as

`[a]`

in Haskell. We're going to approach lists with the same heuristic as for

`Maybe`

and

`Either`

: never throw away data unnecessarily. We also don't want to

copyorrearrangedata unnecessarily.With these conditions,

`return`

is pretty easy:

1 2`return :: a -> [a]`

`return x = [x]`

`fmap`

, as the name implies, is just

`map`

1 2 3 4 5`fmap :: (a -> b) -> [a] -> [b]`

`fmap = map`

`-- or`

`fmap f [] = []`

`fmap f (x:xs) = f x : fmap f xs`

Finally, we have

`join`

which needs to flatten a list. Here's the most reasonable definition:

1 2 3`join :: [[a]] -> [a]`

`join [] = []`

`join (ls:rest) = ls ++ join rest`

Basically, we just take our list of lists and concatenate each of its items with

`++`

.

Now our next question: what does

`a -> [b]`

mean? It's a function that can return

any number of results. In math, this is often called anondeterministic function.What does it mean to compose non-deterministic functions like this? Well, to compose

`f :: a -> [b]`

with

`g :: b -> [c]`

, we first pass in a value into

`f`

to get a bunch of

`b`

s, then we pass each

`b`

into

`g`

to get a whole bunch of

`c`

s and finally we return

allof those`c`

s. This is exactly what

`>=>`

for lists does.

So the list monad gives us clean composition of functions that have any number of results. One cool thing to note is that this corresponds very closely to list comprehensions! In fact, we can rewrite a list comprehension in terms of

`return`

and

`>>=`

pretty easily:

1`[f a b | a <- as, b <- bs]`

becomes

1 2 3`bs >>= \ b ->`

`as >>= \ a ->`

`return (f a b)`

As you can see, this transformation is actually not list-specific at all. In fact, you can do this for any monad at all. Haskell's MonadComprehensions extension actually allows this: you can use list comprehension syntax for any monad at all, which is sometimes quite nice.

Reader

Now lets look at a much trickier one. This is thefunctionmonad in the form of`r -> a`

for some fixed

`r`

. (Just like

`Either err a`

had a fixed

`err`

.)

What does

`return`

mean for this? Let's look at the type we want:

1`return :: a -> (r -> a)`

We get a value and want to turn it into a function from

`r`

. Since we don't know anything particular about

`r`

, we can't do anything with that argument but ignore it:

1`return x = (\ r -> x)`

How about

`fmap`

? Figuring it out on your own is actually a good exercise. Try it on your own before continuing.

Again, we want to look at the type:

1`fmap :: (a -> b) -> (r -> a) -> (r -> b)`

Hey, that type looks a familiar! It's just like function composition:

1`(.) :: (b -> c) -> (a -> b) -> (a -> c)`

And, in fact, that's exactly what

`fmap`

is for

`(r -> a)`

:

1 2 3`fmap = (.)`

`-- or`

`fmap f g = \ x -> f (g x)`

Finally, we need

`join`

. Again, the type is going to be very helpful:

1`join :: (r -> (r -> a)) -> (r -> a)`

We take in a function

`f`

with

twoarguments and need to use it to produce a function ofoneargument. Since we can't magically produce a value of type`r`

, there is actually only one reasonable implementation of

`join`

:

1`join f = \ x -> f x x`

So now we see that, indeed,

`(r -> a)`

is a monad. But what does this mean? We can think of value of type

`(r -> a)`

as values of type

`a`

in the contextof`r`

. That is, they're normal values of

`a`

that can also

dependon a value of`r`

. They can "read" from the environment, which is why

`(r -> a)`

is often called the "reader monad".

The reader monad allows us to pipe this environment through a whole bunch of values and functions that all depend on it. The result of

`x >>= f >>= g >>= h`

lets us pass in a single value of

`r`

that is first given to

`x`

then passed into the result of

`f`

then the into the result of

`g`

and so on.

Writer

Another type to look at is`(w, a)`

for a fixed

`w`

. This might seem a little weird, but as we'll see it naturally forms a monad and can actually be pretty useful!

So how do we do

`return`

? Immediately, we run into a problem: we would need to manufacture a value of type

`w`

, but we can't. So, in fact, we need to add a constraint to

`w`

: it has to be a type that has a special value of some sort to use with

`return`

. This is provided by the

`Monoid`

class which has

`mempty`

:

1`mempty :: Monoid w => w`

We'll just use this; as we'll see later, the other part of the monoid will also come in useful.

Given a free

`w`

value,

`return`

is straightforward:

1 2`return :: Monoid w => a -> (w, a)`

`return x = (mempty, x)`

Now

`fmap`

, which we can only really do in one way:

1 2`fmap :: (a -> b) -> (w, a) -> (w, b)`

`fmap f (w, x) = (w, f x)`

Finally, we need to do

`join`

:

1`join :: (w, (w, a)) -> (w, a)`

We have *two* values of type

`w`

. We could just throw one away, but, as a rule, losing information unnecessarily is bad. So, instead, we will use the other part of

`Monoid`

:

1`mappend :: Monoid w => w -> w -> w`

It's just an arbitrary way to combine two values of the type into a third. We can use it to turn the two levels of

`w`

in

`(w, (w, a))`

into one:

1`join (w_1, (w_2, x)) = (mappend w_1 w_2, x)`

So, as long as

`w`

is a

`Monoid`

,

`(w, a)`

is a monad. But what does this mean?

It lets us string an extra channel of output through our whole computation, combining it using

`mappend`

at every step. It's often called the "writer monad" because we can "write" to this extra channel of output at every step. To be useful, we have to actually have an extra function for writing:

1 2`tell :: Monoid w => w -> (w, ())`

`tell output = (output, ())`

This lets us inject values into the output stream. This newly added output will be

`mappend`

ed onto the rest of the output from our computation.

This is useful for purely functional logging. If we have a bunch of function

`f`

,

`g`

,

`h`

that all want to log some

`String`

s in addition to returning something, we can write them by using

`tell ["Message"]`

and then automatically pipe all the strings through using the same code we've already seen a few times before:

1`x >>= f' >>= g' >>= h'`

For lists (ie

`[String]`

in this example),

`mappend`

is just

`++`

, so this expression becomes:

1`(log ++ logF ++ logG ++ logH, (h (g (f x))))`

(Where

`f`

,

`g`

and

`h`

are the function parts of

`f'`

,

`g'`

and

`h'`

that only do the actual computation and not the logging.)

It's pretty neat how we assembled the

`log`

in parallel to actually applying the base functions. And thanks to laziness, we will only evaluate as much of the log as we use: we aren't wasting too many resources on making a log we will never use!

State

So, we've seen how to compose functions thatreadand functions thatwrite. Can we do both at once? Why, that would be a function that can both readandwrite! Sounds like mutable state.This is, in fact, exactly what the

`State`

type is: it's a combination of both reader and writer using the same type for both:

1`type State s a = s -> (s, a)`

Conceptually, a value of type

`s -> (s, a)`

is an

`a`

that can depend on and/or modify a value of type

`s`

.

Since we're always going to have a value of type

`s`

passed in—that's the

`s ->`

part of the type—we don't need the

`Monoid`

constraint any more.

With that in mind, here are the monadic functions which are largely the combinations of their reader and writer versions. Note how

`return`

and

`fmap`

don't modify the state at all; only

`join`

can affect it.

1 2 3 4 5 6 7 8 9`return :: a -> (s -> (s, a))`

`return x = \ s -> (s, x)`

`fmap :: (a -> b) -> (s -> (s, a)) -> (s -> (s, b))`

`fmap f x = \ s -> let (s', a) = x s in (s', f a)`

`-- remember that the stateful value is a function itself!`

`join :: (s -> (s, (s -> (s, a)))) -> (s -> (s, a))`

`join x = \ s -> let (s', x') = x s in x' s'`

`join`

is rather confusing, so take the time to work out what it does on paper. Basically, we start with a nested state function that depends on

`s`

twice. To turn this into a single level of state dependency, we have to take a value of type`s`

and string it through

bothlevels: that's all the`join`

code is doing.

We also need some primitive ways to access and modify the state, similar to

`tell`

. It's easiest to think about two of them:

`get`

to read in the current state and

`set`

to change it:

1 2`get :: (s -> (s, s))`

`get = \ s -> (s, s)`

Note how

`get`

doesn't change the state at all. It just takes the state and moves it to the value "channel" as well, exposing it to the functions in the computation.

1 2`set :: s -> (s -> (s, ()))`

`set newS = \ s -> (newS, ())`

`set`

takes the new value of the state and creates a new stateful value. This value

ignoresthe state that's passed into it, replacing it with the new state. We also don't have a meaningful result for this, so we just put a`()`

into the result channel.

We can combine

`get`

and

`set`

into some useful functions like

`modify`

:

1 2 3 4`modify :: (s -> s) -> (s -> (s, ()))`

`modify f = get >>= set . f`

`-- or`

`modify f = \ s -> (f s, ())`

The state monad lets us compose

statefulfunctions, and manages passing the state through each one automatically. Hopefully a pattern is now emerging: monads give us new ways of composing functions, usually with more structure.

Procedures

Now we're going to talk about the only monad here that has any magic:`IO`

. The idea is that

`IO a`

represents a procedure that, when run, produces a value of type

`a`

. The procedure has access to the runtime system, so it can do "extra-language" things like talk to the operating system, get input, print output and so on: all these are impossible to define in terms of

justpure Haskell.Since

`IO`

is an abstract type, we don't know—and

don't care—about how the monadic functions are implemented. Instead, I'll just talk about what theydo.`return :: a -> IO a`

creates the empty procedure that just does nothing except return the given value. This is essentially where its name comes from, coincidentally.

`fmap :: (a -> b) -> IO a -> IO b`

takes a procedure that gives us an

`IO a`

and creates a new one that first runs the

`IO a`

then passes its result into the function to get a

`b`

. Since this whole thing is still a procedure itself, it's an

`IO b`

: the

`b`

never "escapes" into normal code.

Finally, we have

`join :: IO (IO a) -> IO a`

. This is a slightly odd way of

sequencingprocedures: we take in a procedure that returns a procedure and create one that runs both of them.Honestly,

`join`

does not make too much sense for

`IO`

. But

`>>=`

does: it's a way to apply procedures, as if they were functions! This lets us write complicated programs by combining these procedures in a systematic way to produce a single big procedure.

In fact, this is how the entire Haskell program actually gets run.

`main :: IO a`

is just the big procedure; when you run a Haskell executable, the runtime system executes

`main`

, which likely involves both executing

`IO`

procedures

inside`main`

and evaluating normal Haskell expressions.

Just like the writer and state monads,

`IO`

needs some primitive values like

`tell`

,

`get`

and

`set`

. But, unlike the earlier examples, it doesn't have one primitive value or two primitive values: it has hundreds. Every system call, every runtime functions are all primitive

`IO`

values.

`getLine`

,

`getChar`

,

… The monad machinery just lets us combine these primitive operations in different ways as well as letting us glue them together with normal (ie pure) Haskell code.

Do-notation

One thing I haven't really mentioned is do-notation, which is some syntax sugar Haskell provides to make code using monads look more like an imperative program. It actually works like the list comprehension, but in reverse:

1 2 3`do x <- xs`

`y <- ys`

`return (f x y)`

becomes

`xs >>= \ x -> ys >>= \ y -> return (f x y)`

It also allows you to ignore the value of an expression:

1 2`do something`

`somethingElse`

is the same as

1`something >>= \ _ -> somethingElse`

This turns every monad into an

imperative-looking DSL. Once you understand the stuff I explained above, I think do-notation is quite easy to deal with.

Conclusion

There are actually a whole bunch more monads I haven't talked about. We can use them for parsing, for logic programming, for mutable references, for`callCC`

… Lots of interesting things.

Moreover, we can actually

combinemonads. This gives rise to "monad transformers". For example, we can layer`Maybe`

onto another monad by wrapping the inner monad's functions in checks for

`Nothing`

.

Ultimately, just remember that a monad is a

type with several functions defined on it. This type gives us anew way to compose and apply functionswith custom behavior during the application/composition "step".I hope this gives you a good understanding of monads in practice. I know the answer's a bit long, but I've thought a lot about how to explain this idea. I just hope it wasn't too much! I think this is the longest Quora answer I've ever written, by a fair margin.