What are monads and why are they useful?

Answer by Tikhon Jelvis:

There are a lot of monad tutorials out there. Too many, probably. I'm not terribly happy with most of the explanations though, so I'll throw my hat in too!

The core problem, as I see it, is that the idea of a monad is very abstract. Trying to pin it down to a single concrete idea, a single analogy is doomed to fail.

I've been thinking about how to explain abstract concepts like this for a while. To me, a good explanation has to have two parts: examples and an actual definition. Far too often, I've found people leave out one or the other, but I need both to understand! If you just give me an abstract mathematical definition without context, I won't actually understand it, even if I can understand the parts. And if you only give me a bunch of examples but forget to explicitly define the idea, I might understand how it behaves, but I won't have any idea of what it actually is.

So a good explanation has to have these two parts.  The remaining question, then, is how to order them. Examples first, to motivate a definition? Start with the definition and "flesh it out" with examples? Somehow interleave the two?

I've found that for me, the best approach is to open with a definition and expand on it. I won't understand the definition immediately, but I'll have it in the back of my mind for each example, and I'll refer back to it periodically.

If you like the other order more, take a look at one of the best monad tutorials around: You Could Have Invented Monads! (And Maybe You Already Have.)

So I'll introduce monads in two parts: first, I'll tell you what a monad is; next, I'll tell you why we care; finally, I'll wak through a few concrete examples of monads.

What

So what is a monad, at least in functional programming? Fundamentally, it's just a type

m

that has three particular functions defined on it. Note that

m

has to accept a type argument itself: you'd always use it as

m Int

,

m String

or

m a

. Then it just has to have some functions defined for it:

1
2
3
return :: a -> m a
fmap :: (a -> b) -> m a -> m b
join :: m (m a) -> m a

These functions also have to follow some laws. I usually don't think about these directly—it's enough to know that they relate to each other "reasonably"—so I'll talk about them later.

The most important part is

join

, which captures a notion of "flattening" a structure. You go from a nested

m (m a)

to a single

m a

: you flatten out a level. If you want a single idea to hang onto, this is it: monads let you flatten them.

As I alluded earlier, the definition probably seems dry and arbitrary right now. Don't worry about it. Just look back here as you're reading the rest of my post. Most of all, remember: a monad is just a type with some functions. That's all.

Why

So why do we care about these particular functions? What's so special about them? How do they flow together? Why is the notion of "flattening" so important?

The first two functions,

return

and

fmap

both serve a simple role: they allow us to transition from normal code

a

to code that uses

m

.

return

wraps a normal value, lifting it into the monad;

fmap

lifts normal functions to operate over these wrapped values. This becomes clearer when you add an extra set of parentheses to its type signature:

1
fmap :: (a -> b) -> (m a -> m b)

But what about flattening? That one's a bit less obvious.

Composition
Perhaps the single most important idea in functional programming (or really all programming ever) is composition. And this is exactly what monads help with! In particular, the three functions let us use

m

as a customizable way to compose parts of our program, more versatile than just function composition.

Normally, we just compose normal functions: given

a -> b

and

b -> c

, we can first apply one then the other to get

a -> c

. Simple but surprisingly useful.

If a type

m

is a monad it means it gives us a new way to compose functions. It lets us compose functions of the form

a -> m b

: given

a -> m b

and

b -> m c

, we can get

a -> m c

. Note how the

m

in the middle gets swallowed up: this is the main difference with this form of composition.

Since we have an instance of our

m

in the middle, we can insert different behavior as things get composed. This means that, for every different monad

m

, we get a new regime of composition.

And this is exactly what

join

enables!

join

specifies a bit of computation that lets us "get rid" of an extra layer of

m

, and this is exactly what happens inside the composition. To compose two monadic functions like this, we start with

fmap

:

1
f >=> g = \ x -> fmap g (f x)

Try following allong in GHCi, checking types as you go along.

Remember that

f :: (a -> m b)

and

g :: b -> m c

, which means that the type of the function above is:

1
(>=>) :: (a -> m b) -> (b -> m c) -> (a -> m (m c))

We start with an

a

, pass it into

f

to get an

m b

and map over that to get an

m (m c)

. And how do we go from this to the actual type we want? We need to flatten it!

So we make the whole composition work properly and not have an extra

m

by throwing in a call to

join

:

1
f >=> g = \ x -> join (fmap g (f x))

So: a monad gives us a new kind of "join-enabled composition", letting us customize what happens when things get composed by defining an appropriate

join

function.

We can now view

a -> m b

as a special kind of function between

a

and

b

which just gets composed in a different way.

Bind
So, we have a new idea of composition:

>=>

. But we can actually have a new idea of function application too! This is called bind and is written as

>>=

. It's very useful for programming, but I find it harder to think about than

join

and

>=>

.

What does this new sort of application mean? Well, it can't just be applying

a -> m b

to a normal

a

because we can already do that. Instead, it's applying

a -> m b

to an

m a

:

1
(>>=) :: m a -> (a -> m b) -> m b

In spirit, it's very similar to

>=>

: our new notion of application also "swallows" an intermediate

m

, so we can customize how it works.

In fact, the definition is just like

>=>

but not wrapped in an extra lambda:

1
x >>= f = join (fmap f x)

This operator will come up in some of the practical examples.

Laws
Now, just before jumping into the examples, lets cover the monad laws. If you can't remember the details, don't worry: most of the time, how the laws need to apply in any given case is pretty intuitive.

These laws are most easily described in terms of

>=>

:

1
2
3
return >=> f = f
f >=> return = f
f >=> (g >=> h) = (f >=> g) >=> h

In other words,

return

is both a right and a left identity for

>=>

and

>=>

is associative.

Hey, these laws look very similar to how

id

and

.

(normal function composition) behave!'

1
2
3
id . f = f
f . id = f
f . (g . h) = (f . g) . h

This is just more evidence that monad provide a special kind of composition!

Examples

Now that we have the main idea, lets go on to see how it's used in practice with some examples!

Maybe
Perhaps the simplest meaningful monad is

Maybe

, defined as follows:

1
2
data Maybe a = Just a
                      | Nothing

Maybe

wraps a normal type and also allows it to be null (ie

Nothing

): it's Haskell's option type.

How do we make it a monad? Well, we know the three functions we need, so lets just implement those. We know that

return

just "wraps" a normal value in

Maybe

; the most natural way to do this is just

Just

(heh):

1
2
return :: a -> Maybe a
return x = Just x

What about

fmap

? How do we apply a function to a

Maybe

value? Well, if we have a

Just

, we can just apply the function to its contents. But if we have

Nothing

, we don't have anything to apply the function to! We just have to return

Nothing

again:

1
2
3
fmap :: (a -> b) -> Maybe a -> Maybe b
fmap f (Just x) = Just (f x)
fmap f Nothing = Nothing
join

is the final function. Here, we just have to consider all the possible cases and implement them in the most natural way:

1
2
3
4
join :: Maybe (Maybe a) -> Maybe a
join (Just (Just a)) = Just a
join (Just Nothing) = Nothing
join Nothing           = Nothing
Maybe

is a nice example because, at every step, there was only one "reasonable" thing to do. In this case, "reasonable" meant not needlessly throwing away real values. As long as we keep values whenever we can, we only have one possible implementation of the monad functions.

The next question to ask is "what does

a -> Maybe b

mean? It's a function that takes an

a

and may or may not return a

b

. It's a function that can fail.

How do we compose two functions that can fail? Well, if they both succeed, they're just like normal functions. And if any one of them fails, if we ever get a

Nothing

well, we have no choice but to return

Nothing

for the whole thing.

So

Maybe

gives us a type of composition and application for functions that can fail but propagating the failure through for you. It's very useful for avoiding deeply nested

case

expressions. We can transform

1
2
3
4
5
6
7
8
9
case f x of
  Nothing  -> Nothing
  Just res ->
    case g res of
      Nothing  -> Nothing
      Just res' ->
        case h res' of
          Nothing -> Nothing
          Just res'' -> ...

into a much nicer

1
x >>= f >>= g >>= h

Maybe

abstracts over function application and composition that automatically pipes possible

Nothing

s through the whole computation, saving us from doing it manually and making the code much less noisy. It saves us from the equivalent of

1
2
3
4
5
if (x != null) {
  return ...;
} else {
  return null;
}

Either

Maybe

is great, but sometimes we want to fail in multiple different ways. This is where

Either

comes in: it's like

Maybe

but with the

Nothing

case annotated with an extra argument:

1
2
data Either err a = Left err
                  | Right a

It uses the generic names

Left

and

Right

to show it isn't just about error handling, but we'll pretend it is.

It actually forms a monad almost exactly like

Maybe

. If anything, it's easier: since we can't come up with a value of type

err

from thin air, we can only really implement these functions one way:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
return :: a -> Either err a
return x = Right x

fmap :: (a -> b) -> Either err a -> Either err b
fmap f (Right x)  = Right (f x)
fmap f (Left err) = Left err

join :: Either err (Either err a) -> Either err a
join (Right (Right x))  = Right x
join (Right (Left err)) = Left err
join (Left err)         = Left err

In fact,

Maybe

can be seen as a special case of

Either

where the

Left

case doesn't carry extra information:

1
type Maybe a = Either () a
Either

lets us abstract over error checking via return types. We can avoid the

Go

error-handling pattern:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
res, err = f(x)
if err != nil {
  return nil
}

res2, err = g(res)
if err != nil {
  return nil
}

res3, err = h(res)
if err != nil {
  return nil
}

...

The moral equivalent of the above in Haskell could just be written as

1
x >>= f >>= g >>= h

even though they're both using return types to manage errors!

Either

abstracts over all the plumbing for us.

List
So far, we've seen how monads let us compose functions that might fail. What else can we do?

Well, a simple one is the list type which is written as

[a]

in Haskell. We're going to approach lists with the same heuristic as for

Maybe

and

Either

: never throw away data unnecessarily. We also don't want to copy or rearrange data unnecessarily.

With these conditions,

return

is pretty easy:

1
2
return :: a -> [a]
return x = [x]
fmap

, as the name implies, is just

map

1
2
3
4
5
fmap :: (a -> b) -> [a] -> [b]
fmap = map
-- or
fmap f []     = []
fmap f (x:xs) = f x : fmap f xs

Finally, we have

join

which needs to flatten a list. Here's the most reasonable definition:

1
2
3
join :: [[a]] -> [a]
join [] = []
join (ls:rest) = ls ++ join rest

Basically, we just take our list of lists and concatenate each of its items with

++

.

Now our next question: what does

a -> [b]

mean? It's a function that can return any number of results. In math, this is often called a nondeterministic function.

What does it mean to compose non-deterministic functions like this? Well, to compose

f :: a -> [b]

with

g :: b -> [c]

, we first pass in a value into

f

to get a bunch of

b

s, then we pass each

b

into

g

to get a whole bunch of

c

s and finally we return all of those

c

s. This is exactly what

>=>

for lists does.

So the list monad gives us clean composition of functions that have any number of results. One cool thing to note is that this corresponds very closely to list comprehensions! In fact, we can rewrite a list comprehension in terms of

return

and

>>=

pretty easily:

1
[f a b | a <- as, b <- bs]

becomes

1
2
3
bs >>= \ b ->
as >>= \ a ->
return (f a b)

As you can see, this transformation is actually not list-specific at all. In fact, you can do this for any monad at all. Haskell's MonadComprehensions extension actually allows this: you can use list comprehension syntax for any monad at all, which is sometimes quite nice.

Reader
Now lets look at a much trickier one. This is the function monad in the form of

r -> a

for some fixed

r

. (Just like

Either err a

had a fixed

err

.)

What does

return

mean for this? Let's look at the type we want:

1
return :: a -> (r -> a)

We get a value and want to turn it into a function from

r

. Since we don't know anything particular about

r

, we can't do anything with that argument but ignore it:

1
return x = (\ r -> x)

How about

fmap

? Figuring it out on your own is actually a good exercise. Try it on your own before continuing.

Again, we want to look at the type:

1
fmap :: (a -> b) -> (r -> a) -> (r -> b)

Hey, that type looks a familiar! It's just like function composition:

1
(.) :: (b -> c) -> (a -> b) -> (a -> c)

And, in fact, that's exactly what

fmap

is for

(r -> a)

:

1
2
3
fmap = (.)
-- or
fmap f g = \ x -> f (g x)

Finally, we need

join

. Again, the type is going to be very helpful:

1
join :: (r -> (r -> a)) -> (r -> a)

We take in a function

f

with two arguments and need to use it to produce a function of one argument. Since we can't magically produce a value of type

r

, there is actually only one reasonable implementation of

join

:

1
join f = \ x -> f x x

So now we see that, indeed,

(r -> a)

is a monad. But what does this mean? We can think of value of type

(r -> a)

as values of type

a

in the context of

r

. That is, they're normal values of

a

that can also depend on a value of

r

. They can "read" from the environment, which is why

(r -> a)

is often called the "reader monad".

The reader monad allows us to pipe this environment through a whole bunch of values and functions that all depend on it. The result of

x >>= f >>= g >>= h

lets us pass in a single value of

r

that is first given to

x

then passed into the result of

f

then the into the result of

g

and so on.

Writer
Another type to look at is

(w, a)

for a fixed

w

. This might seem a little weird, but as we'll see it naturally forms a monad and can actually be pretty useful!

So how do we do

return

? Immediately, we run into a problem: we would need to manufacture a value of type

w

, but we can't. So, in fact, we need to add a constraint to

w

: it has to be a type that has a special value of some sort to use with

return

. This is provided by the

Monoid

class which has

mempty

:

1
mempty :: Monoid w => w

We'll just use this; as we'll see later, the other part of the monoid will also come in useful.

Given a free

w

value,

return

is straightforward:

1
2
return :: Monoid w => a -> (w, a)
return x = (mempty, x)

Now

fmap

, which we can only really do in one way:

1
2
fmap :: (a -> b) -> (w, a) -> (w, b)
fmap f (w, x) = (w, f x)

Finally, we need to do

join

:

1
join :: (w, (w, a)) -> (w, a)

We have *two* values of type

w

. We could just throw one away, but, as a rule, losing information unnecessarily is bad. So, instead, we will use the other part of

Monoid

:

1
mappend :: Monoid w => w -> w -> w

It's just an arbitrary way to combine two values of the type into a third. We can use it to turn the two levels of

w

in

(w, (w, a))

into one:

1
join (w_1, (w_2, x)) = (mappend w_1 w_2, x)

So, as long as

w

is a

Monoid

,

(w, a)

is a monad. But what does this mean?

It lets us string an extra channel of output through our whole computation, combining it using

mappend

at every step. It's often called the "writer monad" because we can "write" to this extra channel of output at every step. To be useful, we have to actually have an extra function for writing:

1
2
tell :: Monoid w => w -> (w, ())
tell output = (output, ())

This lets us inject values into the output stream. This newly added output will be

mappend

ed onto the rest of the output from our computation.

This is useful for purely functional logging. If we have a bunch of function

f

,

g

,

h

that all want to log some

String

s in addition to returning something, we can write them by using

tell ["Message"]

and then automatically pipe all the strings through using the same code we've already seen a few times before:

1
x >>= f' >>= g' >>= h'

For lists (ie

[String]

in this example),

mappend

is just

++

, so this expression becomes:

1
(log ++ logF ++ logG ++ logH, (h (g (f x))))

(Where

f

,

g

and

h

are the function parts of

f'

,

g'

and

h'

that only do the actual computation and not the logging.)

It's pretty neat how we assembled the

log

in parallel to actually applying the base functions. And thanks to laziness, we will only evaluate as much of the log as we use: we aren't wasting too many resources on making a log we will never use!

State
So, we've seen how to compose functions that read and functions that write. Can we do both at once? Why, that would be a function that can both read and write! Sounds like mutable state.

This is, in fact, exactly what the

State

type is: it's a combination of both reader and writer using the same type for both:

1
type State s a = s -> (s, a)

Conceptually, a value of type

s -> (s, a)

is an

a

that can depend on and/or modify a value of type

s

.

Since we're always going to have a value of type

s

passed in—that's the

s ->

part of the type—we don't need the

Monoid

constraint any more.

With that in mind, here are the monadic functions which are largely the combinations of their reader and writer versions. Note how

return

and

fmap

don't modify the state at all; only

join

can affect it.

1
2
3
4
5
6
7
8
9
return :: a -> (s -> (s, a))
return x = \ s -> (s, x)

fmap :: (a -> b) -> (s -> (s, a)) -> (s -> (s, b))
fmap f x = \ s -> let (s', a) = x s in (s', f a)

-- remember that the stateful value is a function itself!
join :: (s -> (s, (s -> (s, a)))) -> (s -> (s, a))
join x = \ s -> let (s', x') = x s in x' s'
join

is rather confusing, so take the time to work out what it does on paper. Basically, we start with a nested state function that depends on

s

twice. To turn this into a single level of state dependency, we have to take a value of type

s

and string it through both levels: that's all the

join

code is doing.

We also need some primitive ways to access and modify the state, similar to

tell

. It's easiest to think about two of them:

get

to read in the current state and

set

to change it:

1
2
get :: (s -> (s, s))
get = \ s -> (s, s)

Note how

get

doesn't change the state at all. It just takes the state and moves it to the value "channel" as well, exposing it to the functions in the computation.

1
2
set :: s -> (s -> (s, ()))
set newS = \ s -> (newS, ())

set

takes the new value of the state and creates a new stateful value. This value ignores the state that's passed into it, replacing it with the new state. We also don't have a meaningful result for this, so we just put a

()

into the result channel.

We can combine

get

and

set

into some useful functions like

modify

:

1
2
3
4
modify :: (s -> s) -> (s -> (s, ()))
modify f = get >>= set . f
-- or
modify f = \ s -> (f s, ())

The state monad lets us compose stateful functions, and manages passing the state through each one automatically. Hopefully a pattern is now emerging: monads give us new ways of composing functions, usually with more structure.

Procedures
Now we're going to talk about the only monad here that has any magic:

IO

. The idea is that

IO a

represents a procedure that, when run, produces a value of type

a

. The procedure has access to the runtime system, so it can do "extra-language" things like talk to the operating system, get input, print output and so on: all these are impossible to define in terms of just pure Haskell.

Since

IO

is an abstract type, we don't know—and don't care—about how the monadic functions are implemented. Instead, I'll just talk about what they do.

return :: a -> IO a

creates the empty procedure that just does nothing except return the given value. This is essentially where its name comes from, coincidentally.

fmap :: (a -> b) -> IO a -> IO b

takes a procedure that gives us an

IO a

and creates a new one that first runs the

IO a

then passes its result into the function to get a

b

. Since this whole thing is still a procedure itself, it's an

IO b

: the

b

never "escapes" into normal code.

Finally, we have

join :: IO (IO a) -> IO a

. This is a slightly odd way of sequencing procedures: we take in a procedure that returns a procedure and create one that runs both of them.

Honestly,

join

does not make too much sense for

IO

. But

>>=

does: it's a way to apply procedures, as if they were functions! This lets us write complicated programs by combining these procedures in a systematic way to produce a single big procedure.

In fact, this is how the entire Haskell program actually gets run.

main :: IO a

is just the big procedure; when you run a Haskell executable, the runtime system executes

main

, which likely involves both executing

IO

procedures inside

main

and evaluating normal Haskell expressions.

Just like the writer and state monads,

IO

needs some primitive values like

tell

,

get

and

set

. But, unlike the earlier examples, it doesn't have one primitive value or two primitive values: it has hundreds. Every system call, every runtime functions are all primitive

IO

values.

getLine

,

getChar

,

print

… The monad machinery just lets us combine these primitive operations in different ways as well as letting us glue them together with normal (ie pure) Haskell code.

Do-notation
One thing I haven't really mentioned is do-notation, which is some syntax sugar Haskell provides to make code using monads look more like an imperative program. It actually works like the list comprehension, but in reverse:

1
2
3
do x <- xs
    y <- ys
   return (f x y)

becomes

xs >>= \ x -> ys >>= \ y -> return (f x y)

It also allows you to ignore the value of an expression:

1
2
do something
   somethingElse

is the same as

1
something >>= \ _ -> somethingElse

This turns every monad into an imperative-looking DSL. Once you understand the stuff I explained above, I think do-notation is quite easy to deal with.

Conclusion
There are actually a whole bunch more monads I haven't talked about. We can use them for parsing, for logic programming, for mutable references, for

callCC

… Lots of interesting things.

Moreover, we can actually combine monads. This gives rise to "monad transformers". For example, we can layer

Maybe

onto another monad by wrapping the inner monad's functions in checks for

Nothing

.

Ultimately, just remember that a monad is a type with several functions defined on it. This type gives us a new way to compose and apply functions with custom behavior during the application/composition "step".

I hope this gives you a good understanding of monads in practice. I know the answer's a bit long, but I've thought a lot about how to explain this idea. I just hope it wasn't too much! I think this is the longest Quora answer I've ever written, by a fair margin :).

View Answer on Quora

Why is 0.999… equal to 1?

Answer by Sridhar Ramesh:

One must distinguish between notation and what that notation represents. Different notation can represent the same entity, as in, for example, the equality of "1/3" and "2/6": they are not equal as notation, but the fractions they denote are equal.

Now, does "0.9999…" denote the same thing as "1"?

Well… first off, a disclaimer: of course, one could invent an interpretation of this notation on which they denoted different things, just as one could invent an interpretation of notation on which "1/3" and "2/6" denoted different things (for example, they denote different dates…). But I'm not going to talk about that sort of thing right now. Instead, I'm going to talk about the standard, conventional interpretation of infinite decimal notation, the one that mathematicians mean when they use this notation, and the one which justifies the claim that "0.9999…" denotes 1.

When a mathematician gives an infinite decimal as notation for a number, what they mean by it is this: the* number which is >= the rounding downs of the infinite decimal at each decimal place, and <= the rounding ups of the infinite decimal at each decimal place. This is the definition of what infinite decimal notation means; it's true because we say it is, just as the three letter word "dog" refers to a particular variety of four-legged animal because we say it does.

So, for example, when a mathematician says "0.166666…", what they mean, by definition, is "The number which is >= 0, and also >= 0.1, and also >= 0.16, and also >= 0.166, and so on, AND also <= 1, and also <= 0.2, and also <= 0.17, and also <= 0.167, and so on." What number satisfies all these properties? 1/6 satisfies all these properties. Thus, when a mathematician says "0.16666…", what they mean, by this definition, is 1/6.

Similarly, when a mathematician says "0.9999…", what they mean, by that same definition, is "The number which is >= 0, and also >= 0.9, and also >= 0.99, and also >= 0.999, and so on, AND also <= 1, and also <= 1.0, and also <= 1.00, and also <= 1.000, and so on." What number satisfies all these properties? 1 satisfies all these properties. Thus, when a mathematician says "0.9999…", what they mean, by definition, is 1.

[*: Of course, when one says a thing like "THE number which is...", this may be taken to involve an implicit claim that there is a unique such number. So when mathematicians use infinite decimal notation, they also generally have a very particular number-system in mind in which these uniqueness claims are all justified. But, there are many other number-systems (just as useful ones, or even more useful ones, for many purposes; the world is diverse and our mathematical analyses needn't be shoehorned into "one size fits all" form) in which there may be no number or many different numbers satisfying such systems of constraints; in such contexts, infinite decimal notation is generally less useful as a way to denote numbers, though it can still be used in essentially the same way to denote certain intervals instead.]

View Answer on Quora

How does a computer program convert different types of data like images or text files into binary data?

Answer by Alon Amit:

Whatever thing you wish to represent in a computer, you need to find a way of converting it into numbers. This conversion process is sometimes completely faithful, meaning you can recover the original object precisely from the numbers, or it can be an approximation. In the latter case, the digital representation of your original object is incomplete in some ways, and the trick is to make it close enough in the areas that matter, meaning close enough so that under ordinary circumstances, we can hardly tell the difference, or not at all.

Text

Text files are a simple example of an object that can generally be represented faithfully. A text file is just a sequence of letters in some language and other characters (spaces, punctuation marks, maybe a few special characters). The first order of business is to agree, once and for all, on a numerical representation for those characters – what number we use to represent 'A', what represents 'j', what is the number for space ' ' and so on.

One of the most common such schemes is called ASCII. This is just a simple table that lists 256 more-or-less useful characters including the English alphabet, the digits 0-9, symbols like '@' or '=' and so on. Most text files actually utilize a good deal less than 256 different symbols, and ASCII is really mostly used for the values between 0 and 127.

For example, in ASCII, 'A' is 65, 'B' is 66, 'C' is 67 and so on. The lower case letters start at 'a' (97) and end at 'z', 122. The digits 0-9 span the numbers 48 through 57. Space is 32. A line break
like
these
is actually represented by two symbols in ASCII, one called "line feed" or LF (10) and one called "carriage return" CR (13). This is a carry-over from old typewriter systems and is a well-known nuisance when dealing with text files; some systems insist on having a CR/LF combo at any line break, some don't, and hilarity ensues.

Anyway, if you have a text file and you wish to encode it in binary data, you first scan it from beginning to end, converting each character to its ASCII code. Now you have a sequence of numbers; each such number takes no more than 3 decimal digits to write down (like 122), and if you write it in base 2 instead of base 10 (which is what "binary" means) you need at most 8 digits (called "bits"). Thus every character in a text file requires 8 bits. Computer people like uniformity, so all numbers are represented using all 8 bits, even those which could be written with less. For example, CR is 13 which in binary is 1101 (eight plus four plus (skip the twos column, so zero) plus one), but when storing a text file we would store this character as 00001101. This is just like we had used 013 instead of 13 for the decimal representation. The advantage is that you don't need to have any sort of separator between numbers: every 8 bits is a number, and then comes the next one.

A short piece of text like 'Quora' becomes the sequence 81, 117, 111, 114, 97 which in binary is 0101000101110101011011110111001001100001. So here's a binary encoding of a tiny text file.

Of course, once you enlarge the scope of "text files" to cover things with a higher variety of characters, letter sizes, tables and stuff, you'll need more elaborate representation schemes. Let's stop here for now and move on to more exciting objects.

Images

Images begin as physical objects in our physical world: patterns of color and light hitting our retinae. The first order of business is to capture those patterns somehow, which is what cameras do. Older, "analog" cameras capture the light and imprint it on various kinds of film; newer, "digital" cameras employ A/D converters in the body of the camera to transform the real-life color signal into numbers.

The way this happens is, roughly, this. Imagine your field of view is divided into a fine grid of little squares.

Every tiny square on the grid has a color which is more or less uniform across the entire square. The tinier the squares, the more accurate this is. If the squares are large, you may see a shift from dark to light or from red to lighter red inside of a square, so if anyone asks you "what is the color in that square" you'd be hard pressed to give a definite answer. But if the grid is very very fine, most of the time a square will be close enough to having just one single color; in fact, if you replace the real image with one where each square has precisely that one color, a person won't be able to tell the difference.

This apple isn't really an apple: It's just an array of 256 rows and 256 columns of little squares, and each square has a specific, uniform color. Can you see the little squares? Not really, but if we used a much coarser grid, we would have gotten something like this:

This looks a lot less like an apple and a lot more like an array of squares. We call those squares "pixels", for "picture elements".

Ok. So now we have lots of pixels and each pixel has a color. We need to represent each pixel as a number (or a few numbers), and then we can store those numbers as bits just as we did before.

There are various ways of doing that. A common way uses a color scheme relying on Red, Green and Blue, and measures how much of each are in each square (this is done with color filters, following which the intensity of the light is captured with a sensor). Each color is measured on a scale of 0 to 255, say (which is 8 bits), so you get 24 bits in all for each pixel. Once you've done this, you have an array of 24-bit numbers instead of an array of squares. You arrange those numbers in sequence, add some extra information to explain how the file is structured (for instance, how many rows and how many columns it has), and that's it.

The process of converting the original image into numbers can be seen as a sequence of "sampling" or "making something discrete". ("Discrete" means that it has a definite number of possible values, instead of a continuum of infinitely many). We divided the image both horizontally and vertically into strips and pixels, and then we divided "color space" into a finite number of possible values. This process of sampling is what lies behind most analog-to-digital conversion schemes.

In practice, most image file formats employ an additional step called compression. The reasons is this: the relatively small apple image we started with has 256 x 256 = 65,536 pixels. Each such pixel needs 24 bits, so just this apple would require 1,572,864 bits. That's quite a lot, if you think about the number of photos you have on your computer or Facebook account. It therefore behooves us to find ways of using less bits per image, and this is achieved via compression. JPEG, GIF and PNG files utilize various such compression schemes. That's a whole other can of worms which we should save for a separate answer.

Audio and Music

Our audio perception is based on sensing changes in air pressure inside of our ears. We have two ears, so we hear things in "stereo"; the main issue here is how to represent what we hear in one ear (a "mono" file) and then we can take care of our two ears just by using two such representations.

An audio signal (in one ear) is, therefore, merely "air pressure as a function of time". A microphone converts those changing pressures into an analog electric signal, and we now need to convert those analog signals into – as always – a sequence of numbers. We need, again, to sample.

A good way to visualize the process is like this:

We have a continuously-varying signal (time pressure, or electric current), represented here as a graph with time flowing to the right and magnitude going up. We now divvy up time into a finite but dense sequence of sample points, and at each such point we take a reading of the magnitude and store its value – approximately – as a number.

The rapidness in which we sample is called sampling frequency. A typical sampling frequency for audio signals is 48,000 samples per second (44,100 was the standard for audio CDs). Why 48,000 and not 100 or 1,000,000? Well, humans hear sound frequencies up to 20kHz, which is when the air pressure vibrates 20,000 times per second (most people are less sensitive, but it's good to be safe). It turns out that if you wish to catch something that goes up and down X times per second, you had better sample it at 2X times per second. This is kind of intuitive: if you only sampled at X times per second, you'd always catch the signal at its peak or trough and you wouldn't even notice it's oscillating. Mathematically this is known as the Nyquist–Shannon theorem.

So, we sample 48,000 times per second. The vertical range of magnitudes is sampled into 256 levels (8 bit) or 65536 levels (16 bit) or, most accurately, about 16 million levels (24 bit). For each sample point we now have a number, and the whole audio signal is no more than a sequence of numbers. Take two sequences for a stereo signal (or more for a spatial signal, like we sometimes do in home theater systems), convert them to binary, add metadata to delineate the structure of the file, and you're done.

Once again, compression is often used to make the files more manageable: mp3 files, for example, use a common compression scheme.

Video

The audio track of a movie or YouTube clip is an audio file, which we've covered, so let's focus on the "moving image" part (I'm suppressing for now the messy issue of synchronizing the video with the audio. Trust me, it's messy. Google "drop frame" or "29.97" for the gory details.)

By now you've gotten the hang of sampling, so you can guess what happens next: a video is just an image with an added dimension of time. In the physical world time is continuous, but for the benefit of our computer we need to sample time, much like we did with audio.

So we take our moving images as seen through the video camera, and snap them every once in a while. How frequently? It turns out that in this respect, our eyes are a lot less finicky than our ears. Capturing a still image 20-30 times per second and then playing it back at the same pace yields a fairly convincing illusion of continuous movement. This was known to the early filmmakers, although to save clutter they sometimes opted for even slower rates (The Lumière brothers made do with 16 frames per second).

http://www.youtube.com/watch?v=JGugm8Dzmuc

Modern, popular video formats capture images at 24fps or 30fps (fps = frames per second). Your HDTV does 50 or 60fps, which is quite a bit more than the minimum necessary but helps create even more fluid, crisp images (side note: the fact that multiple standards exist for the frame rate, including 24fps, 25fps, 29.97fps, 30fps, 60fps and others is another source of serious trouble – resampling one of these into another is a terrible mess).

So, the basics are simple: capture lots of images at a sufficiently high rate, convert each image into numbers as covered above, add the necessary metadata and you have your video file.

Video files provide an opportunity to discuss another cool nuance: the file format may leave room for the encoder to make clever decisions. Here's how it works with video files.

Remember how we said that image files may need to get compressed? Well obviously, video provides ample motivation for us to compress even more diligently. Let's do the math for standard definition, 90-minute move:

  • 90 minutes
  • 60 seconds per minute
  • 30 frames per second
  • 480 rows per image
  • 640 pixels per row
  • 3 bytes (24 bits) per pixel

Uncompressed, this would take up 150 gigabytes (GB) of disk space, and that's just standard def; go HD and you're looking at 8 times that (double rows, double columns and double fps). More than a terabyte per movie isn't going to fly.

So we've already mentioned that individual images can be compressed, but for video we can (and need to) do better. It should be clear that most frames in a video are very similar to the ones that came just before them. Every once in a while there'll be a "cut" where the frame changes completely, but most of the time, what you see in front of you is very similar to what you see 1/30 of a second later.

How can we utilize this? One approach which I will describe just roughly (this answer is long enough) is to hunt for blocks inside frame N+1 that are almost identical to those in frame N, only shifted a little bit (think of a panning shot, or a still room with some people walking). Then, frame N is stored in full, but frame N+1 stores just the block locations, the vertical and horizontal shifts of each block, and the in-block changes which are mostly 0's. Chunks of digits that are mostly 0 can be efficiently compressed, so frame N+1 will take up a lot less space than it otherwise would have.

What this means is that the file format needs to specify how those blocks are described, how the shifts are stored, etc., but it doesn't say anything about how to actually compress. A dumb video encoder can simply not use this feature at all, or use it only rarely, and that yields a perfectly valid video file – albeit a very large one. A smarter encoder would work hard to optimize the division into blocks, the correct matching of the next block and the previous block, and the differences, and would create a much smaller file that could be opened by the same video decoder and would look almost exactly the same.

(this image demonstrates how the block-finding algorithm of H.265, aka High Efficiency Video Coding, is more flexible than that of H.264, making it significantly more effective in achieving high image quality at low bandwidth).


Those are the very basics – there are of course lots of variations, lots of details I had omitted and many other types of information we learned to digitize effectively. I'm leaving room for future (or past) questions.

View Answer on Quora

How can I develop my kids’ curiosity?

Answer by Eva Glasrud:

When I'm around kids, I ask them questions all the time. The point is to make them wonder and help them think critically. A lot of adults like to tell kids things. In fact, I'll often ask a child a question, and a nearby adult will answer for/to the child.

But I really think it's better to ask kids questions about everything — even when the original question came from them.

For example:

Child: "How do I draw a dog?"
Adult: "That's a really great question, [child's name]! Where do you think we should start? What's the first part of the dog we should draw? Then what? Want to try it? We can always try again if we mess up."

And if they get it wrong, don't stop them and say, "No, that's wrong. Do this." Let them make mistakes. And then ask them,

Adult: "Uh oh! It looks like we did something wrong. Does any part of the dog look wrong? How can we fix it? What should we do differently next time?"

If you're having fun and it seems appropriate, you can even ask questions like:

Adult: "Great work! You put a lot of thought into your drawing, and it shows! But I wonder if that's the only way to draw a dog. What do you think? Is that the only way? Or might there be other ways?"

This gets them thinking — and teaches them to test, iterate, and try again. It shows them that many problems have more than one solution.

And, just as important, it teaches them to persevere when things don't go right the first time. It teaches them that it's okay to take a risk, and that it sometimes takes a few tries to get it right.

I really can't speak highly enough of how great it is to engage kids through dialogue. Here are some other examples of conversations I've had with kids recently that will help explain why:

While playing outside on a longboard

Adult: "[Child's name], where do you think the skateboard will go faster — on the dirt, or on the sidewalk?
Child: "The dirt!"
Adult: "Why do you think it will go faster on the dirt?"
Child: (says some explanation)
Adult: "That's a very interesting idea. Do you want to try it out to see if you're right?"
(We test it – the child's hypothesis was wrong)
Adult: "So what happened? Where did the skateboard go faster? Why?"

It was so great to set up a little experiment with this child and explore her thoughts with her along the way.

It's also awesome that when you ask kids questions, you often end up thinking differently, too. Kids have really interesting ideas and strange senses of humor. Here's something that happened at a playground recently:

While at the playground

Child: "I want to look for caterpillars!"
Adult: "Catepillars? Cool! Where do you look when you want to find a caterpillar?"
Child: "The air!"

The answer most people would have expected is probably, "The ground!" But in northern California, this happens in the springtime:

(He's hanging from a little piece of caterpillar string.)

So maybe that's what she meant. Or maybe she meant that caterpillars turn into butterflies. The only way see into the child's mind is by asking — not telling.

But often, like I mentioned before, my conversations with kids go more like this:

Child: "I want to look for caterpillars!"
Adult 1: "Catepillars? Cool! Where do you look when you want to find a caterpillar?"
Adult 2: "The ground. Caterpillars live on the ground. Right, [child's name]?"

It's also important to note that when a child is working on something, it's great to give them praise and feedback for their effort. It sends the message that their hard work (rather than their natural ability) pays off. It makes them more entrepreneurial and curious than careful and risk-averse.

You can often combine this effort praise with questions about the work. Try to avoid yes or no questions. Keep them more open. Like this:

Adult: "I like that you are spending lots of time covering the whole paper with paint. Can you tell me about your painting?"

Versus:

Adult: "You're so good at painting. Is that a house?"

I like the first option for three reasons.

1. Effort praise.
2. What if the kid isn't painting a house? You might accidentally embarrass them. There might be a really cool story or feeling about this painting, but you'll never know now, because the child doesn't want to talk about it anymore.
3. Asking them if it's a house limits their answer to a yes or no. Leaving the question open allows them to be more creative in how they answer.

Be there to guide, but not always lead, the discussion.

You should also provide them with opportunities and resources for growth. If your kid loves animals, take them to the zoo. Or even the creek in your local park. See how many different kinds of animals you can find living there. Ask your child about what relationships the different animals might have with each other.

Let them get their clothes dirty. The cognitive and motor skills they develop playing at this hypothetical creek are so much more important than a little bit (or even a lot) of mud.

And let them love what they love. Even if what they love is toilets.

http://www.youtube.com/watch?v=T4i-L8_Ks6c&feature=player_embedded(From Why Enriched Education is Crucial for All Children.)

This kids' parents are awesome. They could tell Dustin, "Ew, no! Toilets are gross! Don't touch them." But instead they did this — and I really wish I saw more of this and less of the nature/animal/germ/possibility of getting hurt-ophobia.

View Answer on Quora

Online Sharing

sharing

#statingTheObvious
I’ve been observing a trend lately, and the geek in me is not very fond of it. A lot of people who share links online, are doing it wrong. We have a (natural) tendency to just click and paste the link we want to share into the box which says “Share”. What some of us completely ignore is that Google provides a specific button to share a hyperlink.

Even Facebook had this system in place for sharing links, but they removed it a few months back. I don’t want Google to go in the same direction.

Of course you can say what difference does it make? Well, it is not pleasing to the eye. Let me show you an example:

The post on the top is definitely more pleasing to the eye than the second one. You have to agree!

Of course I can’t force anyone to do this, but it would be nice if we all follow the same protocol (which is in place because it is good).