Concurrency in the Go Programming Language

This article is based on the second half of a presentation I gave at PyMNtos in December, 2011.

Go sports two features specifically designed to ease concurrent programming: Goroutines and Channels.

What is a Goroutine?

Rob Pike describes goroutines as being “kind of like threads, but lighter-weight.” They’re similar to what other languages might call “green threads” or “fibers,” but have enough distinguishing features to merit their own name.

The basic idea is that your Go program can automatically spawn a handful of native threads or processes, and then automatically allocate goroutines between them at runtime. Any goroutine can execute on any thread or process, and any goroutine can dynamically move from one thread or process to another.

That means that as soon as a thread blocks, the other goroutines on that thread can be seamlessly moved to other, unblocked threads, and continue running.

The language itself provides great syntactic support for kicking off goroutines during the execution of your program. For example, imagine you wanted to build an application that timed how long it took to brew some coffee or tea. It might look like this:

package main

import "time"

func IsReady(what string, seconds int64) {
    time.Sleep(seconds * 1e9) // nanoseconds
    println(what, "is ready!")
}

func main() {
    println("Let's go!")
    IsReady("Coffee", 6)
    IsReady("Tea", 2)
    println("I'm done here.")
}

The program exposes one function, IsReady, which takes in two variables: what it’s waiting for, and how many seconds to wait. After the specified time passes, it prints out a message notifying you that something is ready.

Running this program produces the following output, annotated with the time at which each line gets printed:

$ ./brew1
Let's go!         # 00:00
Coffee is ready!  # 00:06
Tea is ready!     # 00:08
I'm done here     # 00:08

Everything happens synchronously and in order: “Let’s go!” gets printed immediately, then the program pauses for 6 seconds, prints out “Coffee is ready!”, pauses another two seconds, prints out “Tea is ready!”, then prints “I’m done here.” and exits.

But what if we wanted to start brewing the tea and coffee at the same time? We’d need some way to start both calls to IsReady asynchronously. That’s where goroutines come in. Prefixing any expression or function invocation with the keyword go causes it to be executed in its own goroutine, on an aribitrary thread. Let’s try it:

func main() {
    println("Let's go!")
    go IsReady("Coffee", 6)
    go IsReady("Tea", 2)
    println("I'm done here.")
}

Running this yields:

$ ./brew2
Let's go!      # 00:00
I'm done here. # 00:00

Wait! That’s not right! It immediately printed “Let’s go!”, then “I’m done here.” and then exited. What happened to the IsReady calls? Both of those calls fired up a goroutine which began executing in the background. Unforuntately, before those goroutines could do anything useful, execution hit the end of the main function and the program terminated, taking the goroutines with it.

We really want to wait until both goroutines finish before exiting, but we don’t know how to do that yet. For now, just to prove that everything is working, let’s just try waiting a little bit at the end before we exit:

func main() {
    println("Let's go!")
    go IsReady("Coffee", 6)
    go IsReady("Tea", 2)
    println("I'm done here.")
    time.Sleep(7 * 1e9)
}

Which, when run, prints out:

$ ./brew3
Let's go!        # 00:00
I'm done here.   # 00:00
Tea is ready!    # 00:02
Coffee is ready! # 00:06

So just like before, the two println statements in main get executed right off the bat. Two seconds later, we see “Tea is ready!”. Four seconds later it’s followed by “Coffee is ready!”. Then, one second later, the call to time.Sleep(7 * 1e9) finishes and the program exits.

Using time.Sleep like that is a terrible idea, but we’re kind of limited without being able to send or receive messages from the goroutines to know when they’re done. However, we actually don’t need that to do useful things!

Imagine a web server: It simply loops forever waiting for a request. When one comes in, it processes it, sends a response, and goes back to waiting. If you wanted to get fancy, you might handle each request in its own thread, which keeps things neat and tidy, at least in concept. The downside is that you then have to deal with gnarly thread management issues. Ick.

What if we used goroutines instead instead of threads? Each goroutine works, conceptually, like its own thread, but they can be multiplexed on top of any number of native threads or process! And the Go language automatically handles resource management for you.

There’s even a built in http package which uses this kind of architecture behind the scenes:

package main

import (
    "fmt"
    "http"
)

func handler(w http.ResponseWriter, r *http.Request) {
    fmt.Fprintf(w, "Hi there, I love %s!", r.URL.Path[1:])
}

func main() {
    http.HandleFunc("/", handler)
    http.ListenAndServe(":8080", nil)
}

If you run that and visit http://localhost:8080/cake, you’ll see “Hi there, I love cake!” in your browser, since http.HandleFunc("/", handler) ensures that all requests get passed off to the handler function, which simply displays that message. Digging in to the source for the http package will reveal that it takes each incoming request, spins up a goroutine to handle it, and then goes back to listening.

Thus, a great use of goroutines without needing to know how to communicate with them while they run. Still, there are situations where you need to communicate with the goroutines, as per the coffee / tea brewing code above. Luckily, Go provides channels to do just that.

What is a channel?

A channel is a type-safe pipe for communicating between different parts of your program. Let’s start with an improved version of the brewing program as an example:

package main

import "time"

func IsReady(what string, seconds int64, ch chan<- bool) {
    time.Sleep(seconds * 1e9) // nanoseconds
    println(what, "is ready!")
    ch <- true
}

func main() {
    ch := make(chan bool)
    println("Let's go!")
    go IsReady("Coffee", 6, ch)
    go IsReady("Tea", 2, ch)
    println("I'm done here.")

    for i := 0; i < 2; i++ {
        <-ch
    }
}

Take a second to read that and see if you can spot the differences.

For one, IsReady now takes a third parameter, ch, which is a chan<- bool, meaning “a channel for sending bool data.” Secondly, after printing out that it’s done brewing whatever, it calls ch <- true, which is an expression meaning “send true down the channel ch.”

Now let’s look at main, which starts by creating a bool channel with ch := make(chan bool). This same channel is then passed to each of the two IsReady calls.

Lastly, instead of doing something terrible like waiting for 7 seconds, the program calls <-ch twice, which just means “read a value from from ch.” Since channels are blocking and unbuffered by default, reading from a channel with <-ch will sit and wait until some other goroutine tries to write to the channel with, say, ch <- true. The inverse is also true: senders will block and wait until a receiver is ready.

Since we know we only spun up two goroutines, we can safely exit after two values pop out of ch, since sending on ch is the last thing each goroutine does before reaching completion.

To step back a bit, you can think of goroutines as being similar to type-safe Unix pipes. Let’s take that analogy and extend it.

$ echo "hello world" | wc -c
      12
$ echo "hello world" | sed -e 's/hello/goodbye/g' | wc -c
      14

The shell commands above work like they do since echo just takes a string, slaps a \n character on the end, and spits it back out. Meanwhile wc -c reads in data and counts the number of characters. Thus, it counts 12 in the first instance (5 + 5 + a space + a newline). Similarly, the sed invocation takes its input and replaces any occurances of “hello” with “goodbye”, lengthening the original string by 2.

What if we wanted to simulate that sort of a pipeline in Go? First, let’s define functions to replace echo, sed, and wc. Each one will take a channel for input, and a channel for output:

package main

import (
    "regexp"
    "strconv"
)

func echo(in chan string, out chan string) {
    out <- (<-in + "\n")
}

func sed(in chan string, out chan string) {
    re := regexp.MustCompile("hello")
    out <- re.ReplaceAllString(<-in, "goodbye")
}

func wc(in chan string, out chan string) {
    out <- strconv.Itoa(len(<-in))
}

In the above functions, echo simply reads from its input channel, adds a newline to the end of it with <-in + "\n", and then sends the result to the out channel.

The sed function works similarly, first compiling a regular expression that matches hello, and then calling that regex’s ReplaceAllString method. The method is told to grab a string from the in channel and use “goodbye” as the replacement for matches. The result gets sent to the out channel.

Lastly, wc simply reads from the input channel (<-in), then gets the length of the input with len, and lastly converts the integer length into a string with strconv.Itoa before sending the result on the out channel.

The main function is simple, but a bit verbose.

func main() {
    ch1 := make(chan string)
    ch2 := make(chan string)
    ch3 := make(chan string)
    ch4 := make(chan string)

    go echo(ch1, ch2)
    go sed(ch2, ch3)
    go wc(ch3, ch4)

    ch1 <- "hello world"
    println(<-ch4)
}

It simply makes four channels, and hooks them up so that the output of echo is the input of sed, and the output of sed is the input of wc.

Each of those subfunctions get started in their own goroutines, and then sit and wait for input. The string “hello world” gets sent on ch1, which flows through echo, then sed, then wc and out on ch4. The result, “14”, gets printed:

$ /pipes1
14

Explicitly specifying the input and output channels is needlessly verbose. We have to give each function an input channel, but if it made and returned its own output channel, then we could trivially chain together as many functions as we want.

Let’s do that:

package main

import (
    "regexp"
    "strconv"
)

func echo(in chan string) chan string {
    out := make(chan string)
    go func() {
        out <- (<-in + "\n")
    }()
    return out
}

func sed(in chan string) chan string {
    out := make(chan string)
    re := regexp.MustCompile("hello")
    go func() {
        out <- re.ReplaceAllString(<-in, "goodbye")
    }()
    return out
}

func wc(in chan string) chan string {
    out := make(chan string)
    go func() {
        out <- strconv.Itoa(len(<-in))
    }()
    return out
}

This might look complex, but it’s really not too different from the previous code. The signatures changed to just accepting a single chan string, each function starts with creating an output channel, and each function ends by returning that channel. In the middle, the “real work” gets wrapped in an anonymous function which gets invoked in a goroutine. This means that when you call something like echo, it immediately returns its newly-made output channel, and as a side effect, spins off a goroutine to process any data that gets sent on the in channel. It’s basically saying “Okay, I’m ready and waiting on in. When I get something, I’ll give it back to you on this channel that I’m returning.”

Note that because functions in Go are closures, we didn’t have to explicitly pass in a reference to the input channel, in, when we defined an anonymous worker function. Rather, because in is visible in scope where we defined the anonymous function, that function is able to see it.

With that restructuring in place, our main function looks much more sensible:

func main() {
    in := make(chan string)
    out := wc(sed(echo(in)))
    in<-"hello world"
    println(<- out)
}

We make an input channel, then we get an output channel by composing our filters: out := wc(sed(echo(in))). Again, because the output of echo(in) is a channel, we can use it as the input to sed, and so on. And it works exactly like you would expect:

$ ./pipes2
14

Functions that start goroutines and return channels are a somewhat unique and interesting Go idiom.

How do channels and goroutines work together?

Channels and goroutines work together to implement a much more sane model of concurrency than traditional threads and shared memory. The central idea is that, for unbuffered channels, the act of reading and writing across a channel forces two goroutines to, at that moment, be synchronized. Otherwise, things are free to execute concurrently or in parallel.

This paradigm is often expressed as:

Do not communicate by sharing memory; instead, share memory by communicating. —Effective Go

For a somewhat silly example of channels and goroutines working together, check out this prime number sieve:

package main

func Generate(ch chan<- int) {
    for i := 2; ; i++ {
        ch <- i
    }
}

func Filter(in <-chan int, out chan<- int, prime int) {
    for {
        i := <-in
        if i%prime != 0 {
            out <- i
        }
    }
}

func main() {
    ch := make(chan int)
    go Generate(ch)
    for i := 0; i < 10; i++ {
        prime := <-ch
        print(prime, "\n")
        ch1 := make(chan int)
        go Filter(ch, ch1, prime)
        ch = ch1
    }
}

Generate simply starts at 2 and sends ever increasing numbers out on a given channel. First 2, then 3, then 4, and so on.

Filter sits in between two channels and silently drops any number that’s divisible by its prime input parameter. Otherwise, it just passes the number along. A Filter(in, out, 2), for instance, would not let any multiples of 2 through it. Thus, when hooked up to the output of Generate, first 3, then 5, then 7, would pop out the other end.

Can you see where this is going? We want to build a pipeline that successively filters each prime, so that with each number popping out the end, we save it, and then add on a filter for its own multiples:

[Generate] -> 2

[Generate] -> [Filter(2)] -> 3

[Generate] -> [Filter(2)] -> [Filter(3)] -> 5

…and so on.

Hence the loop in the main function, which successively reads from the rightmost end of the pipeline, prints that value, and then adds on a filter for it.

$ ./primes
2
3
5
7
11
13
17
19
23
29

Because all of the work is happening in goroutines, as soon as any two adjacent goroutines are ready, a new value can begin traversing through the pipeline. The adjacent goroutines synchronize upon exchanging data.

Channels can also be multiplexed

Any number of goroutines can write to a single channel, and any number of goroutines can read from a single channel. Go even has a special select statement which, given a list of channels, reads from and executes a block for a random, waiting channel. This allows channels to be easily de-multiplexed.

For another example, let’s use the random selection amongst waiting channels to implement a ridiculous random number generator:

package main

func main() {
    ch1 := make(chan int)
    ch2 := make(chan int)

    go func() {
        for {
            <-ch1
        }
    }()

    for {
        select {
        case ch1 <- 0:
            print(0)
        case ch1 <- 1:
            print(1)
        case ch2 <- 2:
            print(2)
        }
    }
}

Executing this prints out a random series of 1’s and 0’s. Why? Well, first we set up two channels for integers, and then kick off a goroutine that does nothing but drain one of those channels.

Then we loop infinitely, selecting one of the ready cases. Since the goroutine is always waiting to consume any value sent on ch1, either of the two case statements that send on ch1 could run. By definition, select chooses one those at random, creating a stream of random data.

Because nothing is ever waiting to read from ch2, the case that sends on ch2 is never executed.

These features make it easy to implement things like queues, signaling, etc.

Broader Applications

Even if you never plan on using Go, just learning about goroutines and channels can provide a rich mental model for designing concurrent systems. I’m particularly fond of this quote describing the benefits of good notation:

By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental power of the race. —Alfred North Whitehead, An Introduction to Mathematics, 1911.

And if you’d like to go straight to the notation’s source, goroutines and channels take their inspiration from C. A. R. Hoare’s “Communicating Sequential Processes”, a formal language for describing patterns of interaction in concurrent systems.

Python and Concurrency

Quite a few projects exist to make concurrent programming easier in Python. In particular:

Other Resources

Using CSP is Tony Hoare’s book on CSP, available as a free PDF.

The Go website at golang.org has absolutely fantastic resources, including great docs, a browser-based “playground” / pastebin, an interactive tour of the language, a great blog, and much more.

Credits

Random Bit Generator and Coffee / Tea Brewing applications inspired by the slides from day 3 of Rob Pike’s Go course. The Concurrent Prime Sieve is from golang.org.