Let’s Go Python

I love Python. This object-oriented high-level programming language has everything you need: a dynamic type system, garbage collector, large and comprehensive standard library, readable code plus many more great features. However even the best things have their disadvantages. For example, The Global Interpreter Lock which prevents multiple native threads from executing Python bytecodes at once.

Parallelism in Python 

Let’s write a simple application that counts back from 100 million to zero (it can be also program that pushes requests to the server many times, do calculations, etc.): 

from datetime import datetime

def doSomethingWeird(counter, number):
    print("Thread %s started" % number)
    while counter > 0:
        counter -= 1
    print("Thread %s ends" % number)
        

if __name__ == '__main__':
    start = datetime.now()
    counter = 100000000

    doSomethingWeird(counter, 1)

    print("Execution took %s" % (datetime.now() - start)) 

This results in: 

Thread 1 started
Thread 1 ends
Execution took 0:00:05.573309 

As you see, running the whole application took around 5.5 seconds.

Now, let’s modify our code to run the same function with 4 threads (quad-core Intel Core i7 in my case). Each thread should calculate from 25 millions to zero. If we do this correctly, the execution time should drop to ~1.4 sec. 

from datetime import datetime
from threading import Thread

def doSomethingWeird(counter, number):
    print("Thread %s started" % number)
    while counter > 0:
        counter -= 1
    print("Thread %s ends" % number)
        

if __name__ == '__main__':
    start = datetime.now()
    counter = 100000000

    run_x_times = 4
    threads = list()
    for i in range(run_x_times):
        threads.append(Thread(target=doSomethingWeird, args=(counter/run_x_times, i + 1)))

    for t in threads:
        t.start()

    for t in threads:
        t.join()

    print("Execution took %s" % (datetime.now() - start)) 

Results I received: 

Thread 1 started
Thread 2 started
Thread 3 started
Thread 4 started
Thread 3 ends
Thread 4 ends
Thread 2 ends
Thread 1 ends
Execution took 0:00:08.961999 

Threads were run concurrently, but what happened to execution time? It increased! How is this possible? The Global Interpreter Lock… They were run concurrently but not in parallel.

Concurrency & parallelism in Python is a great topic for another article, so let me just introduce a few dry facts. In Python, regardless of the number of threads and processors, only one thread is executed at any time. 

GIL Go Away! 

I looked for other languages that would help me build a fast, multithread application. Scala, Clojure, Erlang, Go… Go? What’s Go? 

“Go is an open source programming language that makes it easy to build simple, reliable, and efficient software.”

Go is a programming language initially developed at Google and announced in November 2009. Go(lang) looks like C on steroids, with the goal of simplicity, succinctness, and safety. For more details about language specification please check out the official Go website. In the meantime, let’s install Go to show you the magic of concurrency. 

Get Go 

Go is available for Linux, Mac, Windows and many more. Just visit the official website and download the version for your system. If you are lazy lack time, you can use Go Playground, which is an awesome place to write and run your first program using only your browser! A simple example: 

package main

import "fmt"

func main() {
	var title string
	title = "Mr"
	name := "Dude"
	fmt.Printf("Hello, %s %s", title, name)
}

To run the program just hit Run button on the playground or type go run main.go (if your filename is named main.go). 

Parallelism in Go 

Now, when we know basics of Go, let’s rewrite the same code that we did in Python. At the beginning, it is the same simple program that counts to 0. 

package main

import "fmt"
import "time"

func doSomethingWeird(counter int, number int) {
	fmt.Printf("Routine %d startedn", number)
	for ; counter > 0; counter-- {
	}
	fmt.Printf("Routine %d endsn", number)
}

func main() {
	start := time.Now()

	counter := 100000000

	doSomethingWeird(counter, 1)

	elapsed := time.Since(start)
	fmt.Printf("Execution took %sn", elapsed)
} 

This results in: 

Routine 1 started
Routine 1 ends
Execution took 31.880747ms 

It works properly. Let’s not focus on execution time, since, in general, programs that have been compiled are faster that programs that are interpreted. 

Now let’s modify our code to run in parallel mode using goroutines. A goroutine is a lightweight thread of execution and, in my opinion, the most common example of Go awesomeness: 

package main

import "fmt"
import "time"
import "sync"
import "runtime"

var wg sync.WaitGroup

func doSomethingWeird(counter int, number int) {
	defer wg.Done()
	fmt.Printf("Routine %d startedn", number)
	for ; counter > 0; counter-- {
	}
	fmt.Printf("Routine %d endsn", number)
}

func main() {
	start := time.Now()

	counter := 100000000
	run_x_times := 4 // How many logic processors we want to use
	runtime.GOMAXPROCS(run_x_times)

	wg.Add(run_x_times)

	for i := 1; i <= run_x_times; i++ {
		go doSomethingWeird(counter/run_x_times, i)
	}

	wg.Wait() // Wait till all routines ends

	elapsed := time.Since(start)
	fmt.Printf("Execution took %sn", elapsed)
} 

And now let’s see the results: 

Routine 3 started
Routine 1 started
Routine 2 started
Routine 4 started
Routine 1 ends
Routine 3 ends
Routine 4 ends
Routine 2 ends
Execution took 8.626793ms 

Woohoo! Success! Using 4 cores, the time dropped from ~32ms to ~8ms. What we do here is added GOMAXPROCS and used goroutine. The GOMAXPROCS variable limits the number of operating system threads that can execute user-level Go code simultaneously. There is no limit to the number of threads that can be blocked in system calls on behalf of Go code. 

Summary 

As David Beazley writes in The Unwritten Rules of Python:

1. You do not talk about the GIL. 
2. You do NOT talk about the GIL. 
3. Don’t even mention the GIL. No seriously.

Now we at least know why!