Golang lock-free values with atomic.Value
Earlier this week, I had the privilege of attending to an sneak preview of Andrew Gerrand’s talk “Stupid Gopher Tricks”, which he should be presenting at the Golang UK conference as I publish these lines. It’s a great talk about lesser-known Go features. I won’t spoil his talk here, you should instead attend the next conference if you can or check out the slides once they become available. But there was one element in the talk that caught particularly my attention. atomic.Value.
While one of Go’s mantras is share memory by communicating, every now and then some problems still call for shared state. Sometimes this is due to performance reasons, sometimes the resource in question is shared in nature like application configuration.
Go’s canonical way to safely share state is by protecting it with a mutex. It was surprising for me that since 1.4 the standard library offers an alternative way to achieve thread safety for arbitrary types via the atomic.Value
type. The sync/atomic
package has always been there providing some very low level atomic primitives and it is conventional wisdom to stay away from it unless you know very well what you are doing.
But atomic.Value
seemed to have a very simple and friendly interface:
package main
import (
"fmt"
"sync/atomic"
)
type User struct {
FirstName string
LastName string
}
var GlobalUser atomic.Value
func main() {
user := User{"Ramsay", "Bolton"}
GlobalUser.Store(user) // atomic/thread-safe
data := GlobalUser.Load().(User) // atomic/thread-safe
fmt.Printf("%+v", data)
}
// Outputs:
// {FirstName:Ramsay LastName:Bolton}
That looked pretty simple to me, less lines, and not only thread-safe but lock-free! Using it I would never have to worry about some gone-rogue-teen™ failing to release a lock. I promptly put together a not very scientific benchmark to compare the performance with the mutex.
package value_test
import (
"sync"
"sync/atomic"
"testing"
)
type Config struct {
sync.RWMutex
endpoint string
}
func BenchmarkPMutexSet(b *testing.B) {
config := Config{}
b.ReportAllocs()
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
config.Lock()
config.endpoint = "api.example.com"
config.Unlock()
}
})
}
func BenchmarkPMutexGet(b *testing.B) {
config := Config{endpoint: "api.example.com"}
b.ReportAllocs()
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
config.RLock()
_ = config.endpoint
config.RUnlock()
}
})
}
func BenchmarkPAtomicSet(b *testing.B) {
var config atomic.Value
c := Config{endpoint: "api.example.com"}
b.ReportAllocs()
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
config.Store(c)
}
})
}
func BenchmarkPAtomicGet(b *testing.B) {
var config atomic.Value
config.Store(Config{endpoint: "api.example.com"})
b.ReportAllocs()
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
_ = config.Load().(Config)
}
})
}
# go version go1.5 darwin/amd64
BenchmarkPMutexSet-4 20000000 90.2 ns/op 0 B/op 0 allocs/op
BenchmarkPMutexGet-4 30000000 57.9 ns/op 0 B/op 0 allocs/op
BenchmarkPAtomicSet-4 20000000 72.2 ns/op 48 B/op 1 allocs/op
BenchmarkPAtomicGet-4 100000000 12.8 ns/op 0 B/op 0 allocs/op
That is indeed faster, although not mind-blowingly so. Likely negligible compared to the rest of your logic. Looks like the mutex was plenty fast to begin with, so there wasn’t a ton of room for improvement.
Should I use it?
Well, probably not. Don’t be fooled by the simple interface. As I mentioned before, the atomic package is not for most people, and even an innocent looking feature like this one has some rough edges. Quoting the documentation:
All calls to Store for a given Value must use values of the same concrete type. Store of an inconsistent type panics, as does Store(nil).
or
Once Store has been called, a Value must not be copied.
Now that you just read about it you know it, but someone else in your team might not, or you might not remember it 6 months from now. This is why you shouldn’t use this unless you really need to worry about contention or you are operating at a scale where this performance improvements make a difference.
So, which kind to applications am I talking about? you may ask. Why is this slightly tricky thing there in the first place? It turns out that this benchmark in my modest laptop was a bit out of scope.
Dmitry Vyukov, the author, made the case for it on the golang-dev mailing list:
I’ve received a private report about poor encoding/gob performance on a 80-core machine. encoding/gob has a bunch of global mutexes. By removing just one of them, I’ve got 2x speedup on 16-core machine. If all mutexes are removed on the 80-core machine it can easily make 20x difference.
…
This kind of scalable synchronization algorithms inherently requires unsafe package today. This means that it is incompatible with appengine and other safe contexts. While it is not actually unsafe. Looks like a quite unfortunate situation to me. Today people use 80-code machines, tomorrow they will use 200-core machines. Mutexes don’t work there in any way, shape or form.
…
If Go wants to continue to be positioned as “the way to program modern multicore machines” (which I believe it was initially), then it must provide relevant means for that and its base libraries must not impose scalability bottlenecks.
Until you have that kind of problem though… you know, use caution.
UPDATE: Stupid Gopher Tricks slides are up, with details about the internals.