Developer experiences from the trenches
Wed 01 January 2020 by Michael Labbe
I am using the new decade to reflect on the most important choices I made as a developer in the 2010s. For the sake of context, I have been hobby programming since 1986, and have had a career programming doing 1999. Games have been the central context for my problem solving, and for the 2000s I largely worked on C++ in core engines and networking on AAA titles, with a side of Python tooling. My work targeted Xbox 360, Playstation 3, Windows and Linux.
I am a developer who does not change languages, roles or rebuild codebases lightly. The benefit of the new approach has to offset the abandonment of the investment in the old which is a tough thing to justify.
That being the acknowledged high bar, there have been a number of improvements I have permitted over the past decade. Many changes optimize for efficiency, but let’s be honest — some are about comfort but do not substantially move the productivity needle in one direction or another. I believe these are still worthwhile for lifelong coders to pursue, within measure.
If it wasn’t obvious to me by the end of the 1990s, it was certainly obvious by the end of the 2000s: the things that I liked most about making games was happening in middleware and on engine teams, and not on actual game teams very consistently.
Services (as in hosted software for development teams) are another area where I have been able to make an impact, and I really saw the strength they can provide and what it takes to keep them running at scale when I consulted with Final Strike Games.
As we enter 2020, traditional engine programmers who also deeply understand how to build and deploy highly available web services at scale are still extremely rare. Many of the interesting problems I intend to solve over the next few years are in the intersection of these two specializations. I see this as a space full of unsolved problems that have a lot of value to any game development team.
From ‘04-‘15 I coded, ported and maintained a game engine which I largely wrote myself. It had everything in a single codebase (and to a large degree, the same address space at runtime!) This included a UI toolkit, an in-engine level editor, an original http client (predating Curl!) a texture compressor and many other goodies. None of these could be separated or built without the whole.
Whenever it was time to try some new experimental direction, I would carve into this monolith, hack in the new experiment, slowly integrating it into the whole. The result was predictable: a hard to slim down, hard to port codebase that also took awhile to compile and had a number of external dependencies which often did very little to help the shipping featureset.
Mid-decade, in light of my shift to tools and services and away from game engine programming, I committed to moving all development to multiple, smaller codebases. The rise of single header C libraries (and indeed, source/header pairs) helped persuade that a non-monolithic codebase was the best investment a coder on a small team could make in many cases.
The increased nimbleness and changeability that comes from having less intertwined code has made it far easier to ship real features to customers.
Many IDEs like Visual Studio and Xcode have project generators. The projects they generate are almost always the wrong thing and you have to go about deleting a bunch of cruft; worse than an empty project. However, it turns out, if you create your own project templates they can be huge productivity boosts. You are freed from the need to burden yourself from specifying how your project and all of its dependencies build every single time you start a new codebase. This strikes out a massive disincentive to build smaller projects.
I ratified the Native Project Standards, a set of thoughtful, partially arbitrary places to put all of the files in a project and then wrote a Project Generator for Native Project Standards which effectively enables me to build my project and all of its dependencies on a number of OSes and IDEs before I even write the first line of bespoke code.
You may disagree with the minutiae of NPS and the project generator may not do what you need it to. They haven’t been adopted by anyone but myself as far as I know. That said, I cannot overstate how impactful it has been to have every codebase I work on have exactly the same build commands with the exact same directory structures. Here is a short list of really meaningful benefits this has yielded:
I also learned that it is possible to go overboard with project generation, wherein you end up needing to fix a bug in many different codebases. Don’t go wide with your project generator until you’ve pounded out all of the major issues, and try not to generate code where referencing library code improves your agility.
While I’ll still pull C++ out when necessary, I default to C99 as my implementation language. When Visual Studio added quasi-C99 support in 2013, I began seriously considering this option. It compiles quickly, you can usually tell what is going to be executed next by just reading source and, in general, languages that are simpler to parse are easier to statically analyze.
Writing a library in C produces an evergreen product that can be called from almost every language. The C ABI is the lingua franca of programming languages. This makes maintaining FFIs for popular languages like Rust and Go that much simpler.
The two main things C99 misses on that cause problems for me:
realloc()
scheme for each new type.defer
keyword and no RAII makes cleaning up a multiple-return function a perilous endeavour.I live with these sharp edges and use tooling to detect problems early.
I’ve written a few small programs in Rust but it is still over the horizon for me. (See my earlier comment about not making changes lightly).
While I still use Python for the glue-it-together tasks I used Perl for in the 1990s, Go is now the language I use for services oriented development. A strongly typed language with explicit error returns goes a long way to ensuring runtime integrity of data.
A standalone, multi-threaded executable that cross compiles to every PC operating system with virtually no effort is nothing to sneeze at. It is incredibly refreshing after trying to portably distribute Python programs to non-Python programmers. This amazing feature is why Go is the language I intend to use to solve highly networked problems that need to execute in the cloud or, alternatively, on the game developer’s workstation.
Many people consider Go a quickly executing language. I have a more balanced view of where it sits on the performance spectrum. A tri-colour mark and sweep garbage collector implies volatility about collectable objects. But, I digress.
Tmux is incredibly powerful and all of the time I have invested in learning and configuring it has been given back to me in productivity and convenience. It’s not the first of its type, but it’s the first that clicked into gear for me.
If you haven’t used it, Tmux sits between your terminal (Xterm, Gnome-Terminal, etc.) and the programs you’re running in the terminal. You can split the screen, run multiple windows (tabs, basically), nest Tmux sessions, or even display the same window across multiple sessions (for remote, textual screen sharing).
All of this is useful, but the killer feature is the ability to detach and reattach a session across devices. Using this and a small Linux vm in the cloud, I am able to work anywhere in the world, provided I use console-oriented programs. (Confession: I am a 25+ year Emacs guy so this is an easy fit for me).
There are times where I’d be working on my triple monitor home workstation, and then have to go out and wait for an appointment. I’d grab my iPad, and using Blink SSH, I’d reattach the session. All of my open code files, log tails and build commands would be right there, albeit in a smaller rectangle.
Tmux won’t help you as much if you are working with graphics or if you are only developing on Visual Studio but if the type of programming you are doing works in a console window, it is extremely handy to be able to detach and reattach at will, across any device.
Almost every engine has debug line drawing which is indispensible for spatial or geometric problems. However, when faced with spatial or geometric edge cases, it is useful to glean details that emerge from sub-frame stepping through an algorithm and to be able to rewind and replay the debug visualization.
None of this is particularly possible in engines I have used, so when faced with some complex tesselation work I built my own line drawing library which streamed the lines out to a file on the disk. From there, a separate tool to visualized and let you scrub through the debug lines in either direction.
I went from poring over vertex values in a debugger for hours to immediately seeing collinear points. Since then i’ve used it to build an efficient tesselator, to implement edge case-free clipping routines and to even visualize Doom subsector creation.
Being able to replay each step in an algorithm and see the results has reduced development team and possibly even brought some of these solutions into reach for me. Visual replay debugging, where the replay happens outside of the address space program performing the work, has been a clear win for me.
Sat 23 February 2019 by Michael Labbe
tags code
Kubernetes is so good at maintaining a user-facing veneer of a stable service that you might not even know that you are periodically crashing until you set up log aggregation and do a keyword search for panic. You can miss crash cues because pods spin up so transparently.
Okay, so your application can crash. You are using Go. What can you do about it? In practice, here are the steps we have found useful:
SIGSEGV
gracefully.SIGTERM
messages.If you write a C program and do not explicitly handle SIGSEGV
with signal(2)
, the receipt of SIGSEGV
terminates the offending thread.
Go is different from C. Go’s runtime has a default panic handler that catches these signals and turns them into a panic. Defer, Panic and Recover on the official blog covers the basic mechanism.
SIGSEGV
(“segmentation violation”) is the most common one. Go will happily compile this SIGSEGV
-generating code:
var diebad *int
*diebad++ // oh, no
The full list of panic reasons is described in the official panic.go source.
Not every signal produces a Go panic — not by a long shot. Linux has over 50 signals. Version 7 had 15 different signals; SVR4 and 4.4BSD both have 31 different signals. Signals are a kernel interface exposed in userspace, and a primary means for processes to contend with their role in the larger operating system.
Let’s go over the non-panic inducing signals and discuss what they mean to our Kubernetes-driven Go program:
Unignorable signals: SIGKILL
and SIGSTOP
can’t be ignored. They are provided by the kernel as a surefire way of killing a process. If received, the process terminates without warning and we have to rely on logging coming from external sources. It is not recommended to use unignorable signals in automating your process restarts.
Flow-related signals: Many signals can be classified as supporting thread execution. These include SIGCONT
and SIGPIPE
. They do not interact with Kubernetes and we can safely ignore them or reserve them for any process-specific needs that come up.
Kubernetes-Generated Signals. Kubernetes sends SIGTERM
to PID 1 in your container thirty seconds before shutting down a pod. If you weren’t trapping this previously (and also not using a preStop hook), you are missing an opportunity to gracefully shut down your pod. By default, SIGTERM
terminates the process in a Go program. The more aggressive SIGKILL
is sent to your pod if it is still running after the grace period.
We’ve established that crashing signals in Go are received by its runtime panic handler, and that we want to override this behaviour to provide our own logging, stack tracing, and http response to a calling client.
In some environments you can globally trap exceptions. For instance, on Windows in a c++ environment you can use Structured Exception Handling to unwind the stack and perform diagnostics.
Not so in Go. We have one technique: defer
. We can set up a defer
function near the top of our goroutine stack that is executed if a panic occurs. When there, we can detect if a panic is currently in progress. There are a number of gotchas with this technique:
defer
does not run if os.Exit()
is called. Make sure all error paths out of your process call panic
or use runtime.Goexit()
.defer
(and recover
) operate on goroutines, not processes. If you set a defer
to run in main
and then spawn a goroutine which panics, the defer
will not be called.We can use the latter trait to our advantage in our web service, providing a generic panic handler that logs, and a second panic handler inside the goroutine that responds to a web request that returns 500 error
to the user.
The global panic handler is your opportunity to employ your logger to use your logger to provide all relevant crash diagnostics that occur outside of responding to an HTTP request:
//
// Sample code to catch panics in the main goroutine
//
func main() {
defer func() {
r := recover()
if r == nil {
return // no panic underway
}
fmt.Printf("PanicHandler invoked because %v\n", r)
// print debug stack
debug.PrintStack()
os.Exit(1)
}()
}
Most (if not all) Go RESTful packages use a per-request Goroutine to respond to incoming requests so they can perform in parallel. The top of this stack is under package control, and so it is up to the RESTful package maintainer to provide a panic handler.
go-restful defaults to doing nothing but offers an API to trap a panic, calling your designated callback. From there, it is up to you to log diagnostics and respond to the user. Check with your RESTful package for similar handlers.
go-restful’s default panic handler (implemented in logStackOnRecover
) logs the stack trace back to the caller. Don’t use it. Write your own panic handler that leverages your logging solution and does not expose internals at a crash site to a client.
Okay, at this point we are logging crash diagnostics, but what about amicable pod termination? Kubernetes is sending SIGTERM
and because we are not yet trapping it, it is causing our process to silently exit.
Consider the case of a DB connection over TCP. If our process has open TCP connections, a TCP connection sits idle until one side sends a packet. Killing the process without closing a TCP socket results in a half-open connection. Half-open connections are handled deep in your database driver and explicit disconnection is not necessary, but it is nice.
It avoids the need for application-level keepalive round trips to discover a half-open connection. Correctly closing all TCP connections ensures your database-side connection count telemetry is accurate. Further, if a starting pod initializes a large enough database connection pool in the timeout window, it may temporarily exceed your max db connections because the half-closed ones have not timed out yet!
//
// Sample code to trap SIGTERM
//
func main()
sigs := make(chan os.Signal, 1)
signal.Notify(sigs, syscall.SIGTERM)
go func() {
// before you trapped SIGTERM your process would
// have exited, so we are now on borrowed time.
//
// Kubernetes sends SIGTERM 30 seconds before
// shutting down the pod.
sig := <-sigs
// Log the received signal
fmt.Printf("LOG: Caught sig ")
fmt.Println(sig)
// ... close TCP connections here.
// Gracefully exit.
// (Use runtime.GoExit() if you need to call defers)
os.Exit(0)
}()
}
You may also want to trap SIGINT
which usually occurs when the user types Control-C. These don’t happen in production, but if you see one in a log, you can quickly recognize you aren’t looking at production logs!
At this point we have deeply limited the number of ways your application can silently fail in production. The resiliency of Kubernetes and the default behaviours of the Go runtime can sweep issues under the rug.
With just a few small code snippets, we are back in control of our exit conditions.
Crashing gracefully is about leaving a meaningful corpse for others to find.
Thu 05 July 2018 by Michael Labbé
tags code
Intellect, the ability to focus in on a problem and sheer time committed to the craft of programming are critical and pretty obvious elements that make a programmer good. Having these things on your side is partly luck and partly an expensive time commitment. However, I believe there are further traits that can be developed through habit-forming practice that make a programmer excellent.
Some programmers transcend being merely good; they are highly effective. This often becomes apparent when you see them becoming the team’s de facto problem solver, or when they reliably design and implement excellent-fit solutions, topping their previous attempts.
In the teams I’ve participated in and built I have found three traits that recur in highly effective programmers. When I find even one of them they often go on to live up to great promise. Any one of them is a strong tell, and more is a sign of a programmer with serious potential to be impactful.
The first trait is intellectual curiosity. When you find someone who tinkers because they are curious about new results you are engaging someone who has internalized the impetus for pioneering solutions. Internalization of curiosity is key because it is the surest driver of tangential exploration. A programmer who has exercised solutions to problems they dreamt up themselves out of pure interest in discovery has strengthened their abilities in excess of the rigours of standard professional performance. Professional programming makes you strong enough to stand tall in full gravity. Intellectual curiosity exceeds that; it is like training with a weight belt on.
The second trait is tenacity. Tenacity is the sworn enemy of “Cool, it works! We’re done!”. Those who internalize this trait never spitball their way to a final solution. If multiplying by negative one solves the problem but they don’t know why, they remove it and figure out why the sign inversion makes everything seemingly work. Inherent to this behaviour is the inclination to traverse underneath abstractions. Making it work is no longer the quest; the search is for a deeper understanding, one that makes the answer readily apparent. Illuminate the problem with a hard-earned understanding of the facts and the rest is small muscle movements.
An example of tenacity is spending three weeks tracking down a memory leak in ostensibly mature system libraries. Working through source, compiling it yourself, pouring over machine code, examining the compiler, and then reading your processor instruction manual. Rewriting portions of libc to verify results. Thermal imaging in your data center. Whatever it takes.
The final trait is a willingness to self criticize. Most programmers eventually have the experience of looking at code from a few years back and cringing. While syntax choices evolve, the cringe truly comes from a looking-in view of a naive problem solver doing their best and missing their mark. When a priori derived solutions are mismatched with the present understanding of a problem, personal growth is felt at a gut level.
An unprompted individual who consistently criticizes their own solutions is going to blossom quickly. Any valuable solution space is enormous, and the ability to criticize from a positive vantage point is the natural habitat of an always improving programmer.
Those are the three traits I’ve seen that suggest a programmer is going to be promising and impactful. Next time I am going to ponder the question that affects your effectiveness more than anything else: How do you decide what to work on?
Mon 07 August 2017 by Michael Labbe
tags code
Between the cloud, VMs, Docker and cheap laptops, I run into more unconfigured shell environments than I ever did before. In the simple old days you used to get a computer and configure it. You reaped the productivity of that configuration for years. Nowadays environments are ridiculously disposable. The tyranny of the default has become incredibly powerful.
I decided to take the power back and create a self-installing bootstrap script which I could use to configure any new system with a Bash shell. This ended up being a one-day hack that has made my life a lot more sane. My requirements were:
It must self-install and self-configure without needing to type any commands.
It must be accessible everywhere from an easy-to-remember URL so I don’t need to copy/paste.
It must optionally let me choose how each system is configured.
It must run anywhere — inside stripped down Docker containers, etc.
To build this, I decided on using Bash scripting. Perl, Python and Ruby are not always available. Bash, while not quite as ubitquious as /bin/sh
and Busybox, is close enough.
Makeself creates self extracting archives. You can download a shell script that unarchives to a tempdir, running a setup script with access to a hierarchy of files. This is exactly what we needed.
In order to centrally host the bootstrap script, I used Amazon S3. S3 buckets have notoriously long names, but Amazon gives you the ability to use a CNAME for a subdomain that you own. This means I could use a subdomain like https://bootstrap.frogtoss.com
that is backed by S3, guaranteeing the bootstrap is accessible virtually anywhere in the free world.
What remained is a long day of enjoyable hacking that produced a set of very personal dotfiles, emacs tweaks and sed manipulations that converted a basic install into something as usable as my most tweaked workstation.
Now I have a chained command that is similar to the following which highly configures any Linux instance:
rm -f bootstrap.sh;
wget http://bootstrap.frogtoss.com/bootstrap.sh;
chmod +x bootstrap.sh ; sudo ./bootstrap.sh