Developer experiences from the trenches
Fri 09 August 2024 by Michael Labbe
tags code
Sometimes you want to parse a fragment from a string and all you have is C. Parsers for things like rfc3339 timestamps are handy, reusable pieces of code. This post suggests a convention for writing stack-based fragment parsers that can be easily reused or composed into a larger parser.
It’s opinionated, but tends to work for most things so adopt or adapt to your needs.
The idea is pretty simple.
// can be any type
typedef struct {
// fields go here
} type_t;
int parse_type(char **stream, size_t len, type_t *out);
Pass in a **stream
pointer to a null-terminated string. On return, **stream
points to the location of an error, or past the end of the parse on success. This means that it can point to the null terminator.
Pass in the length of the string to parse to avoid needing to call strlen, or to indicate if the end of a successful parse occurs before the null terminator.
Return can be an int
as depicted, or an enum of parse failure reasons if not. The key thing is that zero is success. This allows multiple parses to OR the results and test for error once for trivial code.
That’s the whole interface. You can compose a larger parser out of smaller versions of these. So, if you want to parse a float (a deceptively hard thing to do) in a document, or key value pairs with quotes or something, you can build, test and reuse them by following this convention.
When you implement a fragment parser you end up needing the same few support functions. This suggests a convention.
Testing for whether the stream was fully parsed works well works with a macro containing a single expression:
#define did_fully_parse_stream \
(*stream - start == (ptrdiff_t)len)
int parse_type(char **stream, size_t len, type_t *out) {
char *start = *stream;
if (!did_fully_parse_stream)
return 1;
}
Test the next token for a match:
static int is_token(const char **stream, char ch) {
return **stream == ch;
}
Test the next token and bypass it if it matches. By convention, use this if a token failing to match is not an error.
static int was_token(const char **stream, char ch) {
if (is_token(stream, ch)) {
(*stream)++;
return 1;
}
return 0;
}
Test the next token to be ‘ch’, returning true if it is. While this functionally does the same thing as was_token
, it is semantically useful to use it to mean an error has occurred if it does not match.
static int expect_token(const char **stream, char ch) {
return !was_token(stream, ch);
}
Token classification is very easy to implement using C99’s designated initializers. A zero-filled lookup table can be used to test token class and to convert tokens to values.
static char digits[256] = {
['0'] = 0, ['1'] = 1, ['2'] = 2, ['3'] = 3, ['4'] = 4, ['5'] = 5,
['6'] = 6, ['7'] = 7, ['8'] = 8, ['9'] = 9,
};
void func()
{
// is it a digit?
if (digits[**stream]) {
// yes, convert token to stored integral value
int value = digits[**stream];
}
// skip token stream ahead to first non-digit
while (digits[**stream]) (*stream)++;
}
Fri 02 August 2024 by Michael Labbe
tags rant
Recently I had a conversation with a composer who was planning on buying a $5,499 Mac Studio to record music. “It’s the only computer I’ll need to run all of my VSTs and play back all of my tracks”, he remarked. With 24 cores and 64GB of RAM, it sure seemed likely to me. “Are you sure you couldn’t do that on a MacBook Air?” I prompted, genuinely curious about where the resources were going. He seemed taken aback that it might even be a possibility.
Whether or not he needed the extra headroom — and you can make the argument that you would weigh down a lighter recording computer with VSTs and track layering — it was a good reminder that marketing like Apple’s makes people equate professional significance with higher end devices. Today, most base model CPUs are good enough for most people. Most professionals do not quantify their computing needs before making a purchase, and so many computers being sold are unnecessarily overpowered. Device marketing encourages this. Over the past couple of decades we benefitted from those gains but I don’t believe it’s true anymore for many tasks if you choose the right software.
A hobby of mine is to achieve my intended computing result with the least amount of computing power and dollars I reasonably can. For example, this blog post is being written on a refurbished $250 Thinkpad humming along on a Linux Mint MATE desktop running only Emacs.
It is refreshing to not be precious about an expensive laptop, and to be able to just toss it in a bag. Unlike modern buggy gaming laptops that lack ports, it also wakes from sleep with 100% consistency. I still own a highend workstation, but I am finding many computing needs can be covered by devices that shipped 5+ years ago: browsing, editing, messaging, some music composing and even coding smaller scale (read: not Unreal) projects.
Microsoft has announced the end of life of Windows 10 on October 14, 2025. Many highly capable computers including Threadrippers, Dell XPS laptops and other older highend configurations will be unable to run Windows 11 in a secure, supported way. That said, Steam Hardware Survey, as of this writing, has Windows 10 counted as more popular than 11.
This situation has created a rising tension, and one of two things is likely to happen:
Next year will be a great opportunity to pick up refurbished hardware that can do most computing tasks after doing an install of FreeBSD or a Linux distribution. Linux Mint is my preference — it feels supportive of the user like Windows 2000 did, has no obvious subversive agenda, is Ubuntu package compatible, and is entirely snappy on lowend hardware that is slated for deprecation by Microsoft.
Turn Microsoft’s e-waste into your next workhorse computer.
On-device AI is being shoehorned where it has no business going because it is perceived as being able to push tech company valuations. It is being foisted on consumers whether they understand it or not. Meanwhile, we are being told we have to upgrade to new processors and operating systems to receive these fun new experiences.
There is a lot to be said about AI, but as far as my computing device goes — I’m totally fine with staying on the beach while the corporate agendafied first wave hits everybody who jumps in the water. Consumer on-device AI is not going to be a part of my professional workflows until the waters have settled and the hype has passed.
The refurbished device market stands to become very saturated if AI features motivate users to abandon their existing computers. Buy the dip!
By using a refurbished device on Linux you are virtually guaranteed to avoid the first generation of consumer on-device AI which is likely to involve annoying or even dangerous missteps.
Users coming from Apple to other operating systems seem to demonstrate a sensibility — they want to love their new PC, tablet or phone. This is because the device is the nexus of the experience in the Apple world. The hardware is second to none and you can end up experiencing entirely bug free workdays if you stay on a well-manicured path. Jumping from a Macbook Pro to an unconfigured Thinkpad on Ubuntu would be like going from an Americano with cream to gritty camping coffee prepared with a hangover.
A shift in mindset about what matters is helpful. I have found it productive to not focus on the device so much as I focus on getting the result I am looking for in my work. Loving the hardware is not the point, and it can be freeing to find the workflow that gets you the result you need outside of loving a device. Imagine how much you can achieve outside the binds of device love!
October 2025 is looking like a great time to pick up a dirt cheap first generation 16-core Threadripper, install Linux on it and have it perform phenomenally for a decade or longer. Now, if Microsoft could just deprecate some of those previous-gen GPUs…
Sat 22 June 2024 by Michael Labbe
tags meta
Just a quick note to say I improved Pelican’s RSS generator for Labs. You can now read full articles in your RSS reader if you subscribe to this blog. Previously they were truncated, which forced users to go to the site. Now you can read the posts anywhere you want.
I also cropped the number of posts in RSS down to five so RSS readers will not need to mark a ton of really old posts as read. There has never been a better time to subscribe. :)
Sun 09 June 2024 by Michael Labbe
tags code
Modern web applications are a façade consisting of many smaller programs and libraries that are configured to run in concert to produce a result. To developers outside of games, this has been acutely obvious for a long time. Games have largely been spared the configuration needs this brings due to a focus on producing a monolithic runtime. However, many modern games ship proprietary logic outside of the code that runs on the disc, such as backend services, so has been affecting games for some time, as well.
At the heart of all this is the need for configuration. Having personally experienced professional devops roles, there seems to be a lack of deep thinking about configuration. This article hopes to inspire deeper thinking about configuration design for programs.
Application configuration is our opportunity to affect runtime state before a program begins its main execution. Static declarations are easily definable, immutable, loggable, can be stored in revision control and can be easily reviewed by a team. Runtime state, on the other hand, is ephemeral and mutable. Through configuration, we have the opportunity wield the runtime state of large, distributed applications in predictable, effcient ways. Most programs do not seize this opportunity.
We treat configuration like it is simple and easy. It is time to start respecting configuration in application design and maintenance.
What is the ground truth configuration for a program? Is it the config file? Not even remotely close. It is the portion of in-memory state that is necessary to cause an (approximately) deterministic, repeatable execution of the program. This is what I call the “ground truth” of an application’s configuration. It usually includes:
Commonly, programs read configuration from many sources. A bespoke search path for configuration, starting from system-wide, and moving in to home directories. Environment variables as an override. Then, command line arguments.
This process differs for each program which is why you’ll see each program document it. Even specifying the system hostname requires addressing multiple files, deprecations and symlinks on Linux.
What happens if there is a system-wide config file but it is not readable because of the permissions of the current user? Pass over it? Throw an error because it exists? This, too, is ambiguous and varies from program to program.
The bottom line is that most programs accumulate a ground truth configuration haphazardly, and then begin executing, perhaps destructively, with no means to review the configuration before it starts.
Writing code is commonly less time consuming than maintaining and debugging the same code. The same is true of configuring software versus troubleshooting it. A misconfigured application produces errors for end users. Many of the configuration formats that are commonly in use (JSON, YAML, TOML) prioritize convenient authorship over unambiguous runtime states. This allows for rapid configuration in exchange for potential risks involving:
Implicit defaults are exceptionally bad when ground truth configuration is not reviewable. You may not even know that you are operating on a bad default, or that an option exists.
Consider:
secrue=true
An insufficiently rigorous program can be misconfigured to breach security without error due to these two aforementioned properties.
YAML, in particular, has a lot of known pitfalls. The point of this article is not to debate popular config file formats. A good developer can overcome YAML’s problems with knowledge and practice, but the problem of contending with underspecified ground truth configuration state is a lifelong drag which can only be overcome through good program design.
JSON, YAML and TOML all have versioned file format specs, but those specifications have no details about how they should affect on-disk performance. Some examples of ambiguities:
Every program behaves differently as a result of this underspecification.
When folks debug a program, they have a mental model of its execution in their heads. Consider:
b = 1;
if (cfg.a)
b += do_optional_thing()
// code continues to do complex things with b
When a developer reads this code, they will either consider b
to be augmented by config option a
or not. Their mental model of the code necessarily includes this mutating state. Therefore, in removing as much uncertainty as to what the state of a
is, is important to someone attempting to ascertain why they are seeing the result of b
on their screen.
The rest of this article’s solutions emphasize the need for reducing the size of the mental model necessary for proper configuration troubleshooting.
Which one is right for your application depends on your context. Declarative configuration is a turing-complete program that configures a program. Keeping a mental model of config state requires mentally interpolating variables, simulating loops in your head and jumping through nested function calls.
Imperative configuration lays it all out flat, which lets you see what things are. However, almost everything imperative ends up becoming awkwardly complex when it layers in declarative concepts. See: HCL for_each loops or Ansible adding Python dictionary lookups to YAML files.
A better approach is to think of imperative configuration as a funnel. A data table, perhaps nested, of configuration values can be derived from all sources and fed as input to the ground truth configuration. This table could be declared, or imperatively derived.
The healthy thing is to arrive at a data table of explicit program configuration before core execution of the program starts — an imperative funnel which can be arrived at declaratively.
Schemas are for constraining config file formats, not for constraining ground truth configuration. Ground truth configuration is subjected to underspecified parsers, config file search orders, environment variable and command line overrides and more. Therefore, a schema for a config file does not solve the larger program configuration problem by itself. It doesn’t necessarily hurt it, either, though.
When someone says “we need schemas”, it is useful to explore the root reasoning of that statement before jumping in.
In structured languages, a ground truth configuration can be typed and could be used to produce a schema. The right choice is to keep as much ground truth about the program’s configuration in one reviewable structure.
Most importantly, provide the best tooling for your in-context situation to edit and review the program’s ground truth.
Configuration has a way of becoming layered, especially in devops. For example:
values.yaml
file overrides the Helm Chart for a forked Docker imageIn this case, we reap the benefits of a highly-available program that is configured to our specification, compiled by a program provider else and made to work for our purposes. This reduces a large one-time up-front cost. However, we incur a cost of five configuration files, implicitly depending on values from each other to derive whole program state. This has a drag on efficiency for the lifetime of the product. This is an important tradeoff in where you spend effort — one to commit to consciously.
Each small program comes with its own configuration files and state. Since your application consists of multiple programs, you end up producing configuration files that require values that are similar between them. This is brittle when changed.
Further, if there are multiple versions of an application (eg: test and production), there is an n by m problem, where each dependent configuration must exist for each version of the application.
This can be addressed by having a single source of truth for each application configuration, used to produce the smaller configurations for each program.
For the remainder of my life I will depend on large applications that are made up of many small, configured programs operating in concert. Making configuration correct, safe and expressive is an opportunity to wield large numbers of these programs with minimal cost and overhead.
Many of these smaller programs came from programming cultures that emphasized getting something up and running over long term maintainability, loose coupling and quick-and-dirty scripting. As computing complexity increases, it is my hope that the sort of rigorous values that spurred the creation of languages like Rust are applied to configuration management.
Page 1 / 9 »