Programming Tradeoffs

One of my favorite programming quotes is by Rich Hickey who said "Programmers know the benefits of everything, but the tradeoffs of nothing".

It's easier to give blanket advice like "composition over inheritance" or "don't repeat yourself", but those rules don't tell you which contexts they're most appropriate for.

Idea #1: Concurrency

There's several models for concurrency:

  • Thread-per-request - this is a very common pattern for servers. The downside with this model is that a thread is heavy-weight, but some languages / platforms have solved this with very lightweight thread equivalents like goroutines in Go. Java is currently working adding fibers to the JVM. Google, internally, has fibers support in C++. This has the simplest mental model because it's very similar to programming in a synchronous style.
  • Event loop - this is used in web programming, but is also used in various other contexts. For example, Chromium had an async task scheduler that essentially functioned like an event loop.
  • Dependency Graph - at work I've used an asynchronous dependency graph framework for Java that allows you to specify various nodes and their dependencies and "unwraps" future nodes by scheduling asynchronous work. The advantage is that it allows you to reach the theoretical best, but in practice it creates a very complex programming model when it comes to simple CRUD-like operations.
  • Streams - I've the least experience with this, but this has been popularized with libraries like Rx. I think this model works well when you have complex dataflows. For example, if your approach to user events requires extensive knowledge of prior events (e.g. dragging, throttling), then modeling it as streams gives you a coherent way of dealing with a series of events. On the other hand, most of my experience with web programming is dealing with simple user events like clicks, keydown, etc. For server-side, streams may be useful if it's critical for your use case (e.g. server streaming live results for a multiplayer game, or streaming video data). In general, though, streams seem to be more of the exception.

Idea #2: Testing

There's a huge range of opinions on testing from "unit testing is a waste of time" to "integration testing is a scam" and everything in between.

What makes a test good?

  • Fidelity - does the test execute actual scenarios? For example, is it easy for your test data to drift from reality over time?
  • Coverage - does the test fail when something incorrect happen? This can be much subtler than it seems, for example, if there's dangling event handlers or resources leaked, will the test continue to pass?
  • Speed - how fast does the test run?
  • Determinism - does the test flake?
  • Maintenance - is it easy to read and update the test? When you see a test failure, how long does it take you to fix it?

Unit test is good for speed and determinism, but oftentimes fidelity and coverage are not as good.

Integration test is good for coverage and depending on how test data is managed coverage may be OK. However, they're usually much slower and tend to be much flakier.

Idea #3: APIs

Thought process for coming up with good APIs.

  1. Start with concrete use cases from likely users.
  2. Imagine all the places that the APIs will evolve. This can be very speculative.
  3. Delete everything that won't be used right away and keep your notes on how the API might evolve.

Some random ideas on APIs:

  • Think about the lifetime of entities. Is there a strict 1:1 mapping in the lifetime of two entities? (for example, will users always create and delete them at the same time?) If so, consider merging the two entities together. Perhaps you need a higher-level concept.
  • Imagine the future, but don't contort the present. Think about how things might evolve and where you'll want to add fields and which fields might become obsolete. But the key is to not let it contort the present API where things are overtly complex and don't make sense.
  • When introducing a new entity, think about how it ties with all the other entities in your system. It will make sure that 1) the new entity is essential and 2) help explain the usage of this entity to your users.

Idea #4: Functional core, imperative shell

This idea is from Gary Bernhardt and it's been something that I've thought about a lot and tried to use in my projects. Keep the core business logic as functionally pure as possible. Map it as inputs -> outputs without side-effects, asynchronous work, etc.

Idea #5: Make coherent changes

I don't think the goal for changing code is to make as many tiny commits as possible. It makes 1) understanding the rationale for a series of logical changes harder to follow (subjective, sometimes), 2) creates an overhead to rolling back changes.

Small CLs:

  • Good for trivial maintenance changes.
  • Incrementally migrating to a new API as part of a large-scale change.

Large CLs:

  • Good for atomically migrating to a new API. When programming in the small, atomically changing is good because 1) you avoid the extra work of supporting two APIs concurrently in the transition and 2) it makes the scope of the change (e.g. is this helping most callsites) clear. At a certain scale, this becomes 1) too risky and 2) impossible to do from a logistics point (e.g. underlying files change).
  • Introducing a new, low-risk component. If there's a feature toggle or definitely won't affect existing components, then the risk of introducing a bug is small.

Idea #6: Indirection

The two main types of indirection I see are:

  1. Functions
  2. Interfaces

Some programmers advocate for lots of little functions like Uncle Bob and others like Steve McConnell of Code Complete say that large functions are good (er... never mind).

For me, I think splitting things into small private methods is OK as long as:

  • The function name isn't an exact description of the implementation. This is obvious when the function name is almost as long as the function body itself.
  • The private methods are kept as functionally pure as possible. Relying on instance variables in private methods makes things hard to follow, particularly when private method #1 sets instance variable A and then calls private method #2 which reads instance variable A. Instead, just pass in parameters and make it clear what the contract is between the functions. The exception to this is if there's one instance that would require a lot of plumbing (e.g. un-related intermediate functions need to know about it).

For interfaces, I think it's only worth creating an interface when 1) there's multiple impls or 2) there will very likely be multiple impls in the lifetime of the program.

Otherwise, it provides an indirection that makes things unnecessarily confusing. That said, it's still possible to use a concrete class directly but hide as much implementation detail as possible with a thoughtful public API.

Idea #7: Structs vs. Classes

This is a big divide between functional programming and OO programming. FP tends to prefer manipulating struct data structures directly, vs. OOP prefers to encapsulate data structures with classes.

In practice, both of these techniques are useful and I think the trade-offs are:

  • Structs are great for internal representation. Manipulating them is simpler as you can create generic utilities.
  • Classes are great for external representation. It allows you to avoid leaking implementation details.
  • Within a component / change boundary, dealing with structs is a productive choice. If you can easily change all the usages of a struct, then it's OK.
  • Across components / change boundary, you want to return classes so you have more flexibility. Accepting structs as config options is OK as your input should be kept fairly simple (and accepting class objects is OK too).


views