Programming Tradeoffs

One of my favorite programming quotes is by Rich Hickey who said "Programmers know the benefits of everything, but the tradeoffs of nothing".

It's easier to give blanket advice like "composition over inheritance" or "don't repeat yourself", but those rules don't tell you which contexts they're most appropriate for.

Idea #1: Concurrency

There's several models for concurrency:

  • Thread-per-request - this is a very common pattern for servers. The downside with this model is that a thread is heavy-weight, but some languages / platforms have solved this with very lightweight thread equivalents like goroutines in Go. Java is currently working adding fibers to the JVM. Google, internally, has fibers support in C++. This has the simplest mental model because it's very similar to programming in a synchronous style.
  • Event loop - this is used in web programming, but is also used in various other contexts. For example, Chromium had an async task scheduler that essentially functioned like an event loop.
  • Dependency Graph - at work I've used an asynchronous dependency graph framework for Java that allows you to specify various nodes and their dependencies and "unwraps" future nodes by scheduling asynchronous work. The advantage is that it allows you to reach the theoretical best, but in practice it creates a very complex programming model when it comes to simple CRUD-like operations.
  • Streams - I've the least experience with this, but this has been popularized with libraries like Rx. I think this model works well when you have complex dataflows. For example, if your approach to user events requires extensive knowledge of prior events (e.g. dragging, throttling), then modeling it as streams gives you a coherent way of dealing with a series of events. On the other hand, most of my experience with web programming is dealing with simple user events like clicks, keydown, etc. For server-side, streams may be useful if it's critical for your use case (e.g. server streaming live results for a multiplayer game, or streaming video data). In general, though, streams seem to be more of the exception.

Idea #2: Testing

There's a huge range of opinions on testing from "unit testing is a waste of time" to "integration testing is a scam" and everything in between.

What makes a test good?

  • Fidelity - does the test execute actual scenarios? For example, is it easy for your test data to drift from reality over time?
  • Coverage - does the test fail when something incorrect happen? This can be much subtler than it seems, for example, if there's dangling event handlers or resources leaked, will the test continue to pass?
  • Speed - how fast does the test run?
  • Determinism - does the test flake?
  • Maintenance - is it easy to read and update the test? When you see a test failure, how long does it take you to fix it?

Unit test is good for speed and determinism, but oftentimes fidelity and coverage are not as good.

Integration test is good for coverage and depending on how test data is managed coverage may be OK. However, they're usually much slower and tend to be much flakier.

Idea #3: APIs

Thought process for coming up with good APIs.

  1. Start with concrete use cases from likely users.
  2. Imagine all the places that the APIs will evolve. This can be very speculative.
  3. Delete everything that won't be used right away and keep your notes on how the API might evolve.

Some random ideas on APIs:

  • Think about the lifetime of entities. Is there a strict 1:1 mapping in the lifetime of two entities? (for example, will users always create and delete them at the same time?) If so, consider merging the two entities together. Perhaps you need a higher-level concept.
  • Imagine the future, but don't contort the present. Think about how things might evolve and where you'll want to add fields and which fields might become obsolete. But the key is to not let it contort the present API where things are overtly complex and don't make sense.
  • When introducing a new entity, think about how it ties with all the other entities in your system. It will make sure that 1) the new entity is essential and 2) help explain the usage of this entity to your users.

Idea #4: Functional core, imperative shell

This idea is from Gary Bernhardt and it's been something that I've thought about a lot and tried to use in my projects. Keep the core business logic as functionally pure as possible. Map it as inputs -> outputs without side-effects, asynchronous work, etc.

Idea #5: Make coherent changes

I don't think the goal for changing code is to make as many tiny commits as possible. It makes 1) understanding the rationale for a series of logical changes harder to follow (subjective, sometimes), 2) creates an overhead to rolling back changes.

Small CLs:

  • Good for trivial maintenance changes.
  • Incrementally migrating to a new API as part of a large-scale change.

Large CLs:

  • Good for atomically migrating to a new API. When programming in the small, atomically changing is good because 1) you avoid the extra work of supporting two APIs concurrently in the transition and 2) it makes the scope of the change (e.g. is this helping most callsites) clear. At a certain scale, this becomes 1) too risky and 2) impossible to do from a logistics point (e.g. underlying files change).
  • Introducing a new, low-risk component. If there's a feature toggle or definitely won't affect existing components, then the risk of introducing a bug is small.

Idea #6: Indirection

The two main types of indirection I see are:

  1. Functions
  2. Interfaces

Some programmers advocate for lots of little functions like Uncle Bob and others like Steve McConnell of Code Complete say that large functions are good (er... never mind).

For me, I think splitting things into small private methods is OK as long as:

  • The function name isn't an exact description of the implementation. This is obvious when the function name is almost as long as the function body itself.
  • The private methods are kept as functionally pure as possible. Relying on instance variables in private methods makes things hard to follow, particularly when private method #1 sets instance variable A and then calls private method #2 which reads instance variable A. Instead, just pass in parameters and make it clear what the contract is between the functions. The exception to this is if there's one instance that would require a lot of plumbing (e.g. un-related intermediate functions need to know about it).

For interfaces, I think it's only worth creating an interface when 1) there's multiple impls or 2) there will very likely be multiple impls in the lifetime of the program.

Otherwise, it provides an indirection that makes things unnecessarily confusing. That said, it's still possible to use a concrete class directly but hide as much implementation detail as possible with a thoughtful public API.

Idea #7: Structs vs. Classes

This is a big divide between functional programming and OO programming. FP tends to prefer manipulating struct data structures directly, vs. OOP prefers to encapsulate data structures with classes.

In practice, both of these techniques are useful and I think the trade-offs are:

  • Structs are great for internal representation. Manipulating them is simpler as you can create generic utilities.
  • Classes are great for external representation. It allows you to avoid leaking implementation details.
  • Within a component / change boundary, dealing with structs is a productive choice. If you can easily change all the usages of a struct, then it's OK.
  • Across components / change boundary, you want to return classes so you have more flexibility. Accepting structs as config options is OK as your input should be kept fairly simple (and accepting class objects is OK too).


Immutable Objects

Why is immutability great?

  • Avoids sneaky state-related bugs. Recently saw a bug at work where a single array reference was being used my multiple objects and the mutation of this array led to a very surprising bug.
  • Encourages testability. Immutable objects lead you to write pure functions which are always easier to test than impure functions. Think "pit of success".
  • Concurrency. Much easier to do concurrency with immutable objects.

See Effective Java on how to make classes immutable.

Typescript - a way to almost make classes immutable through readonly modifier & conditional types.

Boundaries by Gary Bernhardt

FP vs OOP: Choose Two by Brian Goetz


Learning Canvas

Ref:

  • https://developer.mozilla.org/en-US/docs/Web/API/Canvas_API/Tutorial/Basic_usage
  • Canvas Tutorial Series: https://www.youtube.com/watch?v=8pNzjUjvNsY&t=401s
  • Comparing SVG, canvas, webgl: https://www.linkedin.com/pulse/svg-canvas-webgl-visualization-options-web-sebastian-m%C3%BCller

Libraries:

  • Fabric:
    • http://fabricjs.com/fabric-object-caching
    • https://www.slideshare.net/kangax/fabricjs-building-acanvaslibrarybk
  • http://paperjs.org/
  • https://docs.cornerstonejs.org/concepts/libraries.html
    • https://github.com/cornerstonejs/cornerstone/pull/26/files

Engineering Lessons

  • Focus on the big win. When there's a big win, it makes the work 1) rewarding, 2) keeps you focused on what's important, and 3) encourages you when you encounter challenges.
  • Think about evolve-ability of your APIs. If you make a mistake, which is inevitable sometimes, how hard will it be to fix the mistake? Days, weeks, months or never?
  • Compatibility over purity. It's almost always better to be pragmatic and pick something that's compatible with existing patterns than it is to focus on purity.
  • We are farmers, not builders. Even though engineers like to think of ourselves as building new features and systems, in reality most work is about maintaining and cultivating existing systems which inherently involves messiness with real-world systems.
  • "Plans are worthless, but planning is everything". Dwight Eisenhower said this quote and for engineering, I think it holds true. Even though designs will always evolve over time, it's still worth thinking about design up-front rather than letting it be a sprawling architecture.
  • API: Principle of least surprise. Even without reading documentation, users of your API should not be surprised by the outcome of your API.
  • API: when in doubt, leave it out. When you're unsure about whether an API is needed or whether the current design is sufficient, it's usually better to do nothing. Doing nothing gives you the flexibility to do the right thing later.
  • API: Don't let implementation details leak out. For example, if you have a database interface, you don't want the underlying implementation throw a SQL error. Make sure your exceptions match the same level of abstraction (Joshua Bloch).
  • API: You can’t make everyone happy but you shouldn’t make everyone unhappy with your changes.
  • Documentation is critical for widely-used API. Without excellent documentation, it's highly unlikely anyone will want to use your API and even if they do, it's even more unlikely they will be able to correctly use your API.
  • Implementation: keep it simple silly. When you're unsure whether an abstraction is beneficial, avoid it. Do the simple, obvious thing first like copy and paste code. When you see the pattern emerge, then abstract.
  • Inheritance is not primarily for code re-use. Inheritance is for creating a hierarchy of classes. If you don't want a hierarchy, then use composition instead. This is why people say "favor composition over inheritance". It's not that inheritance is inherently bad, but when you see deep hierarchies with lots of incoherent overrides in some of the chains, then it's a sign that inheritance was mis-used.
  • Good tests is mostly about testability. There's no trick to writing really great tests if the underlying production code is written in a very non-testable manner.
  • Refactor the part that has debt & is changing. A lot of people do code cleanups for parts with technical debt without assessing whether the refactoring effort is worthwhile. If a component is stable, it's usually not worth refactoring. Don't refactor for the sake of refactoring.
  • Understand the context for a tool. Tools like languages, frameworks, libraries, etc. almost always have a particular context for which they are suitable. Understand your own context and understand which contexts are suitable for the tool.
  • Keep functions pure. This means side-effect free. Don’t mutate global variables, parameters, etc. A pure function has no observable side-effect (e.g. an internal cache might cause a side effect, but this isn't observable from the caller).
  • Don't underestimate how hard change is. No one likes change, particularly when it's small but noticeable and has no obvious improvements from before.