What Is Yak-shaving?

A miserable little pile of secrets.

May 15, 2024

Yak-shaving is a reasonably common term in the old tech world, but like many of the old wisdoms, it has been used and misused, and largely failed to make the jump into games.

The term had existed for some time when it was popularized by Seth Godin with his 2005 blog post “Don’t Shave That Yak”, in which he described a series of increasingly bizarre tasks that diverge further and further from the task you’re actually trying to complete.

“I want to wax the car today.”

“Oops, the hose is still broken from the winter. I’ll need to buy a new one at Home Depot.”

“But Home Depot is on the other side of the Tappan Zee bridge and getting there without my EZPass is miserable because of the tolls.”

“But, wait! I could borrow my neighbor’s EZPass…”

“Bob won’t lend me his EZPass until I return the mooshi pillow my son borrowed, though.”

“And we haven’t returned it because some of the stuffing fell out and we need to get some yak hair to restuff it.”

And the next thing you know, you’re at the zoo, shaving a yak, all so you can wax your car.

I prefer a shorter definition.

“When the prerequisites for a task recursively generate more prerequisites.”

Illustration by David Revoy of the webcomic “Pepper & Carrot”

Yak-shaving occurs for one singular reason.

Lack of pre/intra-task scoping evaluation

Yak-shaving can occur when the depth of the recursion required for the original task is unknown. In practice, the first sign of yak-shaving is often having to go back after sprint planning to report “actually this task is much more complex than I first thought, and now I have to do [OTHER TASK] in order to start”. If this happens more than once in the chain of tasks spawned by the original task, you are probably yak-shaving.

The root cause of yak-shaving is a lack of understanding of the chain of dependencies involved, and those dependencies not being raised during grooming or planning when your task was prioritized and scheduled.

What should happen:

A need for a particular task is raised.
A subject-matter expert flags that the task has prerequisite that prevent the work from being completed.
An optimized chain of work is created, with each task listed as a blocker or sub-task of it’s parent.

What actually happens?

A Real World Example

This is a paraphrasing of a real yak-shaving incident that I was the sole participant in.

I wanted to distribute an app in the Mac app store.
So I needed to sign my code. (known)
So I applied for a code signing certificate from Comodo. (known)
Comodo needed to validate my business entity, which they wouldn’t do because that entity was not locally registered. Everything past this point was unknown work outside of scope.
So I needed to gather materials for personal validation instead.
I had changed my name when I got married, and the name on my locally-verifiable identity had not been updated yet, so I needed to go to the DMV to get a new driver’s license.
The DMV would not accept my name change certificate because it was issued out-of-state, and because as an immigrant, my “foundational document” was my green card.
My green card had also not been updated with my new name, and in order to do that, I needed an FBI background check.
In order to get an FBI background check, I needed to get my biometrics (fingerprints) verified to make sure I wasn’t a criminal, so I had to go to a third-party fingerprinting business.
To park outside the fingerprinting business, I needed to use a particular app, but that app was rejecting my credit card information, so I had to call my bank, and it turned out my card had been flagged for fraud and blocked.

And that’s how I ended up trawling my transaction history to validate my purchases, so I could unblock my credit card, so I could pay for parking, so I could get my fingerprints taken, so I could get a background check, so I could get a new green card, so I could get a new driver’s license, so I could validate my identity, so I could get a code signing certificate, so I could publish a completed app to the app store.

The Real Truth About Yak-shaving

Yak-shaving, at it’s core, is the recursive resolution of technical debt. That debt can occur in many forms. It can be as simple as that the process you’re attempting to undertake has simply not been done before, and you are unaware the dependencies of what you’re attempting to do.

Since we can’t simply wish our way into complete knowledge of the world and our code-bases, instead we must have strategies to manage yak-shaving. In practice, this means identifying when you are experiencing task prerequisite recursion as early as possible.

In my example, if I’d understood the depth of the recursion, I would have just registered a local business entity the second that Comodo told me I needed one. It would have been faster and easier. Instead, I went down a rabbit hole that recursively opened up further rabbit holes, and the rest of those tasks could have been completed at my leisure without blocking app publication.

Yak-avoidant scoping opens up new possibilities for how to achieve your goals. You might be able to assign multiple engineers to work on parts of the chain concurrently, allowing you to still deliver on time. You might discover that it isn’t worth the amount of time you’d have to spend fixing the system, and instead make the call to rewrite it or replace it.

What About When The Chain Is Unavoidable?

When I was working on Dauntless, I needed to add a new type of Hunt to a player menu. Three weeks later, I was tearing the codebase apart, pulling out 30+ references to a kind of daily/weekly challenge system that had not been used in years, of a chain of dependencies in the software that necessitated decommissioning the system in order to update three more levels of system that all needed changes to allow for an alternate data source. It took six times longer than we’d budgeted, and it was pure technical debt. This was unavoidable, but it shouldn’t have been unknown. The work was “punted over the fence” to me by the design team, who simply wanted to be able to selectively show or hide new types of Hunts.

What I should have done was push back on the original task, and instead set up an investigation task to correctly scope the work. If I’d taken the time to evaluate it properly, since I was unfamiliar with the systems involved I would have contacted the original UI engineer who built the menu system, and they could have correctly scoped that work.

What I actually did was just start work on it, and sequentially run head first into the limitations of each of the four layers of systems that constructed the menus.

The fix is the same: Correctly scope the work so know what you’re signing up for, so you can correctly set your stakeholder expectations.

Don’t Shave That Yak

We’ve discussed technical debt before in the context of rapid prototyping. Technical debt should be resolved at the point at which the outcome that resolving it justifies the work involved in resolving it.

These four statements will help you identify if you are yak-shaving:

The work must depend on tasks that were not known during scoping.
Attempting those tasks must generate additional tasks.
The work may not be delivered in the expected timeframe.
The work may not be the the optimal path to resolution of the initial task.

Using this criteria during retros will assist you in identifying yak-shaving incidents, and once you’ve done that, you can start adjusting your workflow to prevent them.

How Do We Avoid This, Organizationally?

A short, but non-exhaustive list:

Design your codebase in a way that takes knowability into account.
- Remember, if you have a live game and a junior cannot fix a bug in your code at 3am without you, you are signing yourself up to be called at 3am.
Publish high level dependency charts of how your game fits together.
- Don’t forget to include historical context of how and why particular systems and arrangements came to be. This can be valuable and is often missed.
Think through the entire process of a task and what it touches before you start to generate a list of dependencies.
- What systems in the codebase does the thing I’m touching rely on?
- Are there things outside the game code this depends on? Kubernetes, services, and deployment environments often get forgotten here.
Be aware when you are making assumptions about how a system works.
Have your processes allow for flagging of those assumptions for validation.
Keeping an eye out for tasks that are recursing, and flagging it early.

—

Stay tuned for next time, when we’ll discuss my favorite red flag:

“We only hire seniors”.

// for those we have lost
// for those we can yet save

Load-bearing Tomato

Discussion about this post