Problem Domain Naming

Problem Domain Naming

Code ultimately has to run on a computer, but most code is designed to solve other real-world problems. The space surrounding that problem is called the “problem domain.” How we go about solving those problems with software is the “solution domain.”

Many computer programmers grew up enthralled with computer science and thus intimately familiar with all sorts of computer terminology. They tend to be less familiar (at least initially) with the domains of the real-world problems their software is trying to solve.

Crafting names in the problem domain, rather than the solution domain, has many benefits that we’ll discuss. However, this is something many people struggle with, understandably so when writing code within a new, unfamiliar problem domain. Rather than settling for limited understanding, strive to improve your problem domain knowledge. When thinking about naming variables in the problem domain, it can be helpful to ask questions like "What is the problem domain for this particular piece of code?" If you can't answer that question, it may be worth seeking out further knowledge before settling on a variable name.

Benefits of Problem Domain Naming

To be effective, computer programmers have to mentally span levels of detail all the way from the operation of computer hardware to real-world business problems being solved by their code. Edsger Dijkstra described this as a baffling nine orders of magnitude and indicated that human minds are not capable of intellectually spanning such huge orders of magnitude. Thus, techniques that help human programmers minimize their mental burden are immensely valuable.

When code is trying to solve real-world problems, you really want to be able to “read” about those real-world problems in the code - that’s important for understanding and verifying correctness of software to ensure it is actually doing what’s intended for real-world users. Extraneous concepts or terms get in the way of that and increase the mental burden of readers - there is more to process and filter out.

Therefore, as much as possible, strive to have variable names use terminology that exists within the problem domain your code is operating in. Understand the terms familiar to users. Using such terms in your variable names will lower the mental burden for making sure your code solves the problems of real-world users.

Naming in the problem domain also has benefits outside of just a programming team. If problem domain variable names are used, that can invite collaboration with less-technical audiences involved in a software project - testers, customer support, project managers, business staff, etc. Collaboration with consistent terminology can result in better communication and make it easier to adapt code to solve the real-world problems as details change.

Problems with Solution Domain Naming

The main problem with solution domain naming is that it introduces extra concepts - typically complex concepts of lower-level computing details - into code. Those extra concepts impose extra mental burden, getting in the way of reading code to understand the real-world problems it is attempting to solve. When that extra mental burden exists, it makes it much more difficult to ensure code is reliably and correctly solving the real-world problems it is supposed to.

Solution vs. Problem Domain Names

A common problem is including programming language concept, data structure, or type information in variable names. Err on the side of leaving such solution domain information out of variable names unless absolutely needed for sufficient specificity. “Computerish” terms tend to be a hint that a name is focused more on the solution domain than the problem domain.

When such additional solution domain terms are included in variable names, they tend to communicate more of how the problem is being solved (at a computer level). While not always the case, when programmers typically include terms like "list", "dictionary", etc. in variable names, that tends to refer to some particular programming data structure that happens to be chosen to implement a solution to some problem in code ("how" some problem domain concept is represented in the code).

Such names can also be more fragile. If the underlying data structure needs to change to some different collection type, for example, those variable names can instantly become inaccurate unless extra effort is put in to update them. Time spent on such efforts is usually best spent elsewhere.

Instead, focus on making your names more problem domain oriented. This is done by focusing more on the what of the problem domain, rather than how you may be choosing to solve some problem in code. For example, you can make collections plural (name_list -> names) or communicate problem domain information for the keys/values in a dictionary (age_dictionary -> ages_by_name, which also has the advantage of enabling reading for correctness by spotting mismatches in keys/values).

Remember That Code Exists in Levels of Abstraction

Obviously, code has to run on a computer eventually, so there will be times where code needs leverage computing concepts to actually work and solve a problem on a computer. When this happens, it can seem like there is a conflict between avoiding solution domain “computerish” terms (in favor of problem domain naming) and having sufficiently clear and accurate variable names.

In reality, there is no conflict.

All names in computer code are abstractions, and any non-trivial piece of software will be divided into multiple levels of abstraction. At the top level, you’ll have code that is fundamentally focused on solving the user’s real-world problems. At the bottom level, you’ll have code that needs to interoperate with the actual computer. Between those levels, there may be multiple other levels of abstraction.

To minimize mental burden, what you want is a gradual descent from one level of abstraction to another. As you move from a top level focused on solving the core real-world problem, you may start interacting with various lower-level classes that represent core objects in the problem domain. As you go lower, those lower-level classes may be implemented using computer science data structures. As you go even lower, those computer science data structures will have to be implemented using lower-level primitives a given programming/computing environment provides.

The way to think about this is that as you descend down levels of abstraction, the problem domain for that level may change. For example, referencing “bytes” in a name may be wholly inappropriate at a top level for a shopping cart app. But at lower levels where data structures directly manipulate bytes for high performance, referencing “bytes” in a name would be appropriate - the problem that you’re solving at that level of abstraction is high-performance manipulation of bytes, not a user shopping.

Of course, if your software is focused on solving a “computer problem” (like many software development tools), using various “computer” terms may be appropriate - that’s your problem domain.

As you program at different levels of abstraction, make sure you’re naming variables at the right level. Intermixing too many different levels of abstraction in the same body of code at once can make naming more difficult and complicate understanding.

Tips for Problem Domain Naming

The following sections give more specific tips on naming variables in the problem domain:

Related content