Let’s talk about line length limits in programming. Common values are 80, 100, 120 characters. What is the correct number of CPL (characters per line) to limit yourself to?
This is part of the Zen of Coding series.
Various big-name industry players publish their formatting guidelines for different programming languages. Let’s start by just looking at what industry does.
Let’s review the various arguments that come into play one at a time.
I can’t necessarily bring myself to stand behind this. I do think it’s true that within a single project, standards should be consistent. I also think that in a particular team or organization, there should be a generally followed standard to prevent friction for developers that work on multiple projects.
That said, I don’t think the fact that 80 characters is the most commonly used is sufficient reason alone to adopt the standard. When most organizations that adopt it recursively cite “tradition” as the backing reason, that is an example of argumentum ad populum. The friction of going between organizations with standards that differ by ~20 characters is both infrequent and minor enough to be negligible in my opinion.
I think this stack overflow answer summarizes the situation the best: the primary value of consistent formatting guidelines like line limits isn’t because one is necessarily better, it’s to limit time-wasting discussion of inconsequential formatting rules. True evil is not an 80 character, 100 character, or 120 character limit, but rather making a big deal of it and wasting developer time.
Linus Torvolds said as much publicly nearly 13 years ago. More recently, he also pointed out more than a few concrete reasons why “tradition” and “convention” are not valid reasons to maintain the 80 character limit. This is coming from the guy who still sticks to ANSI C, so I’m inclined to trust his judgement isn’t premature.
In other words, consistency means picking a reasonable limit and sticking to it for as long as it’s reasonable. It does not, however, say that a particular limit is better than the others.
To add a bit of my own thoughts: soft limits are more effective than hard limits because they enforce the standard while eliminating the need for human intervention in minor exceptional cases. Let me give a concrete example:
This doesn’t mean hard limits shouldn’t exist, but they should:
Otherwise, you have undermined one of the major benefits of adopting consistency in the first place. It should always be the responsibility of tooling to complain about formatting problems when it actually matters.
The most important thing I can note here is that studies have shown again and again that perception of readability at various line lengths (aka, reader preference) is constantly at odds with actual speed of reading and comprehension.
The bottom line is that longer lines are read faster, regardless of how the readers feel about it (empirically verified up to 95 CPL).
What does negatively impact readability is the need to horizontally scroll, hence a line limit that prevents the need to do so should be used. In the specific situation of needing to read two columns of code side by side on a 4:3 monitor, 80 characters is the maximum limit. People who read three columns of code at once, use narrow monitors, or use exceptionally large fonts are in the very small minority. In most other situations, 100 characters is sufficiently restrictive to prevent horizontal scrolling.
Setting the line character limit too low will also negatively impact readability. This is obvious to understand if you imagine a CPL limit of 40, but it can also happen at a limit of 80 or even 100. The most common negative outcome from limiting line length is that variable names will be shortened and comments will be truncated, rendering the purpose of the code less obvious. Every study shows this is detrimental to readability.
There are some language-specific elements that affect this argument as well. For example, in Java, named identifiers tend to be longer, and as such, may benefit from a longer line length limit. This is exactly what Google does, which is an admission that excessively wrapped lines harm readability.
In other languages, such as Python, newline whitespace has semantic meaning. PEP8 states clearly that backslashed (escaped) newlines should be avoided if implied continuation newlines are possible, which is an admission that such additional semantic characters harm readability.
Finally, to help concretely illustrate how low CPL limits can negatively impact readability, here is an example of a non-contrived (but admittedly selected) section of Python code both before and after limiting to 80 characters.
Set a soft limit of 80/100 characters, create a CI/CD job that auto-wraps lines longer than 100/120 characters, and never, ever have a human discussion about it again. If there is disagreement, send them this article. If they still disagree strongly, then it might be worth a discussion.
This is part of a series where I dig up debated software engineering concepts and try to hone in on an answer while sharing my research. You can find the whole series at the Zen of Coding.
Published September 26th, 2022
The ramblings of a software engineer, with an emphasis on security.
© Aaron James 2022