> DRY does NOT lead to over-complicating things.
That is not true. I dive around foreign code bases a lot and dry-ness is actually a significant complicating factor in understanding code, because you're jumping around a lot (as in physically to different files or just a few screens away in the same file). As in, inherently every time it's used, not just in situations where it's used in a complicated way.
This sounds dumb but it just simply is much harder to keep context about what's going on around if you can't refer back to it because it's on the same screen or one short mouse scroll above or below your current screen.
That obviously doesn't mean you should leave copy pasted versions of the same code over your code base. But it's important to consider that refactorization of that code into something common that gets called from multiple places as something that you don't get for free, but that is an active trade off which you usually have to apply to prevent bugs (changing one code location and not the other) or simple code bloat. In practice this is very relevant when you suspect something might be repeated in the future, but you're not sure. Imo: Just don't factor it out into anything, leave it there, in place, in the code.
`make_pizza(["pepperoni"])`
What does `make_pizza()` do? It could be a lot or it could be a little. It could have side-effects or not. Now I have to read another function to understand it, rather than easily skimming the ~four lines of code that I would have to repeat.
I think the article fails to show particularly problematic examples of DRY. E.g. merging two ~similar functions and adding a conditional for the non-shared codepaths. shudders
This is not a problem of DRY. This is a problem of wrong abstraction and naming. If the function is just four lines, it could easily be named `make_and_cook_pizza`. In the alternative scenario where those four lines are copy pasted all over the place, one is never sure if they are exactly the same or have little tweaks in one instance or the other. Therefore, one has to be careful of the details, which is much harder than navigating to function definition, because in this case you cannot navigate to other instances of the code.
The code had test coverage, but the test confirmed that it produced the wrong result. I had to fix the test too.
however... real software doesn't work like this. the abstractions that work that way exist for a select few very well understood problems where a consensus has developed long before you're looking at any code.
math libraries would be a typical example. you really don't need to know how two matrices are multiplied if you know the sort of black box properties of a matrix.
but the minute functions, classes, and other ways of abstraction code in a DRY way that you encounter constantly in everyday code, even if they are functionally actually well abstracted (meaning it does an isolated job and its inputs and outputs are well defined), even for simple problems, are typically complex enough that learning their abstract properties can be the same level of difficulty and time investment as learning the implementation itself. on top of practical factors like lack of documentation.
this is also why DRYness as a complicating factor really doesn't factor in once the abstracted code does something so complex that there is no way you could even attempt to understand it in a reasonable amount of time. like implementing a complex algorithm, or simply just doing something that touches too many lines of code. in this case you are left to study the abstract properties of that function or module anyways.
def make_string_filename(s):
# four lines of regex and replace magic
so that we have code like file_src = make_string_filename(object_name)
file_dst = make_string_filename(object_name_2)
which is much more understandable than eight lines of regex magic where you don't even know what the regex is doing.The problem of not knowing what it does or whether it has side effects or not is more a problem of naming and documentation than DRY. Even then, it's still better than repeating the code all over, simply because when you read and understand the function once, you don't need to go back. On the other hand, if the code is all over, you need to read it again to recognize it's the same piece of code.
additionally, the function should be stateless and have no side effects ;)
``` def make_string_filename(s, style="new"): # 2 lines of shared magic if style == "old" # 2 lines of original magic elif style == "new": # different 2 lines of magic ```
When you get here, two totally separate `make_string_filenames()`, each private to the area of code they're relevant to, would be better.
I’ve seen this again and again in the field and I wholeheartedly agree with the sentiment in the OP. IMHO different code paths should only share code if there is good reason to believe that the code will be identical forever.
So now you've dumped it down to an interface with a default implementation which calls the create_dough, add_toppings, bake_pizza interfaces in order, each of which are either passed in callbacks or discovered through reflection.
We can even sprinkle in some custom DSL to "abstract away" common step like putting the product into the oven correctly!
Jr's will never understand when why and what is effectively excecuted at runtime. Honestly, at this point I enjoy working with this kind of code. It's always such a high entertainment value and I get paid by the hour, so whatever
https://sandimetz.com/blog/2016/1/20/the-wrong-abstraction
Quote follows:
----
The strength of the reaction made me realize just how widespread and intractable the "wrong abstraction" problem is. I started asking questions and came to see the following pattern:
1. Programmer A sees duplication.
2. Programmer A extracts duplication and gives it a name. This creates a new abstraction. It could be a new method, or perhaps even a new class.
3. Programmer A replaces the duplication with the new abstraction. Ah, the code is perfect. Programmer A trots happily away.
4. Time passes.
5. A new requirement appears for which the current abstraction is almost perfect.
6. Programmer B gets tasked to implement this requirement. Programmer B feels honor-bound to retain the existing abstraction, but since isn't exactly the same for every case, they alter the code to take a parameter, and then add logic to conditionally do the right thing based on the value of that parameter. What was once a universal abstraction now behaves differently for different cases.
7. Another new requirement arrives. Programmer X. Another additional parameter. Another new conditional. Loop until code becomes incomprehensible.
8. You appear in the story about here, and your life takes a dramatic turn for the worse.
Existing code exerts a powerful influence. Its very presence argues that it is both correct and necessary. We know that code represents effort expended, and we are very motivated to preserve the value of this effort. And, unfortunately, the sad truth is that the more complicated and incomprehensible the code, i.e. the deeper the investment in creating it, the more we feel pressure to retain it (the "sunk cost fallacy"). It's as if our unconscious tell us "Goodness, that's so confusing, it must have taken ages to get right. Surely it's really, really important. It would be a sin to let all that effort go to waste."
A functional style certainly helps. I get the pizza in my hand and don’t have to worry that anyone left the oven on.
You can't, unless it's in a standard library or a core dependency used by millions of people.
That's one of the reasons why functional code is generally easier to read. A lambda defined a few lines above whatever you're reading gives you the implementation details right there while still abstracting away duplicate code. It's the best of both worlds. People who's idea of "functional programming" is to import 30 external functions into a file and compose them into an abstract algorithm somewhere other than where they're defined write code that's just as shitty and unreadable as most Java code.
`makePizza :: PizzaType -> [Toping] -> IO (Pizza)`
Seems to carry all that information by just accepting a PizzaType symbol and a list of toppings, `IO` communicating the side effect.
Not a problem of DRY, but bad code structure.
Just keep the two functions and pull the shared code-path out
In these cases factorizing may or may not be a good idea.
What makes it easier to understand a system is simplicity. I'd argue that DRY, deployed with a right strategic plan, usually does more to simplify things than does copy-paste.
But, as any tool, DRY is but a tool; to be useful it requires some skill.
DRY only hits when you indeed repeat something.
If you predict potential reuse, which you don't certainly know, it's premature optimization.
Abstractions have non zero complexity costs.
And
Repeated code has non zero complexity costs
Why is this a hard concept?
It doesn't make dry any less valid.
Generally you can invoke both reasons to do something but the underlying reasoning is always complexity.
You can't just slam the DRYness knob to 11 and expect it to always be better, any more than you can turn a reflow oven up to 900°C and expect it to be better, just because 380°C is better, for the specific PCB in question, than 250°C.
It also doesn't mean you can turn it off entirely, just as if you look at your charred results at 900°C you don't conclude that "heaters considered harmful".
Also, the problem is strongly multivariate and the many variables are not independent so the "right" setting for the DRYness knob is not necessarily the same depending on all sorts of things, technical and not, up to and including "what are we even trying to achieve?"
I can't agree more. Also, "code reuse" makes debugging significantly harder when trying to reverse engineer some code base. The breakpoints or printf:s get triggered by other code paths etc. And you need to traverse stack frames to get a clue what is going on.
Extra bonus points for fancy reflection so that you have no clue what is going on.
If you make everything as generic and reusable as possible from the beginning, you'll end up with messy code that has way too much options to set for every simple operation.
Increasing the distance between inputs and outputs increases complexity.
Reusable code isn't all that reausable when nobody understands it or things are so fragmented people can't figure out how to operate the code base.
This isn't a rule. It's a moderation thing.
To note, a common effect of not DRYing functions is an increase in local code length.
In many code bases that lived long enough, that means screens and screens of functions inside the module/class files. It is still easier to navigate than between many files, but not by that much in practice (back/forth keyboard shortcuts go a long way to alleviate this type of pain)
Are you still using a VT100?