setting thresholds on method / class sizes seems quite arbitrary and potentially harmful. Splitting a method into n different ones, none of which is called more than once is setting up the code for opportunistic reuse and obscures it's true function. It's especially wrong if the code that was split is mutating / non pure functional.
Agreed, I find it difficult to read code that's been split up into many small pieces because it gets difficult to keep track of the overall flow. Even with an IDE, the jumping around (and maintaining a mental call-stack) can be quite distracting.
My guideline is basically "as big as it needs to be, without having large pieces of duplicated functionality." If an algorithm requires 100 lines, none of which are repeated or very similar to the others, then so be it.
I wonder what the distribution of function lengths is like in the Linux kernel... there will be longer ones and shorter ones, but a 20-line limit seems absurdly short.
You're not supposed to "flow" through multiple levels of abstraction in your head. That is the fundamental problem, and why we so often think that it's okay to have large functions (you know, to 'keep it all in our heads', or whatever reason is most often touted).
Not only that, but as the OP mentioned that it doesn't work if the methods/functions have side-effects, then we're building up opinions on a sloppy and fundamentally broken house of cards.
First off, your methods should not be having side-effects that are not obvious from the purpose of the method. And then secondly, back to my original point about not having to flow through the code. In light of that, a function should do one thing, and that's it. And when it does so, the method name clearly states what it does. After you put those two together, you simply "assume" that method does what you expect it to, and "flow" through the logic at a higher level.
I'm not going to go down into some silly "read_input_data" method while "flowing" through the high-level code, because I know what it is supposed to do. And by extension, at some point I will review it in its singular/atomic entirety, or would have already done so. Either read_input_data returns the data, or it loads it up into a instance-variable.
>setting thresholds on method / class sizes seems quite arbitrary and potentially harmful. Splitting a method into n different ones, none of which is called more than once is setting up the code for opportunistic reuse and obscures it's true function.
Sometimes, but in general I don't agree.
If the functions are made private and giving appropriate names than it's really a way of self-documenting code, which helps.
Suppose you have:
some_function() {
// Calculate interest
// 100 lines of code
// Apply tax
// 120 lines of code
// Store results
// 30 lines of code
}
Then the need to have those "banner comments" there already indicate it might be a good idea to decompose the function into:
It's not just about reuse, but about documenting what each section of code does, and about limiting interactions between arbitrary sets of variables. Functions are create boundaries (assuming they don't intensively use global variables or class instance variables when part of a class)
This depends on the language capabilities. In the C/Lisp/ML families of languages at least you can create isolated scopes. It's unfortunately not true of Javascript (and Ruby?).
In Ruby some people achieve this by wrapping actions in their own class, which does the trick if the classes are kept independent from each other (no shared state, except global vars, which nobody uses in the ruby world).
Where I work we refer to guidelines like this as code smell. These are cues that refactoring should be considered but not strict rules which would prevent a commit.
I disagree. Breaking down a function into several one time functions that all sit at the same mental model of abstraction make code much easier to reason about.
A named one-time-use function has a name that describes what it does. This seems obvious, but bears repeating: instead of having to figure out what a chunk of code does, you can guess from the name and have a series of steps naturally described.
You can get the same effect with a block of code by properly naming variables and adding a single comment line if need be. It's really just a personal preference.
You'd have to have a standard vim config with folding to get it exactly the same.
Also, there's more restriction in that the single-use code block only has access to the variables passed into it, which makes it easier to parse, since less things could be happening. It serves to prune variables from the local scope.
I agree with you in principle but find that code editors let me down in that regard.
Say I split a function up into a few sub-functions because it's getting big. Now I have the problem that I'm jumping forwards and backwards through the code when I want to explore what that function does:
In this case, SubfunctionC() might end up a few pages down in source code. So now it's a context switch to go there and then go back.
Now this can be somewhat avoided with good function names (so you don't HAVE to jump backwards and forwards) and keyboard shortcuts (to make the process quicker), but it's still a trade-off that I think needs to be kept in mind.
I think you've hit upon a good measure of function quality. If you find you have to jump around a lot when reading, that would be a sign that it's poorly organized and needs to be refactored. On the other hand, if you find you don't have to jump around and can trust the sub functions by their names, then it's been broken up well.
In the best case you should be able to follow the logic without diving into the other functions, only looking at their implementation details as that particular detail becomes relevant.
From a static point of view this is true, however splitting the functionality into sub functions might later obscure commonalities that would have been obvious if they were still part of the upper function.
It all depends at which stage of development you are at. If the piece of code we're talking about is mature and rarely changing then yes it seems like a reasonable thing to do. If however this section is still under development then I would opt for another way of dealing with readability issue.
I think if what you want is to optimise the readability of the code then a simple comment above each section, combined with block scoping is a good set up for splitting.
As the true nature of the code emerges one can decide to turn the block into local lambdas then later into functions.
Again that's because of the first point I've made here in this comment. If you are still developing the functionality, splitting early means that subsequent reviews&development may miss potential interactions + potential refactorings/simplifications that would have been quite clear if the function was still "messy"
IME refactoring optimises for a local minima of code entropy. Any time you want to add new functionality, you're going to have to jump back out into 'mess' to implement the feature, then 'fix' it again with further refactoring, with all the extra overhead this entails.
If you can't understand what UsedToBeSomeMassiveFunction() does from the SubFunctionK()'s names and arguments, they were not split out correctly.
Often I find the best way to simplify large functions is to tear out sub-blocks and give them name. Loops or large conditional blocks are usually easy to tear out and can usually get very meaningful names. Long stretch of imperative code however does not separate well and probably should remain a very large function.
I think it's a trade off we could get away from if we improved our tools. We should make tools that reduce our cognitive over head. I am regularly diving through code at my new job trying to figure out how everything fits together. Reading code split between various functions and figuring out program flow is a necessary skill, even if you do keep monolithic functions.
I would love an IDE that showed inline in some sane manner function calls, letting me recourse arbitrarily deep - some thing ala code collapsing and expansion. Extra points if you can show a for chart to side of function calls, class interactions and general code flow, highlighting where your cursor is. More points for a pane that shows what variables have to be in order for a the branch of code you're working in to be reached. A changeable granularity setting would be great for all of these.
I think often times we discuss pros and cons without discussing what they could be if we put some effort into changing the current situation.
When I have to deal with something like this in emacs I will normally split the window in two so I can look at two different parts of the buffer at the same time. I assume most other editors allow the same?
If you name the methods correctly (what they do) splitting actually helps readability of the code.
It is quite hard to read and remember what was in the beginning of 500 line method.
IE it can help to understand the intent and flow of the code, but also make it harder to debug. If I'm in a situation where I want to follow the exact instructions a large fraction of the code performs, a call stack jumping all over the place can be pretty frustrating.
It's always a balance between "The Blob" and the "Poltergeist" anti-pattern, when in doubt go more towards borderline "The Blob".
I just say though it's very seldom that a huge class/method is justified. Usually there are plenty self-enclosed logic or reusable methods that can be broken out either to utility libraries or a separate class.
My rule of thumb is that you should be able to look at a method/class and understand what it does, and that diminishes quickly over a certain size.
Sure some logic might be in other classes but that's another layer of abstraction
I agree that arbitrary limits can be a bit restrictive at times, but that is mostly an issue when dealing with automated code checking tools.
However, in this case it is more about something to keep an eye for for the reviewer -- if the function is large and complicated it may be a good idea to take a closer look at it. Presumably the reviewer won't force the function to be broken into smaller pieces unless it actually improves readability and maintainability of the code.
Its extremely subjective though. For the first few years after starting, I thought that large functions were bad and should be split up unless theres a clear reason not to.
Now, I feel that it should be the other way. Most of the time you want to keep the code that does something together, splitting it up only to factor out common reusable pieces (that are actually reused).
I mean, some functions/methods will be longer than usual. Even if you split it into multiple ones the end product will be the same. Due to some of those derivative functions/methods being exclusive children to the caller. Sometimes it's best to leave the bigger one alone, and some times it helps to break it down. You do end up with basically the same size codebase. Dunno if it's more readable. Though this somehow shines some light into other issues. Such as language verbosity. Some languages are just syntax factories. All the logic is accompanied by a truck load of verbosity.
I agree with your point. It seems silly to be strict about such things. More so if the codebase is written in Java or any other verbose language.
Yeah, thank you for the link! I use this list as a set of rough guidelines, basically, so it may or may not apply in all cases.
Sometimes it can be helpful to split up a method to be called once if it helps readability, for example if it helps split up a loop or abstracts the specific way we get a value.
see http://number-none.com/blow/john_carmack_on_inlined_code.htm...