Code Cop: code smell

4 July 2024

Encapsulation vs Business Rules

Business Men (licensed CC BY-NC-ND by Andreina Schoeberlein)

No Naked Primitives is a Coderetreat constraint which trains our object orientation skills. No primitive values, e.g. booleans, numbers or strings, must be visible at object boundaries, i.e. public methods. Arrays and other containers like lists or hash-tables are primitives, too. I love this constraint, as it pushes people right out of their comfort zone. ;-) (I wrote about No Naked Primitives in combination with other constraints and included it in the expert level Brutal Coding Constraints.)

Value Objects
The usual designs to avoid naked primitives are Value Objects and First Class Collections. Value Objects, by design, expose the values they wrap with a getter because some other objects will want to use these values. What happens if I go extreme and do not allow any primitives at object boundaries? (Of course this is crazy, a clear case of Primitive Obsession Obsession. Still when Coderetreat facilitators get together to practice, things end up like that.) Let us take the Game of Life as an example. (If you do not know the Game of Life, read the description and implement it right away.) In the game, for evolving a generation, I need to count the living neighbours of each cell. The number of living neighbours is an integer and its value object in C# could look like

public class NeighbourCount {
    private int count;
    
    public NeighbourCount(int count) {
        this.count = count;
    }

    // ... code to manage the count

}

Now any code which depends on the data (i.e. the count) will have to be moved into the value object to be able to access the data. Following the rules of the game, if there are two or three living neighbours, a living cell lives on. The method ApplyRulesOnLivingCell implements this rule.

public class NeighbourCount {

    // ...

    public GridSpace ApplyRulesOnLivingCell() {
        if (this.count == 2 || this.count == 3) {
            return new AliveCell();   
        }
        return new EmptySpace();
    }
}

public interface GridSpace {}
public class EmptySpace : GridSpace {}
public class AliveCell : GridSpace {}

Grouping the data (count) and the logic which is based on the data, uses or modifies it (ApplyRulesOnLivingCell) together is a core principle of object orientation. Further all data is strongly encapsulated.

Polymorphism
The next method ApplyRulesOnEmptySpace is similar. The decision which of the two methods to call depends on the state of the cell, which is either alive or dead/non existing. This boolean state has to be encapsulated inside a class, e.g. class GridSpace. This class behaves differently for the values of the boolean state, which makes the boolean a simple type code. The object oriented way to work with type codes is to use polymorphism:

public interface GridSpace {
    public GridSpace ApplyRulesWith(NeighbourCount count);
}
public class AliveCell : GridSpace {
    public GridSpace ApplyRulesWith(NeighbourCount count) {
        return count.ApplyRulesOnLivingCell();
    }
}
public class EmptySpace : GridSpace {
    public GridSpace ApplyRulesWith(NeighbourCount count) {
        return count.ApplyRulesOnEmptySpace();
    }
}

The code looks weird and it is not my usual implementation of the game's rules. It has an issue: The rules of the game are distributed among three classes. This is Shotgun Surgery - when a single change is made to multiple classes simultaneously: If I need to change the rules, or even want to read and understand the logic of cell evolution, I have to go to three places.

Shot (licensed CC BY-NC-ND by Bart Maguire)

Business Rules
On the other hand, a basic implementation of the rules using primitives (e.g. in Ruby because polyglot programming is cool),

def alive_in_next_generation(alive, living_neighbours)
  (alive and living_neighbours == 2) or 
  living_neighbours == 3
end

is one line of code and easy to understand. The game's rules - the business rules - are boolean expressions describing certain situations, which "the business" needs to act on. Typical examples of such situations are when an item is out of stock or a client qualifies for a discount. Business related conditions are called policies. (And there are predicates, which are boolean expressions, too. These have their origins in formal logic.) Boolean expression are functional in nature. So a functional design, i.e. functions operating on primitive data, could be more appropriate. Even in object oriented design there are use cases for objects containing only logic and no (mutable) data.

Conclusion
What is the point of my discussion? In the case of Game of Life, there is a tension between keeping data and its logic together versus keeping related logic together. This is particularly true for boolean expressions and code depending on them, as boolean values usually end up in conditionals. I like to keep "decisions" and the logic depending on them close together but I want to keep my business rules in one place even more. I am wondering if this is true for most design situations, besides Game of Life. Boolean logic is interesting because if allows variation in the automation. Code without any booleans is still useful, e.g. pure calculations or uniform transformations in a pipeline style of operations.

Taking it further?
While boolean is a primitive, it is different from other primitives. What happens if I do not allow any primitives besides boolean at object boundaries? The data of class NeighbourCount would still be encapsulated when I add relevant queries (in Python because I love programming languages):

class NeighbourCount:

  def __init__(self, count):
    self._count = count

  # ... code to manage the count

  def isTwoOrThree(self):
    return self._count == 2 or self._count == 3

  def isThree(self):
    return self._count == 3

Using these small methods, I get a concise implementation of the rules,

class Rules:
  def cellInNextGeneration(self, cell, count):
    if (cell.isAlive() and count.isTwoOrThree()) or count.isThree():
      return AliveCell()
    return EmptySpace()

Is this better? I am not sure. At least the (business) rules of the Game of Life are in one place now. They could be replaced with different rules if needed, making the design extensible. At the same time different rules would most likely require different queries in NeighbourCount. For example in Hex Life, I need a weighted sum of first and second tier living neighbours to decide the state of the next generation. This is not possible without adding new queries to NeighbourCount. The Open Closed Principle is not satisfied. (Then maybe Hex Life is too much of a change for any design to "survive".) My rules logic keeps calling into the encapsulated value object repeatedly, which looks much like Feature Envy. I feel like I am going in circles here ;-)

15 October 2017

Introducing Code Smells into Code

Code Smells
Code smells are hints that show you potential problems in your code. They are heuristics: Like in real life, if something smells, look at it, think about it, and change it if necessary. In the classic book Refactoring - Improving the Design of Existing Code, Martin Fowler describes 21 code smells like Long Method, Primitive Obsession, Switch Statements, Feature Envy and other anti-patterns that indicate deeper problems in your code.

The Brutal Refactoring Game
Adrian Bolboaca came up with the Brutal Refactoring Coding Game. He explained the history of the game and the game itself on his blog. I attended his workshop at the XP conference 2013 and experienced the game first hand.

In the game participants are asked to write the cleanest code possible. If the facilitator spots any code smell, participants must stop and immediately remove it. Adding functionality is forbidden until the facilitator agrees that the smell has been removed. In his workshop, Adi gave us a numbered list of smells and gave cards with the appropriate number to pairs where he saw a code smell. And he was mercilessly flagging the smallest problems in our code. ;-)

Code Smells Used in the Game
Adi chose these code and test smells for his game:

Lack of tests
Name not from domain
Name not expressing intent
Unnecessary if
Unnecessary else
Duplication of constant
Method does more than one thing
Primitive obsession
Feature envy
Method too long (has more than six lines)
Too many parameters (more than three parameters)
Test is not unitary
Test setup too complex
Test has an unclear Act
Test has more than one assert
Test has no assert
Test has too many paths

Adi told me that he chose these smells because he saw them most often in his clients' code bases. His list definitely misses duplication, deeply nested conditionals and a some more. A more complete list might contain 30 items, making it more difficult and potentially frustrating for participants. (Maybe I will come up with the Moar Brutal Refactoring Game in the future...)

Observations during the Brutal Refactoring Game
This article is not about the Brutal Refactoring Game, but about code smells introduced into code. The game allows observation how and when code smells are introduced (because the whole point is to spot and remove them). As part of my refactoring training I facilitated the game more than ten times. Each time took 3 to 5 hours and had six to eight participants. The teams were average teams with several senior developers and an occasional junior developer. People worked in pairs and implemented Tic-Tac-Toe. Most teams used Java, two teams used C.

Discussion of Introduced Code Smells
Here is the code smells statistic:

Code Smells Introduced

The chart shows the number of problems I flagged during the last ten games. The different colours of the bars show the different teams. Obviously not all smells are introduced equally often. The first smells appear 10 to 15 minutes into the exercise. One team using C had difficulties with the setup and was going forward very slow - they produced little code and very few smells.

The first smell I usually see is 1 - Lack of tests. Even people following the TDD cycle happen to create "more production code than is sufficient to pass the test." This happens in the beginning and also later during the game.

Naming is hard. Not surprisingly the most common smells (number two - Name not from domain - and number three - Name not expressing intent) are naming related. Naming things after the problem domain seems twice as hard as pure technical naming. Any non trivial method could be named process or execute, but that does not help understanding the code at all.

Primitive Obsession (number eight) is the most common single code smell I have seen during the game. It is introduced early in development when method signatures are created and APIs are designed. It occurs roughly as often as the naming related smells together. Most Tic-Tac-Toe implementations are (publicly) based on numbers, pairs of numbers, arrays of numbers or the like. Primitive Obsession is very dominant in many (Java) code bases. In my code reviews I am used to method argument lists like String, String, String, String, String, int, long, long etc. Instead of using all these primitive values, they should be wrapped and should not be visible at object boundaries. (I have written more about primitives in the past.) This is an object oriented design smell.

The third most often flagged code smell are long methods (number ten). This smell is introduced later, when logic is added to existing methods. I see this smell more in the second part of the game. Even when using TDD, this smell is introduced if the refactoring phase is skipped or taken lightly. Long methods are also very common in legacy code bases and difficult to understand or change. Everyone hates these 1000 lines long methods, still I find them in every (large) code base I look at.

Code Smells Categories
To conclude this analysis let us have a look at problem categories. I aggregated Adi's 17 code smells into four groups:

Problems in test code
Naming related smells
Missing object orientation
Complexity

It seems that unit testing is the least problem - which it definitely not true. Most teams I work with have no automated (unit) tests for their production code. Maybe there were less testing issues during the game because the teams had learned about testing smells before. I practice refactoring with my teams after we have worked through all of unit testing.

Initially I was surprised to see missing object orientation high on the list. Now, after writing about it, I think it is also related to my "coaching/ learning plan". After refactoring I go for naming and finally object orientation. (Maybe the order of topics is wrong, but unit testing is easily sold to management and refactoring is asked for by developers often, making both topics ideal to start an improvement initiative.) I do not expect less naming problems, even after a few sessions on naming, because - as discussed before - naming is hard. I would expect the object orientation of the solutions to improve.

Samir Talwar wrote about his experience with the game. As facilitator he had a different focus, e.g. he was more strict about unnecessary if, treating it more like a No If constraint. He also saw different code smells being introduced. (I recommend reading his summary.) We both agree that naming is hard and causes many problems.

Comparison of Team Performance
While the participants were industry average - maybe even above - they were in need of improving. (Who is not?) The following bar chart shows the number of code smells introduced into the code by each team. (To compare the teams I removed the one with setup problems.) Some teams ran the exercise twice. Some of them improved, some did not.

Code Smell Categories by Team

On average, each team introduced 17 issues into their code base, right from the beginning of a small project, during a few hours of work. I am sure they tried hard because I was watching them, still this result is very disappointing. I am scared of the massive amount of code smells lurking in real world projects.

There is a noticeable difference between individual teams. Some teams created only half as many smells as other ones. Better teams introduced less code smells creating less technical debt.

Conclusion
Adi claims that you can have legacy code after 15 minutes. It is true. In a short time, the teams introduced many code smells into their code. The most common smells were bad names and Primitive Obsession. Different smells were introduced during different development activities. Some teams introduced less smells than others.

We need to focus on code smells. Noticing smells in our code is an important skill which can be trained. A good place to start practising are refactoring code katas like Emily Bache's Tennis Game and Yatzy. (Both exercises are available in many programming languages.) "Listening to code smells" improves our design. Finally I want to encourage you to watch out for primitive values on object boundaries as Primitive Obsession seems to be the most common problem in object oriented code.

Final disclaimer: The game is no scientific experiment throughout our industry. Only a few teams participated and the results are biased. Nevertheless I wanted to share the results.

Code Cop

4 July 2024

Encapsulation vs Business Rules

15 October 2017

Introducing Code Smells into Code

About Me

In Public

More Creations

Content Series

Community

Labels

Archive