Code Cop: refactoring

Showing posts with label refactoring. Show all posts

21 November 2019

Promotion Service Kata

In September I attended a small, club-like unconference. The umbrella topic of the event was katas and their use in teaching and technical coaching. A kata, or code kata, is defined as an exercise in programming which helps hone your skills through practice and repetition. We spent two days creating, practising and reviewing different exercises. I came home with a load of new challenges for my clients.

Kata Factory
One session, run by Bastien David, a software crafter from Grenoble, was named Kata Factory. Bastien guided us to create a short exercise with a very small code base, focused on a single topic. In the first part of the session we created small tasks working in pairs. Then we solved a task from another pair in the second part. A total of four new coding exercises was created, tried and refined. It was awesome.

Promotion Service Kata
I worked with Dmitry Kandalov and we created the Promotion Service Kata. It is a small refactoring exercise, based on Feature Envy, a code smell listed in Martin Fowler's book. (Did you know that there is a second edition of this great book? No, so get it quickly.) The code base contains a single service, the promotion service, which calculates discounts for promoted items. It is a bit crazy because it also reduces the tax. The data is stored in a classic DTO and its fields are not encapsulated. The task is to make it a rich object and encapsulate its fields. There are existing unit tests to make sure things are still working.

After the Kata Factory, I spent some time on porting the kata to different languages. Currently the code is available in C#, Java, Kotlin, PHP and Python. Pull requests porting the code to other languages are very welcome. Check out the code here.

Promotion Service Retrospective

Notes from first run
I already facilitated the exercise with a small team of C# developers. Here is what they said about the kata:

It is a good exercise.
It is a short exercise. It is small, so there is no need for context.
Encapsulate all the things!
I learned to separate concerns.
I learned about string.Format (a C# specific function).
I did not know the goal of the exercise.
Maybe rename the Persist() method to Save().
The Item class should be in its own file.

Conclusion
Bastien's approach shows that it is possible to create a brand new and highly focused coding exercise in a short time. As with most development related things, pair work is superior and it is easy to come up with new code katas when working in pairs. Small exercises - I call them micro exercises - are easy to get started because there is little context to know. Context is part of what makes coding assignments difficult. I am very happy with this new exercise.

Give it a try!

11 November 2018

Widespread Architectural Changes using IDEs

This is the second part of my list of tools to automate Widespread Architectural Changes. Now Widespread Architectural Change is a category of architectural refactoring which do not change the overall structure but need to be applied in many places consistently across a whole code base. For example, consider changing or unifying coding conventions or fixing violations. The main challenge of these changes is the high number of occurrences.

The goal of this article is to introduce you to ways to automate such changes. I have been using some of these techniques because as Code Cop I value code consistency, still this is a raw list. I have only used a few of the mentioned tools and need to explore many in more detail.

In the first part I listed some basic options, e.g. how to support manual changes with fast navigation, search and replace across the whole file system, use scripted search and replace using regular expressions, Macros and finally JetBrains' Structural Search and Replace. This time I focus on options available for modern IDEs. Tools like Eclipse or IntelliJ use an internal representation of the code, the Abstract Syntax Tree (AST). Structural Search and Replace works the AST and therefore belongs into this group as well.

Rescripter (Eclipse)
Modern IDEs support many Refactoring steps - why not use the existing tools to make our job easier? One of such tools is Rescripter, built for making large-scale changes that you can describe easily but are laborious to do by hand. Rescripter is an Eclipse plugin which runs JavaScript in the context of the Eclipse JDT. David Green added some helper objects to make searching, parsing and modifying Java code easier and wrapped a thin JavaScript layer around it. Rescripter scripts are working the AST.

Here are two examples from Rescripter's documentation: Finding a method named getName inside the class Person:

var type = Find.typeByName("Person");
var method = Find.methodByName(type, "getName").getElementName();

and adding a method getJobTitle to that type:

var edit = new SourceChange(type.getCompilationUnit());
edit.addEdit(ChangeType.addMethod(type, "\n\tpublic String getJobTitle() {\n" +
                                        "\t\treturn this.jobTitle;\n" +
                                        "\t}"));
edit.apply();

Rescripter is an old project and has not been updated since 2011. While it has some demo code and documentation, it is a bit raw. I guess it is not compatible with newer versions of Eclipse. Still - like Structural Search and Replace - it is very powerful. If you have a large code base and a repetitive change that can be expressed in terms of the AST, the high effort to get into the tool and create a script will pay off. (A useful plugin to help you working the AST is the AST View. The AST View is part of the JDT but not installed out of the box. It is visualising the AST of a Java file open in the editor.)

Jackpot (NetBeans)
Another, very interesting tool in this area is the Jackpot DSL for NetBeans used in the IDE actions Inspect and Transform aka Inspect and Refactor. Jackpot is a NetBeans IDE module for modelling, querying and transforming Java source files. In the Jackpot context, a program transformation is a script or Java class which queries sources in Java projects for patterns, changes them, and then writes these changes back to the original source files. [...] Jackpot transformations tend to be applied globally, such as at a project level or across several projects. Win! Unfortunately there is very little information about it. It took me hours just to find a little bit of information about it:

Dustin Marx describes how to create NetBeans 7.1 custom hints. He agrees that the most difficult aspect of using NetBeans's custom hints is finding documentation on how to use them. The best sources currently available appear to be the NetBeans 7.1 Release Notes, several Wielenga posts (Custom Declarative Hints in NetBeans IDE 7.1, Oh No Vector!, Oh No @Override!), and Jan Lahoda's jackpot30 Rules Language (covers the rules language syntax used by the custom inspections/hints). The Refactoring with Inspect and Transform in the NetBeans IDE Java Editor tutorial also includes a section on managing custom hints. All listed documentation is from 2011/2012. The most recent one I found is a short Jackpot demo from JavaOne Brazil 2016.

NetBeans hints are warnings that have a quick fix. For example the hint "Can Use Diamond" finds places where the diamond operator of Java 7 can be used instead of explicit type parameters. When the offered action is taken, the code is migrated. In the Inspect and Transform dialogue, the inspections can be managed. Custom hints are stored in .hint files. For example, Dustin Marx' hint to remove extraneous calls to System.currentTimeMillis() from new java.util.Date constructor is written as

<!description="Unnecessary use of System.currentTimeMillis on new Date">

new java.util.Date(java.lang.System.currentTimeMillis())
=> new java.util.Date()
;;

The Java Declarative Hints Format allows matching on variables, modifiers and statements. Fixes can be applied conditionally too.

I do not know if Jackpot hints can be applied across a whole code base at once. As they are inspections, I expect them to be displayed for the whole project - making them fast navigation markers at least. Anyway this is very exciting. It is so exciting that some people wanted to port it to Eclipse (but they never did).

Using Specific Refactoring Plugins
There are a few specific plugins that combine various Refactoring. Eclipse's Source Clean Up Action deserves a honorary mention: It fixes basic warnings on a whole code base, but is very limited. An interesting plugin for Eclipse is Autorefactor which aims to fix language/API usage to deliver smaller, more maintainable and more expressive code bases. Spartan Refactoring is another Eclipse plugin that performs automatic refactoring of Java source code, making it shorter, more idiomatic and more readable.

All these plugins change a predefined, limited set of code patterns, mainly focusing on technical debt. Maybe someone implemented a refactoring plugin for part of the widespread change you need to perform. Search the market places first.

Using Refactoring APIs of IDEs
A possible way is to write your own refactoring plugin. Maybe start with code of a refactoring plugin listed above. Both Eclipse and IntelliJ IDEA offer APIs to manipulating Java code, i.e. the JDT and PSI APIs. I have not done that because it seems too much effort for one time migrations and widespread changes. And reusing the available refactoring tools might raise some problems like waiting for user input which is problematic.

Using (APIs of) Refactoring Browsers
The Refactoring Browser was the first tool that automated Refactoring for Smalltalk. It set the standard for all modern Refactoring tools. Today we have Refactoring Browsers for many languages, e.g. CScout - the C Refactoring Browser, Ruby Refactoring Browser for Emacs or PHP Refactoring Browser which is controlled via the command-line and has plugins for Vim and Emacs. Stand-alone Refactoring tools should be easier to use and script than full blown IDEs, especially if they were designed for different plugins. The idea would be to create code which controls Refactoring Browsers to apply certain changes - again and again.

Python Refactoring Libraries
While searching for Refactoring Browsers I just learnt that there are at least three stand-alone Python Refactoring libraries: Rope comes with Vim and Emacs plugins and is also used in VS Code. The second is Bicycle Repair Man. Further there is Bowler which supposedly is better suited for use from the command-line and encourages scripting. All these libraries are rich in features and used by Vim users. (Yeah, Vim is still going strong.)

Rope (Python)
For example, Rope can be used as a library, which allows custom refactoring steps. Even more, it offers something similar to Jackpot's hints, Restructurings. For example, we split a method f(self, p1, p2) of a class mod.A into f1(self, p1) and f2(self, p2). The following Restructuring updates all call sites:

pattern: ${inst}.f(${p1}, ${p2})
goal:
 ${inst}.f1(${p1})
 ${inst}.f2(${p2})

args:
 inst: type=mod.A

The code to perform the Restructuring using Rope as a library is

from rope.base.project import Project
from rope.refactor import restructure

project = Project('.')

pattern = '${inst}.f(${p1}, ${p2})'
goal = '...'
args = '...'

restructuring = restructure.Restructure(project, pattern, goal, args)

project.do(restructuring.get_changes())

I never used that but it looks cool. The promise of scripted refactoring makes me excited. Another item to add to my to-research list. ;-) And there are more tools on my list, which will have to wait for part 3.

31 October 2018

More on Architectural Refactoring

Last month I showed my categorisation of Architectural Refactoring to some friends in the Software Crafting community. Explaining my ideas and getting feedback helped me to sharpen the definitions further. In my original article I defined four groups of Architectural Refactoring. Here is what we discussed about each one of them:

1. Substitute Architectural Decision
In this category I collected Refactoring techniques which replace whole parts of an architecture, e.g.

Replacing one relational database system with another
Replacing the relational database with a NoSQL one
Replacing UI technologies, e.g. moving from JSF to Vaadin

Adrian Bolboaca further split these categories into groups and used the following three names:

Kinds of Architectural Refactoring (C) SoftDevGang 2018

Kinds of Architectural Refactoring (C) SoftDevGang 2018

1.1. Tool Change
A Tool Change affects the implementation of a "tool", e.g. the database system. The first change in the previous list - replacing one relational database system with another - is a Tool Change. Upgrading libraries or migrating between similar frameworks is a Tool Change too. Initially I had listed them as Widespread Architectural Change. While upgrading or migrating libraries might require similar changes throughout the whole project, they are motivated by the Architectural Decision to use a certain version of a library.

1.2. Technology Change
A Technology Change modifies a used technology. Items two and three of the previous list are Technology Changes. In my experience, most Architectural Refactoring is of this type.

1.3. (Other) NFR Caused Change
Changing a tool or technology is never done for its own sake. We might upgrade a framework for the improved performance of its new version, or switch to another data store because it is more reliable or its license is more suited for the business. NFRs (Non Functional Requirements) are requirements that specify how a system is supposed to be in contrast to functional requirements which define what a system is supposed to do. NFRs usually contain long lists of attributes describing a system like performance, reliability and also security, traceability and globalisation. Examples of such changes are changing aspects like logging or security or applying internationalization. Again I had initially listed them as Widespread Architectural Change.

2. Refactor Architectural Structure
The second category deals with architectural building blocks like components, layers or packages. There are two causes for changing these:

RSAR Package Tangle found at an unnamed client

2.1. Cleaning up Legacy Code
When we start a new project, its architecture is clear. When the system becomes more complex, effects known as Software Architecture Erosion or Architectural Drifts happen. These effects can lead to systems where the realisation of the system diverges from the intended architecture with resulting negative impacts on quality attributes associated with the intended architecture. Uncle Bob uses other words: The architecture starts to rot like a piece of bad meat.

For example look at the package tangle on the right: Several teams were working on a shared code base under an aggressive schedule and produced an architecture where each package was depending on all others, resulting in the legendary Big Ball of Mud architecture. To clean up this mess components with clear interfaces had to be introduced.

2.2. Evolving an Architecture
Today we rather embrace evolving requirements than stick to a big up-front plan. Techniques like Evolutionary Design accommodate the need of changing the design while systems are growing. And also their architectures need to evolve, e.g. we need to extract a new component or aspect which became duplicated during addition of structure to support new features. With the rising popularity of Microservices and their focus on incremental change in the architecture, Evolutionary Architecture has become a reality.

3. Widespread Architectural Change
I am mostly interested in these changes because I want to automate them. I described some basic tools and some more powerful ones. Most examples of changes I used were motivated in the change of Architectural Decisions. Even coding conventions can be seen as part of the NFR Maintainability. There is definitely an overlap, as some architectural Refactoring and also a few structural ones, will be widespread. Adi drew me a Venn diagram to explain that overlap:

Overlap of Categories of Architectural Refactoring, by Adrian Bolboaca

Overlap of Categories of Architectural Refactoring, by Adrian Bolboaca

4. Other Architectural Changes
I am still uncertain if there are other kinds of architectural changes. I miss examples. If you have some, please come forward!

29 August 2018

Widespread Architectural Change Part 1

Last month I introduced Four Categories of Architectural Refactoring:

Substitute Architectural Decision
Refactor Architectural Structure
Widespread Architectural Change
Other Architectural Changes

Today I want to start discussing options to perform Widespread Architectural Changes. This category contains all kind of small changes repeated throughout the whole code base. The number of these changes is high, being hundreds or sometimes even thousands of occurrences that have to be changed in a similar way. Examples of such modifications are:

Migrating or upgrading (similar) APIs, frameworks or libraries
Changing or unifying coding conventions
Fixing compiler warnings and removing technical debt
Consistently applying or changing aspects like logging or security
Applying internationalization
Migrate between languages, e.g. SQL vs. HQL

Ways to do Widespread Architectural Changes (C) by SoftDevGang 2016

All of them do not change the overall structure, they just work on the implementation. Often substituting architectural decisions or altering architectural structure contain similar changes.

Options
I have been using some techniques since 2004 because as Code Cop I value code consistency. Two years ago I had the opportunity to discuss the topic with Alexandru Bolboaca and other experienced developers during a small unconference and we came up with even more options, as shown in the picture on the right. The goal of this and the next articles is to introduce you to these options, the power of each one and the cost of using it. A word of warning: This is a raw list. I have used some but not all of them and I have yet to explore many options in more detail.

Supporting Manual Changes With Fast Navigation
The main challenge of widespread changes is the high number of occurrences. If the change itself is small, finding all occurrences and navigating to them is a major effort. Support for fast navigation would make things easier. The most basic form of this is to modify or delete something and see all resulting compile errors. Of course this only works with static languages. In Eclipse this works very well because Eclipse is compiling the code all the time and you get red markers just in time. Often it is possible to create a list of all lines that need to be changed. In the past have created custom rules of static analysis tools like PMD to find all places I needed to change. Such a list can be used to open the file and jump to the proper line, one source file after another.

As example, here is a Ruby script that converts pmd.xml which is contains the PMD violations from the Apache Maven PMD Plugin to Java stack traces suitable for Eclipse.

#! ruby
require 'rexml/document'

xml_doc = REXML::Document.new(File.new('./target/site/pmd.xml'))
xml_doc.elements.each('pmd/file/violation') do |violation|
  if violation.attributes['class']
    class_name = violation.attributes['package'] + '.' +
                 violation.attributes['class']
    short_name = violation.attributes['class'].sub(/\$.*$/, '')
    line_number = violation.attributes['beginline']
    rule = violation.attributes['rule']

    puts "#{class_name}.m(#{short_name}.java:#{line_number}) #{rule}"
  end
end

The script uses an XML parser to extract class names and line numbers from the violation report. After pasting the output into Eclipse's Stacktrace Console, you can click each line one by one and Eclipse opens each file in the editor with the cursor in the proper line. This is pretty neat. Another way is to script Vim to open each file and navigate to the proper line using vi +<line number> <file name>. Many editors support similar navigation with the pattern <file name>:<line number>.

Enabling fast navigation is a big help. The power this option is high (that is 4 out of 5 on my personal, totally subjective scale) and the effort to find the needed lines and script them might be medium.

Search and Replace (Across File System)
The most straight forward way to change similar code is by Search and Replace. Many tools offer to search and replace across the whole project, allowing to change many files at once. Even converting basic scenarios, which sometimes cover up to 80%, helps a lot. Unfortunately basic search is very limited. I rate its power low and the effort to use it is also low.

Scripted Search and Replace using Regular Expressions
Now Regular Expressions are much more powerful than basic search. Many editors allow Regular Expressions in Search and Replace. I recommend creating a little script. The traditional approach would be Bash with sed and awk but I have used Ruby and Python (or even Perl) to automate lots of different changes. While the script adds extra work to traverse all the source directories and files, the extra flexibility is needed for conditional logic, e.g. adding an import to a new class if it was not imported before. Also in a script Regular Expressions can be nested, i.e. analysing the match of an expression further in a second step. This helps to keep the expressions simple.

For example I used a script to migrate Java's clone() methods from version 1.4 to 5. Java 5 offers covariant return types which can be used for clone(), removing the cast from client code. The following Ruby snippet is called with the name of the class and the full Java source as string:

shortName = shortClassNameFromClassName(className)
if source =~ / Object clone\(\)/
  # use covariant return type for method signature
  source = $` + " #{shortName} clone()" + $'
end
if source =~ /return super\.clone\(\);/
  # add cast to make code compile again
  source = $` + "return (#{shortName}) super.clone();" + $'
end

In the code base where I applied this widespread change, it fixed 90% of all occurrences as clone methods did not do anything else. It also created some broken code which I reverted. I always review automated changes, even large numbers, as jumping from diff to diff is pretty fast with modern tooling. Scripts using Regular Expressions helped me a lot in the past and I rate their power to high. Creating them is some effort, e.g. a medium amount of work.

Macros and Scripts
Many tools like Vim, Emacs, Visual Studio, Notepad++ and IntelliJ IDEA allow creation or recording Keyboard and or mouse macros or Application scripts. With them, frequently used or repetitive sequences of keystrokes and mouse movements can be automated, and that is exactly what we want to do. When using Macros the approach is the opposite as for navigation markers: We find the place of the needed change manually and let the Macro do its magic.

Most people I have talked to know Macros and used them earlier (e.g. 20 years ago) but not recently in modern IDEs. Alex has used Vim Scripts to automate repetitive coding tasks. I have not used it and relating to his experience. Here is a Vim script function which extracts a variable, taken from Gary Bernhardt's dotfiles:

function! ExtractVariable()
  let name = input("Variable name: ")
  if name == ''
    return
  endif

  " Enter visual mode
  normal! gv
  " Replace selected text with the variable name
  exec "normal c" . name
  " Define the variable on the line above
  exec "normal! O" . name . " = "
  " Paste the original selected text to be the variable value
  normal! $p
endfunction

I guess large macros will not be very readable and hard to change but they will be able to do everything what Vim can do, which is everything. ;-) They are powerful and easy to create - as soon as you know Vim script of course.

Structural Search and Replace
IntelliJ IDEA offers Structural Search and Replace which performs search and replace across the whole project, taking advantage of IntelliJ IDEA's awareness of the syntax and code structure of the supported languages.. In other words it is Search and Replace on the Abstract Syntax Tree (AST). Additionally it is possible to apply semantic conditions to the search, for example locate the symbols that are read or written to. It is available for Java and C# (Resharper) and probably in other JetBrains products as well. I have not used it. People who use it tell me that it is useful and easy to use - or maybe not that easy to use. Some people say it is too complicated and they can not make it work.

Online help says that you can apply constraints described as Groovy scripts and make use of IntelliJ IDEA PSI (Program Structure Interface) API for the used programming language. As PSI is the IntelliJ version of the AST, this approach is very powerful but you need to work the PSI/AST which is (by its nature) complicated. This requires a higher effort.

To be continued
And there are many more options to be explored, e.g. scripting code changes inside IDEs or using Refactoring APIs as well as advanced tools outside of IDEs. I will continue my list in the next part. Stay tuned.

13 July 2018

Categories of Architectural Refactoring

At the end of one of my refactoring workshops, we started discussing how to refactor
larger structures, sometimes even changing the architecture of a system. During that discussion we came up with four different categories of larger refactoring which use different techniques and serve different goals:

Replace whole parts
Change the structure of components
Similar change all over the place
Other changes

We struggled to find good names, so let me explain the different things a bit. When I talk about architecture, I mean Software Architecture of a single application, i.e. 1. the different software components and 2. their (inter-) dependencies making up the application. Components are sub-systems and usually contain one or more namespaces or packages. While we can discuss forever what exactly software architecture is and is not, I like the definition of IEEE which in addition to elements (1.) and relationships (2.) adds principles of its design and evolution. I describe these principles as 3. design guidelines which include smaller recommended usage patterns and coding conventions.

Martin Fowler and Neal Ford say that architecture is the stuff that's hard to change later and replacing parts of the architecture involves a lot of work: Imagine (the highly theoretical example of) changing the programming language a system is written in or (a more common example of) moving from a monolith to a Microservice based architecture. These two examples already fall into different categories.

Replacing whole parts of the architecture
Massive changes like changing the UI technology or targeting another platform are possible but a lot of code has to be rewritten. Substitute Algorithm is a classic Fowler refactoring, and so is Substitute Architectural Decision. (This is my third try to name these category of changes.) Architectural Decisions are made to support non-functional requirements like performance, security, connectivity etc and when revisiting them later - with more domain knowledge - better alternatives might come up. Real examples of changing Architectural Decisions include:

Replacing one relational database system with another
Replacing your relational database with a NoSQL one
Replacing UI technologies, e.g. moving from JSF to Vaadin
Changing API technologies, e.g. moving from SOAP to REST
Switching from a fat client to a client-server architecture

Depending on used frameworks and abstractions, some of these changes will be easy, others will not. Even when the necessary change is isolated, it likely touches more than one architectural component. For example moving to another data store might need changing all data access objects in all components which access the data store. The traditional approach is to change everything at once ("Big Leap") but it is possible to refactor the architecture in a safe way, using small steps. In Responsive Design (QCon 2009) Kent Beck describes several strategies to perform such changes:

Leap: Make the new design and use it immediately. (Only if the change is not too big.)
Parallel Change: Make the new design and run both new and old simultaneously.
Stepping Stone: Create a platform to bring the desired new design within reach.
Simplification: Only implement part of the design now and add complexity bit-by-bit later.
Hack it: Just do it and pay price later. (This is maybe not a real strategy ;-)

Changing the structure of architectural components
The next category is similar to Substitute Architectural Decision but is focused on the architectural building blocks - the components. Some examples are:

Introduce or extract new component, module or package
Remove or invert dependencies between components, e.g. remove cyclic dependencies
Introduce layering, add a new layer, clean up layering violations
Weaken potential trouble spots in the architecture, e.g. Butterfly, Breakable, Hub and Tangle

This category of changes is the "classic" Architectural Refactoring because that is how people call it. For example - in accordance to Martin Fowler's definition - Dave Adsit defines Architectural Refactoring as intentionally changing the structure of a system without altering its features. Similarly Sean Barow explains Architectural Refactoring as the deliberate process to remove architectural smells while not changing the scope or functionality of the code. So it is always structural and deals with components (1.), their relationships (2.) and their inner structure. Let's call it Refactor Architectural Structure to distinguish it from other architectural refactoring.

Sean Barow uses the term Code Refactoring for the refactoring defined by Fowler. Code refactoring focuses on software entities like classes, methods, fields and variables. Architectural refactoring involves code refactoring, which leads to confusion between the two. Working with dependencies needs extracting or moving classes, methods or fields. We can work each class or dependency at once, which supports an incremental approach.

Performing similar changes throughout the whole project
This is another category and it feels like Shotgun Surgery: A usually small change has to be repeated all over the place. While Shotgun Surgery is a code smell for feature logic, it is unavoidable if not required for consistent structure and usage defined by the architecture. For example let's assume we want to replace EasyMock with Mockito. Both libraries serve the same purpose and are used in a similar way, but their APIs are different. (Facing such a problem indicates that we are coupled to much to a used library - maybe we should have wrapped it, maybe not. But that discussion is outside the scope of this article.) Converting a single usage of EasyMock to Mockito is straight forward: We change the method invocations and execute the calls earlier or later in the test. The only problem is that we have hundreds of such cases. (Update 31st October 2018: Some people have pointed out that build tools and testing frameworks are not considered part of the architecture. In that case a migration between mocking libraries is misleading and I should have used another example. Still the idea is the same.) More examples of similar "Shotgun" changes are:

Upgrading a major version of a library with breaking API changes
Migrating between similar frameworks or libraries
Changing or unifying coding conventions and fixing violations
Consistently applying or changing aspects like logging or security
Applying internationalization

These changes do not change the overall structure, they just work on the implementation. They are similar to Refactor Architectural Structure, just deal with design guidelines, coding conventions and smaller usage patterns - item 3 from the definition of Software Architecture in the beginning. They are code refactoring applied in many places consistently across a whole code base.

These category is important, because some steps of both Substitute Decision and Refactor Structure need similar changes. The main challenge of these changes is the high number of occurrences. Changing conventions and aspects is a piece by piece work and can be done manually. Nevertheless some automation tool or script would help a lot. Even converting only basic scenarios, which often cover up to 80%, is a big win. Unfortunately some changes, e.g. migrating libraries, need to be done all at once. In such cases a Stepping Stone approach has worked well for me in the past: First wrapping the library piece by piece - incrementally - and then changing the wrapper and all invocations at once. Creating the wrapper is a lot of work. Again, automated migration removes the need for a wrapper. There are some options available, which I will cover in a future article.

I have not found a good name for this category. I used to call it Large Scale Refactoring, but I am not happy with that name. It is a similar or identical code refactoring applied to many places widespread throughout the code. It is a Widespread Architectural Change.

Other Architectural Changes
During our initial discussion, we also considered other refactoring. To be on the safe side, always allow a category of unknown things ;-). At the moment I have no real examples for them. These are smaller changes neither related to architectural decisions or structure nor are they widespread. I believe we find no examples because these are plain code changes and we do not consider them architectural even if they are related to the architecture of the applications in a way.

Summary
Next to the code we also can and should refactor the architecture of our applications. Architectural changes are expensive, still sooner or later things need to change if the application is in service just long enough. There are at least four categories of architectural refactoring:

Substitute Architectural Decision
Refactor Architectural Structure
Widespread Architectural Change
Other Architectural Changes

After discussing the topic with some friends, I wrote a followup with clarifications. I also used some tools to automate Widespread Architectural Changes.

15 October 2017

Introducing Code Smells into Code

Code Smells
Code smells are hints that show you potential problems in your code. They are heuristics: Like in real life, if something smells, look at it, think about it, and change it if necessary. In the classic book Refactoring - Improving the Design of Existing Code, Martin Fowler describes 21 code smells like Long Method, Primitive Obsession, Switch Statements, Feature Envy and other anti-patterns that indicate deeper problems in your code.

The Brutal Refactoring Game
Adrian Bolboaca came up with the Brutal Refactoring Coding Game. He explained the history of the game and the game itself on his blog. I attended his workshop at the XP conference 2013 and experienced the game first hand.

In the game participants are asked to write the cleanest code possible. If the facilitator spots any code smell, participants must stop and immediately remove it. Adding functionality is forbidden until the facilitator agrees that the smell has been removed. In his workshop, Adi gave us a numbered list of smells and gave cards with the appropriate number to pairs where he saw a code smell. And he was mercilessly flagging the smallest problems in our code. ;-)

Code Smells Used in the Game
Adi chose these code and test smells for his game:

Lack of tests
Name not from domain
Name not expressing intent
Unnecessary if
Unnecessary else
Duplication of constant
Method does more than one thing
Primitive obsession
Feature envy
Method too long (has more than six lines)
Too many parameters (more than three parameters)
Test is not unitary
Test setup too complex
Test has an unclear Act
Test has more than one assert
Test has no assert
Test has too many paths

Adi told me that he chose these smells because he saw them most often in his clients' code bases. His list definitely misses duplication, deeply nested conditionals and a some more. A more complete list might contain 30 items, making it more difficult and potentially frustrating for participants. (Maybe I will come up with the Moar Brutal Refactoring Game in the future...)

Observations during the Brutal Refactoring Game
This article is not about the Brutal Refactoring Game, but about code smells introduced into code. The game allows observation how and when code smells are introduced (because the whole point is to spot and remove them). As part of my refactoring training I facilitated the game more than ten times. Each time took 3 to 5 hours and had six to eight participants. The teams were average teams with several senior developers and an occasional junior developer. People worked in pairs and implemented Tic-Tac-Toe. Most teams used Java, two teams used C.

Discussion of Introduced Code Smells
Here is the code smells statistic:

Code Smells Introduced

The chart shows the number of problems I flagged during the last ten games. The different colours of the bars show the different teams. Obviously not all smells are introduced equally often. The first smells appear 10 to 15 minutes into the exercise. One team using C had difficulties with the setup and was going forward very slow - they produced little code and very few smells.

The first smell I usually see is 1 - Lack of tests. Even people following the TDD cycle happen to create "more production code than is sufficient to pass the test." This happens in the beginning and also later during the game.

Naming is hard. Not surprisingly the most common smells (number two - Name not from domain - and number three - Name not expressing intent) are naming related. Naming things after the problem domain seems twice as hard as pure technical naming. Any non trivial method could be named process or execute, but that does not help understanding the code at all.

Primitive Obsession (number eight) is the most common single code smell I have seen during the game. It is introduced early in development when method signatures are created and APIs are designed. It occurs roughly as often as the naming related smells together. Most Tic-Tac-Toe implementations are (publicly) based on numbers, pairs of numbers, arrays of numbers or the like. Primitive Obsession is very dominant in many (Java) code bases. In my code reviews I am used to method argument lists like String, String, String, String, String, int, long, long etc. Instead of using all these primitive values, they should be wrapped and should not be visible at object boundaries. (I have written more about primitives in the past.) This is an object oriented design smell.

The third most often flagged code smell are long methods (number ten). This smell is introduced later, when logic is added to existing methods. I see this smell more in the second part of the game. Even when using TDD, this smell is introduced if the refactoring phase is skipped or taken lightly. Long methods are also very common in legacy code bases and difficult to understand or change. Everyone hates these 1000 lines long methods, still I find them in every (large) code base I look at.

Code Smells Categories
To conclude this analysis let us have a look at problem categories. I aggregated Adi's 17 code smells into four groups:

Problems in test code
Naming related smells
Missing object orientation
Complexity

It seems that unit testing is the least problem - which it definitely not true. Most teams I work with have no automated (unit) tests for their production code. Maybe there were less testing issues during the game because the teams had learned about testing smells before. I practice refactoring with my teams after we have worked through all of unit testing.

Initially I was surprised to see missing object orientation high on the list. Now, after writing about it, I think it is also related to my "coaching/ learning plan". After refactoring I go for naming and finally object orientation. (Maybe the order of topics is wrong, but unit testing is easily sold to management and refactoring is asked for by developers often, making both topics ideal to start an improvement initiative.) I do not expect less naming problems, even after a few sessions on naming, because - as discussed before - naming is hard. I would expect the object orientation of the solutions to improve.

Samir Talwar wrote about his experience with the game. As facilitator he had a different focus, e.g. he was more strict about unnecessary if, treating it more like a No If constraint. He also saw different code smells being introduced. (I recommend reading his summary.) We both agree that naming is hard and causes many problems.

Comparison of Team Performance
While the participants were industry average - maybe even above - they were in need of improving. (Who is not?) The following bar chart shows the number of code smells introduced into the code by each team. (To compare the teams I removed the one with setup problems.) Some teams ran the exercise twice. Some of them improved, some did not.

Code Smell Categories by Team

On average, each team introduced 17 issues into their code base, right from the beginning of a small project, during a few hours of work. I am sure they tried hard because I was watching them, still this result is very disappointing. I am scared of the massive amount of code smells lurking in real world projects.

There is a noticeable difference between individual teams. Some teams created only half as many smells as other ones. Better teams introduced less code smells creating less technical debt.

Conclusion
Adi claims that you can have legacy code after 15 minutes. It is true. In a short time, the teams introduced many code smells into their code. The most common smells were bad names and Primitive Obsession. Different smells were introduced during different development activities. Some teams introduced less smells than others.

We need to focus on code smells. Noticing smells in our code is an important skill which can be trained. A good place to start practising are refactoring code katas like Emily Bache's Tennis Game and Yatzy. (Both exercises are available in many programming languages.) "Listening to code smells" improves our design. Finally I want to encourage you to watch out for primitive values on object boundaries as Primitive Obsession seems to be the most common problem in object oriented code.

Final disclaimer: The game is no scientific experiment throughout our industry. Only a few teams participated and the results are biased. Nevertheless I wanted to share the results.

30 November 2014

TDDaiymi vs. Naked Primitives

For our recent session Brutal Coding Constraints at the Agile Testing Days 2014, Martin Klose and I experimented with combinations of constraints for code katas and coding dojos. A constraint, also known as an activity, is a challenge during a kata, coding dojo or code retreat designed to help participants think about writing code differently than they would otherwise. Every activity has a specific learning goal in mind. Most coding constraints are an exaggeration of a fundamental rule of clean code or object oriented design. We tried to find particularly difficult combinations that would challenge expert developers and masters of object orientation. A brutal mixture of constraints is TDD as if you Meant it, No IFs and No naked primitives.

TDD as if you Meant it
Several years ago Keith Braithwaite noticed that many developers use some kind of pseudo-TDD by thinking of a solution and creating classes and functions that they just know they need to implement it. Then their tests keep failing and much time is spent debugging. But instead they should just follow Kent Beck's original procedure of adding a little test, seeing it fail and making it pass by some small change. To help developers practise these original principles Keith defined TDDaiymi as an exercise to really do TDD. The exercise uses a very strict interpretation of the practice of TDD, avoiding design up front and helping people getting used to emergent design by refactoring. It is a difficult exercise and I still remember the first time I did it in a session run by Keith himself. For almost two hours my pair and I were struggling to create something useful. While TDDaiymi is a bit extreme, some people believe it to be the next step after TDD. German "Think Tank" Ralf Westphal even claims that following TDD as if you Meant it is the only way that TDD can give you good design results.

No naked primitives
No naked primitives is another nice constraint. All primitive values, e.g. booleans, numbers or strings need to be wrapped and must not be visible at object boundaries. Arrays, all kinds of containers like lists or hash-tables and even Object (the root class of the language's class hierarchy) are considered primitive as well. Similar to Keith's TDDaiymi this rule is designed to exercise our object orientation skills. A string representing a name is an under-engineered design because many strings are no valid names. In an object oriented system we would like to represent the concept of a name with a Name class. Usually Value Objects are used for this purpose. Also a list of shopping items is not a shopping basket. A general purposes list implementation offers operations that do not make sense for a shopping basket. So containers need to be encapsulated. While it is exaggerated to wrap all primitives (see Primitive obsession obsession), I have seen too many cases of Primitive Obsessions that I rather see a few additional value objects than another map holding maps holding strings. (To avoid that I created a custom PMD rule PrimitiveObsession that flags all primitives in public signatures of Java classes.)

No IFs
No IFs is a great constraint of the Missing Feature catalogue. All conditionals like if, while, switch and the ternary operator (e.g. _ ? _ : _) are forbidden. Like No naked primitives this is exaggerated but many code bases are full of abysmal-deeply nesting, like the PHP Streetfighter. This has been recognised as a problem and there are campaigns to reduce the usage of ifs. After all, excessive usage of if and switch statements is a code smell and can be replaced by polymorphism and other structures. In order to practice the constraint is taken to the extreme. I have seen many developers struggle with the missing concept in the beginning, but after some thought each and everyone of them found a way to express his or her concepts in another, less nested way.

Combining Constraints
Some constraints like TDDaiymi are already difficult on their own, but obeying several of them at the same time is hard. We stick to our (coding-) habits even if we planned to do otherwise. While working the experiments Martin an I would notice a constraint violation only after several times switching roles, several minutes after introducing the "forbidden" concept. This is especially true for conditionals because they are most fundamental to our (imperative styled) coding. It seems that tool support for constraints would help, but not all constraints can be enforced automatically and definitely not for all programming languages.

Refactoring Towards Constraints
It is difficult to find solutions with constraints - after all that is one of their primary purposes - so we decided to first make it green and then refactor towards the constraint. This seems natural as TDD itself has a similar rule: After creating the next failing test (red phase), try to get it green as fast as possible, creating dirty code and duplication as needed (green phase). Then clean up the mess relying on the safety net of the already existing tests (refactor phase). This results in more refactoring than usual, and we spent more than half of the time refactoring. As TDDaiymi puts much more focus on the refactoring step, this might have been due to the first constraint alone. 50-50 for a code kata with a refactoring focused constraints is not much, given that some people report three refactoring commits per one feature commit in their production code.

Designing Up Front?
When refactoring towards the constraint, it is often hard to change the current ("green") solution because it is rooted on primitive data types or conditionals or whatever violates the constraints. Sometimes we had no clue how to refactor and left the violation in the code for several red-green-refactor cycles until it became clear how to proceed. Being hard is not a problem because this is an exercise after all but allowing violations for extended periods of time made me unhappy as the code was not completely clean regarding our standards after each cycle. Also the code looked weird but some constraints are weird and therefore force odd solutions. In following TDDaiymi we did not think about any design up front, but maybe we should have done so. Thinking about structures (i.e. design) that support the constraint, e.g. the absence of conditionals, greatly helps the code and such "designed" solutions look much better. While we should think about the problem, the technology-agnostic solution and its acceptance criteria (test cases) before starting to write code, classic TDD discourages creating a design up front, and TDDaiymi does even more so. I guess more practise is needed to resolve this issue.

Evolving Structures
Another, more direct contradiction is the combination of TDDaiymi and No naked primitives. While following the former constraint, we want to avoid premature design decisions and sometimes use Object as a place-holder for a concept as long as there is no structure yet. Maybe this can be avoided but I do not know how. (If you know how to TDDaiymi without plain objects, I am more than happy to pair with you on some exercises.) In the end, when the design is finished, the Object has been replaced, but as the final structure evolves late, it may stick around for some time. On the other hand Object is a primitive and therefore forbidden. Maybe it is special and should not count as primitive. I have never seen any under-engineered code base using plain objects, on the contrary they usually contain lots of strings with the occasional boolean or number, but no naked Object. So I believe it is safe to allow Object as primitive because it will be replaced eventually.

Conclusion
When I started to write this article I thought its conclusion would be that some coding constraints are just not compatible. As constraints are exaggerated principles, this would be no surprise. But now, after ~~writing~~ thinking about it, I am not sure it is a problem. On the contrary the discussion helped me understand the constraints, their origins and intent. My real conclusion is that doing katas with constraints sparks interesting discussions and is a great way of learning about one's own coding habits and many aspects of coding in general. I strongly encourage you to do so,

Credits
Thanks to Martin Klose for the countless hours of pairing and discussing these constraints with me.