Code Cop: custom rules

Showing posts with label custom rules. Show all posts

18 February 2018

Checking Python for Compliance with Object Calisthenics

To use one of my Object Orientation workshops for a different client, I needed its whole setup in Python. (To summarise the workshop: I used Jeff Bay's Object Calisthenics as constraint and static code analysis to check the code for compliance. This helped participants follow the rules.)

I found that Pylint already contained some checkers I could use out of the box to check Python code:

checkers.refactoring with setting max-nested-blocks=1 for rule #1 (Use only one level of indentation per method.)
checkers.design_analysis with setting max-branches=1 also for rule #1, but stronger, which I like more.
checkers.design_analysis with setting max-attributes=2 for rule #7 (Don't use any classes with more than two instance variables.)

Diving deeper into Pylint's custom checkers, I created some of my own to enforce rules #2, #4, #6, #7, #8 and #9. For rule #4 (One Dot per Line) I used the approach of PHP_CodeSniffer and looked for nested expressions, allowing only one dot per statement, ignoring self references.

Dynamic Nature
Due to the dynamic nature of Python there is a grey area when working with types. For first-class collections (rule #8), only certain uses of collections are identified at the moment. I was not able to check for primitives at object boundaries at all, so rule #3 is not verified.

Figuring out how to write complex checkers for Pylint was difficult at first, but also a lot of fun. Like PMD and most static analysis tools I know, Pylint works with the Abstract Syntax Tree of the code and checkers are built using the Visitor design pattern. The main problem was figuring out which types of nodes exist, then the implementation of the rules was straight forward.

Python Project Setup
The setup is similar to the Java project. I created a repository for the LCD Numbers under lcd-numbers-object-calisthenics-python-setup. To test your setup there is some code in the project and ./run_pylint will show its violations. Pylint is configured using the objectcalisthenics.pylintrc file in the project directory. The individual checkers are in ./pylint/checkers.

11 January 2018

Compliance with Object Calisthenics

During my work as Code Cop I run many workshops. Sometimes I use constraints to make exercises more focused or more intense. Some constraints, like the Brutal Coding Constraints, are composite or aggregate constraints, which means that they are a combination of several simpler or low level constraints. (Here simple does not mean that they are easy to follow, rather that they focus on a single thing.) Today I want to discuss Object Calisthenics.

Object Calisthenics
Jeff Bay's Object Calisthenics is an aggregate constraint combined of the following nine rules:

Use only one level of indentation per method.
Don't use the else keyword.
Wrap all primitives and strings (in public API).
Use only one dot per line.
Don't abbreviate (long names).
Keep all entities small.
Don't use any classes with more than two instance variables.
Use first-class collections.
Don't use any getters/setters/properties.

If you are not familiar with these rules, I recommend you read Jeff Bay's original essay published in the The ThoughtWorks Anthology in 2008. William Durand's post is an exact copy of that essay, so no need to buy the book for that. Several people have followed up and discussed their interpretation and experience with Object Calisthenics, e.g. Mark Needham, Jeff Pace, Vasiliki Vockin and Juan Antonio.

Object Calisthenics is an exercise in object orientation. How is that? One of the core concepts of OOP is Abstraction: A class should capture one and only one key abstraction. Obviously primitive and built-in types lack abstraction. Wrapping primitives (rule #3) and wrapping collections (rule #8) drive the code towards more abstractions. Small entities (rule #6) help to keep our abstractions focused.

Further objects are defined by what they do, not what they contain. All data must be hidden within its class. This is Encapsulation. Rule #9, No Properties, forces you to stay away from accessing the fields from outside of the class.

Next to Abstraction and Encapsulation, these nine rules help Loose Coupling and High Cohesion. Loose Coupling is achieved by minimizing the number of messages sent between a class and its collaborator. Rule #4, One Dot Per Line, reduces the coupling introduced by a single line. This rule is misleading, because what Jeff really meant was the Law Of Demeter. The law is not about counting dots per line, it is about dependencies: "Only talk to your immediate friends." Sometimes even a single dot in a line will violate the law.

High Cohesion means that related data and behaviour should be in one place. This means that most of the methods defined on a class should use most of the data members most of the time. Rule #5, Don't Abbreviate, addresses this: When a name of a field or method gets long and we wish to shorten it, obviously the enclosing scope does not provide enough context for the name, which means that the element is not cohesive with the other elements of the class. We need another class to provide the missing context. Next to naming, small entities (rule #6) have a higher probability of being cohesive because there are less fields and methods. Limiting the number of instance variables (rule #7) also keeps cohesion high.

The remaining rules #1 and #2, One Level Of Indentation and No else aim to make the code simpler by avoiding nested code constructs. After all who wants to be a PHP Street Fighter. ;-)

Checking Code for Compliance with Object Calisthenics
When facilitating coding exercises with composite constraints, I noticed how easy it is to overlook certain violations. We are used to conditionals or dereferencing pointers that we might not notice them when reading code. Some rules like the Law Of Demeter or a maximum size of classes need a detailed inspection of the code to verify. To check Java code for compliance with Object Calisthenics I use PMD. PMD contains several rules we can use:

Rule java/coupling.xml/LawOfDemeter for rule #4.

Rule #6 can be checked with NcssTypeCount. A NCSS count of 30 is usually around 50 lines of code.

<rule ref="rulesets/java/codesize.xml/NcssTypeCount">
    <properties>
        <property name="minimum" value="30" />
    </properties>
</rule>

And there is TooManyFields for rule #7.

<rule ref="rulesets/java/codesize.xml/TooManyFields">
    <properties>
        <property name="maxfields" value="2" />
    </properties>
</rule>

I work a lot with PMD and have created custom rules in the past. I added rules for Object Calisthenics. At the moment, my Custom PMD Rules contain a rule set file object-calisthenics.xml with these rules:

java/constraints.xml/NoElseKeyword is very simple. All else keywords are flagged by the XPath expression //IfStatement[@Else='true'].
java/codecop.xml/FirstClassCollections looks for fields of known collection types and then checks the number of fields.
java/codecop.xml/OneLevelOfIntention
java/constraints.xml/NoGetterAndSetter needs a more elaborate XPath expression. It is checking MethodDeclarator and its inner Block/ BlockStatement/ Statement/ StatementExpression/ Expression/ PrimaryExpressions.
java/codecop.xml/PrimitiveObsession is implemented in code. It checks PMD's ASTConstructorDeclaration and ASTMethodDeclaration for primitive parameters and return types.

For the nitty-gritty details of all the rules have a look at the rules defined in codecop.xml and constraints.xml.

Interpretation of Rules: Indentation
When I read Jeff Bay's original essay, the rules were clear. At least I thought so. Verifying them automatically showed some areas where different interpretations are possible. Different people see Object Calisthenics in different ways. In comparison, the Object Calisthenics rules for PHP_CodeSniffer implement One Level Of Indentation by allowing a nesting of one. For example there can be conditionals and there can be loops, but no conditional inside of a loop. So the code is either formatted at method level or indented one level deep. My PMD rule is more strict: Either there is no indentation - no conditional, no loop - or everything is indented once: for example, if there is a loop, than the whole method body must be inside this loop. This does not allow more than one conditional or loop per method. My rule follows Jeff's idea that each method does exactly one thing. Of course, I like my strict version, while my friend Aki Salmi said that I went to far as it is more like Zero Level Of Indentation. Probably he is right and I will recreate this rule and keep the Zero Level Of Indentation for the (upcoming) Brutal version of Object Calisthenics. ;-)

Wrap All Primitives
There is no PHP_CodeSniffer rule for that, as Tomas Votruba considers it "too strict, vague or annoying". Indeed, this rule is very annoying if you use primitives all the way and your only data structure is an associative array or hash map. All containers like java.util.List, Set or Map are considered primitive as well. Samir Talwar said that every type that was not written by yourself is primitive because it is not from your domain. This prohibits the direct usage of Files and URLs to name a few, but let's not go there. (Read more about the issue of primitives in one of my older posts.)

My rule allows primitive values in constructors as well as getters to implement the classic Value Object pattern. (The rule's implementation is simplistic and it is possible to cheat by passing primitives to constructors. And the getters will be flagged by rule #9, so no use for them in Object Calisthenics anyway.)

I agree with Tomas that this rule is too strict, because there is no point in wrapping primitive payloads, e.g. strings that are only displayed to the user and not acted on by the system. These will be false positives. There are certain methods with primitives in their signatures like equals and hashCode that are required by Java. Further we might have plain numbers in our domain or we use indexing of some sort, both will be false positives, too.

One Dot Per Line
As I said before, I use PMD's LawOfDemeter to verify rule #4. The law allows sending messages to objects that are

the immediate parts of this or
the arguments of the current method or
objects created inside the current method or
objects in global variables.

I did not look at PMD's source code to check the implementation of this rule - but it complains a lot. For me this is the most difficult rule of all nine rules. (I code according to #1, #3, #5 and #6 and can easily adapt to strictly follow #2, #7, #8 and #9.) Although it complains a lot, I found every violation correct. I learned much about Law Of Demeter by checking my code for violations. For example, calling methods on an element of an array is a violation. The indexed array access is similar to a pointer access. (In Ruby this is obvious because Array defines a method def [](index).) Another interesting fact is that (at least in PMD) the law flags calling methods on enums. The enum instances are not created locally, so we cannot send them messages. On the other hand, an enum is a global variable, so maybe it should be allowed to call methods on it.

The PHP_CodeSniffer rule follows the rule's name and checks that there is only one dot per line. This creates better code, because train wrecks will be split into explaining variables which make debugging easier. Also Tomas is checking for known fluent interfaces. Fluent interfaces - by definition - look like they are violating the Law Of Demeter. As long as the fluent interface returns the same instance, as for example basic builders do, there is no violation. When following a more relaxed version of the law, the Class Version of Law Of Demeter, than different implementations of the same type are still possible. The Java Stream API, where many calls return a new Stream instance of a different class - or the same class with a different generic type - is likely to violate the law. It does not matter. Fluent interfaces are designed to improve readability of code. Law Of Demeter violations in fluent interfaces are false positives.

Don't Abbreviate
I found it difficult to check for abbreviations, so rule #5 is not enforced. I thought of implementing this rule using a dictionary, but that is prone to false positives as the dictionary cannot contain all terms from all domains we create software for. The PHP_CodeSniffer rules check for names shorter than three characters and allow certain exceptions like id. This is a good start but is not catching all abbreviations, especially as the need to abbreviate arises from long names. Another option would be to analyse the name for its camel case patterns, requiring all names to contain lowercase characters between the uppercase ones. This would flag acronyms like ID or URL but no real abbreviations like usr or loc.

Small Entities
Small is relative. Different people use different limits depending on programming language. Jeff Bay's 50 lines work well for Java. Rafael Dohms proposes to use 200 lines for PHP. PHP_CodeSniffer checks function length and number of methods per class, too. Fabian Schwarz-Fritz limits packages to ten classes. All these additional rules follow Jeff Bay's original idea and I will add them to the rule set in the future.

Two Instance Variables
Allowing only two instance variables seems arbitrary - why not have three or five. Some people have changed the rules to allow five fields. I do not see how the choice of language makes a difference. Two is the smallest number that allows composition of object trees.

In PHP_CodeSniffer there is no rule for this because the number depends on the "individual domain of each project". When an entity or value object consists of three or more equal parts, the rule will flag the code but there is no problem. For example, a class BoundingBox might contain four fields top, left, bottom, right. Depending on the values, introducing a new wrapper class Coordinate to reduce these fields to topLeft and bottomRight might make sense.

No Properties
My PMD rule finds methods that return an instance field (a getter) or update it (a setter). PHP_CodeSniffer checks for methods using the typical naming conventions. It further forbids the usage of public fields, which is a great idea. As we wrapped all primitives (rule #3) and we have no getters, we can never check their values. So how do we create state based tests? Mark Needham has discussed "whether we should implement equals and hashCode methods on objects just so that we can test their equality. My general feeling is that this is fine although it has been pointed out to me that doing this is actually adding production code just for a test and should be avoided unless we need to put the object into a HashMap or HashSet."

From what I have seen, most object oriented developers struggle with that constraint. Getters and setters are very ingrained. In fact some people have dropped that constraint from Object Calisthenics. There are several ways to live without accessors. Samir Talwar has written why avoiding Getters, Setters and Properties is such a powerful mind shift.

Java Project Setup
I created a repository containing the starting point for the LCD Numbers Kata:

lcd-numbers-object-calisthenics-java-setup.

Both are Apache Maven projects. The projects are set up to check the code using the Maven PMD Plugin on each test execution. Here is the relevant snippet from the pom.xml:

<build>
  <plugins>
    <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-pmd-plugin</artifactId>
      <version>3.7</version>
      <configuration>
        <printFailingErrors>true</printFailingErrors>
        <linkXRef>false</linkXRef>
        <typeResolution>true</typeResolution>
        <targetJdk>1.8</targetJdk>
        <sourceEncoding>${encoding}</sourceEncoding>
        <includeTests>true</includeTests>
        <rulesets>
          <ruleset>/rulesets/java/object-calisthenics.xml</ruleset>
        </rulesets>
      </configuration>
      <executions>
        <execution>
          <phase>test</phase>
          <goals>
            <goal>check</goal>
          </goals>
        </execution>
      </executions>
      <dependencies>
        <dependency>
          <groupId>org.codecop</groupId>
          <artifactId>pmd-rules</artifactId>
          <version>1.2.3</version>
        </dependency>
      </dependencies>
    </plugin>
  </plugins>
</build>

You can add this snippet to any Maven project and enjoy Object Calisthenics. The Jar file of pmd-rules is available in my personal Maven repository.

To test your setup there is sample code in both projects and mvnw test will show two violations:

[INFO] PMD Failure: SampleClass.java:2 Rule:TooManyFields Priority:3 Too many fields.
[INFO] PMD Failure: SampleClass:9 Rule:NoElseKeyword Priority:3 No else keyword.

It is possible to check the rules alone with mvnw pmd:check. (Using the Maven Shell the time to run the checks is reduced by 50%.) There are two run_pmd scripts, one for Windows (.bat) and one for Linux (.sh).

Object Calisthenics Retrospective

Limitations of Checking Code
Obviously code analysis cannot find everything. On the other hand - as discussed earlier - some violations will be false positives, e.g. when using the Stream API. You can use // NOPMD comments and @SuppressWarnings("PMD") annotations to suppress false positives. I recommend using exact suppressions, e.g. @SuppressWarnings("PMD.TooManyFields") to skip violations because other violations at the same line will still be found. Use your good judgement. The goal of Object Calisthenics is to follow all nine rules, not to suppress them.

Learnings
Object Calisthenics is a great exercise. I used it all of my workshops on Object Oriented Programming and in several exercises I did myself. The verification of the rules helped me and the participants to follow the constraints and made the exercise more strict. (If people were stuck I sometimes recommended to ignore one or another PMD violations, at least for some time.) People liked it and had insights into object orientation: It is definitely a "different" and "challenging way to code". "It is good to have small classes. Now that I have many classes, I see more structure." You should give it a try, too. Jeff Bay even recommends to run an exercise or prototype of at least 1000 lines for at least 20 hours.

The question if Object Calisthenics is applicable to real working systems remains. While it is excellent for exercise, it might be too strict to be used in production. On the other hand, in his final note, Jeff Bay talks about a system of 100,000 lines of code written in this style, where the "people working on it feel that its development is so much less tiresome when embracing these rules".

30 November 2014

TDDaiymi vs. Naked Primitives

For our recent session Brutal Coding Constraints at the Agile Testing Days 2014, Martin Klose and I experimented with combinations of constraints for code katas and coding dojos. A constraint, also known as an activity, is a challenge during a kata, coding dojo or code retreat designed to help participants think about writing code differently than they would otherwise. Every activity has a specific learning goal in mind. Most coding constraints are an exaggeration of a fundamental rule of clean code or object oriented design. We tried to find particularly difficult combinations that would challenge expert developers and masters of object orientation. A brutal mixture of constraints is TDD as if you Meant it, No IFs and No naked primitives.

TDD as if you Meant it
Several years ago Keith Braithwaite noticed that many developers use some kind of pseudo-TDD by thinking of a solution and creating classes and functions that they just know they need to implement it. Then their tests keep failing and much time is spent debugging. But instead they should just follow Kent Beck's original procedure of adding a little test, seeing it fail and making it pass by some small change. To help developers practise these original principles Keith defined TDDaiymi as an exercise to really do TDD. The exercise uses a very strict interpretation of the practice of TDD, avoiding design up front and helping people getting used to emergent design by refactoring. It is a difficult exercise and I still remember the first time I did it in a session run by Keith himself. For almost two hours my pair and I were struggling to create something useful. While TDDaiymi is a bit extreme, some people believe it to be the next step after TDD. German "Think Tank" Ralf Westphal even claims that following TDD as if you Meant it is the only way that TDD can give you good design results.

No naked primitives
No naked primitives is another nice constraint. All primitive values, e.g. booleans, numbers or strings need to be wrapped and must not be visible at object boundaries. Arrays, all kinds of containers like lists or hash-tables and even Object (the root class of the language's class hierarchy) are considered primitive as well. Similar to Keith's TDDaiymi this rule is designed to exercise our object orientation skills. A string representing a name is an under-engineered design because many strings are no valid names. In an object oriented system we would like to represent the concept of a name with a Name class. Usually Value Objects are used for this purpose. Also a list of shopping items is not a shopping basket. A general purposes list implementation offers operations that do not make sense for a shopping basket. So containers need to be encapsulated. While it is exaggerated to wrap all primitives (see Primitive obsession obsession), I have seen too many cases of Primitive Obsessions that I rather see a few additional value objects than another map holding maps holding strings. (To avoid that I created a custom PMD rule PrimitiveObsession that flags all primitives in public signatures of Java classes.)

No IFs
No IFs is a great constraint of the Missing Feature catalogue. All conditionals like if, while, switch and the ternary operator (e.g. _ ? _ : _) are forbidden. Like No naked primitives this is exaggerated but many code bases are full of abysmal-deeply nesting, like the PHP Streetfighter. This has been recognised as a problem and there are campaigns to reduce the usage of ifs. After all, excessive usage of if and switch statements is a code smell and can be replaced by polymorphism and other structures. In order to practice the constraint is taken to the extreme. I have seen many developers struggle with the missing concept in the beginning, but after some thought each and everyone of them found a way to express his or her concepts in another, less nested way.

Combining Constraints
Some constraints like TDDaiymi are already difficult on their own, but obeying several of them at the same time is hard. We stick to our (coding-) habits even if we planned to do otherwise. While working the experiments Martin an I would notice a constraint violation only after several times switching roles, several minutes after introducing the "forbidden" concept. This is especially true for conditionals because they are most fundamental to our (imperative styled) coding. It seems that tool support for constraints would help, but not all constraints can be enforced automatically and definitely not for all programming languages.

Refactoring Towards Constraints
It is difficult to find solutions with constraints - after all that is one of their primary purposes - so we decided to first make it green and then refactor towards the constraint. This seems natural as TDD itself has a similar rule: After creating the next failing test (red phase), try to get it green as fast as possible, creating dirty code and duplication as needed (green phase). Then clean up the mess relying on the safety net of the already existing tests (refactor phase). This results in more refactoring than usual, and we spent more than half of the time refactoring. As TDDaiymi puts much more focus on the refactoring step, this might have been due to the first constraint alone. 50-50 for a code kata with a refactoring focused constraints is not much, given that some people report three refactoring commits per one feature commit in their production code.

Designing Up Front?
When refactoring towards the constraint, it is often hard to change the current ("green") solution because it is rooted on primitive data types or conditionals or whatever violates the constraints. Sometimes we had no clue how to refactor and left the violation in the code for several red-green-refactor cycles until it became clear how to proceed. Being hard is not a problem because this is an exercise after all but allowing violations for extended periods of time made me unhappy as the code was not completely clean regarding our standards after each cycle. Also the code looked weird but some constraints are weird and therefore force odd solutions. In following TDDaiymi we did not think about any design up front, but maybe we should have done so. Thinking about structures (i.e. design) that support the constraint, e.g. the absence of conditionals, greatly helps the code and such "designed" solutions look much better. While we should think about the problem, the technology-agnostic solution and its acceptance criteria (test cases) before starting to write code, classic TDD discourages creating a design up front, and TDDaiymi does even more so. I guess more practise is needed to resolve this issue.

Evolving Structures
Another, more direct contradiction is the combination of TDDaiymi and No naked primitives. While following the former constraint, we want to avoid premature design decisions and sometimes use Object as a place-holder for a concept as long as there is no structure yet. Maybe this can be avoided but I do not know how. (If you know how to TDDaiymi without plain objects, I am more than happy to pair with you on some exercises.) In the end, when the design is finished, the Object has been replaced, but as the final structure evolves late, it may stick around for some time. On the other hand Object is a primitive and therefore forbidden. Maybe it is special and should not count as primitive. I have never seen any under-engineered code base using plain objects, on the contrary they usually contain lots of strings with the occasional boolean or number, but no naked Object. So I believe it is safe to allow Object as primitive because it will be replaced eventually.

Conclusion
When I started to write this article I thought its conclusion would be that some coding constraints are just not compatible. As constraints are exaggerated principles, this would be no surprise. But now, after ~~writing~~ thinking about it, I am not sure it is a problem. On the contrary the discussion helped me understand the constraints, their origins and intent. My real conclusion is that doing katas with constraints sparks interesting discussions and is a great way of learning about one's own coding habits and many aspects of coding in general. I strongly encourage you to do so,

Credits
Thanks to Martin Klose for the countless hours of pairing and discussing these constraints with me.

7 May 2010

Custom PMD Rules

Last month I submitted my fourth article from the 'Code Cop' series. After Daily Build, Daily Code Analysis and Automated Testing I wrote about checking architecture rules. One approach described there used PMD. In case you don't know what PMD is, it's a static code analysis tool for Java. It was first developed in 2002 and is updated regularly. It checks the abstract syntax tree (AST) of Java code. (PMD is a great tool. You should definitely use it!)

PMD vs. AST (I just love TLAs)
PMD comes with many predefined rules. Some of them are real life-savers, e.g. EmptyCatchBlock. It's possible to define your own rules. The easiest way to do this is to use XPath expressions to match patterns in the AST. For example, the following trivial class

package org.codecop.myapp.db;

import java.sql.Connection;

public class DBSearch {
   ...
}

is represented inside PMD by the following AST

+ PackageDeclaration
|    + Name:org.codecop.myapp.db
+ ImportDeclaration
|    + Name:java.sql.Connection
+ TypeDeclaration
     + ClassOrInterfaceDeclaration:DBSearch
          + ClassOrInterfaceBody
               + ...

An XPath expression to find imports from the java.sql package looks like

//ImportDeclaration[starts-with(Name/@Image, 'java.sql.')]

It is quite simple (at least when you are familiar with XPath ;-) For further details read the PMD XPath Rule Tutorial.

I've always been enthusiastic about static code analysis and PMD in particular and have been using it successfully since 2004. (For example see my very first presentation on static code analysis with PMD.) I love it; It's a great tool. I have created several custom rules to enforce consistent coding conventions and to prevent bad coding practices. Today I am going to share some of these rules with you in this post.

First Rule
One common bug is public boolean equals(MyClass o) instead of public boolean equals(Object o) which shows that the developer is trying (and failing) to override the equals(Object) method of Object in MyClass. It will work correctly when invoked directly with a MyClass instance but will fail otherwise, e.g. when used inside collections. Such suspiciously close (but still different) equals methods are matched by

//ClassDeclaration//MethodDeclarator
  [@Image = 'equals']
  [count(FormalParameters/*)=1]
  [not ( FormalParameters//Type/Name[
           @Image='Object' or @Image='java.lang.Object' ] ) ]

This rule explained in plain English matches all the method declarations that are named equals, and have one parameter which type is neither Object nor java.lang.Object. (Object is mentioned twice because PMD analyses the source and therefore can't know about simple and full qualified class names.) Later, Tom included this SuspiciousEqualsMethodName rule into PMD's default set of rules.

Copy and Paste
The easiest way to define your own rule is to take an existing one and tweak it. Like SuspiciousEqualsMethodName was derived from SuspiciousHashcodeMethodName, the next JumbledIterator is quite similar to JumbledIncrementer. (JumbledIterator was created by Richard Beitelmair, one of my colleagues who appointed me "Code Cop" and presented me with my first Code Cop T-shirt.) So what's wrong with the following line?

for (Iterator it1 = iterator(); it2.hasNext(); ) { ... }

Most likely it2 should be it1, shouldn't it. Richard created the following rule to pick up on these kinds of errors:

//ForStatement[
  ( ForInit//ClassOrInterfaceType/@Image='Iterator' or
    ForInit//ClassOrInterfaceType/@Image='Enumeration'
  ) and
  ( ends-with(Expression//Name/@Image, '.hasNext') or
    ends-with(Expression//Name/@Image, '.hasMoreElements')
  ) and not (
    starts-with(Expression//Name/@Image,
      concat(ForInit//VariableDeclaratorId/@Image, '.'))
  )
]

A Real Environment
After the last two sections we are warmed up and finally ready for some real stuff. On several occasions I have met developers who wondered why their code Long.getLong(stringContainingANumber) would not work. Well it worked, but it did not parse the String as they expected. This is because the Long.getLong() is a shortcut to access System.getProperty(). What they really wanted was Long.parseLong(). Here is the UnintendedEnvUsage rule:

//PrimaryExpression/PrimaryPrefix/Name[
  @Image='Boolean.getBoolean' or
  @Image='Integer.getInteger' or
  @Image='Long.getLong'
]

Care for Your Tests
PMD provides several rules to check JUnit tests. One of my favourite rules is JUnitTestsShouldIncludeAssert which avoids (the common) tests that do not assert anything. (Such tests just make sure that no Exception is thrown during their execution. This is fair enough but why bother to write them and not add some assert statements to make sure the code is behaving correctly.) Unfortunately, one "quick fix" for that problem is to add assertTrue(true). Rule UnnecessaryBooleanAssertion will protect your tests from such abominations.

A mistake I keep finding in JUnit 3.x tests is not calling super in test fixtures. The framework methods setUp() and tearDown() of class TestCase must always call super.setUp() and super.tearDown(). This is similar to constructor chaining to enable the proper preparation and cleaning up of resources. Whereas Findbugs defines a rule for that, PMD does not. So here is a simple XPath for JunitSetupDoesNotCallSuper rule:

//MethodDeclarator[
  ( @Image='setUp' and count(FormalParameters/*)=0 and
    count(../Block//PrimaryPrefix[@Image='setUp'])=0
  ) or
  ( @Image='tearDown' and count(FormalParameters/*)=0 and
    count(../Block//PrimaryPrefix[@Image='tearDown'])=0
  )
]

Obviously this expression catches more than is necessary: If you have a setUp() method outside of test cases this will also be flagged. Also it does not check the order of invocations, i.e. super.setUp() must be the first and super.tearDown() the last statement. For Spring's AbstractSingleSpringContextTests the methods onSetUp() and onTearDown() would have to be checked instead. So it's far from perfect, but it has still found a lot of bugs for me.

That's all for now. There are more rules that I could share with you, but this blog entry is already far too long. I've set up a repository pmd-rules containing the source code of the rules described here (and many more).