Code Cop: PMD

Showing posts with label PMD. Show all posts

19 October 2018

NATstyle Custom Reports

This summer I wrote about NATstyle, a utility to define and check the coding standard in your NATURAL program. Now NATURAL is not a language you will encounter frequently (hopefully not ;-) but it is an excellent show case for neither being intimidated by strange environments nor accepting idiosyncratic vendor views. First I showed how NATstyle can be used outside of the NATURAL IDE - a prerequisite for setting up Continuous Integration, then I experimented with ways to define my own custom analysis rules. Now is the time to talk about reporting.

Custom Stylesheets
NATstyle creates an XML report. Its structure is

<NATstyle>
  <file type='...' location='...'>
    <error severity='...' message='...' />
    ...
  </file>
  ...
</NATstyle>

NATstyle comes with a stylesheet NATstyleSimple.xsl. The XSLT transformation is not part of the NATstyle execution. Using custom stylesheets allows you to convert the XML into arbitrary reports.

Suitable Target Structure
When working on weird environments, I try to use standard tools as much as possible. Emulating common output formats is the way to go. For example in the case of test results, the JUnit XML format is used in similar scenarios, e.g. Erlang, Cobol and others.

What would be a good structure for reporting static code analysis findings? PMD is a common tool in the Java space and it creates XML reports which are understood by tools like Jenkins, making integration much easier. In the report, violations are grouped by files:

<pmd version="..." timestamp="...">
  <file name="SampleClass.java">
    <violation beginline="2" endline="16"
               begincolumn="8" endcolumn="1"
               rule="TooManyFields" ruleset="Code Size"
               externalInfoUrl="..." priority="3">
      Too many fields
    </violation>
  </file>
</pmd>

XML Parser
Changing the NATstyle result to PMD's format is possible using XSLT alone, but I favour XML parsers for more complex transformations. I established the following process to convert the results: As I used Ant to execute NATstyle on Jenkins, I created an Ant task NatStyleResultToPmdTask.java. (Ant declaration for that task is in ant/ant_NatStyleResultToPmd.xml.) The Ant task sets the input and output XML file names and invokes the NatStyleToPmd.java conversion which uses a SAX parser to parse the NATstyle result and feeds the violations into net.sourceforge.pmd.Report#addRuleViolation. Then core PMD classes create the final XML report. (The project containing the Ant task and complete, working setup is here.)

Conclusion
In legacy environments it is often possible to "bridge" to standard tools and make use of the rich tooling we have. We can apply modern practises like automated code reviews or Continuous Integration. Another example is bringing TDD and automated testing to the mainframe platform. It is 2018, we have all these great tools, let's make use of them!

11 January 2018

Compliance with Object Calisthenics

During my work as Code Cop I run many workshops. Sometimes I use constraints to make exercises more focused or more intense. Some constraints, like the Brutal Coding Constraints, are composite or aggregate constraints, which means that they are a combination of several simpler or low level constraints. (Here simple does not mean that they are easy to follow, rather that they focus on a single thing.) Today I want to discuss Object Calisthenics.

Object Calisthenics
Jeff Bay's Object Calisthenics is an aggregate constraint combined of the following nine rules:

Use only one level of indentation per method.
Don't use the else keyword.
Wrap all primitives and strings (in public API).
Use only one dot per line.
Don't abbreviate (long names).
Keep all entities small.
Don't use any classes with more than two instance variables.
Use first-class collections.
Don't use any getters/setters/properties.

If you are not familiar with these rules, I recommend you read Jeff Bay's original essay published in the The ThoughtWorks Anthology in 2008. William Durand's post is an exact copy of that essay, so no need to buy the book for that. Several people have followed up and discussed their interpretation and experience with Object Calisthenics, e.g. Mark Needham, Jeff Pace, Vasiliki Vockin and Juan Antonio.

Object Calisthenics is an exercise in object orientation. How is that? One of the core concepts of OOP is Abstraction: A class should capture one and only one key abstraction. Obviously primitive and built-in types lack abstraction. Wrapping primitives (rule #3) and wrapping collections (rule #8) drive the code towards more abstractions. Small entities (rule #6) help to keep our abstractions focused.

Further objects are defined by what they do, not what they contain. All data must be hidden within its class. This is Encapsulation. Rule #9, No Properties, forces you to stay away from accessing the fields from outside of the class.

Next to Abstraction and Encapsulation, these nine rules help Loose Coupling and High Cohesion. Loose Coupling is achieved by minimizing the number of messages sent between a class and its collaborator. Rule #4, One Dot Per Line, reduces the coupling introduced by a single line. This rule is misleading, because what Jeff really meant was the Law Of Demeter. The law is not about counting dots per line, it is about dependencies: "Only talk to your immediate friends." Sometimes even a single dot in a line will violate the law.

High Cohesion means that related data and behaviour should be in one place. This means that most of the methods defined on a class should use most of the data members most of the time. Rule #5, Don't Abbreviate, addresses this: When a name of a field or method gets long and we wish to shorten it, obviously the enclosing scope does not provide enough context for the name, which means that the element is not cohesive with the other elements of the class. We need another class to provide the missing context. Next to naming, small entities (rule #6) have a higher probability of being cohesive because there are less fields and methods. Limiting the number of instance variables (rule #7) also keeps cohesion high.

The remaining rules #1 and #2, One Level Of Indentation and No else aim to make the code simpler by avoiding nested code constructs. After all who wants to be a PHP Street Fighter. ;-)

Checking Code for Compliance with Object Calisthenics
When facilitating coding exercises with composite constraints, I noticed how easy it is to overlook certain violations. We are used to conditionals or dereferencing pointers that we might not notice them when reading code. Some rules like the Law Of Demeter or a maximum size of classes need a detailed inspection of the code to verify. To check Java code for compliance with Object Calisthenics I use PMD. PMD contains several rules we can use:

Rule java/coupling.xml/LawOfDemeter for rule #4.

Rule #6 can be checked with NcssTypeCount. A NCSS count of 30 is usually around 50 lines of code.

<rule ref="rulesets/java/codesize.xml/NcssTypeCount">
    <properties>
        <property name="minimum" value="30" />
    </properties>
</rule>

And there is TooManyFields for rule #7.

<rule ref="rulesets/java/codesize.xml/TooManyFields">
    <properties>
        <property name="maxfields" value="2" />
    </properties>
</rule>

I work a lot with PMD and have created custom rules in the past. I added rules for Object Calisthenics. At the moment, my Custom PMD Rules contain a rule set file object-calisthenics.xml with these rules:

java/constraints.xml/NoElseKeyword is very simple. All else keywords are flagged by the XPath expression //IfStatement[@Else='true'].
java/codecop.xml/FirstClassCollections looks for fields of known collection types and then checks the number of fields.
java/codecop.xml/OneLevelOfIntention
java/constraints.xml/NoGetterAndSetter needs a more elaborate XPath expression. It is checking MethodDeclarator and its inner Block/ BlockStatement/ Statement/ StatementExpression/ Expression/ PrimaryExpressions.
java/codecop.xml/PrimitiveObsession is implemented in code. It checks PMD's ASTConstructorDeclaration and ASTMethodDeclaration for primitive parameters and return types.

For the nitty-gritty details of all the rules have a look at the rules defined in codecop.xml and constraints.xml.

Interpretation of Rules: Indentation
When I read Jeff Bay's original essay, the rules were clear. At least I thought so. Verifying them automatically showed some areas where different interpretations are possible. Different people see Object Calisthenics in different ways. In comparison, the Object Calisthenics rules for PHP_CodeSniffer implement One Level Of Indentation by allowing a nesting of one. For example there can be conditionals and there can be loops, but no conditional inside of a loop. So the code is either formatted at method level or indented one level deep. My PMD rule is more strict: Either there is no indentation - no conditional, no loop - or everything is indented once: for example, if there is a loop, than the whole method body must be inside this loop. This does not allow more than one conditional or loop per method. My rule follows Jeff's idea that each method does exactly one thing. Of course, I like my strict version, while my friend Aki Salmi said that I went to far as it is more like Zero Level Of Indentation. Probably he is right and I will recreate this rule and keep the Zero Level Of Indentation for the (upcoming) Brutal version of Object Calisthenics. ;-)

Wrap All Primitives
There is no PHP_CodeSniffer rule for that, as Tomas Votruba considers it "too strict, vague or annoying". Indeed, this rule is very annoying if you use primitives all the way and your only data structure is an associative array or hash map. All containers like java.util.List, Set or Map are considered primitive as well. Samir Talwar said that every type that was not written by yourself is primitive because it is not from your domain. This prohibits the direct usage of Files and URLs to name a few, but let's not go there. (Read more about the issue of primitives in one of my older posts.)

My rule allows primitive values in constructors as well as getters to implement the classic Value Object pattern. (The rule's implementation is simplistic and it is possible to cheat by passing primitives to constructors. And the getters will be flagged by rule #9, so no use for them in Object Calisthenics anyway.)

I agree with Tomas that this rule is too strict, because there is no point in wrapping primitive payloads, e.g. strings that are only displayed to the user and not acted on by the system. These will be false positives. There are certain methods with primitives in their signatures like equals and hashCode that are required by Java. Further we might have plain numbers in our domain or we use indexing of some sort, both will be false positives, too.

One Dot Per Line
As I said before, I use PMD's LawOfDemeter to verify rule #4. The law allows sending messages to objects that are

the immediate parts of this or
the arguments of the current method or
objects created inside the current method or
objects in global variables.

I did not look at PMD's source code to check the implementation of this rule - but it complains a lot. For me this is the most difficult rule of all nine rules. (I code according to #1, #3, #5 and #6 and can easily adapt to strictly follow #2, #7, #8 and #9.) Although it complains a lot, I found every violation correct. I learned much about Law Of Demeter by checking my code for violations. For example, calling methods on an element of an array is a violation. The indexed array access is similar to a pointer access. (In Ruby this is obvious because Array defines a method def [](index).) Another interesting fact is that (at least in PMD) the law flags calling methods on enums. The enum instances are not created locally, so we cannot send them messages. On the other hand, an enum is a global variable, so maybe it should be allowed to call methods on it.

The PHP_CodeSniffer rule follows the rule's name and checks that there is only one dot per line. This creates better code, because train wrecks will be split into explaining variables which make debugging easier. Also Tomas is checking for known fluent interfaces. Fluent interfaces - by definition - look like they are violating the Law Of Demeter. As long as the fluent interface returns the same instance, as for example basic builders do, there is no violation. When following a more relaxed version of the law, the Class Version of Law Of Demeter, than different implementations of the same type are still possible. The Java Stream API, where many calls return a new Stream instance of a different class - or the same class with a different generic type - is likely to violate the law. It does not matter. Fluent interfaces are designed to improve readability of code. Law Of Demeter violations in fluent interfaces are false positives.

Don't Abbreviate
I found it difficult to check for abbreviations, so rule #5 is not enforced. I thought of implementing this rule using a dictionary, but that is prone to false positives as the dictionary cannot contain all terms from all domains we create software for. The PHP_CodeSniffer rules check for names shorter than three characters and allow certain exceptions like id. This is a good start but is not catching all abbreviations, especially as the need to abbreviate arises from long names. Another option would be to analyse the name for its camel case patterns, requiring all names to contain lowercase characters between the uppercase ones. This would flag acronyms like ID or URL but no real abbreviations like usr or loc.

Small Entities
Small is relative. Different people use different limits depending on programming language. Jeff Bay's 50 lines work well for Java. Rafael Dohms proposes to use 200 lines for PHP. PHP_CodeSniffer checks function length and number of methods per class, too. Fabian Schwarz-Fritz limits packages to ten classes. All these additional rules follow Jeff Bay's original idea and I will add them to the rule set in the future.

Two Instance Variables
Allowing only two instance variables seems arbitrary - why not have three or five. Some people have changed the rules to allow five fields. I do not see how the choice of language makes a difference. Two is the smallest number that allows composition of object trees.

In PHP_CodeSniffer there is no rule for this because the number depends on the "individual domain of each project". When an entity or value object consists of three or more equal parts, the rule will flag the code but there is no problem. For example, a class BoundingBox might contain four fields top, left, bottom, right. Depending on the values, introducing a new wrapper class Coordinate to reduce these fields to topLeft and bottomRight might make sense.

No Properties
My PMD rule finds methods that return an instance field (a getter) or update it (a setter). PHP_CodeSniffer checks for methods using the typical naming conventions. It further forbids the usage of public fields, which is a great idea. As we wrapped all primitives (rule #3) and we have no getters, we can never check their values. So how do we create state based tests? Mark Needham has discussed "whether we should implement equals and hashCode methods on objects just so that we can test their equality. My general feeling is that this is fine although it has been pointed out to me that doing this is actually adding production code just for a test and should be avoided unless we need to put the object into a HashMap or HashSet."

From what I have seen, most object oriented developers struggle with that constraint. Getters and setters are very ingrained. In fact some people have dropped that constraint from Object Calisthenics. There are several ways to live without accessors. Samir Talwar has written why avoiding Getters, Setters and Properties is such a powerful mind shift.

Java Project Setup
I created a repository containing the starting point for the LCD Numbers Kata:

lcd-numbers-object-calisthenics-java-setup.

Both are Apache Maven projects. The projects are set up to check the code using the Maven PMD Plugin on each test execution. Here is the relevant snippet from the pom.xml:

<build>
  <plugins>
    <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-pmd-plugin</artifactId>
      <version>3.7</version>
      <configuration>
        <printFailingErrors>true</printFailingErrors>
        <linkXRef>false</linkXRef>
        <typeResolution>true</typeResolution>
        <targetJdk>1.8</targetJdk>
        <sourceEncoding>${encoding}</sourceEncoding>
        <includeTests>true</includeTests>
        <rulesets>
          <ruleset>/rulesets/java/object-calisthenics.xml</ruleset>
        </rulesets>
      </configuration>
      <executions>
        <execution>
          <phase>test</phase>
          <goals>
            <goal>check</goal>
          </goals>
        </execution>
      </executions>
      <dependencies>
        <dependency>
          <groupId>org.codecop</groupId>
          <artifactId>pmd-rules</artifactId>
          <version>1.2.3</version>
        </dependency>
      </dependencies>
    </plugin>
  </plugins>
</build>

You can add this snippet to any Maven project and enjoy Object Calisthenics. The Jar file of pmd-rules is available in my personal Maven repository.

To test your setup there is sample code in both projects and mvnw test will show two violations:

[INFO] PMD Failure: SampleClass.java:2 Rule:TooManyFields Priority:3 Too many fields.
[INFO] PMD Failure: SampleClass:9 Rule:NoElseKeyword Priority:3 No else keyword.

It is possible to check the rules alone with mvnw pmd:check. (Using the Maven Shell the time to run the checks is reduced by 50%.) There are two run_pmd scripts, one for Windows (.bat) and one for Linux (.sh).

Object Calisthenics Retrospective

Limitations of Checking Code
Obviously code analysis cannot find everything. On the other hand - as discussed earlier - some violations will be false positives, e.g. when using the Stream API. You can use // NOPMD comments and @SuppressWarnings("PMD") annotations to suppress false positives. I recommend using exact suppressions, e.g. @SuppressWarnings("PMD.TooManyFields") to skip violations because other violations at the same line will still be found. Use your good judgement. The goal of Object Calisthenics is to follow all nine rules, not to suppress them.

Learnings
Object Calisthenics is a great exercise. I used it all of my workshops on Object Oriented Programming and in several exercises I did myself. The verification of the rules helped me and the participants to follow the constraints and made the exercise more strict. (If people were stuck I sometimes recommended to ignore one or another PMD violations, at least for some time.) People liked it and had insights into object orientation: It is definitely a "different" and "challenging way to code". "It is good to have small classes. Now that I have many classes, I see more structure." You should give it a try, too. Jeff Bay even recommends to run an exercise or prototype of at least 1000 lines for at least 20 hours.

The question if Object Calisthenics is applicable to real working systems remains. While it is excellent for exercise, it might be too strict to be used in production. On the other hand, in his final note, Jeff Bay talks about a system of 100,000 lines of code written in this style, where the "people working on it feel that its development is so much less tiresome when embracing these rules".

3 December 2017

PMD Check and Report in same build

I am working together with senior developer and (coding) architect Elisabeth Blümelhuber to set up a full featured continuous delivery process for the team. The team's projects use Java and are built with Maven.

Using PMD for Static Code Analysis
After using Jenkins for some time to run the tests, package and deploy the products, it was time to make it even more useful: Add static code analysis. As a first step Elisabeth added a PMD report of a small set of important rules to the Maven parent of all projects. PMD creates a pmd.xml in the target folder which is picked up by Jenkins' PMD Plugin. Jenkins displays the found violations and tracks changes over time, showing a basic trend graph. (While SonarQube would be more powerful, we decided to stay with Jenkins because the team was already "listening" to it.)

Breaking the Build on Critical Violations
I like breaking the build on critical violations to ensure the developers' attention. It is vital, though, to achieve the acceptance of the team members when changing their development process. We thus started with a custom, minimal set of rules (in src/config/pmd_mandatory.xml) that would break the build. The smaller the initial rule set is the better. In the beginning of adding static code analysis to the build process, it is not about the code but getting the team aboard - we can always add more rules later. The first rule set might even contain a single rule, e.g. EmptyCatchBlock. Empty Catch blocks are a well known problem when analysing defects and usually developers agree with the severity of having them in the code and accept breaking the build for that. On the other hand, breaking the build on minor or formatting issues is not recommended in the beginning.

Here is the snippet of our pom.xml that breaks the build:

<build>
  ...
  <plugins>
    <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-pmd-plugin</artifactId>
      <configuration>
        <failOnViolation>true</failOnViolation>
        <printFailingErrors>true</printFailingErrors>
        <rulesets>
          <ruleset>.../pmd_mandatory.xml</ruleset>
        </rulesets>
        ... other settings
      </configuration>
      <executions>
        <execution>
          <id>pmd-break</id>
          <phase>prepare-package</phase>
          <goals>
            <goal>check</goal>
          </goals>
        </execution>
      </executions>
    </plugin>
  </plugins>
</build>

This is more or less taken directly from the PMD Plugin documentation. After running the tests, PMD checks the code.

Keeping a Report of Major Violations
We wanted to keep the report Elisabeth had established previously. We tried to add another <execution> element for that. As executions can have their own <configuration> we thought that this would work, but it did not. PMD just ignored the second configuration. (Maybe this is a general Maven issue. For example the Maven Failsafe Plugin is a copy of the Surefire plugin to allow both plugins to have different configurations.)

The PMD plugin offers a report for the Maven site which is configured independently. As a workaround for the above problem, we used the site report to check the rules listed in src/config/pmd_report.xml. The PMD report invocation created the needed target/pmd.xml as well as a readable target/site/pmd.html.

<reporting>
  <plugins>
    ... other plugins
    <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-pmd-plugin</artifactId>
      <configuration>
        <rulesets>
          <ruleset>.../pmd_report.xml</ruleset>
        </rulesets>
        ... other settings
      </configuration>
    </plugin>
  </plugins>
</reporting>

Skipping Maven Standard Reports
Unfortunately mvn site also created other reports which we did not need and which slowed down the build. Maven standard reports can be selected using the Maven Project Info Reports Plugin. It is possible to set its <reportSet> empty, not creating any reports:

<reporting>
  <plugins>
    ... other plugins
    <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-project-info-reports-plugin</artifactId>
      <version>2.9</version>
      <reportSets>
        <reportSet>
          <reports>
            <!-- empty - no reports -->
          </reports>
        </reportSet>
      </reportSets>
    </plugin>
  </plugins>
</reporting>

Now it did not create the standard reports. It only generated target/site/project-reports.html with a link to the pmd.html and no other HTML reports. Win.

Skipping CPD Report
By default, the PMD plugin invokes PMD and CPD. CPD is checking for duplicate code - and is very useful - but we did not want to use it right now. As I said before, we wanted to start small. All plugins have goals which are explained in the documentation. Obviously the Maven report invokes PMD plugin's goals pmd:pmd and pmd:cpd. How do we tell a report which goals to invoke? That was the hardest problem to solve because we could not find any documentation on that. It turned out that each reporting plugin can be configured with <reportSets> similar to the Maven Project Info Reports Plugin:

<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-pmd-plugin</artifactId>
  <configuration>
    ... same as above
  </configuration>
  <reportSets>
    <reportSet>
      <reports>
        <report>pmd</report>
      </reports>
    </reportSet>
  </reportSets>
</plugin>

Putting Everything Together
We execute the build with

mvn clean verify site

If there is a violation of the mandatory rules, the build breaks and Maven stops. Otherwise site generates the PMD report. If there are no violations at all, Maven does not create a pmd.html. There is always a pmd.xml, so Jenkins is always happy.

(The complete project (compressed as zip) is here.)

7 May 2010

Custom PMD Rules

Last month I submitted my fourth article from the 'Code Cop' series. After Daily Build, Daily Code Analysis and Automated Testing I wrote about checking architecture rules. One approach described there used PMD. In case you don't know what PMD is, it's a static code analysis tool for Java. It was first developed in 2002 and is updated regularly. It checks the abstract syntax tree (AST) of Java code. (PMD is a great tool. You should definitely use it!)

PMD vs. AST (I just love TLAs)
PMD comes with many predefined rules. Some of them are real life-savers, e.g. EmptyCatchBlock. It's possible to define your own rules. The easiest way to do this is to use XPath expressions to match patterns in the AST. For example, the following trivial class

package org.codecop.myapp.db;

import java.sql.Connection;

public class DBSearch {
   ...
}

is represented inside PMD by the following AST

+ PackageDeclaration
|    + Name:org.codecop.myapp.db
+ ImportDeclaration
|    + Name:java.sql.Connection
+ TypeDeclaration
     + ClassOrInterfaceDeclaration:DBSearch
          + ClassOrInterfaceBody
               + ...

An XPath expression to find imports from the java.sql package looks like

//ImportDeclaration[starts-with(Name/@Image, 'java.sql.')]

It is quite simple (at least when you are familiar with XPath ;-) For further details read the PMD XPath Rule Tutorial.

I've always been enthusiastic about static code analysis and PMD in particular and have been using it successfully since 2004. (For example see my very first presentation on static code analysis with PMD.) I love it; It's a great tool. I have created several custom rules to enforce consistent coding conventions and to prevent bad coding practices. Today I am going to share some of these rules with you in this post.

First Rule
One common bug is public boolean equals(MyClass o) instead of public boolean equals(Object o) which shows that the developer is trying (and failing) to override the equals(Object) method of Object in MyClass. It will work correctly when invoked directly with a MyClass instance but will fail otherwise, e.g. when used inside collections. Such suspiciously close (but still different) equals methods are matched by

//ClassDeclaration//MethodDeclarator
  [@Image = 'equals']
  [count(FormalParameters/*)=1]
  [not ( FormalParameters//Type/Name[
           @Image='Object' or @Image='java.lang.Object' ] ) ]

This rule explained in plain English matches all the method declarations that are named equals, and have one parameter which type is neither Object nor java.lang.Object. (Object is mentioned twice because PMD analyses the source and therefore can't know about simple and full qualified class names.) Later, Tom included this SuspiciousEqualsMethodName rule into PMD's default set of rules.

Copy and Paste
The easiest way to define your own rule is to take an existing one and tweak it. Like SuspiciousEqualsMethodName was derived from SuspiciousHashcodeMethodName, the next JumbledIterator is quite similar to JumbledIncrementer. (JumbledIterator was created by Richard Beitelmair, one of my colleagues who appointed me "Code Cop" and presented me with my first Code Cop T-shirt.) So what's wrong with the following line?

for (Iterator it1 = iterator(); it2.hasNext(); ) { ... }

Most likely it2 should be it1, shouldn't it. Richard created the following rule to pick up on these kinds of errors:

//ForStatement[
  ( ForInit//ClassOrInterfaceType/@Image='Iterator' or
    ForInit//ClassOrInterfaceType/@Image='Enumeration'
  ) and
  ( ends-with(Expression//Name/@Image, '.hasNext') or
    ends-with(Expression//Name/@Image, '.hasMoreElements')
  ) and not (
    starts-with(Expression//Name/@Image,
      concat(ForInit//VariableDeclaratorId/@Image, '.'))
  )
]

A Real Environment
After the last two sections we are warmed up and finally ready for some real stuff. On several occasions I have met developers who wondered why their code Long.getLong(stringContainingANumber) would not work. Well it worked, but it did not parse the String as they expected. This is because the Long.getLong() is a shortcut to access System.getProperty(). What they really wanted was Long.parseLong(). Here is the UnintendedEnvUsage rule:

//PrimaryExpression/PrimaryPrefix/Name[
  @Image='Boolean.getBoolean' or
  @Image='Integer.getInteger' or
  @Image='Long.getLong'
]

Care for Your Tests
PMD provides several rules to check JUnit tests. One of my favourite rules is JUnitTestsShouldIncludeAssert which avoids (the common) tests that do not assert anything. (Such tests just make sure that no Exception is thrown during their execution. This is fair enough but why bother to write them and not add some assert statements to make sure the code is behaving correctly.) Unfortunately, one "quick fix" for that problem is to add assertTrue(true). Rule UnnecessaryBooleanAssertion will protect your tests from such abominations.

A mistake I keep finding in JUnit 3.x tests is not calling super in test fixtures. The framework methods setUp() and tearDown() of class TestCase must always call super.setUp() and super.tearDown(). This is similar to constructor chaining to enable the proper preparation and cleaning up of resources. Whereas Findbugs defines a rule for that, PMD does not. So here is a simple XPath for JunitSetupDoesNotCallSuper rule:

//MethodDeclarator[
  ( @Image='setUp' and count(FormalParameters/*)=0 and
    count(../Block//PrimaryPrefix[@Image='setUp'])=0
  ) or
  ( @Image='tearDown' and count(FormalParameters/*)=0 and
    count(../Block//PrimaryPrefix[@Image='tearDown'])=0
  )
]

Obviously this expression catches more than is necessary: If you have a setUp() method outside of test cases this will also be flagged. Also it does not check the order of invocations, i.e. super.setUp() must be the first and super.tearDown() the last statement. For Spring's AbstractSingleSpringContextTests the methods onSetUp() and onTearDown() would have to be checked instead. So it's far from perfect, but it has still found a lot of bugs for me.

That's all for now. There are more rules that I could share with you, but this blog entry is already far too long. I've set up a repository pmd-rules containing the source code of the rules described here (and many more).

19 January 2008

Daily Build Articles

In 2007 I started writing articles about Daily Builds for Java applications, which I called the 'Code Cop' Series. Unfortunately I just managed to finish two articles so far, shown below. I have a lot of material for further articles about adding automated testing and enforcing architecture to our daily build, just have to squeeze in the time to do it ;-)

Part 1 - Daily Build with Anthill OS
This article describes my experiences when introducing a daily build in 2004 when I used the Anthill tool. The first steps were to create JavaDoc pages daily and to compile the Java sources. It turned out that the initial set-up of these build routines did not cost much and were supported by the team. Obviously this is only the start of better quality code. Read more in the full article Täglicher Build mit Anthill OS published in JavaSPEKTRUM, 5/2007.

References

S. McConnell, Daily Build and Smoke Test, in IEEE Software, Vol. 13, No. 4, 1996
J. Spolsky, Daily Builds Are Your Friend, Joel on Software, 2001
J. Spolsky, Getting Things Done When You're Only a Grunt, Joel on Software, 2001
Urbancode, Inc, Anthill OS

Other build tools are Anthill Pro, Continuum, Cruise Control, Maven, Luntbuild (incomplete list).

Part 2 - Daily Code Analysis with PMD
This article introduces static code analysis with PMD. The existing daily build was extended easily. A daily report of the code quality metrics awakened the management and was used as a base to check for a small set of errors. The most serious of them were fixed and part of the coding conventions have been checked automatically since then. Read more in the full article Tägliche Code-Analyse mit PMD published in JavaSPEKTRUM, 1/2008.

References

P. Duvall, Automation for the people: Continuous Inspection, IBM developerWorks, 2006
E. R. Harold, Zap bugs with PMD, IBM developerWorks, 2005
SourceForge.net, PMD

Other code analysis tools are Checkstyle, Enerjy CQ2, Findbugs, JavaNCSS, JLint, Jtest, Optimal Advisor, Sotograph (incomplete list).

(Download source code of Ant-PMD integration and BuildStatistics.java.)

FAQ
Q: Why did you favour Anthill OS instead of all other build tools listed? A: There was no evaluation or decision process. Anthill OS was just the first tool I got my hands on. Everything worked fine, so I did not look further. For a new project I would use Cruise Control because it is actively maintained and has a strong community. Paul Duvall has written a nice comparison of open source CI servers. If your build process is "heavy", you might want to have a look at commercial build servers. Some of them offer build farms. e.g. Anthill Pro, Parabuild or Team City.

Q: You divided all rules of PMD into four groups: error, warning, information and formatting. How can I get this categorisation? A: PMD comes with build in severity. Each rule definition contains an <priority> element. There are also some commercial tools that use PMD under the hood and have their own severity levels. Some of them even have references for each rule, why it's bad.

Q: You use a program called BuildRuleDoc to document used rules. Is it free or home-grown? A: I wrote it myself, but you can use it if you want to. The BuildRuleDoc.zip contains the code, the template to create the rule document, an Ant fragment and some test scripts. You will have to adapt the scripts in order to run them. Finally you need the XML rule set file of active PMD rules to generate the report.

(List of all my publications with abstracts.)