13 September 2011

Getting Started with Hackergarten

Hackergarten is a computer programming contributor group. Read more about Hackergarten here. To join the group either find an existing Hackergarten near you, or start your own. This is a step by step list of what I did when I started Hackergarten Vienna.

The Simple Way to Start a Hackergarten
  1. Contact a few people that might be interested.
  2. Find some Open Source committers and the smallest tickets/ideas to implement from their projects. Usually committers are short on personal time anyway and will love to find someone who is willing to help.
  3. Ask them to create a list of possible things to do in advance and publish it. Maybe have a theme for a night.
  4. Determine the preconditions (what is needed to code...) and publish them, so people are able to instal before.
  5. Find a place with wireless network. Most likely this is the office of a small company, e.g. Canoo or Sphinx. You might also find some pub with a separate room/wireless for no charge.
  6. Negotiate a proper date and time. At least for the first time. Then having a fixed time, e.g. first Tuesday each month is probably the best.
  7. Meet.
  8. Discuss the agenda for max. 15 minutes, do not discuss too long.
  9. Have food and drink ready so people do not have to leave for it.
Read the Hackergarten FAQ for further details and what to avoid.

What to Do
There have been all kinds of contributions made during a Hackergarten. Here are some general ideas what to do during a Hackergarten:
  • Fixing an outstanding ticket (bug) in project X, submitting a patch.
  • Writing Javadocs/doc pages for project Y.
  • Building a plugin for project Z.
  • Making screen casts showing how you can integrate W with Q.
  • Writing a new feature for project P.
  • Create a kata cast.
Also everything applies from Ways to Contribute to Open Source without Being a Programming Genius or a Rock Star.

12 August 2011

Word Wrap Kata Variants

It's time for some exercise - time for a code kata. I like the simple ones because they don't take much time and still provide a certain amount of training. After having done the Prime Factors Kata more than 20 times using Java, Ruby, C#, Turbo Pascal, BASIC and even Forth, I feel like trying a new one for a change. I choose the Word Wrap Kata, also by Uncle Bob. Unlike Prime Factors, Word Wrap seems to be less popular, there are only a few experiences with it published, e.g. Word Wrap using Python.

Recursive
The kata's task is to write a function that, like a word processor, breaks the line by replacing the last space in a line with a newline. The most straight forward solution to this is IMHO the recursive one. I only need to consider the first blank or forced break and call myself with the remaining text.
public String wrap(String line, int maxLineLen) {
   if (line.length() <= maxLineLen) {
      return line;
   }
   int indexOfBlank = line.lastIndexOf(BLANK, maxLineLen);
   int split;
   int offset;
   if (indexOfBlank > -1) {
      split = indexOfBlank;
      offset = 1;
   } else {
      split = maxLineLen;
      offset = 0;
   }
   return line.substring(0, split) + NEWLINE +
      wrap(line.substring(split + offset), maxLineLen);
}
Usually kata is not about the final solution, but about the process to get there. Still I want to compare different solutions, so let's analyse this one. If the line needs to be split n times (into n+1 shorter lines) then this solution creates 3*n String objects and additional n StringBuilders for string concatenation. ... 4*n objects are created.

Exercise Time at Karate DojoTail Recursive
To be tail recursive a function's last statement must be the recursive call.
public String wrap(String line, int maxLineLen) {
   StringBuilder accumulator = new StringBuilder();
   wrap(line, maxLineLen, accumulator);
   return accumulator.toString();
}
private void wrap(String remainingLine, int maxLineLen, StringBuilder accumulator) {
   if (remainingLine.length() <= maxLineLen) {
      accumulator.append(remainingLine);
      return;
   }
   int indexOfBlank = remainingLine.lastIndexOf(BLANK, maxLineLen);
   int split;
   int offset;
   if (indexOfBlank > -1) {
      split = indexOfBlank;
      offset = 1;
   } else {
      split = maxLineLen;
      offset = 0;
   }
   accumulator.append(remainingLine.substring(0, split));
   accumulator.append(NEWLINE);
   wrap(remainingLine.substring(split + offset), maxLineLen, accumulator);
}
This solution creates the new String in the very end. It needs 2 Strings per new line and only one StringBuilder and a final String to return it. ... 2*n+2 objects are created.

Loop
A tail recursive function can be rewritten to reuse the stack frame transforming it into a plain loop. As Java does not support that optimisation, I do it by hand.
public String wrap(String line, int maxLineLen) {
   StringBuilder accumulator = new StringBuilder();
   String remainingLine = line;
   while (remainingLine.length() > maxLineLen) {
      int indexOfBlank = remainingLine.lastIndexOf(BLANK, maxLineLen);
      int split;
      int offset;
      if (indexOfBlank > -1) {
         split = indexOfBlank;
         offset = 1;
      } else {
         split = maxLineLen;
         offset = 0;
      }
      accumulator.append(remainingLine.substring(0, split));
      accumulator.append(NEWLINE);
      remainingLine = remainingLine.substring(split + offset);
   }
   accumulator.append(remainingLine);
   return accumulator.toString();
}
The loopy :-) solution creates the same number of objects as the tail recursive one, but has reduced call overhead. ... Still 2*n+2 objects are created.

Optimised Loop
Let's optimise away the splitting of the remaining line because it gets split again in the next call, so all these Strings are only temporarily used.
public String wrap(String line, int maxLineLen) {
   StringBuilder accumulator = new StringBuilder();
   int pos = 0;
   while (pos + maxLineLen < line.length()) {
      int indexOfBlank = line.lastIndexOf(BLANK, pos + maxLineLen);
      int split;
      int offset;
      if (indexOfBlank > pos - 1) {
         split = indexOfBlank;
         offset = 1;
      } else {
         split = pos + maxLineLen;
         offset = 0;
      }
      accumulator.append(line.substring(pos, split));
      accumulator.append(NEWLINE);
      pos = split + offset;
   }
   accumulator.append(line.substring(pos));
   return accumulator.toString();
}
Now only one String is created per new line, one for the last remaining part and one after the final concatenation. ... n+3 objects are created.

wrappedUsing a Buffer
If I could get rid of half of the String splitting, why not drop the other half too?
public String wrap(String line, int maxLineLen) {
   StringBuilder accumulator = new StringBuilder();
   accumulator.append(line);
   int pos = 0;
   int inserted = 0;
   while (pos + maxLineLen < line.length()) {
      int indexOfBlank = line.lastIndexOf(BLANK, pos + maxLineLen);
      if (indexOfBlank > pos - 1) {
         accumulator.setCharAt(inserted + indexOfBlank, NEWLINE);
         pos = indexOfBlank + 1;
      } else {
         accumulator.insert(inserted + pos + maxLineLen, NEWLINE);
         pos = pos + maxLineLen;
         inserted++;
      }
   }
   return accumulator.toString();
}
Only one StringBuilder and one String are created. ... 2 objects are created. This is definitely the most garbage collector friendly solution because it creates only one temporary object, the StringBuilder.

Copying Characters
If there are blanks in line then all characters get copied once in append() and all solutions behave similar. (The method substring() does not copy characters, it just creates a new String with different pointers in the character array of the original String.) But if there are no blanks in line then the last solution copies all characters after the split point around one or more times. Copying large numbers of characters might be slower than allocating an object, especially as the JVM/Hotspot is optimised for short lived, small objects.

Resizing the StringBuilder
For all solutions using an explicit StringBuilder another optimisation is to avoid automatic resizing. The size of accumulator must be large enough to contain all additional newlines. If a line has a blank then it's replaced, so no new character is added. Only when a line contains no blank, then a newline is inserted. This can happen up to lineLen / maxLineLen times. Flooring (rounding down in integer division) is the right thing because after the last part, e.g. remainingLine, nothing is added.
...
   StringBuilder accumulator =
      new StringBuilder(calcMaxSize(line.length(), maxLineLen));
...
private int calcMaxSize(int lineLen, int maxLineLen) {
   int maxCharsAdded = lineLen / maxLineLen;
   return lineLen + maxCharsAdded;
}
Note that all these optimisations are highly theoretical. In a typical web or database application it doesn't matter at all if a few temporary objects are created or not. Remember that "Early optimisation is the root of all evil" (Donald Knuth). I would stick with the first solution because it's the shortest and easy to understand.

Bonus Round
public String wrap(String line, int maxLineLen) {
   return line.replaceAll("([^ ]{" + maxLineLen + "})" + // 1
      "(?=[^ ])" +                                       // 2
      "|" +                                              // 3
      "(.{1," + maxLineLen + "})" +                      // 4
      " ",                                               // 5
      "$1$2" + NEWLINE);
}
This solution is even shorter, just one line using a Regular Expression. It replaces the split points with a newline or adds one. The pattern matches areas of line which (1) contain exact maxLineLen characters that are not blanks, (2) which must be followed by a character that's not a blank (and which is not consumed) (3) or it matches areas of (4) one up to maxLineLen characters (5) which are followed by a blank. The match is replaced with the first (1) and second (3) groups together with a newline. The single blank (5) is not a member of any group and is dropped.

(Get the source.)

8 August 2011

Maven Plugin Harness Woes

Last year I figured out how to use the Maven Plugin Harness and started using it. I added it to my Maven plugin projects. Recently I started using DEV@cloud. DEV@cloud is a new service which contains Jenkins and Maven source repositories. CloudBees, the company behind DEV@cloud, offers a free subscription with reduced capabilities which are more than enough for small projects. I set up all my projects there in no time, but had serious problems with the integration tests.

Using a local Maven repository other than ~/.m2/repository
Repository (unnamed)You don't have to use the default repository location. It's possible to define your own in the user's settings.xml or even in the global settings. But I guess most people just use the default. On the other hand in an environment like DEV@cloud, all the different builds from different users must be separated. So CloudBees decided that each Jenkins job has its own Maven repository inside the job's workspace. That is good because the repository is deleted together with the project.

Problem
The Testing Harness embeds Maven, i.e. forks a new Maven instance. It fails to relay the modified settings to this new process. During the execution of the integration test a new local repository is created and the original local one is used as a remote one (called "local-as-remote"). But without any hints, Maven uses ~/.m2/repository. So the true local repository is not found and all needed artefacts are downloaded again. This takes a lot of time (and wastes bandwidth). Dependencies that exist only in the local repository, e.g. snapshots of dependent projects, are not found and the integration test fails.

Solution
RepositoryTool.findLocalRepositoryDirectoy() uses an instance of MavenSettingsBuilder to get the settings. Its only implementing class is DefaultMavenSettingsBuilder and it tries to determine the repository location from the value of the system property maven.repo.local. Then it reads the user settings and in the end it uses ~/.m2/repository. The solution is to set the maven.repo.local system property whenever the local repository is not under ~/.m2/repository. Add -Dmaven.repo.local=.repository into the field for Goals and Options of the Jenkins job configuration.

Using additional dependencies while building integration test projects
indirectionAfter the plugin under test is built and installed into the new local repository the Maven Plugin Harness runs Maven against the integration test projects inside the src/test/resources/it folder. The approach which I described last year forks a Maven with pom, properties and goals defined by the JUnit test method.

Problem
The integration tests know the location of the new local repository (because it is set explicitly) and are able to access the plugin under test. But they know nothing about the local-as-remote repository. They can only access all artefacts which have been "downloaded" from the local-as-remote repository during the build of the plugin under test. So the problem is similar to the previous problem but occurs only when an integration test project needs additional artefacts. For example a global ruleset Maven module might consists of XML ruleset configuration files. The test module depends on the Checkstyle plugin and executes it using the newly build rulesets. So the object under test (the rules XML) is tested indirectly through the invocation of Checkstyle but the ruleset module itself does not depend on Checkstyle.

Solution
All POMs used during integration test have to be "mangled", not just the POM of the plugin under test. The method manglePomForTestModule(pom) is defined in the ProjectTool but it's protected and not accessible. So I copied it to AbstractPluginITCase and applied it to the integration test POMs.

Using settings other than ~/.m2/repository/settings.xml
Cluster ConfigurationIf you need artefacts from repositories other than Maven Central you usually add them to your settings.xml. Then you refer to them in the Jenkins job configuration. Behind the scenes Jenkins calls Maven with the parameter -s custom_settings.xml.

Problem
Similar to the repository location, the custom settings' path is not propagated to the embedded Maven and it uses the default settings. This causes no problems if all needed artefacts are either in the local-as-remote repository or can be downloaded from Maven Central. For example a Global Ruleset might contain some Macker Architecture Rules. The snapshot of the Macker Maven Plugin is deployed by another build job into the CloudBees snapshot repository. The test module depends on this Macker plugin and runs it using the newly built rulesets.

Solution
AbstractPluginITCase calls BuildTool's createBasicInvocationRequest() to get an InvocationRequest and subsequently executes this request. Using any system property the InvocationRequest can be customised:
if (System.getProperty(ALT_USER_SETTINGS_XML_LOCATION) != null) {
   File settings =
      new File(System.getProperty(ALT_USER_SETTINGS_XML_LOCATION));
   if (settings.exists()) {
      request.setUserSettingsFile(settings);
   }
}
Then the value of the used system property is added into the field for Jenkins' Goals and Options: -s custom_settings.xml -Dorg.apache.maven.user-settings=custom_settings.xml.

Alas I'm not a Maven expert and it took me quite some time to solve these problems. They are not specific to CloudBees but they result from using non default settings. Other plugins that fork Maven have similar problems.