Showing posts with label performance. Show all posts
Showing posts with label performance. Show all posts

25 August 2022

Practice Speed

Several years ago I co-facilitated the Global Day of Coderetreat with Houssam Fakih. It was a nice Coderetreat. During the introduction Houssam presented the concept of practice and three levels of competence. The first level is that you are able to do a certain thing from time to time. The second level is that you are able to do it all the time. The third level is, and that was new to me, that you perform the thing consistently well and fast at the same time. In this short video there are three guys throwing balls at a boot at a fair. The guy on the right is on level one. He is able to throw the ball into the basket most of the time. The guy in the middle is competent and always hits the target. But the guy on the left is on another level. Besides hitting the basket consistently, he is using both hands alternating and is two to three times faster.

The Walk to Save Great Danes (licensed CC BY-NC-ND by Warchild)Deliberate Practice
In Coding Dojos and Coderetreats we practice all kind of coding techniques like Test Driven Development, software design and pair programming, but we rarely practice speed. The Coding Dojo mindset is explicit about going slow. There are a few exercises which have a timing aspect, e.g. Elephant Carpaccio or Baby Steps. Some people try to master the Baby Steps constraint by typing faster and using more shortcuts. While these are good things to practice by themselves, they bypass the exercise. Both exercises focus on taking smaller steps. The speed of typing is not the bottleneck.

Side Note: Speed of Typing
If the speed of typing is not the bottleneck in writing software, why do we focus on shortcuts and touch typing? The idea is, while coding, to stay in the flow and work on a higher level of abstraction. For example, when I want to move a method, and I can do it with a few keystrokes, I keep the current thoughts in my mind. But if I have to navigate, modify blocks of text, change method invocations, and so on, I think in more low level concepts like text editing or file navigation, and this breaks my focus.

Hackathon
One of my recent Coding Dojos became a Hackathon. A misunderstanding with the sponsor, who cared for food during the event, caused this. The sponsor wanted to run some challenge to make people think outside of the box, and make them try something new and innovative. They had prepared an assignment in Hackathon style. A Hackathon is an event that brings together experts and creates a collaborative environment for solving a certain problem. I avoid Hackathon because the format is competitive and people try to go as fast as possible to create some prototype or try to prove some idea or deliver some working code. Going fast above all else, e.g. skipping preliminary design or tests due time pressure is the startup trap. Obviously I dislike this way of working. (I see value in Hackathon as a tool for innovation and maybe team building.)

First I was unsure how to proceed. On one hand I wanted to please the sponsor, paying real money for the crafters community is a real contribution and separates the "talking" from the "doing" companies. On the other hand I wanted to stay true to the Coding Dojo experience. I thought of Houssam and remembered that going fast is a skill, one we need and rarely practice. In fact it is a skill we developers are often asked to apply, e.g. when the deadline is coming up or marketing has some new urgent idea. Like the guy on the left in Houssam's video, working fast and clean at the same time is hard. I made it a mix: I ran a Coding Dojo where the challenge was to finish some basic code in a given time frame. All Coding Dojo rules applied, the goal of the evening was to learn, to collaborate and to be nice. Even under pressure I reminded the participants to write tests because "your best code does not help you if it is broken". I encouraged them to work in small batches as "the most clever algorithm loses if it is unfinished". As facilitator I focused on helping people to create small, working increments. It worked well and after 90 minutes most teams had a working solution and three of them won a price.

Speed (licensed CC BY by Gabriel)How to Practice Speed
Considering my own, personal practice, I go in the same direction. I am practising a lot on old and new exercises, in various programming languages, some I have much experience and some I just learned. Working these exercises, adding hard or even brutal constraints is boring. I even tried contradicting constraints like Tell Don't Ask and Immutability. To try something new I started working against the clock. Maybe I can be the guy on the left.

Roman Numerals
My colleagues tell me that I am very fast in perceiving, reading and understanding code, as well as typing with shortcuts. When working against the clock my first exercise was Roman Numerals. I chose a small exercise because I expected to retry it a lot. I set a timer to five minutes and worked on the kata. When the timer rang I stopped and analysed my progress. Of course I failed to finish but I kept trying. Some rarely used editing shortcuts would have helped, and I learned some more of them. I did use TDD and three to four tests were the most I was able to get, which was enough. The test for Roman I created the method arabicToRoman, II added a recursive call, VI introduced a lookup for the literal by its value and 388 introduced all the remaining literals. When using Ruby for the same exercise I thought I would be able to beat my Java time, because Ruby is more expressive and there is less code to write. Unfortunately I am not as familiar with Ruby as I am with Java and the missing code completion slowed me down. Currently I need five minutes for a fully working Arabic to Roman Numerals conversion including subtractive forms. I am able to create good names and small methods in the given time, but I am unable to create more elaborate structures for holding the Roman literals and their values. Java is just too clunky. In short: the code is good, but the design is bad.

What Next
Then I tried the Coin Changer kata and after several tries I was able to beat four minutes. Coin Changer and Roman Numerals are much alike and Coin Changer is simpler as it has no literals, the coin value is also the returned value. I knew both katas well, and the exercise was in fact only about my typing supported by tests to avoid mistakes. Maybe I should try a larger assignment. Running through many repetitions would be tedious. Maybe I should try unknown exercises. At least they would be unknown for the first run. I need a large number of little TDD exercises of similar size and complexity. Time to do some searching.

4 September 2009

Running JUnit in Parallel

Back in 2008 I had to speed up our daily build. (I should have posted about it since long, but I just didn't make it. Recently when I saw a related post on a similar topic my bad conscience overwhelmed me.) The first thing was to get a faster machine, something with four 3 GHz cores. It worked excellent! All file based operations like compile performed 3 times faster just out of the box, thanks to the included RAID 0+1 disk array. As our automated tests took half of the total build time, I dealt with them first: I applied the usual optimisations as told in my talk about practical JUnit testing, tuning section. So I managed to halve JUnit execution time.

ForkGood, but still not fast enough. The problem was how to utilise all the shiny new cores during one build to speed it up as much as possible. So test execution needed to run in parallel. Some commercial build servers promised to be able to spread build targets over several agents. Unfortunately I had no opportunity to check them out, they cost quite beyond my budget. The only free distributed JUnit runner I found was using ComputeFarm JINI in a research project which did not look mature enough for production usage. Worth mentioning is GridGain’s JunitSuiteAdapter. It's able to distribute JUnit tests across a cluster of nodes. GridGain is a free cloud implementation, it's really hot stuff. But it's not a build solution, so integrating it into the existing build would have been difficult.

As I did not find anything useful had to come up with a minimalist home grown solution. I started with a plain JUnit target junitSequential which ran all tests in sequence:
<target name="junitSequential">

<junit fork="yes" failureproperty="failed"
haltonfailure="false" forkmode="perBatch">

<classpath>
<fileset dir="${lib.dir}" includes="*.jar" />
<pathelement location="${classes.dir}" />
</classpath>

<batchtest>
<fileset dir="${classes.dir}"
includes="**/*Test.class" />
</batchtest>
</junit>

<fail message="JUnit test FAILED" if="failed" />

</target>
I used haltonfailure="false" to execute all tests regardless if some failded or not. Otherwise <batchtest> would stop after the first broken test. With failureproperty="failed" and <fail if="failed" /> the build still failed if necessary. There is nothing special here.

Ant is able to run tasks in parallel using the <parallel> tag. (See my related post about forking several Ant calls in parallel.) A parallel running target would look like
<target name="junitParallelIdea">
<parallel>
<antcall target="testSomeJUnit" />
<antcall target="testOtherJUnit" />
</parallel>
</target>
Good, but how to split the set of tests into Some and Other? My first idea was to separate them by their names, i.e. by the first letter of the test's class name, using the inclusion pattern **/${junit.letter}*Test.class in the <batchtest>'s fileset. So I got 26 groups of tests running in parallel.
<target name="junitParallelNamedGroups">
<parallel>

<antcall target="-junitForLetter">
<param name="junit.letter" value="A" />
</antcall>
<antcall target="-junitForLetter">
<param name="junit.letter" value="B" />
</antcall>
<antcall target="-junitForLetter">
<param name="junit.letter" value="C" />
</antcall>
<!-- continue with D until Z -->

</parallel>
</target>

<target name="-junitForLetter">

<junit fork="yes" forkmode="perBatch">

<!-- classpath as above -->

<batchtest>
<fileset dir="${classes.dir}"
includes="**/${junit.letter}*Test.class" />
</batchtest>
</junit>

</target>
forkmode="perBatch" created a new JVM for each group. Without forking each test class would get it's own class loader, filling up the perm space. Setting reloading="false" made things even worse. All those singletons started clashing even without considering race conditions. So I took the overhead of creating additional Java processes.

Streets of SplitUnfortunately the grouping by letter approach had some problems. First the number of threads needed to be specified with <parallel>'s threadsperprocessor or threadcount attribute, else there would be 26 parallel processes competing for four cores. My experiments showed that two threads per processor performed best for the given set of JUnit tests. (Those JUnit tests were not "strictly unit", some tests called the database or web services, freeing the CPU during blocking. For tests with very little IO it might have looked different.)

Also my haltonfailure approach did not work because <antcall> does not return any properties set inside the called -junitForLetter target. There was no Ant command that supported that. But AntCallBack of the Antelope Ant extensions was able to do the trick: After registering the custom task with name="antcallback" I replaced the plain <antcall>s with <antcallback target="..." return="failed">.

Separating JUnit test cases by their names produced unbalanced and therefore unpredictable results regarding overall execution time. Depending on naming conventions some groups would run much longer than others. Ant's Custom Selectors are a much better way to split a fileset into a given number of parts producing a few balanced filesets with roughly the same number of test classes.
import java.io.File;

import org.apache.tools.ant.BuildException;
import org.apache.tools.ant.types.Parameter;
import org.apache.tools.ant.types.selectors.BaseExtendSelector;

public class DividingSelector extends BaseExtendSelector {

private int counter;
/** Number of total parts to split. */
private int divisor;
/** Current part to accept. */
private int part;

public void setParameters(Parameter[] pParameters) {
super.setParameters(pParameters);
for (int j = 0; j < pParameters.length; j++) {
Parameter p = pParameters[j];
if ("divisor".equalsIgnoreCase(p.getName())) {
divisor = Integer.parseInt(p.getValue());
}
else if ("part".equalsIgnoreCase(p.getName())) {
part = Integer.parseInt(p.getValue());
}
else {
throw new BuildException("unknown " + p.getName());
}
}
}

public void verifySettings() {
super.verifySettings();
if (divisor <= 0 || part <= 0) {
throw new BuildException("part or divisor not set");
}
if (part > divisor) {
throw new BuildException("part must be <= divisor");
}
}

public boolean isSelected(File dir, String name, File path) {
counter = counter % divisor + 1;
return counter == part;
}
}
One of the four available cores was used for static code analysis, which was very CPU intensive and one was used for integration testing. The remaining two cores were dedicated to unit tests. Using 4 balanced groups of tests executing in parallel, the time spent for JUnit tests was halved again: Yippee
<target name="junitParallel4Groups">
<parallel threadcount="4">
<antcallback target="-junitForDivided" return="failed">
<param name="junit.division.total" value="4" />
<param name="junit.division.num" value="1" />
</antcallback>
<antcallback target="-junitForDivided" return="failed">
<param name="junit.division.total" value="4" />
<param name="junit.division.num" value="2" />
</antcallback>
<antcallback target="-junitForDivided" return="failed">
<param name="junit.division.total" value="4" />
<param name="junit.division.num" value="3" />
</antcallback>
<antcallback target="-junitForDivided" return="failed">
<param name="junit.division.total" value="4" />
<param name="junit.division.num" value="4" />
</antcallback>
</parallel>
<fail message="JUnit test FAILED" if="failed" />
</target>

<target name="-junitForDivided">

<junit fork="true" failureproperty="failed"
haltonfailure="false" forkmode="perBatch">

<!-- classpath as above -->

<batchtest>
<fileset dir="${classes.dir}">
<include name="**/*Test.class" />
<custom classname="DividingSelector" classpath="classes">
<param name="divisor" value="${junit.division.total}" />
<param name="part" value="${junit.division.num}" />
</custom>
</fileset>
</batchtest>

</junit>

</target>
(Download source code of DividingSelector.)

Epiloge
Using this approach I kept the option to execute the tests one after another with num=1 of total=1 providing an easy way to switch between normal and parallel execution. This was useful when debugging the build script...
<target name="junitSequential">
<antcallback target="-junitForDivided" return="failed">
<param name="junit.division.total" value="1" />
<param name="junit.division.num" value="1" />
</antcallback>
<fail message="JUnit test FAILED" if="failed" />
</target>

7 November 2008

Forking parallel subants from Ant

Ever wanted to do something in parallel in your Ant build file? Well there is the Parallel-Task, which should do the trick, shouldn't it? Unfortunately the Ant Manual says that

The primary use case for <parallel> is to run external programs such as an application server, and the JUnit or TestNG test suites at the same time.

And that's what all the examples are about - to start Tomcat in a separate thread. In my case I needed to fork several <subant>-calls, i.e. to fork several Java virtual machines with the same properties as the original one and wait till all of them were finished.

To start an independent Ant run, which means to fork a sibling Java process with the same Ant properties, you have to use the <java> task, which allows forking with the fork attribute:
<parallel>
<java classname="org.apache.tools.ant.Main"
fork="true" clonevm="true" dir="${basedir}"
resultproperty="sub_res_1" output="sub_1.log">
<arg value="-buildfile" />
<arg value="${basedir}\sub_1.xml" />
<arg value="target" />
<!-- add jvmarg sysproperty as needed -->
</java>
<java classname="org.apache.tools.ant.Main"
fork="true" clonevm="true" dir="${basedir}"
resultproperty="sub_res_2" output="sub_2.log">
<arg value="-buildfile" />
<arg value="${basedir}\sub_2.xml" />
</java>
<java classname="org.apache.tools.ant.Main"
fork="true" clonevm="true" dir="${basedir}"
resultproperty="sub_res_3" output="sub_3.log">
<arg value="-buildfile" />
<arg value="${basedir}\sub_3.xml" />
</java>
</parallel>
The output of these parallel Ant calls is collected and printed to the screen:
<concat>
<fileset dir="." id="spawn.logs">
<include name="sub_1.log" />
<include name="sub_2.log" />
<include name="sub_3.log" />
</fileset>
</concat>
<delete>
<fileset refid="spawn.logs" />
</delete>
Finally we have to determine if one of the <subant>s failed:
<condition property="spawn.error">
<isfailure code="${sub_res_1}" />
</condition>
<condition property="spawn.error">
<isfailure code="${sub_res_2}" />
</condition>
<condition property="spawn.error">
<isfailure code="${sub_res_3}" />
</condition>
<fail if="spawn.error" message="sub failed" />
That's it. Note that Ant versions below 1.7 do not support the <isfailure/> condition and you have to use
<not>
<equals arg1="0" arg2="${sub_res_1}" />
</not>
(Download Ant scripts.)

16 August 2008

JRuby Performance fails in Batch

You JRuby advocates will say it's obvious: I did not use Java 6, did not compile to byte code etc. You JRuby objectors will say it's obvious: JRuby is so sloooow. Probably the truth is somewhere in between and depends on the circumstances. So here is my little story about trying out JRuby in the enterprise...

We have some kind of ETL Loader that collects data into objects before inserting into the warehouse. Depending on the state of every particular instance some value is precalculated and stored to speed up the queries later. (DWH-data is usually not normalised.) Until now these calculations have been spread across the whole import and transform process, which was a real pain whenever something had to be changed. Recently we had to add some new calculation rules and we decided to make it proper: 1) Put all calculation rules in one place and 2) put them all in some kind of configuration file.

As the checks of the state vary considerably, we need a "powerful" configuration language. How about a dynamic language running on top of the JVM? (Which looks like the mainstream method to solve many problems nowadays ;-) In an XML file (boring mainstream again) we have a list of simple JRuby expressions.
<ACTION_REQUESTOR_EVAL>
<requestor active="true" type="SUPPRESSED">
<code>$action.suppressed</code>
</requestor>
<requestor active="true" type="SPECIAL_CASE">
<code>['GPDA', 'LIA'].include? $action.name</code>
</requestor>
<requestor active="true" type="OVERLOAD">
<code>$action.overload</code>
</requestor>
<!-- all others are normal -->
<requestor active="true" type="NORMAL">
<code>true</code>
</requestor>
</ACTION_REQUESTOR_EVAL>
A centralised component evaluates them in a Chain of Responsibility style.
BSFManager manager = pContext.getManager();
int line = 1;
manager.declareBean("action", pActionData, ActionData.class);
for (ActionRequestorEvaluation eval : getRequestors()) {
boolean isOfType = ((Boolean) manager.eval("ruby",
"action eval command " + eval.getType(), line++,
1, eval.getCode())).booleanValue();
if (isOfType) {
pActionData.setRequestorType(eval.getType());
break;
}
}
manager.undeclareBean("action");
After adding some unit tests we deployed the loader to the test system. Unfortunately this solution ran 3 times slower than before, taking 25 instead of 8 minutes for 100.000 entries, adding approx. 10 milliseconds per entry. Considering that in most cases 10 expressions are evaluated, that's only 1 millisecond per JRuby call, so it's not that bad. (The numbers were got using Java 5 with BSF 2.4.0 to access JRuby 1.1.3.)

The conclusion is that the straight forward use of JRuby in batch processing (i.e. when called millions of times) does not work. So we had to undo the changes and put the rules hard-coded into the code :-( Unfortunately we had only time to try the "plain" approach. Obviously there is room for improvement, I would like to hear about.