Code Cop

5 June 2015

Choose Your Development Services Wisely

URIs Should Not Change
Modern software development relies a great deal on the web. Our source code is hosted on GitHub, we download necessary libraries from Maven Central, Ruby Gems or NPM. We communicate and discuss using issue trackers and forums and our software is built on hosted Continuous Integration servers. We rely a lot on SaaS infrastructure. But the Internet is constantly in motion. Pages appear and vanish. New services get created, moved around and shut down again. When Tim Berners-Lee created the web, he wished for URIs that would not change, but unfortunately the reality is different.

I hate dead links. While usually dead links are not a big deal, you just use Internet search to find their new homes, I still find them annoying. It is impossible to update all dead links on my blog and in my notes, but at least I want the cross references of my own stuff to be correct. This means that I have to migrate and update something at least once a year. The recent shutdown of Google Code concerned me and I had a lot of extra work. This made me think.

Learning: Host as much of your content under your own control, e.g. your personal web space.

To deny the usage of today's powerful and often free services like GitHub would be foolish, but still I believe we (developers) should consider them dependencies and as dependencies they are also liabilities. We should try to reduce coupling and be aware of potential migration costs.

Learning: When choosing a SaaS service, think about its benefits versus the impact of it not being available any more.
Learning: When you pay for it, it is more stable.

Personal Development Process
The impact can be internal, which means it affects only you and your work, or it can be external, which means it affects others, your users or people using your code or documentation. For example let us consider CloudBees. I started using it in the early beta and it was great. Given time I moved all my private, kata and open source projects there and had them build on each commit. It was awesome. But last year they removed their free plan and I did not want to pay, so I stopped using it. (This is no criticism. CloudBees is a company and needs to make money and reduce cost.) The external impact was zero as the Jenkins instance was private. The internal impact seemed huge. I had lost my CI. I looked for alternatives like Travis, but was too lazy to configure it for all my projects. Then I used the Jenkins Job Import Plugin to copy my jobs to a local instance and got it sort of running. (I had to patch the plugin, wasting hours...) Still I needed to touch every project configuration and in the end I abandoned CI. In reality the private impact was also low as I am not actively developing any Open Source right now and I am not working on real commercial software where CI is a must. Now I just run my local builds more often. It was cool to use CloudBees, but I can live without it.

Learning: Feel free to use third party SaaS for convenience, i.e. for anything that you can live without easily.

Information Sharing
Another example is about written information, the Hackergarten wiki. When I started Hackergarten Vienna in 2011, I collected material how to run it and put it into the stub wiki someone had created for Hackergarten. I did not think about it and just used the existing wiki at Wikispaces. It seemed the right place. There were a few changes by other users, but not many changes at all. Two years later Wikispaces removed their free plan. Do you see a pattern? The internal impact was zero, the but external impact was high, as I wanted to keep the information about running a Hackergarten available to other hackers. Still I did not want to spend 50$ to keep my three pages alive. Fortunately Wikispaces offered a raw download of your wiki pages. I used this accessible copy of my work and converted the wiki pages into blog pages in no time. As I changed the pages rarely the extra overhead of working in HTML versus Creole was acceptable. Of course I had to update several links, a few blog posts and two slide-decks, sigh. (And it increased my dependency to Google Blogger.)

Learning: When choosing a SaaS service, check its ways of migrating. Avoid lock-in.
Learning: Use static pages for data that rarely changes.

Code Repositories
Moving code repositories is always a pain. My JavaClass Ruby Gem started out on Rubyforge, later I moved it to Google Code. With Google Code shutting down I had to migrate it again, together with seven other projects. The raw code and history were no problem, hg convert dealt with that. But there were a lot of small things to take care of. For example, different version control system used a different ignore syntax. The Google Code project description was proprietary and needed to be copied manually. The same was true for wiki pages, issues and downloads.

I had to change many incoming links. First URL to change was the source repository location in all migrated projects' descriptors, e.g. Maven's pom.xml, Ruby's gemspec, Node's package.json and so on. Next were the links to and from project wiki pages and finally I updated many blog posts and several slide-decks. And all the project documentation, e.g. Maven sites or RDoc API pages needed to be re-generated to reflect the new locations. While this would be no big deal for a single project, it was a lot of work for all of them. I full-text-searched my hard-disc for obsolete URLs and kept finding them again and again.

Maybe I should not cross link my stuff that much, and I am not even sure I do link that much at all. But instead of putting the GitHub URL of the code kata we will be working on in a Coding Dojo directly into the slides, I could just write down the URL on a flip-chart at the beginning of the dojo. The information about the kata seems to be more stable than the location of the source code. Also I might use the same slides working on code in different languages, which might be stored in different repositories. But on the other hand, if I bother to write documentation and I reference something related, I expect it to be linked for fast navigation. That is the essence of hyper-text, isn't it?

Learning: (maybe) Do not cross link too much.
Learning: (maybe) Do not link from stable resources to less stable ones.

Generated Artefacts
Next to source code I had to migrate generated artefacts like my Maven repository. I had used a Google Code feature that a repository was accessible in raw mode. I would just push to my Maven repository repository (recursion yeah ;-) and the newly released artefacts would show up. That was very convenient. Unfortunately Bitbucket could not do that. I had a look into Bitbucket pages, but really did not feel like changing the layout of the repository. I was getting tired of all this. In the end I just uploaded the whole thing to my public web space. Static web pages and binary files, e.g. compressed archives, can be hosted on any web server and I should have put them there in the very beginning. Again I had to update site locations, repository URLs and incoming links in several projects and blog posts. As I updated my parent Pom I had to release new versions of several projects. I started to hate hyper-links.

Learning: Host static data on regular (personal) web spaces.

You might argue that Maven Central would be a better place for Maven artefacts and I totally agree. I consider Maven Central much more stable than my personal web space, but I did not bother to go through the process of getting access to a service that would mirror my releases to Central. Anyway, this mirroring service, like Sonatype's, feels less stable than Central itself.

Learning: Host your stuff on the most stable option available.

Now all my repositories are hosted on Bitbucket. If its services stop working some day, and they surely will stop somewhere in the future, I will stop using hosted repositories for my projects. I will not migrate everything again. I am done. Update January 2020: Bitbucket stops supporting Mercurial this year. I am done. :-(

Learning: (maybe) Do not bother with dead links or losing your stuff. Who cares?

CNAMEs
For some time Bitbucket offered a CNAME feature that allowed you to associate a domain or sub-domain with an account. That was really nice, instead of bitbucket.org/pkofler/project-x I used hg.code-cop.org/project-x. I liked it, or so I thought. Of course Bitbucket decided to disable this feature this July and I ended - again - updating URLs everywhere. While the change was minimal, path and parameters of URLs stayed the same, I had to touch almost every source repository to change its public repository location, fix 20+ blog posts and update several Coding Dojo slide-decks, which in turn needed to be uploaded to SlideShare again.

Learning: Only use the minimal, strictly necessary features of a SaaS.
Learning: Even when offered, turn off all features you do not need, e.g. wiki, issues, etc.
Learning: Do not use anything just because it is handy or "cool". It increases your coupling.

URLs Change All The Time
Maybe I should not link too much in this post as eventually I will have to change all the links again, grrrr. I consider using a URL service like bitly. Maybe not exactly like bitly because its purpose is the shortening and marketing aspect of links and I do not see a way to change the actual link once it was created. And I would depend on another service, which eventually would go away. So I need to host the service myself, like J. B. Rainsberger does. I like the idea of updating all my links with a single update statement in the link database. I wished I had used such a thing. It would increase work when creating links, but would reduce work when changing them. Like with code, it seems that my links are far more often changed than created, at least some of them.

I could not find any free or otherwise service providing this functionality. So I would have to create my own. Also I do not know the impact of the extra redirect on search engines and other automatic consumers of my content. And still some manual changes are inevitable. If the repository location moves, the SCM settings need to be changed. So I will just wait until the next feature or whole service is discontinued and I have to start over.

Thanks to Thomas Sundberg for proof-reading this article.

29 May 2015

How to Organise Your Code Katas

If you read my blog you know that I like code katas. I did my first one back in 2004 after reading Kent Beck's Test Driven Development book. I was learning Ruby and Kent had recommended creating an xUnit implementation as an exercise to get to know a new language. I did not know it was a kata, I just developed my personal RUnit following the TDD principles.

Over the years I did similar exercises and started to perform formal katas somewhere in 2009/2010. I used them for personal practise and for demo, e.g. to show students how TDD could look like as part of my QA guest lecture. At my first Code Retreat I noticed that pairing on a code kata gave me more insights, so I looked for people who would spend some time with me practising, usually remotely.

Tip: Keep the source of your katas (or at least a record)
I do keep the code of my katas. Even at Code Retreats where you are supposed to delete your code, I manage to recover the source from local history or version control. When I use online tools like Cyber-Dojo, I write down the session id and come back later to recover the code. (I know I am a bad person. ;-)

At the time of writing, I have worked on more than 230 code katas resulting in almost 300 sessions of personal or paired practise. (In remote katas we sometimes tackle larger problems and then work more than one session on it.) I have a lot of kata sources and they are all over my hard-disc. I managed to collect some of them in dedicated repositories, one for each programming language, but many are in their own projects, or even mixed up with other code in early "learning" repositories. It is a mess. (Probably it does not matter, but it is a mess nevertheless.)

Tip: Collect all your katas in one place
In her article about code katas, Iris Classon gave some practical advice regarding katas, e.g. to collect them all in one place, to be able to compare solutions. I really liked the idea and recently I found some time to collect my katas in one place - or so I thought.

Tip: Name your code katas consistently, even across languages
I faced some problems. First not all my katas followed the same naming conventions, e.g. prime_factors and primefactors were good enough for me, but not for a unified collection. Most of my katas were in version control and I did not want to rename or move large numbers of files, because I would lose history. (Again history did not matter for katas, especially as I never looked at it - but I was not able to drop practises I followed every day.)

Tip: Put your katas in kata-only repositories
I managed to extract katas out of mixed repositories using hg convert --filemap with a filemap of simple inclusions, preserving the whole history. I also fixed some inconsistent kata folders with a filemap of renames. But hg convert created new, unrelated repositories and I had to drop and recreate some of my repos.

My vision was to place the same katas regardless of language near one another, so I would be able to count and compare them. But I did not find a way to do that with all the different languages and keeping the code working. How would I combine Java, NodeJS, Ruby and Scala sources in a consistent way, other than separating them by source folder, which they already were. I failed to merge all my katas and looked for alternatives.

In the end I created a little script that would search my hard-disc for katas, normalise their names and collect them in a single place. I grouped the sources first by kata name and then by date. The programming language did not seem that important for a combined collection and ended up after the date, resulting in the name pattern [name of kata]/[ISO date] [- optional comment] ([programming language]). The whole collection looked like:

All Katas
|-- bankocr
|-- bowlinggame
    |-- 20120924 (Java)
        `-- BowlingTest.java
    |-- 20130307 (Scala)
        `-- BowlingGameSuite.scala
    |-- 20130414 (JavaScript)
        `-- BowlingGameSpec.js
    `-- ...
|-- fizzbuzz
`-- ...

Tip: Compare your katas
In the beginning I believed that the code of the kata, the final product, did not matter, that only the process of getting there, the practise, was important. That was the reason why people published kata-casts, because the final code did not represent much. Later I discovered that looking at code of a problem that I knew well, and reading comments regarding that code, also had value. So I searched the web for source code and recordings of the katas I had done. I planned to look at them and to compare them with my solutions and to learn from them even more. Of course I never had the time to do this.

But now I can see all exercises I ever did at one glance. Now I have the opportunity to compare my solutions across time and languages. This is the perfect time to include my bookmarks of katas as well. This is going to be interesting because I never had a second look at my katas before.

20 April 2015

Maven Integration Tests in Extra Source Folder

On one of my current projects we want to separate the fast unit tests from the slow running integration and acceptance tests. Using Maven this is not possible out of the box because Maven only supports two source folders, main and test. How can we add another source and resource folder, e.g. it (for integration test)? Let's assume a project layout like that:

project
|-- pom.xml
`-- src
    |-- main
        `-- java
    |-- test
        `-- java
            `-- UnitTest.java
    `-- it
        `-- java
            `-- IntegrationIT.java

We use the regular Maven Surefire to execute our unit tests, i.e. all the tests in the src/test/java folder. The plugin definition in the pom.xml is as expected.

        <plugin>
            <!-- run the regular tests -->
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-surefire-plugin</artifactId>
            <version>2.18</version>
        </plugin>

And we use Maven Failsafe to execute the integration tests. If you do not know Failsafe, it is much like the Surefire plugin, but with different defaults and usually runs during the integration test phase.

        <plugin>
            <!-- run the integration tests -->
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-failsafe-plugin</artifactId>
            <version>2.18.1</version>
            <executions>
                <execution>
                    <goals>
                        <goal>integration-test</goal>
                        <goal>verify</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>

By default the Failsafe plugin does not look for class names ending with Test but ending with IT, making it possible to mix both kind of tests in the same src/test/java folder, but we do not want that. The "trick" to have another source folder for integration tests is to use the Build Helper Maven Plugin and add it as test source.

        <plugin>
            <groupId>org.codehaus.mojo</groupId>
            <artifactId>build-helper-maven-plugin</artifactId>
            <version>1.9.1</version>
            <executions>
                <execution>
                    <id>add-integration-test-source-as-test-sources</id>
                    <phase>generate-test-sources</phase>
                    <goals>
                        <goal>add-test-source</goal>
                    </goals>
                    <configuration>
                        <sources>
                            <source>src/it/java</source>
                        </sources>
                    </configuration>
                </execution>
            </executions>
        </plugin>

Now src/it/java is added as test source as well, as seen during Maven execution. After the compile phase Maven logs

[INFO] [build-helper:add-test-source {execution: add-integration-test-source-as-test-sources}]
[INFO] Test Source directory: .\src\it\java added.

There is still only one test source for Maven but at least we have two folders in the file system. All test sources get merged and during the test compile phase we see

[INFO] [compiler:testCompile {execution: default-testCompile}]
[INFO] Compiling 2 source files to .\target\test-classes

showing that all classes in both src/test/java and src/it/java are compiled at the same time. This is important to know because class names must not clash and tests still need different naming conventions like *Test and *IT. Now mvn test will only execute fast unit tests and can be rerun many times during development.

[INFO] [surefire:test {execution: default-test}]
[INFO] Surefire report directory: .\target\surefire-reports

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running UnitTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.01 sec - in UnitTest

Results :

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

Only if we want, using mvn verify, the integration tests are executed.

[INFO] [failsafe:integration-test {execution: default}]
[INFO] Failsafe report directory: .\target\failsafe-reports

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running IntegrationIT
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec - in IntegrationIT

Results :

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

[INFO] [failsafe:verify {execution: default}]
[INFO] Failsafe report directory: .\target\failsafe-reports

Now we can run the slow tests before we check in the code or after major changes a few times during the day.