Showing posts with label Functional Programming. Show all posts
Showing posts with label Functional Programming. Show all posts

Monday, 14 June 2010

Project Euler 59: Cracking an XOR code

Code breaking is about as glamorous as it gets for a Geek. So when it comes to Project Euler Problem 59, think 24, with Jack Bauer unscathed amidst a hail of bullets, memory stick in one hand, terrorist’s neck throttled in the other, threatening terrible retribution to the bad guy’s family if the secret goes to the grave with him. Terrorist rasps out,

“There’s a file … cipher1.txt … ASCII format … urrrh … XOR … 3 … letter … password … lower case … all English”

What would Chloe O’Brian make of that when she received the file over a secure socket from Jack’s Smart Phone? Being the bright girl she is, I’ve no doubt that she’d fire up MonoDevelop (only the dumb FBI would use Visual Studio and Windoze) and write the following:

var encryptedBytes = File.ReadAllText(dataPath)
                        .Split(',')
                        .Select(byte.Parse)
                        .ToArray();

That would get her the encrypted data from the file as a sequence of bytes. Now to generate all possible passwords:

const byte FirstChar = (byte)'a';
const byte LastChar = (byte)'z';

var allPossiblePasswords = from a in FirstChar.To(LastChar)
                           from b in FirstChar.To(LastChar)
                           from c in FirstChar.To(LastChar)
                           select new[] { a, b, c };

XOR encryption requires that the password be repeated cyclically, abcabcabc.... So given a sequence containing the characters of password, how about a Repeat extension method that repeats the sequence ad infinitum? (if you look at the following code and hear the Infinite Loop alarms go off, remember that iterators are lazily evaluated):

public static IEnumerable<T> Repeat<T>(this IEnumerable<T> sequence)
{
    while (true)
    {
        foreach (var item in sequence)
        {
            yield return item;
        }
    }
}

While she’s at it, she’ll need to Xor two sequences together:

public static IEnumerable<byte> Xor(this IEnumerable<byte> sequenceA, IEnumerable<byte> sequenceB)
{
    return sequenceA.Zip(sequenceB, (left, right) => (byte)(left ^ right));
}

This uses the Zip method, new in .Net 4.0, which merges two sequences together, allowing elements from left and right sequences to be processed in pairs (if you’re surprised, as I was, that the (byte) cast is necessary, see this question on Stack Overflow).

Now she can try out all the passwords, seeing what they give her if she uses them to decrypt the text. Note that Chloe has already adopted .Net 4.0 and the PLINQ extensions, so she’s using AsParallel() to max out her 128 Core workstation. (ConvertAsciiBytesToString is another extension method that simply delegates to ASCIIEncoding.GetString).

var possibleDecryptions = from trialPassword in allPossiblePasswords.AsParallel()
                          let repeatedPassword = trialPassword.Repeat()
                          select encryptedBytes
                                          .Xor(repeatedPassword)
                                          .ConvertAsciiBytesToString();

But she’s hardly going to spend the next 24 hours trawling through the possible decryptions to pick out the one in recognisable English from the 17575 in gobledygook. So she needs some means of assessing them automatically. This is where Chloe is forever in my debt. She read last weeks episode of Functional Fun, so she knows about my LanguageAnalyser that uses frequency analysis to determine how likely a sample of text is to be written in a particular language.

Rather than using the analyser to determine which of several languages the sample is written in, Chloe can use her prior knowledge that the coded message is written in English, and determine which of the possible decryptions has letter frequencies that most closely match those in English.

var languageAnalyser = new LanguageAnalyser();

var decryptionsWithDistanceToEnglish = from decryptedString in possibleDecryptions
                                       select new
                                                  {
                                                      DecryptedString = decryptedString,
                                                      Distance = languageAnalyser.DetermineCloseness(decryptedString, LanguageProfiles.English)
                                                  };

var mostLikelyDecryption = decryptionsWithDistanceToEnglish.MinItem(p => p.Distance);

And that's it. Thanks to Linguistic Geometry and LINQ, Chloe can report the decrypted (and surprisingly biblical!) message to Jack, so that he can get on with saving the world.

Ahem. Before we got carried away we were trying to solve Project Euler Problem 59, and that asks for the sum of the Ascii values of the decrypted message. So we need one more step:

var sumOfAsciiValues = mostLikelyDecryption.DecryptedString
                        .ConvertStringToAsciiBytes()
                        .Aggregate(0, (runningTotal, next) => runningTotal += next);

The full code is shown below, and is also available on MSDN Code Gallery.

namespace ProjectEuler
{
    [EulerProblem(59, Title="Using a brute force attack, can you decrypt the cipher using XOR encryption?")]
    public class Problem59
    {
        public long Solve()
        {
            const byte FirstChar = 97;
            const byte LastChar = 122;

            var dataPath = @"ProblemData\Problem59.txt";

            var encryptedBytes = File.ReadAllText(dataPath)
                                    .Split(',')
                                    .Select(byte.Parse)
                                    .ToArray();

            var allPossiblePasswords = from a in FirstChar.To(LastChar)
                                       from b in FirstChar.To(LastChar)
                                       from c in FirstChar.To(LastChar)
                                       select new[] { a, b, c };

            var possibleDecryptions = from trialPassword in allPossiblePasswords.AsParallel()
                                      let repeatedPassword = trialPassword.Repeat()
                                      select encryptedBytes
                                          .Xor(repeatedPassword)
                                          .ConvertAsciiBytesToString();

            var languageAnalyser = new LanguageAnalyser();

            var decryptionsWithDistanceToEnglish = from decryptedString in possibleDecryptions
                                                   select new
                                                              {
                                                                  DecryptedString = decryptedString,
                                                                  Distance = languageAnalyser.DetermineCloseness(decryptedString, LanguageProfiles.English)
                                                              };

            var mostLikelyDecryption = decryptionsWithDistanceToEnglish.MinItem(p => p.Distance);

            var sumOfAsciiValues = mostLikelyDecryption.DecryptedString
                .ConvertStringToAsciiBytes()
                .Aggregate(0, (runningTotal, next) => runningTotal += next);

            return sumOfAsciiValues;
        }
    }
}

Thursday, 10 June 2010

Practical Linq #5: Guessing the language of a piece of text

Google Chrome is magic: getting ready for my summer holiday to Brugge, I hit the city’s homepage. A little bar popped up at the top of the page: “This page is in Dutch. Would you like to translate it?”

image

How does Chrome know that? I suppose the domain name “.be” is one clue, but they speak French as well as Dutch in Belgium, so that by itself is not enough. Chrome does the same trick for pages written in a whole raft of languages, from Afrikaans to Yiddish.

I don’t know what Google’s secret sauce is, but I can show you one simple and surprisingly effective technique with an elegant implementation using LINQ in C#.

Studies in Linguistic Geometry

It is built around the observation that in each sufficiently large sample of text in a given language, letters of the alphabet will all occur with roughly the same frequencies. In English for example, the letter ‘e’ tops the popularity charts, making up about 13% of the average body of text. ‘t’ and ‘a’ are next, accounting for 9% and 8%. By contrast, in Portuguese, ‘a’ is the most frequent, at 15%, with ‘e’ and ‘o’ close behind.

So letter frequencies can be used as a kind of signature, or DNA profile, for a language. But given a sample of text, how do we decide which language signature it matches most closely? This is where we switch from linguistics to geometry.

In 2-dimensional space, we calculate the distance between two points using Pythagoras’ theorem. image The same thing works for two points in the real, 3-d, world. The distance from (x1, y1, z1) to (x2, y2, z2) is

image

But why stop there? What about the distance between two points in 26-dimensional space? You guessed it!

image

26-d space? Why bring that into the equation? Well think about those language signatures, giving the frequency of occurrence of each letter. You can reinterpret those signatures as points in 26-d space: the point representing the signature for English would be at 0.13 along the ‘e’ axis, 0.09 along the ‘t’ axis, 0.08 along the ‘a’ axis, and so on.

This then gives us our straightforward method of determining which language our sample text belongs to. We do a frequency analysis on its letters, then imagine the result of that analysis as a point in 26-d space, and work out which language signature it lies closest to using Pythagoras. Easy!

So how does it look in LINQ?

Linq to Linguistics

We start off with the frequency analysis (taking care to normalise the text to lower-case):

IEnumerable<CharacterFrequency> CalculateCharacterFrequencies(string sample)
{
    var characterFrequencies = sample
        .Select(char.ToLower)
        .GroupBy(c => c)
        .Select(group => new CharacterFrequency
        {
            Character = group.Key,
            Frequency = (double)group.Count() / sample.Length
        });

    return characterFrequencies;
}

Next we introduce a LanguageSignature class to hold the signature of each language:

[DebuggerDisplay("Language = { Language }")]
public class LanguageSignature
{
    private IDictionary<char, double> _frequencies;

    public LanguageSignature(string language, IDictionary<char, double> characterFrequencies)
    {
        Language = language;
        _frequencies = characterFrequencies;
    }

    public string Language { get; protected set; }

    public double GetFrequency(char character)
    {
        double frequency;
        return _frequencies.TryGetValue(character, out frequency) ? frequency : 0;
    }
}

Here are some snippets of sample signatures, data courtesy of Wikipedia

public static class LanguageSignatures
{
    public static LanguageSignature English =
        new LanguageSignature(
            "English",
            new Dictionary<char, double>
                {
                    {'a', 0.08167},
                    {'b', 0.01492},
                    {'c', 0.02782},
                    /// etc.
                });

    public static LanguageSignature French =
        new LanguageSignature(
            "French",
            new Dictionary<char, double>
                {
                    {'a', 0.07636},
                    {'b', 0.00901},
                    // etc.
                });

    
}

Now we do our sums in 26-d space, ignoring any letters that we don’t have frequency data for (note that Sqrt is a little extension method I created on the Double class that simply calls Math.Sqrt):

double CalculateDistanceFromSignature(LanguageSignature signature, IEnumerable<CharacterFrequency> characterFrequencies)
{
    var distance = characterFrequencies
        .Where(characterFrequency => signature.GetFrequency(characterFrequency.Character) > 0)
        .Select(characterFrequency
            => Math.Pow(characterFrequency.Frequency - signature.GetFrequency(characterFrequency.Character), 2))
        .Sum()
        .Sqrt();
    return distance;
}

Now we’re ready to determine which language our sample is closest to:

public LanguageSignature DetermineMostLikelyLanguage(string sample, IEnumerable<LanguageSignature> signatures)
{
    var characterFrequencies = CalculateCharacterFrequencies(sample);

    var closestLanguage = signatures.Select(signature => new
    {
        Language = signature,
        Distance = CalculateDistanceFromSignature(signature, characterFrequencies)
    })
    .MinItem(languageDistance => languageDistance.Distance);

    return closestLanguage.Language;
}

MinItem is an extension method I knocked up that returns the item from a sequence that yields the smallest value from the expression you supply (in this case, the smallest distance):

public static T MinItem<T, TCompare>(this IEnumerable<T> sequence, Func<T, TCompare> comparatorSelector) where TCompare : IComparable<TCompare>
{
    var minItem = sequence.Aggregate(
        sequence.First(), 
        (current, min) => comparatorSelector(current).CompareTo(comparatorSelector(min)) < 0 ? current : min);

    return minItem;
}

And we’re done.

Testing, Testing, Un, deux, trois

To prove it, I created a little test that analysed samples of texts in several languages taken from Project Gutenberg (the test uses MbUnit’s very handy XmlData feature to read the different test cases from my xml file).

public class LanguageAnalyserTests
{
    [Test]
    [XmlData("//Sample", FilePath = "ProblemData\\LanguageSamples.xml")]
    public void TestAnalysis([Bind("Text")]string text, [Bind("@Language")]string expectedLanguage)
    {
        var analyser = new LanguageAnalyser();
        var determinedLanguage = analyser.DetermineMostLikelyLanguage(
            text
            new[] { LanguageSignatures.English, LanguageSignatures.French, LanguageSignatures.German, LanguageSignatures.Dutch, LanguageSignatures.Spanish, LanguageSignatures.Italian });

        Assert.AreEqual(expectedLanguage, determinedLanguage.Language);
    }
}

In my test I’ve got 9 samples in German, English, French, Dutch, Spanish and Italian, extracted at random from the Gutenberg archives and each sample is correctly identified by my analyser. I also did a little experiment to see how much text it needed to correctly distinguish the samples. Limiting the samples to 600 characters caused some confusion between Spanish and English, but 700 characters was sufficient in my tests to enable the analyser to pick the right language for all samples.

A Challenge

I actually developed this LanguageAnalyser to help me solve a problem on Project Euler. Which one was it? Answers in the comments please, or by email, and bonus marks to anybody who can guess what my solution looks like. Check back Monday for the answer, and for full source code to today’s post.

Update: The answer's here, and the source code is now on MSDN Code Gallery

Saturday, 23 January 2010

Simulating Snakes and Ladders with a PLINQ Turbo boost

“Daddy, will you play a game with me?”

I’d been left in charge, whilst my wife had some well-earned time to herself. Baby brother had been bounced to sleep, and now big sister was wanting some attention.

“Sure! What shall we play?”

My daughter fished around in the cupboard, then thumped a box on the table. My heart sank. “Snakes and Ladders”, she announced.

Snakes and Ladders

For the benefit of any reader who has the good fortune never to have thus occupied their children, Snakes and Ladders is a game played on 10 x 10 board wherein players take it in turns to move across the board boustrophedonically (that is, left to right then right to left – I’ve been itching to work that word into a blog post), on each go moving the number of spaces indicated by the throw of a die. The winner is the first person to reach the last space on the board. The only modicum of excitement to be found in the game is provided by the eponymous snakes and ladders, which connect certain squares on the board. Land on a square at the foot of a ladder, and you zoom ahead to the square at its top. But beware the squares where snakes lie in wait: land here, and you must slide back down the board to the snake’s tail.

You see then, why I wasn’t thrilled at my daughter’s choice: I don’t expect much from a children’s game, but it is nice to be sure that it will finish before one’s brain goes into power-save mode. And no mathematician would be prepared to give such a guarantee about this game: it is perfectly possible to find yourself within a throw of winning, but the next moment unceremoniously snaking your way back to the beginning, to start all over again.

So as I fixed my grin in place and started rolling the die, I got to wondering: how long till we can play something else? How many turns does a game of Snakes and Ladders last, on average? By the time I’d slid down my first snake, I knew how I could find out: I would set my computer playing snakes and ladders, over and over, and instruct it to count how many turns each game took. Apply a little high-school Maths and Stats to the results, and I’d have my answer.

I will confess the game board received little of my attention from then on, as code began to assemble in the editor in my mind (which has syntax highlighting, but is lacking Intellisense - CommonSense too, my wife would tell you). When I realised that I could use PLINQ to parallelise the code and speed up the simulation, my fingers began itching to get at the keyboard.

Here’s how the code turned out, once I fired up Visual Studio.

Modelling the Game

First, a fluent interface for defining the game board:

var board = new BoardBuilder()
            .Length(100)
            .Ladder().From(1).To(38)
            .Ladder().From(4).To(14)
            .Ladder().From(9).To(31)
            .Snake().From(16).To(6)
            .Ladder().From(21).To(42)
            .Ladder().From(28).To(84)
            .Ladder().From(36).To(44)
            .Snake().From(47).To(26)
            .Snake().From(49).To(11)
   //...
   .Build()

Then the game logic. The GameBoard built by the BoardBuilder consists of a number of Places. Each Place has a reference to the Place following it on the board, and if it is at the foot of a ladder or the head of a snake then it has a reference to the Place at the other end of the link. Most of the logic of the game simulation is contained in the MoveOn method. This method is defined recursively. It is passed the numberOfPlaces that the player should move, and it returns a reference to the Place they end up at, taking into account any snakes or ladders leaving the final square. For simplicity, I’ve decided that players cannot move beyond the last space on the board – they simply get stuck there, whatever they throw.

    class Place
    {
        private Place _ladderTo;
        private Place _chuteTo;
        private Place _nextPlace;

  // ...

        public Place MoveOn(int numberOfPlaces)
        {
            Contract.Requires(numberOfPlaces >= 0, "numberOfPlaces must be positive");

            if (numberOfPlaces > 0)
            {
                if (IsLastSpace)
                {
                    return this;
                }
                else
                {
                    return _nextPlace.MoveOn(numberOfPlaces - 1);
                }
            }

            if (_ladderTo != null)
            {
                return _ladderTo;
            }
            else if (_chuteTo != null)
            {
                return _chuteTo;
            }
            else
            {
                return this;
            }
        }

        // ...
    }
}

Running the Simulation

At last, the simulation:

var gameLengths
    = 1.To(numberOfTrials)
    .Select(trial =>
        {
            var die = new Die();
            return board.
           FirstPlace
           .Unfold(place => place.MoveOn(die.Roll()))
           .TakeWhile(place => !place.IsLastPlace)
           .Count();
        })
    .ToList();

Not much to look at, is it? I start by generating a simple sequence, 1.To(numberOfTrials), and for each item in that sequence I instruct the computer to play a LINQified game of Snakes and ladders, and to count how many moves it takes to get to the end of the board. Each game starts by creating a new Die (would be handy if we could do that in real life – we’re always losing ours). Then, using the Unfold method, seeded with the first place on the board, we generate a sequence of all the spaces landed on during that game. We pull items out of this sequence until we hit the last place on the board, at which point we count how many moves the game took.

Analysing the Results

Of course, we want to do some analysis on the results. LINQ makes it trivial to find the average length of a game:

var averageLength = games.Average();

This turns out to be 35.8 moves.

More interesting is the frequency distribution of game lengths which tells us how often we can expect to play games of a given length.

var frequencyDistribution =
    gameLengths
   .GroupBy(gameLength => gameLength)
   .OrderBy(gameLengthGroup => gameLengthGroup.Key)
   .Select(gameLengthGroup => gameLengthGroup.Key + "," + gameLengthGroup.Count() + "," + gameLengthGroup.Count() / (double)numberOfTrials)
   .ToList();
System.IO.File.WriteAllLines("SnakesAndLadderResult.csv", frequencyDistribution);

gameLengths is the sequence containing the length of each simulated game. This query is counting how many games of each length were encountered in the simulation by grouping together all games of the same length. After ordering the groups, shortest game length first, it creates a string for each group, listing the length, the absolute number of games of that length and the frequency of that length of game. File.WriteAllLines is a neat method that (from .Net 4.0 onwards) takes an IEnumerable<string> and writes each string as a new line to a file, giving us an instant CSV file from which Excel can draw a pretty chart: SnakesAndLadders_FrequenciesOfGameLengths

As you can see at a glance, the shortest game I can hope for takes 7 moves, but that’s pretty unlikely to happen – about once in a thousand games. The most common length of game encountered in the simulation was 19. Now let’s transform this into a cumulative frequency chart (which shows how often games take a given number of moves or fewer): imageThis shows that in at least half of all games I’ll need to throw the dice 28 times or more before getting to the end. Worse, there’s a 10% chance that it’ll be 68 moves or more before I beat those slippery snakes.

The PLINQ Part

Where does PLINQ come into all this? And what is PLINQ anyway? PLINQ is an extension to LINQ, part of the Parallel Tasks library that is shipping as part of .Net 4.0. You know how you splashed out for that quad-core, hyper-threading enabled processor and geeked out over all those little processor utilisation charts in Task Manager – then noticed that your CPU rarely hits more than 12.5% total utilisation? Well PLINQ helps you put those lazy cores to work. MultiCoreTaskManagerWatch closely:

var gamesLengths
    = 1.To(numberOfTrials)
    .AsParallel()
    .Select(trial =>
        {
            var die = new Die();
            return board.
        FirstPlace
        .Unfold(place => place.MoveOn(die.Roll()))
        .TakeWhile(place => !place.IsLastPlace)
        .Count();
        })
    .ToList();

Spot it? Check out line 3. AsParallel(), that’s all you need to add. It takes a standard Linq-to-Objects query, and splits the work across multiple threads. On my bog-standard dual-core home laptop, that change took the time to run 1 million iterations down from 58 seconds to 36. That’s a 1.5x speed-up – by changing one line of code! And the best thing is, the speed-up increases as the number of cores in your machine goes up. On my Core i7 Quad core box at work, the simulation took 7 seconds in its parallelised form – and 9 seconds without the AsParallel magic! Still a 1.3x speed up – but overshadowed by the sheer awesomeness of the Nehalem architecture!

I should point out one subtlety here. You might have been wondering why it was necessary to create a new Die for every game? Why not share a die between games, like you do in the real world? The reason is the multiple threads introduced by the AsParallel() call. The Roll() method on Die calls through to Random.Next() under the covers, and the Next method isn’t thread-safe.

The easiest fix then, is to give each game its own Die. But there’s another thing: with all those threads beavering away simultaneously there’s a good chance that several Die will be created at the same time, each initializing their own instance of Random at the same time. Since Random by default seeds itself from the system clock, this would result in identical games being played – not exactly representative of the real world.

So what I do is have a static instance of Random which I use to seed all the others, taking care to take a lock on it of course:

class Die
{
    private Random _random;
    private static Random _seedGenerator = new Random();

    public Die()
    {
        lock (_seedGenerator)
        {
            _random = new Random(_seedGenerator.Value.Next());
        }
    }

    public int Roll()
    {
        return _random.Next(1, 7);
    }
}

Try it for yourself

As always the complete source code is available for download from MSDN Code Gallery. Try it out, and let me know what you think.

Saturday, 10 October 2009

Practical LINQ #4: Finding a descendant in a tree

Trees are everywhere. I don’t mean the green, woody variety. I mean the peculiar ones invented by programmers that have their root somewhere in the clouds and branch downwards. As inevitably as apples fall from trees on the heads of meditating physicists, so coders find themselves writing code to traverse tree structures.

In an earlier post we explored the tree created by the UI Automation API representing all the widgets on the Windows desktop. One very common tree-traversal operation is finding a particular child several nodes down from a root starting point. I gave the example of locating the Paint.Net drawing canvas within the main window of the application:

// find the Paint.Net drawing Canvas
 var canvas = mainWindow.FindDescendentByIdPath(new[] {
     "appWorkspace",
       "workspacePanel",
         "DocumentView",
           "panel",
             "surfaceBox" });

What we have here is an extension method, FindDescendentByIdPath, that starts from a parent element and works its way down through the tree, at each level picking out the child with the given Automation Id. In my last post, I skipped over my implementation of this method, but it deserves a closer look because of the way it uses Functional techniques and LINQ to traverse the hierarchy.

So here it is:

public static AutomationElement FindDescendentByIdPath(this AutomationElement element, IEnumerable<string> idPath)
{
    var conditionPath = CreateConditionPathForPropertyValues(AutomationElement.AutomationIdProperty, idPath.Cast<object>());

    return FindDescendentByConditionPath(element, conditionPath);
}

public static IEnumerable<Condition> CreateConditionPathForPropertyValues(AutomationProperty property, IEnumerable<object> values)
{
    var conditions = values.Select(value => new PropertyCondition(property, value));

    return conditions.Cast<Condition>();
}

public static AutomationElement FindDescendentByConditionPath(this AutomationElement element, IEnumerable<Condition> conditionPath)
{
    if (!conditionPath.Any())
    {
        return element;
    }

    var result = conditionPath.Aggregate(
        element,
        (parentElement, nextCondition) => parentElement == null
                                              ? null
                                              : parentElement.FindChildByCondition(nextCondition));

    return result;
}

The first thing we do is convert the sequence of Automation Id strings into a sequence of Conditions (actually PropertyConditions) that the UI Automation API can use to pick out children of parent elements. This is handled for us by the CreateConditionPathForPropertyValues method in line 8.

The actual method of interest is FindDescendentByConditionPath, starting in line 15. Here we put the Enumerable.Aggregate method to a slightly unconventional use. Considered in the abstract, the Aggregate method takes elements from a sequence one by one and performs an function (of your choice) to combine the element with the previous result; the very first element is combined with a seed value.

In this case the seed value is the parent element at the top of tree, the sequence of elements is the list of conditions that we use to pick out the child at each level of the tree. And we provide a lambda function that, each time it is called, takes the element it found in the previous iteration, together with the next condition from the list and uses an extension method that I demonstrated in the earlier blog post, FindChildByCondition, to find the appropriate child.

I’ve found this method a great help when monkeying around in UI Automation trees. If you think you might, look for it in the the source code from last time.

Friday, 1 May 2009

Practical LINQ #3: Compacting ranges of numbers

Here’s a problem that has popped up several times in one guise or another during the six years of my career:

Given a list of numbers, present it to the user in its most compact form.

So if you have 1, 3, 4, 5, 9, 11, 12, 13 it should be presented as 1, 3 – 5, 9, 11 – 13. It’s what your brain does automatically when you fill out the “Selected Pages” box in the Print dialog of Microsoft Word.

I’m sure you can imagine for yourself the kind of for-loop I crafted the last time I solved this problem. This time I wanted to make life more interesting.

The first thing is to make sure the numbers are marshalled in ascending order, with no duplicates in the ranks:

var preparedList = numbers.Distinct().OrderBy(x => x);

For the next step, I imagined into being a little Scanner class, with a nice fluent interface. I wrote down the following code, which was how I wanted the Scanner to work; then, with Resharper pointing out my omissions in glaring red text, I implemented the concepts to make it work:

public IEnumerable<Range> Compact(IEnumerable<int> numbers)
{
    var preparedList = numbers.Distinct().OrderBy(x => x);

    var scanner = new Scanner<int, Range>();

    scanner.ConfigurePattern()
        .StartMatchingWhenAny()
        .ContinueMatchingWhile((nextItem, matchedItems) => nextItem == matchedItems.Last() + 1)
        .Output(details => new Range(
                                    details.MatchedItems.First(),
                                    details.MatchedItems.Last()));

    return scanner.Scan(preparedList);
}

The idea is for the Scanner to walk a sequence of objects (ints in this case), producing from it a sequence of tokens (here, Range objects – simple structs with First and Last properties). The Scanner produces its output by matching patterns of contiguous input elements. The patterns are configured using a fluent interface:

  • the condition that must be met in order to start matching this pattern (any integer is accepted at the start of this simple pattern)
  • which items are considered part of the pattern: here an integer is accepted if it is only one bigger than the previous matched integer
  • how to generate a token from the matched part of the sequence

The internal workings of the scanner aren’t terribly interesting. See the download link below if you’re desperate for a look.

Now don’t laugh: String.Join is my discovery of the week. It concatenates an array of strings, putting a separator of your choice between them; leaving you free to fry bigger fish than how to put a comma after every item except the last.

So my final answer looks like this:

var numbers = new[] {1, 3, 4, 5, 9, 11, 12, 13};

var compacter = new RangeCompacter();
var ranges = compacter.Compact(numbers);

// create a pretty string, each range or number separated by a comma
var display = string.Join(
    ", ", 
    ranges
    .Select(range => range.ToString())
    .ToArray());

Assert.AreEqual("1, 3 - 5, 9, 11 - 13", display);

I’ve put all the code, including my Scanner class, on Code Gallery; there’s even a sprinkling of unit tests! If you find any other uses for it, do let me know.

Friday, 27 March 2009

Project Euler 21: Getting friendly with the Amicable numbers

I’ve remarked before that to Mathematicians, numbers aren’t, … well, they’re not just numbers. Numbers have personalities. Whilst some are deficient, there are others which are aspiring, and a few who are actually perfect;  half of them are rather odd, some downright weird. Of more concern are the odious and evil numbers, especially those that consider themselves untouchable1.

But today we’ll look at a more friendly subject: the amicable numbers. What is an amicable number? Let’s get mathematical and define a function d(n) to be the sum of all proper divisors of n (a proper divisor of n is any divisor of n less than n). Then an amicable number a is one which has a friend b where d(a) = b and d(b) = a (note that we don’t allow a to be narcissistic and count itself as a friend). If you do the math (or these days, more realistically, the google search) you’ll find that the first pair of amicable numbers is 220 and 284.

Amicable numbers have been studied since Greek times: the Pythagoreans in particular credited these special numbers with mystical powers. They’ve even by tried out as a kind of mathematical aphrodisiac:

El Madshriti, an Arab of the 11th century, experimented with the erotic effects of amicable numbers by giving a beautiful woman the smaller number 220 to eat in the form of a cookie, and himself eating the larger 284!

My sudden interest in amicable numbers was driven by strictly mathematical desire  - to solve Project Euler 212. The problem simply asks us to find the sum of all amicable numbers less than 10000. And in fact, the solution to the problem pretty much falls out of its definition.

First, I’ll borrow a function from a few problems back that lists all the divisors of a number:

public static class NumericExtensions
{
    public static IEnumerable<int> Divisors(this int number)
    {
        return (from factor in 1.To((int)Math.Sqrt(number))
                where number.IsDivisibleBy(factor)
                select new [] { factor, number / factor }).Concat();
    }

}

With that, the answer is straightforward:

[EulerProblem(21, Title = "Evaluate the sum of all amicable pairs under 10000.")]
public class Problem21
{
    public long Solve()
    {
        Func<int, int> d = n => n.Divisors().Sum() - n;

        return 	(
				from a in 1.To(10000)
                let b = d(a)
                where a != b && d(b) == a
                select a
				)
                .Sum();
    }
}

As always, full code is available on the Project Euler Code Gallery page.

Footnotes
  1. If you want to learn more about number personalities, see this catalogue.
  2. Thanks to Bela Istok who sent me a nice functional solution to the problem, which prompted me to think about how I would solve it myself.

Wednesday, 4 March 2009

Practical LINQ #2: Reporting duplicate names

So my Widget editor can helpfully suggest default names, but it also lets users change the names of widgets. Ohh! Danger ahead: what if they give two (or more) widgets the same name? We can’t have that! The poor dears might confuse themselves. So we better make sure we warn them; that means we have to write code to detect duplicate names. And that gives me another opportunity to show some elegant LINQy C# for solving this real-life problem.

My requirements are simple: I want to generate an error for all widgets with a shared name, except for the first one with that name – no point in overburdening a user with error messages. I also want to ignore Widgets without a name – I’ve got another rule that deals with them.

The code is straightforward: group the widgets by name, and ignore any groups containing just one widget. Then for each group generate error messages for all items except the first. Translating that into proper English:

private static IEnumerable<TMessage> DetectDuplicates<T,TMessage>(IEnumerable<T> instances, Func<T, string> nameSelector, Func<T, TMessage> messageGenerator)
{
   var groupedDuplicates = instances
       .GroupBy(nameSelector)
       .Where(group => group.Count() > 1 && !string.IsNullOrEmpty(group.Key));

   var errors = groupedDuplicates
       .SelectMany(group => group.
                                Skip(1)
                                .Select(messageGenerator));

   return errors;
}

All the angle brackets in the method signature can mean only one thing: I’ve made the method generic, so it doesn’t just work with my Widgets; it will work with anything that has a property vaguely resembling a name. Along with your list of instances, you pass in a function that can extract a name from each of your objects, and a function that will generate the appropriate message for each duplicate. In fact it doesn’t just have to a message: you could return a rich error object if you wanted to be really sophisticated.

Here’s an example1:

static void Main()
{
   var widgets = new[]
                     {
                         new Widget {Name = "Super"},
                         new Widget {Name = "Competent"},
                         new Widget {Name = "Competent"}
                     };

   var errors = DetectDuplicates(
       widgets,
       w => w.Name,
       w => "There's already a widget called " + w.Name + ". Be more original!");

   foreach (var error in errors)
   {
       Console.WriteLine(error);
   }

   Console.ReadLine();
}

Footnotes

  1. The name of one of the widgets here was inspired by a brand of AEG oven that I spotted the other day: they named them “Competence”. I thought brand names were supposed to be aspirational!

Thursday, 19 February 2009

Practical LINQ: Generating the next default name

Pleasurable as it is, constructing LINQ queries to solve abstruse Mathematical problems, a developer’s real purpose in life, as his boss reminds him often enough, is creating business value. Happily I’ve found that that can be done using LINQ too.

Here’s an example that came up recently.

I was writing a Widget editor dialog box with functionality to create new Widgets. Being the helpful soul I am, I wanted to give the new widgets default names, “Widget1”, “Widget2” and so on (I can be helpful or imaginative, but not both!). But the names also needed to be unique: if “Widget1” and “Widget2” were already in the list, the next new widget should be called “Widget3”.

This is the method I came up with:

public static class NamingExtensions
{
    public static string GetNextDefaultName(this IEnumerable<string> existingNames, string defaultNamePrefix)
    {
        if (existingNames == null)
        {
            throw new ArgumentNullException("existingNames");
        }

        defaultNamePrefix = defaultNamePrefix ?? string.Empty;
        string matchNumberInDefaultNamePattern = "^" + defaultNamePrefix + @"(\d+)$";

        const RegexOptions regexOptions = RegexOptions.IgnoreCase | RegexOptions.Singleline;

        int nextNumber = existingNames.Select(name => Regex.Match(name, matchNumberInDefaultNamePattern, regexOptions) )
            .Where(match => match.Success)
            .Select(match => match.Groups[1].Value)
            .Select(matchValue => int.Parse(matchValue, CultureInfo.InvariantCulture))
            .Aggregate(0, Math.Max, max => max + 1);

        return defaultNamePrefix + nextNumber.ToString(CultureInfo.InvariantCulture);
    }
}

Since I wrote it as an extension method, I can use it like this:

var existingNames = new[] { "Widget1", "Widget2" };
var nextName = existingNames.GetNextDefaultName("Widget");

You might wonder why, in line 19, I use Aggregate instead of a straightforward Max()? That’s because Max() throws an exception if you try using it on an empty sequence, which might happen if none of the existing names match the “Widget[n]” pattern. Also, Aggregate has the nice overload that allows you to apply a result selector function to the final aggregate: as you see, I take advantage of this to return the next number in the sequence, rather than the max itself.

Isn’t LINQ lovely?

Friday, 6 February 2009

functional => prizes #1: We have a winner

TrophyThe closing date of the the inaugural Functional Fun programming competition has passed, and I’m relieved to say I actually had some judging to do. Further to my post on Monday, two bits are now required to hold the number of submissions I have to choose between - though not used to full capacity.

The Winner

Our winner is Jamie, a High School Student from Massachusetts, whose code surpasses in elegance my attempt at solving my own problem. Jamie gets the prize of £25 in Amazon gift vouchers. Honourable mention goes to Dan Dumitru who was the first to submit a solution.

Congratulations to Jamie, and thanks to all 10 of you who entered!

The Solution

The puzzle that Jamie and Dan solved was as follows;

Which Mathematicians have found themselves prime positions in the grid below?

HiddenMathematiciansImage

So who were the coy Mathematicians, and how did they conceal themselves? Well, given the maths connection, “prime positions” must surely have something to do with prime numbers? So what happens if we count along the cells of the grid (wrapping over from the end of one row to the beginning of the following row) and note down the characters we find at indices which are prime-numbered?

Interesting! We get the string

tinyurl#teyuir#gqrtyh#hyirtq##

Teyuir? What theorem did he prove? must be Slovakian with a name like that. Now tinyurl – that rings a bell. Hey! I wonder what tinyurl.com/ teyuir looks like?

WikipediaRestApiResult

Huh! XML?

But wait: Leonhard Euler. There’s a name I recognise. Didn’t he dabble in mathematics, and prove a lemma or two?

And having reasoned thus, all that remains is to craft a handful of LINQ queries to winkle all three mathematicians out of the hiding places: Euler, Euclid and Cantor.

Here’s Jamie’s solution in its entirety. The code speaks for itself.

// Gets all the nonwhitespace characters
var rawChars = from ch in File.ReadAllText("HiddenMathematicians.txt") 
               where !char.IsWhiteSpace(ch)
               select ch;

// Gets only the chars at prime indices (char[0] is at index 1)
var primeChars = from num in Enumerable.Range(2, rawChars.Count() - 1) // creates all indices from 1 to numRawChars
                 let divisors = Enumerable.Range(2, num - 2) // all possible divisors, 2 to num - 1
                 let isPrime = !divisors.Any(div => num % div == 0)
                 where isPrime
                 select rawChars.ElementAt(num - 1);

// All the words, including the beginning hint
var words = from str in primeChars.Aggregate(string.Empty, (str, ch) => str += ch).Split('#')
            where !string.IsNullOrEmpty(str)
            select str;

var names = from word in words.Skip(1)
            let url = "http://" + words.First() + ".com/" + word
            let xmlData = XDocument.Load(url)
            let element = xmlData.XPathSelectElement("./api/query/pages/page")
            let attribute = element.Attribute("title")
            select attribute.Value;

foreach (var name in names)
    Console.WriteLine(name);

Console.ReadLine();

Tuesday, 20 January 2009

functional => prizes #1

Ladies and Gentlemen, roll up, roll up for the first ever Functional Fun programming competition, where Functional => Prizes.

Many moons ago, at the Microsoft PDC, I won a copy of Windows Vista Ultimate; said software being superfluous to my requirements, I held an auction, and I now have a pot of dosh which I intend to distribute via this blog. And what better way to do that than to hold competitions? Hopefully the stash will stretch to cover a couple throughout the course of the year, so stay subscribed.

Almost a year ago, I launched this blog with a series of posts on solving Project Euler. This competition gives me a chance to turn the tables and try my hand at problem setting.

Thus without further ado:

The Puzzle

Which Mathematicians have found themselves prime positions in the grid below?

HiddenMathematiciansImage

You’ll probably want to download this grid in text form.

The Rules

  1. All solutions must be emailed to (samuel [dot] jack [at] gmail [dot] com) before Midday (GMT) February 4th 2009.
  2. Don’t just give the answer (I’ve already got a pretty good idea what it might be!): what I want to see is code - especially code that goes all the way from input to final answer.
  3. The prize will be awarded, not to the first correct answer, but to the solution which, in the opinion of the judge, is the most elegant.
  4. Competitors should be aware that C#, particularly in its Version 3 and 4 incarnations, tops the judge’s elegance rankings; this is not to dissuade them from making entries in other languages – merely to let them know that if they choose to enter using an inferior alternative language they’ll have to work harder to make the elegance apparent.
  5. The prize will be £25 in Amazon vouchers – but if you are lucky enough to win, and unfortunate enough to live in one of those foreign places (you know, the kind that get rid of their Head of State every few years, and have to choose a new one) then I’ll do my best to accommodate you with a prize in your local currency.
  6. If you submit a solution, but think you can better it, don’t hesitate to send another.

Your Choice

Now what are you going to do?

A. Keep quiet about the competition in the hope of increasing your chances of winning?

B. Tell as many of your closest friends as possible in order to increase the publicity and your standing in the community when you win the Inaugural Functional => Prizes challenge?

Tuesday, 6 January 2009

Project Euler 89: Converting to and from Roman Numerals

It would not surprise me at all to learn that the Roman Numeral system was invented by a Stone Mason with one eye on his retirement fund. As he charged per character chiseled he could expect a substantial fee whenever called upon to inscribe numbers; dates on tombstones were even better, because on the whole he could expect the length of the inscription and thus the fee to increase as the years progressed.

Everybody knows how the Roman Numeral system works, though the only time we usually get to show off our skills is when the end-credits roll on the the big or small screen and the date of film or programme production flashes up. It consists of 7 symbols, each standing for a value:

Numeral Value
I 1
V 5
X 10
L 50
C 100
D 500
M 1000

A number in Roman Numerals is just a string of these numerals written in descending order (e.g. M's first, followed by D's, etc.). To convert from Roman Numerals to decimal we just work through the string, adding up the values of the numerals. So MMVIIII represents 2 x 1000 + 1 x 5 + 4 x 1 = 2009.

The only complication is the subtractive rule. This rule makes it possible for some numbers to be written in a more compact form (I'm sure archaeologists will one day unearth a document recording the protests made by the Mason's Union on its introduction!). It allows a numeral of a lower value to be placed before one of a higher value to indicate that the lesser should be subtracted from the greater. Thus IV means 5 - 1 = 4, and XC = 100 - 10 = 90. Unfortunately, it's not quite that simple. According to PPI MMDCL1, only I, X and C can be used subtractively, and then only in front of numerals up to ten times as big. So I can only be used in front of V and X, X in front of L and C; and C in front of D and M.

Converting Numerals to Decimal Programmatically

It's the subtractive rule that stumped me at first when I tried to figure out an algorithm to parse roman numerals. It certainly makes things complicated if you think about parsing the string from left to right. But as soon as you start thinking backwards, that is working from right to left, the difficulty evaporates:

  1. Split the string into characters
  2. Convert each character into the value it represents
  3. Start from the right - the lowest value numeral
  4. Keep a running total, and a record of the maximum numeral encountered so far
  5. Take characters one by one:
    1. If character is greater than the maximum, add it to the running total and update the maximum
    2. If character is less than the maximum, subtract it from the running total

This converts straight-forwardly into LINQ:

public static int ConvertToDecimal(this string numerals)
{
    var result = numerals
        .ToCharArray()
        .Reverse()
        .Select(c => _numeralToNumberLookup[c.ToString()])
        .Aggregate(
            new { MaxValue = 0, RunningTotal = 0 },
            (state, item) => new
            {
                MaxValue = Math.Max(state.MaxValue, item),
                RunningTotal = item >= state.MaxValue ? state.RunningTotal + item 
                                                               : state.RunningTotal - item
            }, 
            aggregate => aggregate.RunningTotal);

    return result;
}

You'll note that I'm using an anonymous type in my call to Aggregate to allow me to record both pieces of state. Notice that I have to seed the call to Aggregate with an empty instance of the anonymous type (line 8). The C# compiler makes this possible by ensuring that any anonymous type that has properties with the same name and type and in the same order as another anonymous type actually become the same type. This overload of Aggregate also requires a lambda expression that extracts the overall result from the final aggregate object.

Converting Numbers to Numerals (with added recurserators!)

So that's Parsing taken care of. What about turning a number into numerals?

The algorithm isn't much more difficult, though the implementation I've chosen is a little unusual. The algorithm is recursive. We start off with the number we want to convert, and a stack containing the values of all the numerals that are available to us, greatest on the top. This stack includes the values of "composite numerals" - pairs of numerals that follow the subtractive rule, for example, 4 (IV), 9 (IX) and 40 (XL). In each iteration we peek at the numeral value on the top of the stack and test to see whether the number is divisible by it. If it is we've got us some numerals - we produce part of the final numeral string by looking up the appropriate numeral(s) and repeating them as many times as the number is divisible by the numeral value. If not, we recurse anyway. To recurse, we pop the top numeral off the stack, and pass the reduced list of numerals through to the next iteration, along with the remainder of the number after dividing by the numeral value. We terminate the recursion when we are asked to convert number 0 - which can't be done in roman numerals. If you understood all that, you have permission to skip the next paragraph; otherwise read on for a worked run-through of the algorithm.

For example, suppose we want to convert 2051. We start with the complete list of numerals: { 1000, 900, 500, 400, ... 50, 40, 10, 9, 1}. Peek at the first numeral value, and discover that it's 1000. 2051 is 2 x 1000, with remainder 51. Looking up 1000 in our numeral dictionary yields "M", so the first part of the numeral string is "MM". We're done with that iteration, so we remove 1000 from our list of numerals, and pass the smaller list, and the remainder 51 to the next iteration. The next few numeral values {900, 500, etc.} don't divide into 51, so we keep popping numerals, passing remainders and recursing till we get to a numeral list starting {50, 40, ...}.  51 is divisible by 50 (so we deduce that the next part of the numeral string is "L"). We then pop, pass and recurse until the number 1 is given for conversion, and the numeral list just contains {1}. At this point we add numeral "I" to the result, recursing hits the terminating condition, so we stop.

Now for the implementation. I could have used a straight-forward recursive function: I did this with my solution to Problem 31 (Counting Coin Combinations) which had a similar algorithm. But my Lazy Trampoline hasn't seen much action recently, so I decided to give it a bounce. The idea of the Lazy Trampoline is to create a recursive function that is able to yield a result after each iteration - a recurserator, you could call it.

It looks like this2:

public static string ConvertToNumerals(this int number)
{
    Func<int, IStack<int>, IEnumerable<string>> numeralsEnumerator = Trampoline.MakeLazyTrampoline((int numberToConvert, IStack<int> availableNumerals) =>
    {
        if (numberToConvert == 0)
        {
            return Trampoline.YieldBreak<int, IStack<int>, string>();
        }

        int currentNumeral = availableNumerals.Peek();

        int quotient = numberToConvert / currentNumeral;
        int remainder = numberToConvert % currentNumeral;

        string numberAsNumerals = _numberToNumeralLookup[currentNumeral].Repeat(quotient);

        return Trampoline.YieldAndRecurse<int, IStack<int>, string>(
            numberAsNumerals, 
            remainder, availableNumerals.Pop());
    }
    );

    var result = numeralsEnumerator(number, _availableNumerals)
             .Aggregate(
                new StringBuilder(),
                (sb, numerals) => sb.Append(numerals),
                sb => sb.ToString());

    return result;
}

The call to MakeLazyTrampoline (line 3) generates an iterator (numeralsEnumerator). Each item in the sequence generated by the iterator is a string consisting of one type of numeral. These are all combined by the query starting line 23 to create the complete numeral.

Within the recurserator, note that I return special values generated by Trampoline.YieldBreak and Trampoline.YieldAndRecurse. These control the behaviour of the recurserator. YieldBreak indicates that the recursion should stop now; YieldAndRecurse indicates (as its first parameter) which value should be returned in the sequence, and (in the second and third parameters) which values should be passed through to the next iteration.

It may have slipped under you radar, but in Line 15 I'm making a call to the Repeat method which doesn't exist on the String class. Here's the missing extension method definition:

public static string Repeat(this string value, int count)
{
    if (count == 0)
    {
        return string.Empty;
    }
    
    if (count == 1)
    {
        return value;
    }

    return Enumerable.Repeat(value, count)
        .Aggregate(new StringBuilder(), 
        (aggregate, item) => aggregate.Append(item), 
        aggregate => aggregate.ToString());
}

Solving Project Euler Problem 89

Project Euler Problem 89 gives us a file containing a load of numbers in Roman Numeral form, and asks us to calculate how many characters can be saved if the numbers are compacted using the subtractive rule. Given the two extension methods we've just implemented, the solution is a simple pair of LINQ queries:

[EulerProblem(89, Title = "Develop a method to express Roman numerals in minimal form.")]
class Problem89
{
    public long Solve()
    {
        var lines = File.ReadAllLines("ProblemData\\Problem89Data.txt");

        var numberOfCharacters = lines.Select(line => line.Length).Sum();

        var minimisedNumberOfCharacters = lines
            .Select(line => line.ConvertToDecimal())
            .Select(number => number.ConvertToNumerals())
            .Select(numerals => numerals.Length)
            .Sum();

        return numberOfCharacters - minimisedNumberOfCharacters;
    }
}

The full code is available, as always, on the MSDN Code Gallery page. Hope you like it Saevar.

Footnotes
  1. PPI - Prex pro Ineo. ie Request For Comment! Thanks, InterTran!
  2. Thanks again to Eric Lippert for his Immutable Stack code.

Friday, 21 November 2008

Null Reference checking, functional style

I love Extension methods. I only wish I could pull one out of my computer, so that I could give it a great big hug. Anyway ... That little outburst was prompted by the way that extension methods helped me out with an elegant solution when I needed to do a null reference check on a variable.

We've all written code like

variable = SomeMethodThatMightReturnNull();

if (variable == null) 
{
	throw new ExceptionIndicatingTheNullness();
}

which is alright, but don't you think there are too many curly braces there? You don't? Well, I suppose there are only two, but wouldn't it be more elegant if there were none?

I've written before about a nice feature of extension methods: because they are effectively static methods you can call one on a null reference, and it doesn't blow up. Instead it just passes the null value through into the method.

So we can create this:

public static class FluentMethodExtensions
{
    public static T EnsureNotNull<T>(this T instance, Func<Exception> exceptionBuilder) where T : class
    {
        if (instance == null)
        {
            throw exceptionBuilder();
        }
        return instance;
    }
}

(Note the where constraint on the method (line 3): we need that so that we can do the null comparison in line 5.)

With that in place we can zap those curly braces:

variable = SomeMethodThatMightReturnNull().EnsureNotNull(() => new ExceptionIndicatingTheNullness());

How about that then?

Of course, the usual reason for throwing your own custom exception is so that you can include helpful details about the problem. In that case you might want to create some overloads to EnsureNotNull. How about this one, intended to allow you to pass through parameters so that you can build them into a message using string.Format:

public static T EnsureNotNull<T>(this T instance, Func<object[], Exception> exceptionBuilder, params object[] exceptionMessageParameters) where T : class
{
    if (instance == null)
    {
        throw exceptionBuilder(exceptionMessageParameters);
    }

    return instance;
}

You use it like so:

// ...
Range range = GetRange(workbook, rangeName)
                .EnsureNotNull(BuildInvalidRangeNameException, rangeName);
// ...

private ExcelException BuildInvalidRangeNameException(object[] exceptionParameters)
{
    return new ExcelException(string.Format("Named range {0} does not exist.", exceptionParameters));
}

You can create your own. Go on. Try it. It's fun.

Thursday, 16 October 2008

Project Euler Code Clearance

Roll up! Roll up! Get them while they're hot, don't leave them 'till they're not. Get your genuine, authentic Functional Fun Project Euler solutions here. Bargain prices, can't beat 'em. Visual Studio solution thrown in free.

I've been pretty busy over the last week or so, and it looks like I'll continue to be busy until I depart for LA at the end of next week. And you know what that means: .Net 4, C# 4, Oslo, Cloud Computing, Windows 7. That gang of acronyms and code names should keep me in blog posts for a good long while. Meanwhile, I have several Project Euler solutions languishing in the basement of my hard-disk (fourth platter, third sector, last block on the right, I think), and I can't see myself getting the time to do them justice in individual blog posts.

So today I'm going to do them an injustice and launch them into the world with barely a scant mention each. As usual, you can get the code from MSDN Code Gallery.

Problem 20: Summing the digits of a Factorial

Problem 20 gave me a chance to reuse my code for summing large numbers again. That's because calculating a factorial can be done iteratively. Imagine the sequence 1!, 2!, 3!, etc. You get the (n + 1)th term by multiplying the nth term by itself n + 1 times (that's just the definition of factorial). And the multiplication boils down to adding up n + 1 copies of the nth term. Wrap that simple idea inside an application of Unfold, and you have the solution.

Problem 22: Number crunching names

Problem 22 is just a file processing problem. Nothing more to it than that. It has a nice, one line solution with LINQ though:

return File.ReadAllText(problemDataFileName)
           .Split(new[] {','})
           .Select(name => name.Trim(new[] {'\"'}))
           .OrderBy(name => name)
           .Select((name, index) => (index + 1) * name.ToCharArray()
													  .Select(letter => (letter - 'A') +1)
                                                      .Sum())
           .Sum();

Problem 25: Big Fibonacci numbers

In Problem 25, Euler wants to know the first term in the Fibonacci sequence that has 1000 digits, which is of course his way of getting us to find a way of computing with big numbers. No problem to us though: we've got the Unfold method to calculate the Fibonacci sequence - and did I mention the code I wrote before that we can use to sum the big numbers we'll find?

Problem 31: Counting Coin Combinations

I'm quite pleased with my solution to Problem 31, wherein Euler asks us to count the number of ways in which £2 can be given using the British coinage system. I came up with a recursive algorithm that starts with the value of change you wish to give and a list of available coin types. Then it removes the biggest coin from the list, works out how many times that coin could be used, and for each possibility calculates the number of combinations for the reduced amount, and the reduced coin set by calling itself recursively.

For example, if we needed to make £1 using 10p and 2p, and 1p coins, we'd start by seeing that we could use between zero and ten 10p coins, so there are eleven possibilities we need to investigate. For each possibility we calculate how much remains once we've put down the appropriate number of 10 pences, then use the same algorithm again, but considering just 2p and 1p coins.

Problem 36: Binary and Decimal Palindromes

We last encountered numeric palindromes in Problem 4: they're numbers that read the same in both directions. In Problem 4 we were only interested in numbers that are palindromic when written in base 10. Problem 36 asks us to count the numbers that are palindromic when written in binary and in decimal. The most interesting part about this problem was finding a number's representation in binary. I could probably have used the BitArray class to do this, but instead chose to use this LINQ query:

private static IEnumerable<bool> GetNumberBits(uint number)
{
    if (number == 0)
    {
        return new[] {false};
    }

    // iterate through the bit positions, checking whether that
    // bit of the the number is set;
    // do it in reverse, so that we can ignore leading zeros
    return 31.To(0)
        .Select(bit => (number & (1 << bit)) == (1 << bit))
        .SkipWhile(bit => !bit);
}

Problem 112: Finding the density of Bouncy numbers

Mathematicians don't try to hide their obsession with numbers, do they? They make it plain as day that they count numbers amongst their closest friends, by giving them all names. It's the Bouncy numbers that are the heroes of Problem 112. These are numbers that, considering their digits as a sequence have a neither increasing nor decreasing sequence. My solution to this problem puts the Aggregate and LazilyAggregate methods to a rather unconventional use.

Friday, 5 September 2008

Project Euler Problem 19: A Century of Sundays

There's a hard way and an easy way to solve Problem 19 in our ongoing struggle with Project Euler, where we're asked to count how many Sundays fell on the first of the month during the twentieth century - being software developers and thus virtuously lazy by nature we wouldn't consider the tedious way, getting out all those old diaries (wherein entries invariably finish around January 5th) and actually counting whilst flicking over the pages.

The hard way is to use the information given in the problem definition and devise an algorithm for calculating days of the week that each date falls on. The easy way uses with gratitude the brainpower the .Net Framework developers invested in their work. It's Friday: I'll take the easy way.

Thanks to C#3.0 and LINQ, the actual code to the solution is a one-expressioner:

[EulerProblem(19, Title = "How many Sundays fell on the first of the month during the twentieth century?")]
public class Problem19
{
    public long Solve()
    {
        DateTime endDate = new DateTime(2000, 12, 31);
        DateTime startDate = new DateTime(1901, 1, 1);

        return startDate
            .Unfold(date => date.AddMonths(1))
            .TakeWhile(date => date <= endDate)
            .Count(date => date.DayOfWeek == DayOfWeek.Sunday);
    }
}

If it's Friday for you as well then I'll let you read the following explanatory notes: otherwise, work it out for yourself!

Remember my Unfold extension method? The one that takes a seed value (startDate in this case) and generates a sequence by repeatedly applying an expression to previous value in the sequence to create the next. That combined with the TakeWhile method represent the functional-programming equivalent of the for loop. The Count overload that I'm using here counts every item in a sequence matching a particular criterion. So the code I have above is equivalent to this:

DateTime endDate = new DateTime(2000, 12, 31);
DateTime startDate = new DateTime(1901, 1, 1);

int count = 0;
for (DateTime date = startDate; date < endDate; date = date.AddMonths(1))
{
    if (date.DayOfWeek == DayOfWeek.Sunday)
    {
        count++;
    } 
}

return count;

Thursday, 28 August 2008

Project Euler Problems 18 and 67: Finding the Maximal Route through a Triangle

If this doesn't pique the interest of my fellow developers then I don't know what will: a problem that will take twenty billion years to solve (at an extremely optimistic estimate), unless we find an efficient algorithm. The problem in question is Project Euler Problem 67. We're given a triangle made up of one hundred rows of numbers, and asked to consider all routes from the top to the bottom, made by moving from one row to the next via adjacent cells. Over all such routes, we have to find the maximum sum. Can't picture it? Let me help:

TriangleRoute

For the route that I've drawn, the sum is 5 + 15 + 29 + 45 = [train your brain and work it out yourself!].

If we only wanted to tackle the baby brother of this problem (Problem 18 - the same problem for a triangle with just 15 rows), we could get away with checking each of the routes. But it's not feasible to do that here because there are 299 possibilities: hence the need for an efficient algorithm. Any ideas?

You could take a bath and wait for the Eureka moment. Or if you want to avoid the ensuing streaking, read on.

An algorithm

Suppose that for each cell in a row we know the maximum sum over all routes ending at that cell. From that it's easy to work out the same for each of the cells in the next row. Here's how you do it. If you've got a cell with value c, somewhere in the middle of the row, then you can only reach it from either of the two cells adjacent to it in the row above. 

CalculatingCellMaxima

We're assuming that we already know the maximum sum for routes ending at these cells: suppose it is a and b respectively. So the biggest possible sum we can get for routes ending at c is Max(c + a, c + b). For the two cells at either end of the row the calculation is even simpler: they can only be reached from the cell at the appropriate end of the row above them, so we just sum the cell value with the known maximum sum to the cell above. In this way, we can crunch through the rows, 'flattening' the triangle until we reach the last row, at which point we will know the maximum possible sum over all routes ending at each of the cells in that row.

That's the middle of an algorithm. To solve the overall problem we just need to add a beginning and an end. The middle part starts by assuming that we already know the maximum sum for all routes ending at a particular row. So to get the algorithm rolling we need a row for which we do have that information, and the first row of the triangle is the obvious candidate: all routes start there, so the maximum up to that point is just the value of the cell in that row. How about the end of the algorithm? To solve the problem, we need to know the maximum sum from top to bottom through the triangle; we've calculated the maximum sum ending at each of the cells in the last row, so we just need to take the maximum over all these cells.

Some code

To code this, we'll reach once again into our Functional Programming toolbox. I think the Aggregate method is just the hammer to crack this particular nut. Do you see how?

Ignore the input part for now, and assume we've got a sequence of List<int>, one List for each row of the triangle. Like this:

{75}
{95, 64}
{17, 47, 82}
{18, 35, 87, 10}
...
In this layout, it's a little tricky to visualise which cells are adjacent to which: you need to look at the cell directly above, and the one above but to the left. For example, the 35 in row four is adjacent to 47 and 17 in the row above.

We'll use Aggregate to 'flatten' the triangle represented by this sequence of Lists using the algorithm I described above: the aggregate that we'll be calculating is a List containing the maximum sum over all routes to each cell up to the last row we've processed. Remember that Aggregate needs a function that combines the aggregate so far with the next item in the series. We'll supply a function that takes the maximums so far and combines them with the cell values for the next row. The code looks like this:

return rows.Aggregate(
    new List<int> {0},
    (currentMaxima, nextRow) =>
        {
            var rowLength = nextRow.Count();
            return nextRow
                .Select((cell, index) =>
                        index == 0 ? cell + currentMaxima[index]
                        : index == rowLength - 1 ? cell + currentMaxima[index - 1]
                        : Math.Max(cell + currentMaxima[index - 1], cell + currentMaxima[index]))
                .ToList();
        }
    )
    .Max();

Notice that at line 7 we're processing the sequence nextRow (containing the cell values) using an overload of the Select method that gives us the index of each cell value in the sequence; we use this index to find the appropriate entries in the list currentMaxima containing the maximum sums to the cells in the row above.

And with that, I believe we've solved the problem. It takes 2 milliseconds to solve the 100-row triangle problem on my computer - a satisfying improvement over 20 billion years.

Here's the complete code. Note that in the source code project I have included two files containing the data for these problems.

namespace ProjectEuler
{
    [EulerProblem(18, Title = "Find the maximum sum travelling from the top of the triangle to the base.")]
    public class Problem18
    {
        public long Solve()
        {
            var dataPath = @"ProblemData\Problem18Data.txt";

            return MaximumSumThroughTriangleSolver.SolveFromFile(dataPath);
        }
    }

    [EulerProblem(67, Title = "Using an efficient algorithm find the maximal sum in the triangle?")]
    public class Problem67
    {
        public long Solve()
        {
            var dataPath = @"ProblemData\Problem67Data.txt";

            return MaximumSumThroughTriangleSolver.SolveFromFile(dataPath);
        }
    }

    public static class MaximumSumThroughTriangleSolver
    {
        public static int SolveFromFile(string dataFilePath)
        {
            string problemData = File.ReadAllText(dataFilePath);

            return Solve(problemData);
        }

        public static int Solve(string problemData)
        {
            var rows = problemData
                .Split(new[] {Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)
                .Select(line =>
                        line.Split(' ')
                            .Select(token => int.Parse(token))
                            .ToList()
                )
                .ToList();

            return rows.Aggregate(
                new List<int> {0},
                (currentMaxima, nextRow) =>
                    {
                        var rowLength = nextRow.Count();
                        return nextRow
                            .Select((cell, index) =>
                                    index == 0 ? cell + currentMaxima[index]
                                    : index == rowLength - 1 ? cell + currentMaxima[index - 1]
                                    : Math.Max(cell + currentMaxima[index - 1], cell + currentMaxima[index]))
                            .ToList();
                    }
                )
                .Max();
        }
    }
}