My solution to Citerus/JFokus programming contest

It’s JFokus – the number one annual Java echo system event in Sweden – right now. I’ll present TDD with Regex in a session tomorrow Wednesday. However, Uppsala based consultancy company Citerus kicked off a programming contest during this conference.

The problem: You get one source string and a lot of short acronym strings. Delete acronyms from the source string until you have the shortest possible string. Use whatever programming language you like as long as it is executable on the JVM.

The example says delete the following acronyms:

  • TDD, DDD, DI, DO, OO, UI, ANT, CV, IOC, LOC, SU, VO

…from the source string:

  • VOCDIITEIOCRUDOIANTOCSLOIOCVESTAIOCVOLIOCENTSU

The first idea you probably get is to delete them in a double loop like this:

import java.util.*;
public class Stripper {
  final String [] acronyms;
  public Stripper(final String... acronyms) {
    this.acronyms = acronyms;
  }
  public String strip(final String source) {
    String result = source;
    for (int i = 0; i < source.length(); i++) {
      final String sufix = source.substring(i);
      for (final String acronym : acronyms) {
        if (sufix.startsWith(acronym)) {
          result = shortest(result, strip(source.substring(0, i)
              + sufix.substring(acronym.length())));
        }
      }
    }
    return result;
  }
  private static String shortest(final String s1, final String s2) {
    return s1.length() < s2.length() ? s1 : s2;
  }
  public static void main(String args[]) {
    final long start = System.currentTimeMillis();
    System.out.printf("%s (%d ms)\n", new Stripper("TDD", "DDD", "DI",
        "DO", "OO", "UI", "ANT", "CV", "IOC", "LOC", "SU",
        "VO").strip("VOCDIITEIOCRUDOIANTOCSLOIOCVESTAIOCVOLIOCENTSU"),
            System.currentTimeMillis() - start);
  }
}

Unfortunately, it’s slow. It took my Linux box 20 minutes to run the example. But, you may have noticed that the recursive call invokes the same query over and over again. Let’s add a cache:

import java.util.*;

public class Stripper {
  final private Map<String, String> stripCache = new HashMap<String, String>();
final String [] acronyms;

public Stripper(final String... acronyms) {
  this.acronyms = acronyms;
}
public String strip(final String source) {
  if (stripCache.containsKey(source)) return stripCache.get(source);
  String result = source;
  for (int i = 0; i < source.length(); i++) {
    final String sufix = source.substring(i);
    for (final String acronym : acronyms) {
      if (sufix.startsWith(acronym)) {
        result = shortest(result, strip(source.substring(0, i)
            + sufix.substring(acronym.length())));
      }
    }
  }
  stripCache.put(source, result);
  return result;
}
private static String shortest(final String s1, final String s2) {
  return s1.length() < s2.length() ? s1 : s2;
}
public static void main(String args[]) {
  final long start = System.currentTimeMillis();
  System.out.printf("%s (%d ms)\n", new Stripper("TDD", "DDD", "DI",
      "DO", "OO", "UI", "ANT", "CV", "IOC", "LOC", "SU", 
      "VO").strip("VOCDIITEIOCRUDOIANTOCSLOIOCVESTAIOCVOLIOCENTSU"),
          System.currentTimeMillis() - start);
  }
}

…and suddenly the example is solved in less than 50 ms. An improvement by 30.000 times!

But, it didn’t stop there. If you swap the loops, you can jump over some hopeless cases:

import java.util.*;
public class Stripper {
  final private Map<String, String> stripCache = new HashMap<String, String>();
  final String [] acronyms;
  public Stripper(final String... acronyms) {
    this.acronyms = acronyms;
  }
  public String strip(final String source) {
    if (stripCache.containsKey(source)) return stripCache.get(source);
    String result = source;
    for (final String acronym : acronyms) {
      for (int i = source.indexOf(acronym); 0 <= i && source.length() > i;
            i = source.indexOf(acronym, i + 1)) {
        result = shortest(result, strip(source.substring(0, i) 
              + source.substring(i + acronym.length())));
      }
    }
    stripCache.put(source, result);
    return result;
  }
  private static String shortest(final String s1, final String s2) {
    return s1.length() < s2.length() ? s1 : s2;
  }
  public static void main(String args[]) {
    final long start = System.currentTimeMillis();
  System.out.printf("%s (%d ms)\n", new Stripper("TDD", "DDD", "DI",
      "DO", "OO", "UI", "ANT", "CV", "IOC", "LOC", "SU", 
      "VO").strip("VOCDIITEIOCRUDOIANTOCSLOIOCVESTAIOCVOLIOCENTSU"),
          System.currentTimeMillis() - start);
  }
}

And that made it twice as fast. Now it takes less than 25 ms to solve it.

Advertisements

3 Responses to “My solution to Citerus/JFokus programming contest”


  1. 1 Magnus Ljadas 2011-02-23 at 13.25

    While waiting for a Hadoop job to finish, I decided to port your Java solution to Ruby:

    http://deltaprojects.blogspot.com/2011/02/while-waiting-for-hadoop.html

    The results kind of surprised me ;-)

  2. 3 Sten-Åke Strid 2011-02-28 at 23.03

    Nice way to solve a problem using the technique ‘Dynamic programming’. For those reading this comment and wonder what i talk about, just check it up on wikipedia: http://en.wikipedia.org/wiki/Dynamic_programming


Comments are currently closed.




%d bloggers like this: