Showing posts with label hashtable. Show all posts
Showing posts with label hashtable. Show all posts

22 October 2011

Freeze Custom Ruby Strings When Used as Keys in Hash

Last week I spent quite some time chasing a single issue in my JavaClass Ruby gem. It really annoyed me and I could not find anything useful even using Google. I had to dig deep. Read what happened: I began with some kind of rich string, quite similar to the following class:
class RichString < String
  def initialize(string)
    super(string)
    @data = string[0..0] # some manipulation here
  end
  def data
    @data
  end
end

word = RichString.new('word')
puts word               # => word
puts word.data          # => w
That was not special and worked as expected.

Lost ... !!Then I happened to use instances of RichString as keys in a hash. Why shouldn't I? They were still normal Strings and their data should be ignored when used in the hash.
map = {}
map[word] = :anything

word_key = map.keys[0]
puts word_key           # => word
puts word_key.data      # => nil
The last line warned me "instance variable @data not initialized". Oops, my little @data went missing indicated by the bold nil in the last line. First I did not know what was causing the problems. I was baffled as all tests were green and had a good coverage. I spent some time digging and rewriting a lot of functionality until I found that Hash#keys() caused the trouble when given my RichStrings as hash keys.
puts word == word_key   # => true
puts word.object_id == word_key.object_id  # => false
Aha, Hash changed the keys. It's reasonable to prohibit key changes, so a String passed as a key will be duplicated and frozen. (RTFM always helps ;-) But how did it do that? It did not call dup() on the RichString. As Hash is natively implemented, I ended up in the C source hash.c.
/*
*  call-seq:
*     hsh[key] = value        => value
*     hsh.store(key, value)   => value
*/

VALUE
rb_hash_aset(hash, key, val)
  VALUE hash, key, val;
{
  rb_hash_modify(hash);
  if (TYPE(key) != T_STRING || st_lookup(RHASH(hash)->tbl, key, 0)) {
    st_insert(RHASH(hash)->tbl, key, val);
  }
  else {
    st_add_direct(RHASH(hash)->tbl, rb_str_new4(key), val);
  }
  return val;
}
So when the key is a String and not already included in the hash, then rb_str_new4 is called. (I just love descriptive names ;-) Furthermore string.c revealed some fiddling with the original key.
VALUE
rb_str_new4(orig)
  VALUE orig;
{
  VALUE klass, str;

  if (OBJ_FROZEN(orig)) return orig;
  klass = rb_obj_class(orig);
  if (FL_TEST(orig, ELTS_SHARED) &&
      (str = RSTRING(orig)->aux.shared) &&
      klass == RBASIC(str)->klass) {
    long ofs;
    ofs = RSTRING(str)->len - RSTRING(orig)->len;
    if ((ofs > 0) || (!OBJ_TAINTED(str) && OBJ_TAINTED(orig))) {
      str = str_new3(klass, str);
      RSTRING(str)->ptr += ofs;
      RSTRING(str)->len -= ofs;
    }
  }
  else if (FL_TEST(orig, STR_ASSOC)) {
    str = str_new(klass, RSTRING(orig)->ptr, RSTRING(orig)->len);
  }
  else {
    str = str_new4(klass, orig);
  }
  OBJ_INFECT(str, orig);
  OBJ_FREEZE(str);
  return str;
}
Frozen StringI didn't quite understand what was going on in rb_str_new4(), but it was sufficient to read a few lines: If the original string was frozen, then it was used directly. I verified that.
map = {}
map[word.freeze] = :anything

word_key = map.keys[0]
puts word_key           # => word
puts word_key.data      # => w
Excellent, finally my @data showed up as expected. Fixing the problem added some complexity dealing with frozen values, but it worked.

Freeze your custom Ruby strings when you use them as keys in a hash (and want to retrieve them with Hash#keys())