Showing posts with label key. Show all posts
Showing posts with label key. Show all posts

22 October 2011

Freeze Custom Ruby Strings When Used as Keys in Hash

Last week I spent quite some time chasing a single issue in my JavaClass Ruby gem. It really annoyed me and I could not find anything useful even using Google. I had to dig deep. Read what happened: I began with some kind of rich string, quite similar to the following class:
class RichString < String
  def initialize(string)
    super(string)
    @data = string[0..0] # some manipulation here
  end
  def data
    @data
  end
end

word = RichString.new('word')
puts word               # => word
puts word.data          # => w
That was not special and worked as expected.

Lost ... !!Then I happened to use instances of RichString as keys in a hash. Why shouldn't I? They were still normal Strings and their data should be ignored when used in the hash.
map = {}
map[word] = :anything

word_key = map.keys[0]
puts word_key           # => word
puts word_key.data      # => nil
The last line warned me "instance variable @data not initialized". Oops, my little @data went missing indicated by the bold nil in the last line. First I did not know what was causing the problems. I was baffled as all tests were green and had a good coverage. I spent some time digging and rewriting a lot of functionality until I found that Hash#keys() caused the trouble when given my RichStrings as hash keys.
puts word == word_key   # => true
puts word.object_id == word_key.object_id  # => false
Aha, Hash changed the keys. It's reasonable to prohibit key changes, so a String passed as a key will be duplicated and frozen. (RTFM always helps ;-) But how did it do that? It did not call dup() on the RichString. As Hash is natively implemented, I ended up in the C source hash.c.
/*
*  call-seq:
*     hsh[key] = value        => value
*     hsh.store(key, value)   => value
*/

VALUE
rb_hash_aset(hash, key, val)
  VALUE hash, key, val;
{
  rb_hash_modify(hash);
  if (TYPE(key) != T_STRING || st_lookup(RHASH(hash)->tbl, key, 0)) {
    st_insert(RHASH(hash)->tbl, key, val);
  }
  else {
    st_add_direct(RHASH(hash)->tbl, rb_str_new4(key), val);
  }
  return val;
}
So when the key is a String and not already included in the hash, then rb_str_new4 is called. (I just love descriptive names ;-) Furthermore string.c revealed some fiddling with the original key.
VALUE
rb_str_new4(orig)
  VALUE orig;
{
  VALUE klass, str;

  if (OBJ_FROZEN(orig)) return orig;
  klass = rb_obj_class(orig);
  if (FL_TEST(orig, ELTS_SHARED) &&
      (str = RSTRING(orig)->aux.shared) &&
      klass == RBASIC(str)->klass) {
    long ofs;
    ofs = RSTRING(str)->len - RSTRING(orig)->len;
    if ((ofs > 0) || (!OBJ_TAINTED(str) && OBJ_TAINTED(orig))) {
      str = str_new3(klass, str);
      RSTRING(str)->ptr += ofs;
      RSTRING(str)->len -= ofs;
    }
  }
  else if (FL_TEST(orig, STR_ASSOC)) {
    str = str_new(klass, RSTRING(orig)->ptr, RSTRING(orig)->len);
  }
  else {
    str = str_new4(klass, orig);
  }
  OBJ_INFECT(str, orig);
  OBJ_FREEZE(str);
  return str;
}
Frozen StringI didn't quite understand what was going on in rb_str_new4(), but it was sufficient to read a few lines: If the original string was frozen, then it was used directly. I verified that.
map = {}
map[word.freeze] = :anything

word_key = map.keys[0]
puts word_key           # => word
puts word_key.data      # => w
Excellent, finally my @data showed up as expected. Fixing the problem added some complexity dealing with frozen values, but it worked.

Freeze your custom Ruby strings when you use them as keys in a hash (and want to retrieve them with Hash#keys())

29 August 2010

Productivity Tip: Folder Names

post it boardI want to be productive. I like the feeling of GTD. I believe that even small things make a difference in productivity. For example assigning a keyboard shortcut to the calculator application. I don't use the calculator very often, but when I do, I have this warm and cosy feeling that I saved one or two seconds to open it. I'm always assigning keyboard shortcuts. I have been doing it since the early days of Windows 3.1.

One trick I found recently is to name folders beginning with different letters. For example some time ago my main work folder contained subfolders article, code, community, posts, presentation and resource. To speed up folder switching I renamed them to article, blog, community, develop, presentation and resource. Now all folders start with a different letter. Each folder is uniquely accessible by pressing a single key in explorer or any navigator. The same is true for drives.

Finding names can be difficult. They should describe the files inside them. If I don't find a proper synonym or word in a different language, I don't change it.