| 1 | # Version 0.05 alpha $Revision: 1.6 $ $Date: 2001/06/24 17:11:26 $
 | 
|---|
| 2 | 
 | 
|---|
| 3 | =head1 TO DO
 | 
|---|
| 4 | 
 | 
|---|
| 5 | =over 4
 | 
|---|
| 6 | 
 | 
|---|
| 7 | =item * 
 | 
|---|
| 8 | 
 | 
|---|
| 9 | LIST_CACHE doesn't work with ties to most DBM implementations, because
 | 
|---|
| 10 | Memouze tries to save a listref, and DB_File etc. can only store
 | 
|---|
| 11 | strings.  This should at least be documented.  Maybe Memoize could
 | 
|---|
| 12 | detect the problem at TIE time and throw a fatal error.
 | 
|---|
| 13 | 
 | 
|---|
| 14 | 20010623 This was added sometime prior to 20001025.
 | 
|---|
| 15 | 
 | 
|---|
| 16 | Try out MLDBM here and document it if it works.
 | 
|---|
| 17 | 
 | 
|---|
| 18 | =item *
 | 
|---|
| 19 | 
 | 
|---|
| 20 | We should extend the benchmarking module to allow
 | 
|---|
| 21 | 
 | 
|---|
| 22 |         timethis(main, { MEMOIZED => [ suba, subb ] })
 | 
|---|
| 23 | 
 | 
|---|
| 24 | What would this do?  It would time C<main> three times, once with
 | 
|---|
| 25 | C<suba> and C<subb> unmemoized, twice with them memoized.
 | 
|---|
| 26 | 
 | 
|---|
| 27 | Why would you want to do this?  By the third set of runs, the memo
 | 
|---|
| 28 | tables would be fully populated, so all calls by C<main> to C<suba>
 | 
|---|
| 29 | and C<subb> would return immediately.  You would be able to see how
 | 
|---|
| 30 | much of C<main>'s running time was due to time spent computing in
 | 
|---|
| 31 | C<suba> and C<subb>.  If that was just a little time, you would know
 | 
|---|
| 32 | that optimizing or improving C<suba> and C<subb> would not have a
 | 
|---|
| 33 | large effect on the performance of C<main>.  But if there was a big
 | 
|---|
| 34 | difference, you would know that C<suba> or C<subb> was a good
 | 
|---|
| 35 | candidate for optimization if you needed to make C<main> go faster.
 | 
|---|
| 36 | 
 | 
|---|
| 37 | Done.
 | 
|---|
| 38 | 
 | 
|---|
| 39 | =item * 
 | 
|---|
| 40 | 
 | 
|---|
| 41 | Perhaps C<memoize> should return a reference to the original function
 | 
|---|
| 42 | as well as one to the memoized version?  But the programmer could
 | 
|---|
| 43 | always construct such a reference themselves, so perhaps it's not
 | 
|---|
| 44 | necessary.  We save such a reference anyway, so a new package method
 | 
|---|
| 45 | could return it on demand even if it wasn't provided by C<memoize>.
 | 
|---|
| 46 | We could even bless the new function reference so that it could have
 | 
|---|
| 47 | accessor methods for getting to the original function, the options,
 | 
|---|
| 48 | the memo table, etc.
 | 
|---|
| 49 | 
 | 
|---|
| 50 | Naah.
 | 
|---|
| 51 | 
 | 
|---|
| 52 | =item *
 | 
|---|
| 53 | 
 | 
|---|
| 54 | The TODISK feature is not ready yet.  It will have to be rather
 | 
|---|
| 55 | complicated, providing options for which disk method to use (GDBM?
 | 
|---|
| 56 | DB_File?  Flat file?  Storable?  User-supplied?) and which stringizing
 | 
|---|
| 57 | method to use (FreezeThaw?  Marshal?  User-supplied?)
 | 
|---|
| 58 | 
 | 
|---|
| 59 | Done!
 | 
|---|
| 60 | 
 | 
|---|
| 61 | =item *
 | 
|---|
| 62 | 
 | 
|---|
| 63 | Maybe an option for automatic expiration of cache values?  (`After one
 | 
|---|
| 64 | day,' `After five uses,' etc.)  Also possibly an option to limit the
 | 
|---|
| 65 | number of active entries with automatic LRU expiration.
 | 
|---|
| 66 | 
 | 
|---|
| 67 | You have a long note to Mike Cariaso that outlines a good approach
 | 
|---|
| 68 | that you sent on 9 April 1999.
 | 
|---|
| 69 | 
 | 
|---|
| 70 | What's the timeout stuff going to look like?
 | 
|---|
| 71 | 
 | 
|---|
| 72 |         EXPIRE_TIME => time_in_sec
 | 
|---|
| 73 |         EXPIRE_USES => num_uses
 | 
|---|
| 74 |         MAXENTRIES => n
 | 
|---|
| 75 | 
 | 
|---|
| 76 | perhaps?  Is EXPIRE_USES actually useful?
 | 
|---|
| 77 | 
 | 
|---|
| 78 | 19990916: Memoize::Expire does EXPIRE_TIME and EXPIRE_USES.
 | 
|---|
| 79 | MAXENTRIES can come later as a separate module.
 | 
|---|
| 80 | 
 | 
|---|
| 81 | =item *
 | 
|---|
| 82 | 
 | 
|---|
| 83 | Put in a better example than C<fibo>.  Show an example of a
 | 
|---|
| 84 | nonrecursive function that simply takes a long time to run.
 | 
|---|
| 85 | C<getpwuid> for example?  But this exposes the bug that you can't say
 | 
|---|
| 86 | C<memoize('getpwuid')>, so perhaps it's not a very good example.
 | 
|---|
| 87 | 
 | 
|---|
| 88 | Well, I did add the ColorToRGB example, but it's still not so good.
 | 
|---|
| 89 | These examples need a lot of work.  C<factorial> might be a better
 | 
|---|
| 90 | example than C<fibo>.  
 | 
|---|
| 91 | 
 | 
|---|
| 92 | =item *
 | 
|---|
| 93 | 
 | 
|---|
| 94 | Add more regression tests for normalizers.
 | 
|---|
| 95 | 
 | 
|---|
| 96 | =item *
 | 
|---|
| 97 | 
 | 
|---|
| 98 | Maybe resolve normalizer function to code-ref at memoize time instead
 | 
|---|
| 99 | of at function call time for efficiency?  I think there was some
 | 
|---|
| 100 | reason not to do this, but I can't remember what it was.
 | 
|---|
| 101 | 
 | 
|---|
| 102 | =item *
 | 
|---|
| 103 | 
 | 
|---|
| 104 | Add more array value tests to the test suite.
 | 
|---|
| 105 | 
 | 
|---|
| 106 | Does it need more now?
 | 
|---|
| 107 | 
 | 
|---|
| 108 | =item *
 | 
|---|
| 109 | 
 | 
|---|
| 110 | Fix that `Subroutine u redefined ... line 484' message.
 | 
|---|
| 111 | 
 | 
|---|
| 112 | Fixed, I think.
 | 
|---|
| 113 | 
 | 
|---|
| 114 | =item *
 | 
|---|
| 115 | 
 | 
|---|
| 116 | Get rid of any remaining *{$ref}{CODE} or similar magic hashes. 
 | 
|---|
| 117 | 
 | 
|---|
| 118 | =item *
 | 
|---|
| 119 | 
 | 
|---|
| 120 | There should be an option to dump out the memoized values or to
 | 
|---|
| 121 | otherwise traverse them.
 | 
|---|
| 122 | 
 | 
|---|
| 123 | What for?
 | 
|---|
| 124 | 
 | 
|---|
| 125 | Maybe the tied hash interface taskes care of this anyway?
 | 
|---|
| 126 | 
 | 
|---|
| 127 | =item *
 | 
|---|
| 128 | 
 | 
|---|
| 129 | Include an example that caches DNS lookups.
 | 
|---|
| 130 | 
 | 
|---|
| 131 | =item *
 | 
|---|
| 132 | 
 | 
|---|
| 133 | Make tie for Storable (Memoize::Storable)
 | 
|---|
| 134 | 
 | 
|---|
| 135 | A prototype of Memoize::Storable is finished.  Test it and add to the
 | 
|---|
| 136 | test suite.
 | 
|---|
| 137 | 
 | 
|---|
| 138 | Done.
 | 
|---|
| 139 | 
 | 
|---|
| 140 | =item *
 | 
|---|
| 141 | 
 | 
|---|
| 142 | Make tie for DBI  (Memoize::DBI)
 | 
|---|
| 143 | 
 | 
|---|
| 144 | =item *
 | 
|---|
| 145 | 
 | 
|---|
| 146 | I think there's a bug.  See `###BUG'.
 | 
|---|
| 147 | 
 | 
|---|
| 148 | =item * 
 | 
|---|
| 149 | 
 | 
|---|
| 150 | Storable probably can't be done, because it doesn't allow updating.
 | 
|---|
| 151 | Maybe a different interface that supports readonly caches fronted by a
 | 
|---|
| 152 | writable in-memory cache?  A generic tied hash maybe?
 | 
|---|
| 153 | 
 | 
|---|
| 154 |         FETCH {
 | 
|---|
| 155 |           if (it's in the memory hash) {
 | 
|---|
| 156 |             return it
 | 
|---|
| 157 |           } elsif (it's in the readonly disk hash) {
 | 
|---|
| 158 |             return it
 | 
|---|
| 159 |           } else { 
 | 
|---|
| 160 |             not-there
 | 
|---|
| 161 |           }
 | 
|---|
| 162 |         }
 | 
|---|
| 163 | 
 | 
|---|
| 164 |         STORE {
 | 
|---|
| 165 |           put it into the in-memory hash
 | 
|---|
| 166 |         }
 | 
|---|
| 167 | 
 | 
|---|
| 168 | Maybe `save' and `restore' methods?
 | 
|---|
| 169 | 
 | 
|---|
| 170 | It isn't working right because the destructor doesn't get called at
 | 
|---|
| 171 | the right time.
 | 
|---|
| 172 | 
 | 
|---|
| 173 | This is fixed.  `use strict vars' would have caught it immediately.  Duh.
 | 
|---|
| 174 | 
 | 
|---|
| 175 | =item *
 | 
|---|
| 176 | 
 | 
|---|
| 177 | Don't forget about generic interface to Storable-like packages
 | 
|---|
| 178 | 
 | 
|---|
| 179 | 20010627 It would appear that you put this into 0.51.
 | 
|---|
| 180 | 
 | 
|---|
| 181 | =item * 
 | 
|---|
| 182 | 
 | 
|---|
| 183 | Maybe add in TODISK after all, with TODISK => 'filename' equivalent to
 | 
|---|
| 184 | 
 | 
|---|
| 185 |         SCALAR_CACHE => [TIE, Memoize::SDBM_File, $filename, O_RDWR|O_CREAT, 0666],
 | 
|---|
| 186 |         LIST_CACHE => MERGE
 | 
|---|
| 187 | 
 | 
|---|
| 188 | =item *
 | 
|---|
| 189 | 
 | 
|---|
| 190 | Maybe the default for LIST_CACHE should be MERGE anyway.
 | 
|---|
| 191 | 
 | 
|---|
| 192 | =item *
 | 
|---|
| 193 | 
 | 
|---|
| 194 | There's some terrible bug probably related to use under threaded perl,
 | 
|---|
| 195 | possibly connected with line 56:
 | 
|---|
| 196 | 
 | 
|---|
| 197 |     my $wrapper = eval "sub { unshift \@_, qq{$cref}; goto &_memoizer; }";
 | 
|---|
| 198 | 
 | 
|---|
| 199 | I think becayse C<@_> is lexically scoped in threadperl, the effect of
 | 
|---|
| 200 | C<unshift> never makes it into C<_memoizer>.  That's probably a bug in
 | 
|---|
| 201 | Perl, but maybe I should work around it.   Can anyone provide more
 | 
|---|
| 202 | information here, or lend me a machine with threaded Perl where I can
 | 
|---|
| 203 | test this theory?  Line 59, currently commented out, may fix the
 | 
|---|
| 204 | problem.  
 | 
|---|
| 205 | 
 | 
|---|
| 206 | 20010623 Working around this in 0.65, but it still blows.
 | 
|---|
| 207 | 
 | 
|---|
| 208 | =item * 
 | 
|---|
| 209 | 
 | 
|---|
| 210 | Maybe if the original function has a prototype, the module can use
 | 
|---|
| 211 | that to select the most appropriate default normalizer.  For example,
 | 
|---|
| 212 | if the prototype was C<($)>, there's no reason to use `join'.  If it's
 | 
|---|
| 213 | C<(\@)> then it can use C<join $;,@$_[0];> instead of C<join $;,@_;>.
 | 
|---|
| 214 | 
 | 
|---|
| 215 | =item *
 | 
|---|
| 216 | 
 | 
|---|
| 217 | Ariel Scolnikov suggests using the change counting problem as an
 | 
|---|
| 218 | example.  (How many ways to make change of a dollar?)
 | 
|---|
| 219 | 
 | 
|---|
| 220 | =item * 
 | 
|---|
| 221 | 
 | 
|---|
| 222 | Jonathan Roy found a use for `unmemoize'.  If you're using the
 | 
|---|
| 223 | Storable glue, and your program gets SIGINT, you find that the cache
 | 
|---|
| 224 | data is not in the cache, because Perl normally writes it all out at
 | 
|---|
| 225 | once from a DESTROY method, and signals skip DESTROY processing.  So
 | 
|---|
| 226 | you could add
 | 
|---|
| 227 | 
 | 
|---|
| 228 |         $sig{INT} = sub { unmemoize ... };
 | 
|---|
| 229 | 
 | 
|---|
| 230 | 
 | 
|---|
| 231 | =item *
 | 
|---|
| 232 | 
 | 
|---|
| 233 | This means it would be useful to have a method to return references to
 | 
|---|
| 234 | all the currently-memoized functions so that you could say
 | 
|---|
| 235 | 
 | 
|---|
| 236 |         $sig{INT} = sub { for $f (Memoize->all_memoized) {
 | 
|---|
| 237 |                             unmemoize $f;
 | 
|---|
| 238 |                           }
 | 
|---|
| 239 |                         }
 | 
|---|
| 240 | 
 | 
|---|
| 241 | 
 | 
|---|
| 242 | =item *
 | 
|---|
| 243 | 
 | 
|---|
| 244 | 19990917 There should be a call you can make to get back the cache
 | 
|---|
| 245 | itself.  If there were, then you could delete stuff from it to
 | 
|---|
| 246 | manually expire data items.
 | 
|---|
| 247 | 
 | 
|---|
| 248 | =item *
 | 
|---|
| 249 | 
 | 
|---|
| 250 | 19990925 Randal says that the docs for Memoize;:Expire should make it
 | 
|---|
| 251 | clear that the expired entries are never flushed all at once.  He
 | 
|---|
| 252 | asked if you would need to do that manually.  I said:
 | 
|---|
| 253 | 
 | 
|---|
| 254 |   Right, if that's what you want.  If you have EXISTS return false,
 | 
|---|
| 255 |   it'll throw away the old cached item and replace it in the cache
 | 
|---|
| 256 |   with a new item.  But if you want the cache to actually get smaller,
 | 
|---|
| 257 |   you have to do that yourself.
 | 
|---|
| 258 | 
 | 
|---|
| 259 |   I was planning to build an Expire module that implemented an LRU
 | 
|---|
| 260 |   queue and kept the cache at a constant fixed size, but I didn't get
 | 
|---|
| 261 |   to it yet.  It's not clear to me that the automatic exptynig-out
 | 
|---|
| 262 |   behavior is very useful anyway.  The whole point of a cache is to
 | 
|---|
| 263 |   trade space for time, so why bother going through the cache to throw
 | 
|---|
| 264 |   away old items before you need to?
 | 
|---|
| 265 | 
 | 
|---|
| 266 | Randal then pointed out that it could discard expired items at DESTRoY
 | 
|---|
| 267 | or TIEHASH time, which seemed like a good idea, because if the cache
 | 
|---|
| 268 | is on disk you might like to keep it as small as possible.
 | 
|---|
| 269 | 
 | 
|---|
| 270 | =item *
 | 
|---|
| 271 | 
 | 
|---|
| 272 | 19991219 Philip Gwyn suggests this technique:  You have a load_file
 | 
|---|
| 273 | function that memoizes the file contexts.  But then if the file
 | 
|---|
| 274 | changes you get the old contents.  So add a normalizer that does
 | 
|---|
| 275 | 
 | 
|---|
| 276 |         return join $;, (stat($_[0])[9]), $_[0];
 | 
|---|
| 277 | 
 | 
|---|
| 278 | Now when the modification date changes, the true key returned by the
 | 
|---|
| 279 | normalizer is different, so you get a cache miss and it loads the new
 | 
|---|
| 280 | contents.   Disadvantage:  The old contents are still in the cache.  I
 | 
|---|
| 281 | think it makes more sense to have a special expiration manager for
 | 
|---|
| 282 | this.  Make one up and bundle it.
 | 
|---|
| 283 | 
 | 
|---|
| 284 | 19991220 I have one written: Memoize::ExpireFile.  But how can you
 | 
|---|
| 285 | make this work when the function might have several arguments, of
 | 
|---|
| 286 | which some are filenames and some aren't?
 | 
|---|
| 287 | 
 | 
|---|
| 288 | =item *
 | 
|---|
| 289 | 
 | 
|---|
| 290 | 19991219 There should be an inheritable TIEHASH method that does the
 | 
|---|
| 291 | argument processing properly.
 | 
|---|
| 292 | 
 | 
|---|
| 293 | 19991220 Philip Gwyn contributed a patch for this.
 | 
|---|
| 294 | 
 | 
|---|
| 295 | 20001231 You should really put this in.  Jonathan Roy uncovered a
 | 
|---|
| 296 | problem that it will be needed to solve.  Here's the problem:  He has:
 | 
|---|
| 297 | 
 | 
|---|
| 298 |         memoize "get_items",
 | 
|---|
| 299 |         LIST_CACHE => ["TIE", "Memoize::Expire",
 | 
|---|
| 300 |                 LIFETIME => 86400,
 | 
|---|
| 301 |                 TIE => ["DB_File", "debug.db", O_CREAT|O_RDWR, 0666]
 | 
|---|
| 302 |         ];
 | 
|---|
| 303 | 
 | 
|---|
| 304 | This won't work, because memoize is trying to store listrefs in a
 | 
|---|
| 305 | DB_File.    He owuld have gotten a fatal error if he had done this:
 | 
|---|
| 306 | 
 | 
|---|
| 307 |         memoize "get_items",
 | 
|---|
| 308 |           LIST_CACHE => ["TIE", "DB_File", "debug.db", O_CREAT|O_RDWR, 0666]'
 | 
|---|
| 309 | 
 | 
|---|
| 310 | 
 | 
|---|
| 311 | But in this case, he tied the cache to Memoize::Expire, which is *not*
 | 
|---|
| 312 | scalar-only, and the check for scalar-only ties is missing from
 | 
|---|
| 313 | Memoize::Expire.  The inheritable method can take care of this.
 | 
|---|
| 314 | 
 | 
|---|
| 315 | 20010623 I decided not to put it in.  Instead, we avoid the problem by
 | 
|---|
| 316 | getting rid of TIE.  The HASH option does the same thing, and HASH is
 | 
|---|
| 317 | so simple to support that a module is superfluous.
 | 
|---|
| 318 | 
 | 
|---|
| 319 | =item *
 | 
|---|
| 320 | 
 | 
|---|
| 321 | 20001130 Custom cache manager that checks to make sure the function
 | 
|---|
| 322 | return values actually match the memoized values.
 | 
|---|
| 323 | 
 | 
|---|
| 324 | =item *
 | 
|---|
| 325 | 
 | 
|---|
| 326 | 20001231 Expiration manager that watches cache performance and
 | 
|---|
| 327 | accumulates statistics.  Variation:  Have it automatically unmemoize
 | 
|---|
| 328 | the function if performance is bad.
 | 
|---|
| 329 | 
 | 
|---|
| 330 | =item *
 | 
|---|
| 331 | 
 | 
|---|
| 332 | 20010517 Option to have normalizer *modify* @_ for use by memoized
 | 
|---|
| 333 | function.  This would save code and time in cases like the one in the
 | 
|---|
| 334 | manual under 'NORMALIZER', where both f() and normalize_f() do the
 | 
|---|
| 335 | same analysis and make the same adjustments to the hash.  If the
 | 
|---|
| 336 | normalizer could make the adjustments and save the changes in @_, you
 | 
|---|
| 337 | wouldn't have to do it twice. 
 | 
|---|
| 338 | 
 | 
|---|
| 339 | =item *
 | 
|---|
| 340 | 20010623 Add CLEAR methods to tied hash modules.
 | 
|---|
| 341 | 
 | 
|---|
| 342 | =item *
 | 
|---|
| 343 | 20010623 You get a warning if you try to use DB_File as LIST_CACHE,
 | 
|---|
| 344 | because it won't store lists.  But if you use it as the underlying
 | 
|---|
| 345 | cache with an expiration manager in the middle, no warning---the
 | 
|---|
| 346 | expiration manager doesn't know it's managing a list cache, and
 | 
|---|
| 347 | memoize doesn't know that DB_File is underlying.  Is this fixable?
 | 
|---|
| 348 | Probably not, but think about it.
 | 
|---|
| 349 | 
 | 
|---|
| 350 | =item *
 | 
|---|
| 351 | There was probably some other stuff that I forgot.
 | 
|---|
| 352 | 
 | 
|---|
| 353 | 
 | 
|---|
| 354 | 
 | 
|---|
| 355 | =back
 | 
|---|