mindstalk | MLP:RiM Episode 3, and Python timing (Reply)

One drawback to automatic memoization is that it will cheerfully memoize *all* the parameters, even ones that don't matter, like a counter you were threading through in order to have debugging statements printed with the right number of spaces to indicate nested calls. After getting rid of that, my Python find-edit-path-with-lists code runs much more reasonably... though still by far the slowest of my (memoized) versions, and increasingly so with N. But it's 6x slower at N=800, rather than 100x slower at N=100.

I'm still in the position that the first thing I did was best, and attempts to optimize the code (in legibility or speed) have made things worse for reasons I don't understand. Well, I can guess that strings are represented more compactly than a list of chars, whereas a C string *is* an array of char.

...confirmed! A simple program with s=' '*10,000,000 takes 17 megabytes. s=[' ']*10,000,000 takes 83.5 megabytes. Wow. (Not that you can use such commas in Python proper, but I use them here for legibility.) A tuple takes as much space as a list.

Timing experiment indicates that incrementing a number variable is faster than incrementing a number as a list component (s+=1 vs. a[0]+=1), taking about 80% of the time. That might explain my 'concise' code slowdown, which tried using max() over small lists rather than chaining if statements.

Decrementing a counter in a while loop takes twice the time of using range (Python 3) in a for loop. I don't think that matters for me but good to know.

So I guess I'm starting to understand: lists are convenient to code but terrible for performance, compared to string and number types. Best to avoid repeated list accesses, and don't use a list in lieu of a handful of variables. Increment a counter then store it in a list when you're done, don't use a list location as a counter.

Of course if you *really* cared about performance you wouldn't use Python, but presumably there's a middle ground of wanting easy coding but decent performance.

Though I have dim memories that Common Lisp implementations are just as safe and approximately as convenient -- for raw code, inferior libraries for modern needs -- while being at least 3x faster than Perl or Python. *Shakes fist*.

(I'm also in the position of having many key insights as I try to go to sleep.)

***

And it mostly all makes sense, with a bit more thought. A string really can be an array of chars (or wchars?) internally, while a list can store anything anywhere, and needs the type and pointer information to do that. And list access is bounds-checked, so you've got that overhead calculation on top of the dereference. I just didn't appreciate the magnitude of the overhead when you've only got a few operations to begin with.

Now I wonder about the actual numbers. 17M for 10m chars suggests two bytes per character. I thought a basic ASCII char would take 1 byte but I haven't studied Unicode formats. 83.5M is almost 5x as much, and a pointer in addition to a char would work, but that seems too simple.