mindstalk: (science)
One drawback to automatic memoization is that it will cheerfully memoize *all* the parameters, even ones that don't matter, like a counter you were threading through in order to have debugging statements printed with the right number of spaces to indicate nested calls. After getting rid of that, my Python find-edit-path-with-lists code runs much more reasonably... though still by far the slowest of my (memoized) versions, and increasingly so with N. But it's 6x slower at N=800, rather than 100x slower at N=100.

I'm still in the position that the first thing I did was best, and attempts to optimize the code (in legibility or speed) have made things worse for reasons I don't understand. Well, I can guess that strings are represented more compactly than a list of chars, whereas a C string *is* an array of char.

...confirmed! A simple program with s=' '*10,000,000 takes 17 megabytes. s=[' ']*10,000,000 takes 83.5 megabytes. Wow. (Not that you can use such commas in Python proper, but I use them here for legibility.) A tuple takes as much space as a list.

Timing experiment indicates that incrementing a number variable is faster than incrementing a number as a list component (s+=1 vs. a[0]+=1), taking about 80% of the time. That might explain my 'concise' code slowdown, which tried using max() over small lists rather than chaining if statements.

Decrementing a counter in a while loop takes twice the time of using range (Python 3) in a for loop. I don't think that matters for me but good to know.

So I guess I'm starting to understand: lists are convenient to code but terrible for performance, compared to string and number types. Best to avoid repeated list accesses, and don't use a list in lieu of a handful of variables. Increment a counter then store it in a list when you're done, don't use a list location as a counter.

Of course if you *really* cared about performance you wouldn't use Python, but presumably there's a middle ground of wanting easy coding but decent performance.

Though I have dim memories that Common Lisp implementations are just as safe and approximately as convenient -- for raw code, inferior libraries for modern needs -- while being at least 3x faster than Perl or Python. *Shakes fist*.

(I'm also in the position of having many key insights as I try to go to sleep.)

***

And it mostly all makes sense, with a bit more thought. A string really can be an array of chars (or wchars?) internally, while a list can store anything anywhere, and needs the type and pointer information to do that. And list access is bounds-checked, so you've got that overhead calculation on top of the dereference. I just didn't appreciate the magnitude of the overhead when you've only got a few operations to begin with.

Now I wonder about the actual numbers. 17M for 10m chars suggests two bytes per character. I thought a basic ASCII char would take 1 byte but I haven't studied Unicode formats. 83.5M is almost 5x as much, and a pointer in addition to a char would work, but that seems too simple.
This account has disabled anonymous posting.
(will be screened if not validated)
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

If you are unable to use this captcha for any reason, please contact us by email at support@dreamwidth.org

Profile

mindstalk: (Default)
mindstalk

May 2026

S M T W T F S
     12
3456789
10111213141516
17 18 1920212223
24252627282930
31      

Most Popular Tags

Expand Cut Tags

No cut tags

Style Credit

Page generated 2026-05-19 23:17
Powered by Dreamwidth Studios