mindstalk: (thoughtful)
2016-05-20 12:08 pm

sorting invariants

Of the standard fast sorting algorithms, heap and quick are in-place but unstable, merge is stable (or can be) but take O(n) extra space. You can decorate input with the original sequencing in order to break ties and make a stable sort, but now that takes O(n) extra space. Insertion sort is stable and in-place, but takes O(n^2) time; if you insert into a tree it becomes faster, but takes space again (as you're not just copying the data, but creating two pointers per item, too.)

If I understand radix sort right, it has worst case O(n*log(n)) performance (if there are n distinct entries, they take log(n) bits to be represented, and radix is O(n*numbits), and the LSD version is stable, but I think takes extra space. MSD can be stable but takes extra space for that.

So, seems like a pattern! Except Wikipedia intimates of in-place stable mergesorts in O(n*log n * log n) -- okay, that still takes extra time -- or in just O(n log n) but with high constant values. And there's blocksort, allegedly an in-place, stable, worst case O(n log n), best case O(n) algorithm. Which sounds like the Ultimate Sorting Algorithm, doing everything you'd want. Why haven't I heard of it more? Apart from seeming pretty complicated.
mindstalk: (robot)
2016-03-11 12:04 pm

So high, so low, so many things to know

Still job hunting.
Realized that Scala and Clojure are functional languages at least some people pay people to use, I should go learn them! (I have a functional bias.)

As my algorithms studies continue, it's scary to look back and realize how much CS is out there that I didn't even know I didn't know, back when I was working. Both the stuff I learned in grad school (computer theory, OS concepts, graphics, programming language implementation) and the stuff I've learned since (graph algorithms, non-trivial dynamic programming, quickselect, heaps/priority queues, AVL trees...) What did I actually get hired on? Structured programming, lists, trees, recursion, Big-O analysis, hashes (thanks to Perl on a previous job, not any class I took.) Well, you really can do a lot with that. But man.
mindstalk: (science)
2016-02-11 12:34 pm

MLP:RiM Episode 3, and Python timing

One drawback to automatic memoization is that it will cheerfully memoize *all* the parameters, even ones that don't matter, like a counter you were threading through in order to have debugging statements printed with the right number of spaces to indicate nested calls. After getting rid of that, my Python find-edit-path-with-lists code runs much more reasonably... though still by far the slowest of my (memoized) versions, and increasingly so with N. But it's 6x slower at N=800, rather than 100x slower at N=100.

I'm still in the position that the first thing I did was best, and attempts to optimize the code (in legibility or speed) have made things worse for reasons I don't understand. Well, I can guess that strings are represented more compactly than a list of chars, whereas a C string *is* an array of char.

...confirmed! A simple program with s=' '*10,000,000 takes 17 megabytes. s=[' ']*10,000,000 takes 83.5 megabytes. Wow. (Not that you can use such commas in Python proper, but I use them here for legibility.) A tuple takes as much space as a list.

Timing experiment indicates that incrementing a number variable is faster than incrementing a number as a list component (s+=1 vs. a[0]+=1), taking about 80% of the time. That might explain my 'concise' code slowdown, which tried using max() over small lists rather than chaining if statements.

Decrementing a counter in a while loop takes twice the time of using range (Python 3) in a for loop. I don't think that matters for me but good to know.

So I guess I'm starting to understand: lists are convenient to code but terrible for performance, compared to string and number types. Best to avoid repeated list accesses, and don't use a list in lieu of a handful of variables. Increment a counter then store it in a list when you're done, don't use a list location as a counter.

Of course if you *really* cared about performance you wouldn't use Python, but presumably there's a middle ground of wanting easy coding but decent performance.

Though I have dim memories that Common Lisp implementations are just as safe and approximately as convenient -- for raw code, inferior libraries for modern needs -- while being at least 3x faster than Perl or Python. *Shakes fist*.

(I'm also in the position of having many key insights as I try to go to sleep.)

***

And it mostly all makes sense, with a bit more thought. A string really can be an array of chars (or wchars?) internally, while a list can store anything anywhere, and needs the type and pointer information to do that. And list access is bounds-checked, so you've got that overhead calculation on top of the dereference. I just didn't appreciate the magnitude of the overhead when you've only got a few operations to begin with.

Now I wonder about the actual numbers. 17M for 10m chars suggests two bytes per character. I thought a basic ASCII char would take 1 byte but I haven't studied Unicode formats. 83.5M is almost 5x as much, and a pointer in addition to a char would work, but that seems too simple.
mindstalk: (robot)
2016-02-10 07:59 am

MLP:RiM Episode 2

Protip: when modifying values returned by a memoized function, make sure you're appending copies of the table/cache values, and not references or pointers into the table. I managed to get burned by this in two different languages. Notably, having your core function return a copy of what it's been working on won't help, if its memoized cousin is returning references that you call ++ or .append() on.

Relatedly, the list version of my Python edit-finding program works now. It's 100x slower than the string version, though. This despite the fact that the string version is actually repeatedly copy-modifying strings it might not need to, while the list defers any copying until its made its final choice. I'm guessing slicing lists sucks. It was also using tons of memory earlier than other versions.

I'd wondered if the Python optimization problems in the previous RiM post were actually algorithm problems. Happily I had a C version right there to check. Or after a bit of work, multiple C versions.

Without memoization, C starts off as 3^n, as analysis expects, then seems to slide to the 5^n of the unmemoed Python versionn. With memo, simple distance finding goes up 3x for every 2x in input, which is better than the 4x expected... up to a point, when it slides to 5x. I suspect CPU cache overflow, but that's a cargo cult answer, I can't visualize what's going on. Python distance finding is 14x slower than C for N=200, 150x slower for N=1600.

Getting the C version to generate paths led to some exciting hours of debugging. This problem has weird boundary conditions (that's not just me being lazy: Skiena's version assumes you've padded your input strings with a beginning space) so there's a lot of -1 in my code. Which was fine for simple distance finding, where I could just return a number when the input was -1, but when I ended up trying to use -1 as an index into a C array... not so good. Sadly, it *does* work often enough to be hard to pin down.

And then, I actually had been copying my return values (see the beginning), but changed that in my flailing attempts to find the mysterious segfaults, which combined with what had been concise and elegant code, led to incrementing my table values.

But all that was eventually fixed, and it scales roughly quadratically. The last doubling takes 6.5x longer than the previous, which is still better than O(n^3). It also takes 3.6G of RAM which is why it's the last step; *memory* use certainly is cubic in this implementation, with N^2 entries of char[N] arrays. At modest N it's 5x slower than not finding strings, and 5x faster than the fastest Python version that does.

Of course, I've just been trusting that lru_cache is actually smart. Might be worth trying an manually memoed Python version, using a list of lists. *Pre-allocated* list of lists, probably.

So:
* Memoization is definitely an easy way of turning a slow algorithm into a fast one
* ...not so easy in pure C, though my final code is still pretty simple (but not re-entrant, if I tried making a library out of it.)
* It's definitely not a sure path to being optimal, or even a constant factor of optimal
* ...especially if using a language that makes it easy, at the cost of who knows what mechanisms chewing up time or memory.

I still think it'd be useful at least as a stepping stone to testing more optimal versions, being able to run on much bigger input than naive code while still being high-probability correct. Though in this problem, there can be multiple optimal-length edits, so the path returned by two correct programs needn't be the same. Ah well.
mindstalk: (escher)
2016-02-06 02:18 pm

My Little Program: Recursion Is Magic

One of the DP problems I alluded to in my last post is minimum edit distance: how many one-letter changes does it take to turn text string A into string B? Skiena gives a simple recursive solution, which is described incorrectly in the text but correctly as code. I'd coded my own version in C, and it worked, slowly. Then I memoized it by hand and it worked, much faster. Then more recently, as I practice Python by translating my C and Perl programs into it, I did a Python naive code, which worked, slowly. Then I memoized it with the lru_cache decorator, and it worked much faster. All is well.

But often you'd want to know an actual path of edits to make, not just how long the distance is. Skiena gives a full DP solution, filling one table of 'cached' lengths and another one of 'last moves', which you can then use to reconstruct the path. Skiena himself says it can be a bit tricky to get the boundaries and such right. I decided to try to get paths myself, but with the recursive version.

At first I was thinking in terms of last-move tables too, and worrying that I might need the memoization table hidden by the lru_cache. But then I thought, "no, this is all wrong, I should be thinking about a pure recursive solution first." And in a flash it came to me: the best path is the previous best path, plus whatever the current move is. Plus a base case, of course. That's it! Like all good recursive solutions it feels magical or cheating, like you're punting the work somewhere else indefinitely, but it does work, with pretty transparent code. Yay! Magic!

But magic has its price. I tried timing the code as inputs scaled, especially making two strings of length N and doubling N. "Oh no, it's exponential! Nx2, time x8! No wait, that's cubic, whew." But the algorithm is supposed to be quadratic, O(N^2) not O(N^3).

Well, I can guess why. Recursive programming makes clear and correct code easy, at the cost of often slinging copies of lists or strings around. If there are N^2 steps but each one is taking N steps of its own to copy or hash a string, that's cubic. And it looked like my code was copying path strings a lot; also, memoization works by hashing arguments, I think.

So when I was lying awake in bed thinking on the problem, I imagined this post was going to be a happy story of how I made a couple of tweaks and got my running time down. Sadly, that has not been the case. The string arguments never change, so I lifted them out of the function that gets memoized, but that didn't make much difference. Maybe it was already looking at their addresses and not their contents? I also tried building my paths up in lists rather than strings, but so far that's not correct. I suspect some sort of reference collision but haven't solved it yet, even by returning whole-list slices. And as a side project, I tried making a more concise version of the string-path code, but it took twice as long. I've clawed some of that back but it still takes 30-40% more time than the original.

I was surprised to find that avoiding using the min() function, even on an array of 3 items, in lieu of chained if..then, made a big difference. I'm guessing any function call is expensive.

I'm having bad flashbacks to my one 'big' Haskell class project (in an algorithms class, even), which was clearly non-optimal but also really hard to improve from its original form.

Well, today's frustration about code optimization aside, I can still be happy that the recursive path-finding magic works. The next related project is whether I can find similarly simple solutions to the related problems Skiena adapted his DP solution to solve, like approximate substring matching.

Also good to know: for deep recursion, you can and in fact have to modify Python's stack and recursionlimit in your code, no apparent need to mess with ulimit from the shell. Right now I'm setting the recursionlimit as a function of N...
mindstalk: (thoughtful)
2016-02-05 03:00 pm

learning stuff and grousing about it

So, I'm finally looking for software jobs again. As part of this I've been studying Skiena's algorithms book, figuring fundamentals were more important than details of some language. OTOH, between intensive study, the stress of a phone coding interview, and lack of sleep, the last few days I've decided to do something that doesn't take thought, like learn Python.

Read more... )