mindstalk: (Default)
The work system has two kinds of debug/logging statements.

One is a class with the standard methods (trace through critical) which are just print statements that add "DEBUG" or "INFO" as appropriate. There's not even a hint of output throttling. But it is a class, and so I can rip its guts out and replace them with calls to Python's logging module, and it works.

Then there's the 500+ naked print statements, with "ERROR" or such in their constructed output strings. I can search for them easily -- though I can just search for 'print', I think these are the only uses -- but I don't see any programmatic way of converting them, especially as the logging output formatting needs types (%s, %d) which are totally absent from the existing statements. (And it's python 2, so they are statements.)

I see a day of boring editing in my future.
mindstalk: (Default)
(Definition: solving a problem or writing code in an interview, in front of people, under time pressure. Video/online counts as well.)

I'm not sure I've gotten much better at this in the past year. It's one thing if the problem is one I already know and I just have to write code; I think I regurgitate under pressure fairly well. But if I have to really think about the problem then it feels like my IQ drops 20+ points under stress and being stared at. And when I come up with one idea for a solution, it's hard to try to think of others that might be better in some sense -- after all, the clock is ticking, and I have to start writing code! Not to mention the fun of having to write correct code without backup documentation or a compiler -- my memory prioritizes the stuff that's hard to look up, like What You Shouldn't Do, or Where Information Is, over the stuff that's trivial to look up at need.

As for actually being creative, that goes a lot better when I have time to relax, or step away from the problem and not consciously think about it. A lot of my best solutions just come to me when doing something else, or musing in bed or the shower, or walking.

Post prompted by Thursday's experience, where I was asked to construct a relative path between two directories, and I saw it as a tree problem and hared down making C++ trees with nodes and everything. At the end I asked how I'd done, and was told "well, this works, or would work with a bit more polishing of the code, but there's a simpler way with just lists." One minute after leaving the building I saw what that would be, and at home it took me 18 minutes to code and debug, which I e-mailed in, but apparently got no credit for that.

I did better Monday, with some basic linked list questions; that rejection was "you did well and seem a fine technologist, but not commercially experienced enough". Which is back to the Catch-22 of not being able to get experience because I'm not experienced enough.

On the flip side, Wednesday had a video interview where I had no idea how I did, but they want me to go to NYC for an onsite next week. So yay, progress... of course, that'll probably be more whiteboarding.
mindstalk: (angry sky)
I've started studying JavaScript a bit, as you might guess from recent posts. It's struck me as a mix of Perl, Python, and crack. It's got some neat things, especially in comparison to one language or the other. And it's got lots of... wtf things.

+: exponentiation operator; nested arrays and objects (dictionaries) (without Perl's references); first class functions and lambdas and closures, including nested functions (unlike Perl); Perl-like push/pop/shift/unshift array operators (but what's the performance?); consistent 'valueOf' and 'toString' methods; JSON; multiple kinds of for loops; Perl style labeled break and continue; some convenient conversions (but see below); nice Date methods.

-: oh boy.

* JSON stringifies nested structures nicely, but simple output doesn't: [1, [2,3]] outputs as [1, 2, 3].
* (object1 == object2) is always false, no matter the underlying values. This holds for arrays and Date objects too. Nothing like Python's structural equality, or even that of STL containers.
** But you can do *inequality* comparison: ([1,2,3] < [2,2,3]) == true.
* strings take negative indices a la Python, but arrays don't.
* there's a typeof operator, but it just says 'object' for arrays.
* "5"+2 == "52" (convert 2 to "2", concatenate), but "5"-2 == 3 (convert "5" to 5, subtract.) And no Python string multiplier like "a"*2 == "aa".
** As Avi noted, it gets even weirder given that the values could be hidden in a variable. a+2=="52", a-2==3
* [1,2]+[3,4] doesn't concatenate arrays, doesn't add by element, doesn't give a type error, but gives... "1,23,4" (turn arrays into strings, concatenate without delimiter.)

My friend Mark linked me to https://www.destroyallsoftware.com/talks/wat which gives some more:
* []+{} == [object Object]
Ok, addition is commutative, right?
{}+[] == 0
And for luck: {}+{} == NaN
As above, []+[] == ""
** Actually, on playing with typeof, I think those are actually all strings. "[object Object]", "0", "NaN". OTOH, {}+[4]+5 == 9 (but typeof string)
>>> 5+{}+[4]
5[object Object]4 // because of course it does

turns into

* All numbers are 64-bit floats; you can still bitshift them, but as integers, so 5.2 << 2 == 20. This makes more sense when I remembered that floats are weird, not integers+fractions, so a simple bitshift of the fraction wouldn't make sense.

Hoisting Shadows

2017-Feb-05, Sunday 11:35
mindstalk: (Default)
A bit after writing the previous post on shadowing variables in JavaScript, I came across this page on hoisting. JavaScript re-writes your code so that declarations but not initializations to the top of current scope, meaning the script or function[1]. So

var a=5;

turns into

var a;

Put that way, it's clear why the problem happens, if not why the language was designed this way.

Does the same thing happen in Python? Searching did not get clear results. I saw pages contrasting JavaScript with Python, which doesn't even *have* declarations. OTOH, the same behavior occurs. So... I dunno.

[1] JavaScipt does not have block scoping the way C and Perl[2] do; scopes are delimited by curly braces, but random curly braces do nothing to isolate variables.

{ a=5; }

will work just fine. :(

[2] As I was verifying this in Perl, I ran into behavior I'd forgotten. I'd thrown in "use strict;" but wasn't getting the errors I expected. Eventually I recalled that $a and $b have special meaning in Perl (I think for comparison functions), and I guess are pre-declared, and I was using $a a la the above code, so strict didn't complain about trying to access $a before assigning to it. Sigh.
mindstalk: (Default)
Months ago, Robbie had found this scoping problem in Python, which I reduced to essentials.

I've started finally learning JavaScript, and it has nicer lambdas than Python, and proper hiding of nested functions unlike Perl. But it has the same scope problem:

g1 = 12;
function func() {
  document.getElementById("demo").innerHTML = g1;

  var g1 = 5;


(I'm not including the HTML framework because DW/LJ would yell at me if I did.)

Output is 'undefined', rather than 12. As in Python, the local variable further down shadows the outer scope variable (doesn't matter if the "g1=12" has a 'var' before it) even for lines before the local variable.

As mentioned before, Perl has proper lexical scoping here (though not for nested functions.) I don't think I can even create similar code in Scheme/Lisp, where the scoping is explicit with parentheses. (There's 'define' but I think that makes a new global, and it didn't work.) In Ocaml I have

let g1="10";;

let func () =
  print_endline g1;
  let g1="cat" in


Which I suspect is as explicit as Lisp parentheses, in its own way; the print line is obviously outside the following "let ... in...".

fast array rotation

2017-Jan-28, Saturday 13:17
mindstalk: (Default)
A simple problem I'd never had occasion to think much before, before I saw a sample coding problem.

How to rotate the elements of an N-element array by k spaces? An obvious way is to shuffle it one space, k times, but that's slow, O(N*k). Faster, which I saw about as soon as I thought about performance, is to use a second array B, where B[(i+k)%N] = A[i]. But that takes O(N) extra space. Can you do better?

Yes, as I realized at 5am. For each element A[i], move it to A[(i+k)%N]. O(N) time, O(1) extra space. Can't beat that!

Except some mildly tricky math intervenes: the natural approach only works if N and k are relatively prime. A more general solution is

let g = gcd(N,k)
for i in [0,g)
  for j in [0, N/g)
    shuffle element k spaces

Despite looking like a double loop, it's still O(N), with g*N/g iterations.

I've also learned that C++ has a gcd function now. std::experimental::gcd, in the header experimental/numeric. C++17 moves it out of experimental but even g++ isn't doing that by default.

The really annoying this is that this is the sort of solution that comes naturally to me lying in bed, with little conscious effort, but that I'd likely fail to get in a whiteboard session or timed coding test, due to stress dropping my working IQ.

Applying A*

2017-Jan-17, Tuesday 12:55
mindstalk: (Default)
I've played a lot of freeciv and freecol over recent years. Both games let you order a unit to go to some square, and hopefully it takes the fastest route there. The 'world' is a square grid, of various terrain types and associated movement costs -- e.g. plains or desert take 1 move, but mountains take 3; roads and rivers take 1/3 no matter what terrain type they're laid over.

A* is this sweet magic algorithm for finding shortest paths in some graphs efficiently, vs. doing breadth-first search in all directions, but I was having trouble applying it mentally. I was using the common Manhattan distance heuristic, h((0,0),(x,y)) = x+y, and I wasn't getting good results: the algorithm would cheerfully march down a straight plains path to the goal, while ignoring a path that might step away and into mountains, but then ride a river to the goal much faster.

So I backed off, and thought about BFS. I realized that would work better if instead of naive BFS, enqueuing grid squares as you found them, instead you ranked them by total travel time so far. This is basically A* without a heuristic. Instead of exploring all paths N squares away, you'd explore all paths N moves away; it would still be radiating in all sorts of directions, but at least you'd find the shortest path to the goal.

Then I realized I'd been using the wrong heuristic; the right one should be the shortest possible journey. Or as WP says, "it never overestimates the actual cost to get to the nearest goal node." So the heuristic in this application has to consider rivers and roads, such that h() = (x+y)/3, not (x+y). This works much better: the plains march looks less attractive as it advances, converting cheap heuristic moves into actual plains moves, and the "mountain and away" move gets a chance to be considered.

Actually, units and roads can go diagonally, though rivers are rectilinear, so the proper heuristic is h((0,0),(x,y)) = max(x,y)/3.

Actually actually, infantry units only have one move, but are still guaranteed one square of movement per turn, so can march across mountains as easily as across plains; it's mounted units, with e.g. 4 move in freecol, that really care about base terrain type. Also fractional movement can be useful, e.g. I think a unit with 2/3 move left (after moving on river) can still move onto an adjacent plains.

booyeah, I'm good!

2016-Dec-31, Saturday 15:01
mindstalk: (squeee)
So some months ago I worked on an AVL tree in pure C. My main goal was to get self-balancing happening at all, to demystify that process, and I did that; getting the code to be robust or even asymptotically appropriate (I'm still calculating heights every time I need to, not caching and updating them) was far down the priority list. So, while I tried to be correct in handling memory, it would not have surprised me if subtle bugs still existed.

My past work experience didn't emphasize use of static analysis tools. I think I played with lint a bit on my own, but relying on it or valgrind wasn't a thing. I finally investigated them today, ending up with cppcheck for source code check, and valgrind for what it does. And ran them on my directory of C/C++ exericses, which as small exercises, could easily have had lots of problems.

...not so much! cppcheck found a few problems, most notably a missing return value I wasn't testing properly in linked list, and a realloc bug[1] in my memoized Fibonacci table, and an unset ID variable in a threading program, but nothing else, including in the AVL tree or the new C++ hash table.

valgrind found some memory leaks in linked list and AVL tree... only because the testing main() for each was creating structures and then not freeing them before exit, because who cares at that point? With a manual destroy added, both check out clean. Which the list really should, it's pretty simple, but I'm stoked about the AVL tree doing so, since it's complex and memory correctness wasn't a top priority. Seems like I do have good habits.

There's also clang's static analyzer, supposedly deeper than cppcheck, though much slower to use. It's happy with my codes too.

Of course, valgrind can only test what's executed, so more use might uncover more bugs, but still, I feel good.

[1] ptr = realloc (ptr, ...) sets ptr to NULL if the realloc fails, which means a memory leak if nothing else is pointing to what ptr had been pointing to. Not having used realloc much, plus also not worrying about errors in a tiny program, this didn't stand out to me.
mindstalk: (Default)
So, I've been working today on cloning my Ocaml chat program in C++. It's going slowly as I work on infrastructure, like exception-throwing wrappers to libc socket functions, and a RAII class for file descriptors.

In the process, I found something. I'd been using inet_pton() on the user provided server name, forgetting that I have to look it up first. I was getting away with it because testing has been local, client connect to localhost. My first code wasn't error checking inet_pton, but I got around to making the Inet_pton wrapper, which started throwing exceptions. inet_pton is supposed to return 1 on success (with the IP address written into a parameter), and 0 on invalid input. Turns out, given the string "localhost", it returns 0 but *also* writes in the local IP address.

That, or some weirder default was happening... yeah, on going the other way, I find the address (written or more likely just left over from the usual memset/bzero initialization) is which connects to localhost, just like the well-known Huh. Guess I should look up which looks messy an multi-purpose, but yeah, this seems one potential behavior of it, especially on a DHCP machine. OTOH, neither of the remote servers I have access to will connect to themselves via that address, so it's not reliable.
mindstalk: (Default)
One thing easy to miss about Python is how it can easily print anything, including lists of dictionaries of lists of objects.  C++, not so much.  So I wrote a simple template function to print a vector of printable types, and then one for deques.

template<class T>
void pr_vector(const vector <T> &v) {
    for (auto i : v)
        cout << i;
    cout << "\n";
template<class T>
void pr_deque(const deque <T> &v) {
    for (auto i : v)
        cout << i;
    cout << "\n";

Well, lots of duplication there.  I tried templating on the container type too, template <class C, class T>, but the resulting C <T> didn't work; it probably should with some syntax, but I don't know what, yet.

But I figured I'd try auto:

void pr_cont(const auto &v) {
    for (auto i : v)
        cout << i;
    cout << "\n";

...that subsumes both templates.  And lots of others.  vector, deque, list...  Of course, the values themselves have to be printable, it can't handle a vector<pair<int, string>> because there's no printing on pairs.  But still, for C, this is amazing!
mindstalk: (Default)
At my last onsite interview I got asked -- out of curiosity, not as interview question -- what was so great about dynamically typed languages. I struggled to explain -- everyone knows why they're cool, right? Code is fast and compact to write! It is known!

But I've been thinking about the question some more. Right now I'm thinking there's one thing they're genuinely better at that doesn't come up that often; for the rest, it's more a historical accident of feature packaging, i.e. what else has traditionally come along with dynamic typing that made coding convenient.

The one thing: dealing with some variable length list of arbitrary values that you deal with at run time. Stuff like printf(), or message-passing into a closure-based object. I think I mentioned building a closure object in ocaml, and you can do it, but you need a sum type for various method inputs, and another for output, and it just seems like more work. Yeah, you get type safety for the work, but it's still more work. Likewise C has long had an ability to make things like printf(), but it feels clunky.

Also, based on language features, it seems likely that Lisp-style macros work better with dynamic typing, but I can't defend the claim. eval() seems to go with dynamic typing too, not that it's used that much IME.

The other stuff that's tended to come with dynamic typing:

* direct running of code, without a manual compile step. So faster development.
* (Lisp and Python, but not Perl or Ruby AFAIK): a REPL so you don't have to even write code into a file to run it, but can play at a prompt.
* easily making product types (tuples, lists of arbitrary types.) In C you have to declare and define a struct.
* garbage collection or at least reference counting, so little need to worry about memory management. Corollary: if you stick with small or at least short-lived "scripts", then resources like file descriptors or file handles don't have to be worried about much either; that's a domain thing rather than a language thing.
* And what might be the biggest one, a convenient environment:

In C you need to define a main function and include at least one header (like stdio.h) to do anything. Java's entry point is even more verbose. By contrast, a one line print statement is a valid scripting language or Lisp program. Put another way, in C++ terms, simply starting Perl gets you roughly the equivalent of

#include cstdio
#include string
#include regex
#include deque
#include unordered_map
#include unistd
#include sys/socket.h
#include cstdlib

...and probably more. (I left out the angle brackets because Dreamwidth is annoying.)

plus any statements are assumed to be contents of main() and thus executed. Python makes you import a few of the above as modules, though adds , bigints, and complex numbers.

Now, none of this depends on dynamic typing! ocaml and ghci provide REPLs and ocaml can directly run source files. Type inference and compact notation makes creating typed tuples on the fly trivial (on the fly coding, at least; not at runtime). Java's GC feels like half the reason programmers originally ooh-ed over it. And while Perl's default environment is unmatched AFAIK, having strings and some sort of container types and IO (more than C/C++) without import or use statements is common to Java and the ML-ish languages.

But if your experience of languages is something like C++, Java, and Perl/Python, as was not uncommon, then there's a clear bias in code convenience and compactness. And even with the ML-ish languages, with type inference and REPL, there's still nothing that matches Perl for simply doing useful system stuff right away without imports. I see no reason one couldn't exist, it just doesn't. So it's easy to associate dynamic typing with "lets me get stuff done quickly and concisely".

mysteries of TCO

2016-Dec-11, Sunday 14:44
mindstalk: (Default)
So I've learned that gcc/g++ do tail call optimization with the -O2 option; supposedly MS Visual Studio and clang do it when they can as well. I have tested this:

this got long )

ocaml threads grrr

2016-Dec-07, Wednesday 22:19
mindstalk: (Default)
So for a coding test I had to make a chat client-server pair in ocaml. Done, based on a select loop. Well, mostly done, I didn't address blocking writes. I decided to redo the thing with threads, for the practice. Getting the basic functionality wasn't hard: synchronized queue for communication, simpler buffered socket readers than the first version had, and such.

Getting the threads to quit gracefully, though, that took a lot longer. I think part is due to ocaml's POSIX threads implentation: I'm told that "no one" uses it, they all use LWT (lightweight threads) instead. I can confirm some deficiencies: no Thread.detach functionality, for example, which was a big roadblock. And Thread.kill is in the API documentation, but when I tried to use it, got

Thread 2 killed on uncaught exception Invalid_argument("Thread.kill: not

Throwing an exception from the worker threads to notify the main program to join() them didn't work either, though I'm told I shouldn't expect that to work anyway. I think it works now, with a fair bit of polling -- not busy wait spinning, but reads timing out every 0.1 seconds to check for changes of state -- but that feels messy. As for whether it's simpler... well, it is fewer lines than the select() version, and should handle large (potentially blocking) messages better, so I guess so. Hmm, well, the core chat module is slightly shorter, but main is longer, and there's an extra SQueue module. Still, being more correct is good.

C++ lambdas

2016-Nov-30, Wednesday 12:42
mindstalk: (Default)
I was catching up on C++11 last night. Learned about lambdas, and started playing around with them. A neat addition. But did they provide full closure capability, like creating a snapshot of state that the function can privately modify? Looked like no, and I was all set to be disappointed.

But at a job interview yesterday, I learned about the mutable keyword, which was described as letting you modify const things, like deep in a const object hierarchy. "Why would you even do that?" was my first reaction, though I can vaguely imagine why, "for the same reason that Haskell's purity annoyed me."

So I remembered that, and figured I would try adding mutable to the lambda. Ka-ching!

#include <vector>
#include <iostream>
#include <algorithm>
using namespace std;

int main() {
    vector  v;
    for (int i=1; i<=10; i++) {
    int sum=0;
    for_each(begin(v), end(v), [sum](int n) mutable {sum+=n; cout << sum << endl;});
    cout << sum << endl;



Not that you need the external sum here, you could drop it and have [sum=0] for the lambda, but it illustrates the idea. Which isn't obscure, I saw it in the docs I was reading shortly after returning to them, but still, I found this on my own.

I've been applying to a bunch of C++ jobs more because of experience than because of any deep love for the language, but features like this and auto (type inference) and others from C++11/14 make it a lot more appealing.
mindstalk: (Default)
Having written a bunch of Python and Ocaml recently... I find I miss the {} blocks of C and Perl; they make it a lot easier (given vi's % key) to jump between the beginning and end of a code block, and to be sure that a code block *is* a code block. To be fair, Ocaml doesn't mind if you drop extra parentheses around things, and views them as equivalent to 'begin...end' which is apparently more idiomatic for when you need block markers... but my editor doesn't bounce between begin and end.

I guess the other idiom is "write short functions that fit on a page", but that doesn't always work, especially if writing a bunch of specialized utility functions nested within a bigger function.

Edit: new discovery! vim highlights unmatched 'end' keywords in Ocaml files! That's neat. Still can't bounce between begin and end, though.

Also: ocaml's "Syntax error" is profoundly unhelpful.
mindstalk: (escher)
As mentioned in the previous post, I found Perl has problems here:

[Zefiris:0] cat unbound.pl
sub outer {
    sub inner {
        print "inside inner\n";
    print "inside outer\n";


[Zefiris:0] perl unbound.pl
inside outer
inside inner

Perl's inner() enters the global scope, even from within outer()


[Zefiris:0] cat unbound.py
def outer():
    def inner():
        print("inside inner")

    print("inside outer")


[Zefiris:0] python unbound.py
inside outer
Traceback (most recent call last):
  File "unbound.py", line 8, in 
NameError: name 'inner' is not defined

Python does the scoping I expect, though not any compile time checking for undefined functions. Though Perl doesn't here, either, even with 'use strict;'. Well, I suppose it's possible that in these languages, outer() could have done something to define inner() before it was called.
mindstalk: (Default)
I've started learning Scala. Don't know how far I'll get unless someone's willing to hire me to learn it (not impossible, I see a few jobs saying just that.) So far it at least seems like a nice blend of functional programming (ML type inference branch) and practical utility (JVM base, willingness to let you get dirty and imperative.) Though I'm also getting whiffs of C++ complexity... possibly unavoidable if going for that amount of flexibility in both coding style and performance.

It reminded me that I have trouble respecting a language that doesn't have first class functions, which got me wondering about the languages I know well.

C: no.

C++: no, directly, though you can make function objects with classes and operator(). Bit verbose, though. C++11 made improvements.

Java: I don't know, actually. Searching... looks like not really, though Java 8 made improvements.

Perl: Wikipedia says yes, though you hand around references to subroutines, and I found recently that nested functions aren't actually bound to their scope: function 1 defined in function 2 is still callable outside function 2. Wikipedia does say that nested named functions are in Perl 6, vs. the Perl 5 that AFAIK everyone still uses, if they use Perl at all.

Python: pretty much... but as Robbie and I found recently, Python doesn't do proper lexical scoping. And WP says it doesn't really do anonymous nested functions; Guido seems to have a reluctance to embrace FP. (Also see reduce being exiled to a functools library, not that Perl is better, to my surprise.)

So, wow, none of them.

Ruby I know basically nothing of, but WP says no. JavaScript I don't know enough of, though WP says yes. It also says mostly yes for Rust and Go. I'm sad D isn't on the table.

[Feb 2017 edit: JavaScript has better lambdas than Python, but has the same weird scoping problem.]
mindstalk: (juggleface)
I spent longer than I expected figuring out the problem described in Robbie's Shadows of Python post. I kept revising my e-mail to him: "You're wrong because of X. No, I mean Y. Um, Z?" Eventually I got it, I think, and distilled it to a simpler piece of contrasting code:


def f():

def g():

>>> f()
>>> g()
Traceback (most recent call last):
  File "", line 1, in 
  File "r.py", line 8, in g
UnboundLocalError: local variable 'x' referenced before assignment

Simply because we will (or even might, it works if the assignment is in an if clause) assign to x in the future, it becomes a local variable for the scope of the whole function, blocking access to the outer scope definition. This compile-time annoyance from the same language that has no equivalent to Perl's 'use strict;' to stop you from using misspelled variables. Thanks, Python.

Checking against the other big scripting language, Perl:


sub f {
    print $x, "\n";
sub g {
    print $x, "\n";


[Zefiris:0] perl r.pl

Happy access to a global variable. If you use 'my $x=2;' in g(), then the change to the global-level variable is avoided, of course.

Python would let you use 'global x' in g(), or 'nonlocal x' for intermediate scopes, but that gives you full access to x. There's no "I'm going to create a new local variable from this point on, while having used an outer variable earlier in the function." And this doesn't work:

def h():

You can do that in Perl, with or without 'my', getting expected behavior in each case.

This is part of the reason I'm willing to work in C++ again; yes, it's annoying, but I've seen no language that doesn't have mindbendingly stupid shit in it somewhere.

Edit: someday I'll remember I want the pre tag, not the code tag, for preserving indentation.
mindstalk: (Default)
So there are the left and right Riemann sums, and the much better midpoint Riemann sum. Recently I wondered about integrals that took the average of the endpoint of each strip: (f(x)+f(x+step))/2. The thing is that most of the halves combine, so you can just add up f(x) for each whole step, plus f(start)/2 and f(end)/2. How's that compare?

Much better than the left and right sums, but not quite as good as the standard midpoint one. E.g. the integrals of sin(x) over 0 to pi/2 are

left_: 0.9992143962198378
right: 1.0007851925466327
mid__: 1.0000001028083885
mine_: 0.9999997943832352

All the other integrals I tried show a similar pattern: x, x^2, x^3, 1/x, e(x)... the two are close, but midpoint is just a bit closer to the correct answer. Or looked at another way, has close to 1/2 the error... hmm, that factor is consistent too. I should look into that.

Or: if I just recalled my terminology correctly, midpoint Riemann sums have half the error of trapezoidal Riemann sums. Which is not what I would have expected.

AVL tree

2016-Jul-10, Sunday 20:44
mindstalk: (YoukoYouma)
For a long time, self-balancing trees had seemed like magic to me. Earlier this year I put my mind to figuring them out, and on my own came up with the idea of a weight-balanced tree. With a bit of peeking, I then moved on to a height-balanced tree, pretty much an AVL tree. Then I started coding one in pure C, for Real Programmer (TM) cred.

Stage one, achieved some months ago, fulfilled the basic criterion: you could feed it an increasing sequence of keys, and get a beautifully balanced tree back out. Woo! It had problems, though. Most obviously, I concentrated on the balancing part first, so got heights on the fly via a function rather than from cached values. Not exactly computationally efficient. Less obviously, I had a clever-seeming tree-rotation function I'd come up with: "to rotate right: insert root value to the right, copy rightmost left value into root, delete same value from the left tree." Elegant, but O(log(N)), when there are actually O(1) rotations available.

I went back to the project this evening, and got the faster rotations working. It took longer than I expected, because when you do rotations that way, there's complexity: 1->2->3 can be rotated left, but 1->3->2 needs a rotation right (on 3->2) then left, to come out balanced. Turns out that my slower rotation did the correct thing in both cases, so simply replacing my rotate functions wasn't working until I added more logic. Grumble grumble.

I still don't have heights cached, and the whole thing feels like an unholy mess of pointer manipulations (especially with parent points in the tree.) But, progress!

April 2017

2 345678
9 10 11 1213 1415
161718192021 22
232425262728 29

Expand Cut Tags

No cut tags

Style Credit