Rich and Strange Aeons

bugbear debugging

2022-06-22T01:01:46Z

It is commonly believed that when dealing with an ornery bug, just explaining the problem out loud to someone else often grants insight. "--oh, I see it now, sorry to bother you." So much so that, supposedly, one senior engineer required juniors to explain their problem first to a teddy bear, the 'bug bear', and only if that failed could they bother him.

I haven't actually used this much. Mostly been on my own, and my default approach to bugs is more logging. But I was playing with TypeScript and Koa today, and had a problem that didn't work for: the main code worked fine, but if authentication failed, the server seemed to never respond to the caller.

I was about to find a senior active on Slack, to ask for help, when I thought of the bugbear. I did not have a teddy bear, and I don't think I even said anything out loud, but I did go line by line, trying to mentally verbalize what was happening and what the point was. And yes, found something suspicious: "ctx.response = 401", which looks like it's trying to set a status code but is clobbering the response object. And yes, "cts.response.status = 401" fixed it. Voila! No seniors disturbed.

comments

python JSON datetime at last

2021-08-26T16:47:56Z

content warning: geeky computer stuff

At work I have a common flow of Python code querying MySQL and dumping results into JSON. This works great except our tables tend to have columns for create and modify dates. And Python's json library won't convert datetime types to string, so dumping the result of "SELECT * FROM" errors. And I've had bits of ad hoc code to remote datetime values, then a little function for it, but it was still a pain. Especially today, when for stupid VPC reasons I'm writing a simple generic SQL service.

In annoyance I tried searching again and found, buried in the middle of a stackoverflow with answers about subclassing and shit:

json.dumps(thing, default=str)

Bam, done. The function passed to 'default' gets invoked on anything that errors at first. And in Python 'str' works on pretty much anything.

If I cared about the exact format I could write a function that inspected type and converted dates (or just assume datetime is the only cause of errors so just invoke a method when triggered), but str is fine for now.

Soooo simple. Thanks, python, and stackoverflow person who gave the simplest answer.

comments

more Javascript

2020-10-22T21:34:12Z

New Job has me learning JS and Node.js. I have learned that some of my earlier complaints (see tag) have been improved by later developments (or ones extant at the time that I hadn't found.) Like 'var' variables have function scope (useful concept; I think it applies to Python), but you can get block scope (One True Way) by using 'let' instead. I don't think I knew that simply assigning to an undeclared variable creates a global (dear gods) but JS borrowed 'use strict'; from Perl, which shuts that off.

JS strikes me as a shitty half-assed core language which is trying to grow its way into respectability. Unfortunately it has 25 years of shitty web code to have to be backwardly compatible with.

No wonder there are multiple languages to use instead, that compile down into JS.

Today I accidentally discovered another WTF.

> l=[1,2,3]
[ 1, 2, 3 ]
> for (v in l) console.log(typeof(v));
string
string
string
undefined
> for (v in l) console.log(v);
0
1
2

So in a way this makes sense. "for in" iterates over the properties/keys of an object, which for an Object (dictionary-ish) are names. The properties of an Array are numbers, but in a 'for in' context, they could be strings.

OTOH, stripped of rationalization: the indices of an array are numeric, and yet they're strings in this iteration case. ffffuuuuu.

Of course, JS coerces between numbers and strings at least as fluidly as Perl does.

I found this behavior by accident, I was printing 'v[0]' thinking I was using 'for of" (which iterates over the values of an Array) when I wasn't, and I got numbers rather than letters or 'undefined', which is what you get if you try

> n=4
4
> n[0]
undefined

comments

watermarking ebooks

2020-08-01T08:23:31Z

I've bought RPG PDFs. They're never DRMed, but from DriveThruRPG they're allegedly watermarked.

I've bought DRM-free ebooks. Watermarking isn't mentioned, but I wonder if you could. I assume you could stow an extra blob in an EPUB file. What would you stow?

{  
  'customer': "Jane Doe",
  'title': "This book is awesome",
  'salt': "DEIFEFIGDBYDEADBEEF"
}

Encrypt that with a publisher secret key, stick it in the EPUB.

What does that get you? If someone is lazy and uploads their book to public places, you know where it came from. (You don't know for sure they did it -- maybe a hacker plundered their hard drive.) The encryption means someone can't forge an upload, pretending it came from someone else. The salt complicates known-plaintext attacks trying to recover the secret key.

Most ebooks have at least one image, the cover. You could use steganography to stash the blob there; it's encrypted text so should fit. Given that it's an encrypted blob, people may have trouble knowing whether you did anything at all, until you reveal your process.

After which, non-lazy pirates could delete the blob, or scramble where you're hiding the cover image. The watermark is hardly foolproof. But it's cheap.

comments

JS surprise

2020-07-15T02:39:41Z

I was looking at my 'programming' tag posts and found a bunch of 3 years ago, complaining about odd features of JavaScript. I kind of remember them now, but if you'd asked me yesterday if I'd ever taught myself JavaScript I would have said 'no', not remembering doing so.

I was prompted by an interviewer having looked at this blog. I wonder what he thought about my saying I hadn't taught myself JS...

comments

covid new case rates by country

2020-06-27T05:55:56Z

I've started following https://www.worldometers.info/coronavirus/#countries

which gives daily new cases per country, and total cases per population, but not new cases per population. I wanted to find that out, and more explicitly than doing math in my head. So I wrote a web scraper. I have concluded that writing a scraper should be a last resort, it was a shitty experience. But I got what I wanted: https://mindstalk.net/covid_scrape/new.html

This is a one-day snapshot, not regularly updated. A few countries:

US: 143 new cases per million people.
Sweden: 30
Iran: 30
UK: 20
Austria: 5
Australia: 1.45
Japan: 0.7
Hong Kong: 0.4
New Zealand: 0.2

Granted those last two have single digit new cases, so could jump around from an extra person getting sick. Japan had 87 new cases, that's a significant number. The US, well, had almost 3x as many new cases in one day as Japan has had total cases...

I'd like to leave the country but I don't think there's anywhere that would let me in. Trapped in the Plague Zone.

Edit: a friend found https://91-divoc.com/pages/covid-visualization/
lower half has a graph of this stuff, though it seems to leave out the really low countries. I haven't played with it much yet.

See, this is why I don't have a portfolio of big programming projects; almost anything I think of has already been done.

Edit 2: yeah, the presentation isn't exactly what I want -- I don't see an option for table data. OTOH it shows a 1-week average, which is more robust than daily results, though can also be slow to show exponential takeoff.

Last 7 days average 102 for the US, with 136 as last point; Japan average 0.6. NZ 0.4. I can't even get it to show me data points for Hong Kong. Sweden 98! -- I must have snapped a really low data, or a partial report.

comments

my first ever Python bug

2020-06-14T04:01:28Z

Years and years ago, when Perl was still the dominant scripting language, with Python and Ruby nipping at its heels, and CPAN was where you would look for cool libraries, I wrote my first Python program. I don't recall what it did, whether it was meant to do something useful or just do something like "10 9 1 blastoff" as a test, but I do know that it looped based on a command line parameter. And that it didn't work, running nigh-forever when it had a parameter of '20' and should stop after 20 times. It took me like 20 minutes to figure out what was wrong.

I can replicate the basic problem:

import sys
count = 0
while count < sys.argv[1]:
    print(count)
    count += 1
else:
    print("Blastoff!")

And the behavior still replicates -- if you run it in Python 2. In Python 3, you'll get

TypeError: '<' not supported between instances of 'int' and 'str'

Because sys.argv is a list of strings. Thing is, Perl is a very accommodating language; the equivalent code in Perl will just work, as I well knew. Or rather, Perl has separate comparison operators for numeric and string values, so this code, using '<', would quietly try to convert sys.argv[1] into a number for comparison with 'count'. (If you wanted lexicographic comparison, you would use 'count lt sys.argv[1]', except of course it would be more like '$count lt $argv[1]' or something; anyway, there the value of $count would be converted to a string for comparison.)

I remember being very put out. Though I did go write a 'real' script in Python, and showed it to my boss, who was able to read and understand it with no Python knowledge. Go go executable pseudocode.

These days, on the rare occasions I poke at Perl, I'm more likely to be annoyed by the lack of a REPL, or easy ways to print nested data structures, or the quirky way function arguments are handled. Perl still does some things better ('use strict;') but I don't miss it.

But dang, that bug.

comments

wtf Python

2020-04-13T03:24:25Z

I was writing some code, in a file called 'code.py'. I tried adding doctests and running them; this would hang, or complain about circular imports. It seems that 'import doctest' will run a file in the directory called 'code.py' -- e.g. make a 'code.py' file like

while True:
    pass

start python, import doctest, and watch it hang.

This seems poor, and undocumented. Probably I should file a bug later.

I guess I have a new candidate for interview questions about problem solving.

comments

Noldor Monte Carlo: CORRECTION

2020-04-10T01:01:58Z

If you dive deep into Tolkien fandom, a recurring question is "How many elves were there at any time?" Now, Tolkien cared a lot about languages and moon phases, but his attitude toward demographics or non-human food production would be an insult to good handwaving, so this is hard to answer well. The one hard number is that Turgon brought 10,000 troops to the Battle of Unnumbered Tears. We also have a couple of proportions, and then a whole mass of "the greater part".

Also, in the older Fall of Gondolin, 12 named companies muster to the defense of Gondolin; a company of around 800 people would give 9600 defenders. Under the circumstances you'd think *everyone* who could fight would be...

In the past I've just made a range of estimates of how 10,000 relates to the population of Gondolin and applied averages to the rest, but I thought I would try estimating all the ranges. At which point a Monte Carlo simulation is more useful than just multiplying minima and maxima. Since I wanted the answer ASAP I did it in straight Python, not R or Octave or some library. Being lazy, I used uniform distribution for the ranges.

Kind of my first non-class Monte Carlo? Apart from some old C programs that were simply simulating dice outcomes like 3d6 and "4d6, top 3" and such.

Edit: whoops! I found a bad error in my original code. If I'm trying to go from 10,000 Noldor+Sindar soldiers to a "Noldor in Valinor" population, I need to *multiply* by the fraction of Noldor in Gondolin, not divide!

Instead of pasting code I'll just link: https://mindstalk.net/noldor.py

The first thing I learned is that when you're doing 9 divisions, the small-divisor outliers meant that I needed lot more bins than I thought at first.

95% likely over 50,000, 95% likely under 1 million; 90% likely over 70,000, 90% likely under 680,000. 90% confidence interval is 50,000-1 million, 80% confidence is 70,000-680,000.

Possible range is 8000 -- definitely too small -- to almost 9 million.

comments

Zoom bombing

2020-04-06T21:11:44Z

One downside of the convention was some assholes 'bombing' the performance. I wondered how it was so easy to find us; web scraping or something? Probably not!

https://krebsonsecurity.com/2020/04/war-dialing-tool-exposes-zooms-password-problems/

The unique part of a Zoom ID is just a 9 to 11 digit number, 9 in my experience, so you can just try random URLs until you get a hit. This isn't like cracking a specific password, this is like the birthday paradox, where you keep trying until you hit *something*. And with 200 million daily users now, the odds of a billion numbers are looking pretty good.

So there are fixes like requiring a password, or Waiting Rooms. Why doesn't Zoom just increase the ID size? Someone said that they want URLs that can be easily read out over the phone (this actually happened to me on one job interview.) I don't really see how 16 digits would be much worse than 9 or 11. But I note that if you add in lower case letters (so you don't need to specify case), and even drop a few characters as too similar (1 and l), you can get 1e15 possibilities in 10 characters. That's 1 million times sparser than 9 digits. And the war dialers can't just throw GPUs at the task like password testing, they need to make network connections to zoom.us to try URLs.

Relatedly, the odds of guessing a US Social Security number in use by someone are pretty good. With a few more digits, they would not be. (Your odds of guessing an active credit card number, 16 digits, are probably not good, though the space is limited by internal structure or checksums in ways I don't know.)

Also relatedly, when generating strings of random letters for public exposure, there's the chance of creating funny or offensive strings, like JkmCatp0op. One idea I saw was to not use vowels, though that still leaves the 'vowels' of 0, 4, and 1 (O, A, I). If you limit yourself to lower case consonants plus 7 numbers, that's 28 characters, and 10 of them have 2e14 possibilities. Adding one character puts you back at 8e15, 12 at 2e17.

comments

computer detective work

2020-03-25T20:52:03Z

My phone was slow, so I rebooted it. And logged in again to my shell server. And got an error message about now finding 'zsh-syntax-highlighting/' I hadn't seen before. And my shell didn't have highlighting. I connected to the server's screen session and confirmed that yes, old shells have highlighting. Something's definitely wrong!

'ls', and I find 'zsh-syntnax/' [sic]. WTF? Why is that? Did something go horribly wrong with my last OS upgrade? Have I been hacked and pranked?

Fortunately I keep a really deep shell history, and found

'mv zsh-syntax-highlighting zsh-syntnax' from 9 days ago.

That answers how, but not who or why; I don't remember doing that!

But the history provides context, several other 'mv' commands renaming long filenames to shorter ones. Aha, I do remember doing that! I wanted 'ls' output to fit into more columns. So I did that to this directory without considering that some part of my shell configuration was actually using it. Whoops! And somewhat excusable, once my configuration is set up I don't have to twiddle with it much, I have aspects that are 25+ years old yet contributing positively to my experience.

One quick rename and edit of .zshrc to use the new name, and all is well again.

comments

self-driving car skepticism gloat

2019-12-31T22:29:46Z

When self-driving cars first started getting talked about, like 10 years ago, many people were enthusiastic and anticipatory. I was skeptical, because as someone who walks around dense cities, driving safely and effectively in such felt like a human-complete AI problem, needing theory of mind, social interaction, and a large amount of adaptation to unforeseen circumstances.

Also because while in some things like chess or Go, rather dumb computers beat humans through powerful search, a more common AI pattern is that a fairly simple system can get 60-90% of human performance, but then stalls despite a lot of effort. Which is fine when you're making models for targeting direct mailing, and poorer performance can be balanced by much faster turnaround time and it's just moderate amounts of money at stake anyway. Less fine when even a missing 1% of performance may mean people die, or alternatively that traffic is frozen as cars can't figure out how to safely push through busy streets.

(The direct mailing example is from my first full-time job; we could build a decision tree, to predict response rates to a direct mailing, that was said to be 60% of a hand-crafted model but took a few hours instead of a few months to create. A machine translation course in grad school included various systems that could do 60-95% as well as humans, on fairly narrow word tests, but improving that was Hard. Statistical translation, rule-based, hybrid, all stalled.)

Basically an application of the Pareto principle: 20% of the work can get you 80% of the performance. Except it might be more like 1% of the work gets you 80% of the performance; since we don't *have* human-equivalent AI in most of these domains, we can't even say how much work it actually takes.

Early articles were along the lines of "we're making lots of progress! (but can't drive in the rain or snow and are tested mostly in low-density sunlight)", which for some people sounded like "we're almost there but for a bit more work" but to me sounded like "we're already spending years on the *easy* stuff, imagine what the hard stuff will be like."

More recent articles have been more like "wow, this is harder than we thought", with even the executives in charge of developing and selling this stuff saying like "thirty years away" or "never" or "far in the future", or "decades away".

Singapore reportedly has deployed them, as someone on Facebook likes to keep saying, but a friend there observed various caveats: 10 MPH, a bounded area, not mixed with other cars, safety driver, and attendants trying to shoo pedestrians out of the way. Also see. And this is the state of the art!

So, "ha ha!"

I'll also include a FB thread I made two years ago about predictions, and include just one example of receding predictions:

2014: Volvo promises fully self-driving cars by 2017, 3 years later.
2017: Volvo promises partial self-driving cars by 2021, 4 years later.

comments

Python is annoying

2019-02-07T14:53:14Z

Our code works with binary data (hashes/digest) and hexstring representations of such data, a lot. It was written in Python 2, when everything was a string, but some strings were "beef" and some were "'\xbe\xef'"

Then we converted to Python 3, which introduced the 'bytes' type for binary data, and Unicode strings everywhere, which led to some type problems I had figured out, but a recent debugging session revealed I had to think about it some more. Basically we can now have a hexstring "beef", the bytes object b'\xbe\xef' described by that hexstring... and the bytes b"beef" which is the UTF-8 encoding of the string.

In particular, the function binascii.hexlify (aka binascii.b2a_hex) which we used a lot, changed what it returned.

Python 2:
>>> binascii.a2b_hex("beef")
'\xbe\xef'
>>> binascii.hexlify(_)
'beef'

Python 3:
>>> binascii.a2b_hex("beef")
b'\xbe\xef'
>>> binascii.hexlify(_)
b'beef'

vs.
>>> binascii.a2b_hex("beef")
b'\xbe\xef'
>>> _.hex()
'beef'

I found it easy to assume that if one of our functions was returning b"beef" and the other "beef" that they were on the same page, when really, not.

Bunch of examples in the cut.

( Grah Python )

comments

logging thoughts

2018-05-16T20:33:09Z

One of the first things I did for work was implementing a better logging system, based around the Python logging library, which introduced me to the world of DEBUG, WARNING, INFO, ERROR. Useful, but something felt off from my expectations. I think I've finally pinned that down.

At my old job, which also involved multiple processes talking to each other over the network, I created my own logging system, though I'm not sure I called it that. Really it was an evolution of printf debugging, with a way to control how much output there was. I had DEBUG macros, controlled by a buglevel variable, which more precisely was a verbosity level. 0 meant nothing, 1 meant a small amount of high level output like entering functions, 4 printed from inside nested loops. I don't remember what 2 and 3 were, and it was more ad hoc than formally designed, but it seems reasonable that 2 would be top level statements after the initial entry statement, while 3 would from within loops. E.g.

int some_function(type param) {
  DEBUG(1, "Entering some_function");
  do sometthing
  DEBUG(2, "I got something %s", something);
  while(condition( {
    DEBUG(3, "loop variables");
    for (some other condition) {
      DEBUG(4, "really verbose");
    ...
}

What about INFO, ERROR, etc. levels? I'm guessing those were just printfs -- I had no concept that you'd ever not want to see such output, nor did I have anything like the Python logging flexibility where you can send logging to multiple outputs. So my homegrown macros were entirely for levels of DEBUG output, and really provided a trace record with varying levels of resolution (and performance hit[1].)

And I feel weird because what's become common, in Python and Apache and such, is just a single level of DEBUG.

I've already tried to address that; our logging module contains some custom methods and levels. HITRACE for my old DEBUG 1, function entry (and sometimes exit). And TRACE for "even more verbose than DEBUG", but I hadn't pinned down that that meant; now it seems like I'd have to add one more level to really match up to my old system. And then teach other people how to use it. Feh...

I've also wondered if there's a useful distinction between "statements for tracing in general" and "statements for debugging this problem in particular."

Then of course there's exactly how to split WARNING, ERROR, and CRITICAL, and how to report user errors. Currently I mostly use ERROR for "there's a problem with our code" and CRITICAL for "there's a problem with the environment" like disk full or network connection failure. ERROR gets programmers out of bed, CRITICAL gets Ops out of bed. FATAL is CRITICAL plus a built-in "we die right now." WARNING is "I'm not sure if this is a problem" or "a human should look at this soon". Errors because of bad *user* input, I'm not sure, maybe just DEBUG? Don't want bad users filling up production logs.

[1] Once Ops came to me and said "the new release is really slow!" It turned they'd pushed the code to production with a configured buglevel=4, not 1 or 0 as I instructed. (The macro was such that all that code could have been compiled away, too, but we never bothered; I think we also liked the idea that we could get more output if we needed to.)

comments

refactoring $#&# Python

2018-03-23T02:04:09Z

Once again, into the Pyth circle of hell. This time trying to convert our code from Python 2 to 3. 2to3 does much fo the grunt work of print() and 'from . import', though it didn't always get the latter right. But instead of string and unicode, we now have string and bytes value types, and a strong barrier between them. And of course no static compiler to find up front when types might be mixed up. And yes, we're weak on unit tests, especially tests that exercise all possible code paths. Things seem to mostly work now, but will they under all conditions? Who knows?

comments

joining the Rust cult

2017-10-12T20:06:19Z

So, programming languages. Years ago I'd run across D, aiming to be a better C++, or even a better Python -- enabling high level coding without giving up type safety or speed. It had lots of features, including contracts from Eiffel, and being functional friendly; overall it seemed like a language I'd design. It was cool, but I didn't code actively enough to bother learning it.

Last weekend I did start, but also started on the new cult kid, Rust. I went in parallel a bit, but Rust has pulled ahead.

D is not very exciting or revolutionary; it's like a better done kitchen sink. There's a lot of value in that, and in what it's trying to be, and AFAICT the language itself is pretty good. I've seen people criticize the toolchain, and uptake has been rather modest -- though the forums do see daily activity, at least.

Rust, OTOH, is trying to be revolutionary: compiler-enforced memory safety, meaning not just no memory leaks but "fearless concurrency", where the compiler would enforce no data races in multithread code unless you did something specifically unsafe. "No leaks" doesn't sound exciting unless you're an engineer who's had to worry about them; "safe concurrency" is potentially sexy to lots of people.

On diving in, I noticed something else sexy to me: it's like the unholy love child of C and ML and other functional languages; one blog post even called it a functional language in C clothing. Enums/sum types/algebraic data types/tagged unions, which I quickly fell in love with while playing with Ocaml; 'traits' or type classes a la Haskell, which serve for generics, dynanmic dispatch, and overloading, all with one coherent mechanism; hygienic macros a la Scheme, something I thought I'd never get to play with seriously unless I got into Clojure. Also, supposedly, an easy and powerful package system, and a minor taste of mine, nested comments.

comments

aliasing fi

2017-05-13T11:21:04Z

I think I mentioned not long ago that I found I'd been aliasing fi=finger which breaks if loops in my shell, and marveled that it took so long to find that. It makes more sense to me now.

1) Yeah, I didn't script much.
2) When I did do an ad hoc script at the prompt, it was a for loop.
3) Scripts you get are mostly bash scripts.
4) Even an explicitly written zsh script wouldn't have a problem: my aliases are loaded by .zshrc, which is loaded by interactive shells, i.e. not script shells[1].
5) Only when I tried pasting an if loop into a *function*, also loaded by .zshrc after my aliases, did a problem occur. Possibly it had occurred before and I simply gave up on some unnecessary function that mysteriously didn't work.

[1] This also sheds light on past failures to ssh in somewhere and invoke a function directly: not an interactive shell, so no functions loaded. When I try 'ssh ... "zsh -i script_invoking_function"', it works. So if I want remote function invocation, I'll need to use -i or to load functions outside of .zshrc.

comments

why zsh?

2017-05-12T01:45:46Z

When I got to Caltech and discovered Unix, the default shell on the cluster was csh, with more user features than the sh at the time, but not a lot. If you got the lowdown, you could switch to the far more useful tcsh, but the sysadmin refused to make that the default for resource reasons. There was also ksh, but I never heard people talking about it.

A few years later zsh came along, and the more techie undergraduate cluster largely switched to it en masse. It was even made the default shell there.

Out in the greater world, and in the era of Linux, bash seems the default shell, pretty much incorporating much of what was good about tcsh and ksh, and also displacing any more primitive sh. zsh still is an exotic thing even Linux people may not have heard of... which is a shame, because it's so much better.

Granted, it's also way more complicated, and a lot of its cooler features have to be turned on. If you want a shell that's full-featured out of the box, there's the even more obscure 'fish'.

And bash can approach, though not catch up to zsh, with the "bash-completion" package.

But what's so cool? Well, tab-completion can be far more powerful, working not just on filenames, but environment or shell variables, command options, man pages, process numbers, and git branches. It can also go to a menu mode, for scrolling around lots of options.

(But fish will do the magic of parsing man pages on the fly to display command options. :O )

It's easy to have your prompt display the exit code of the last command, something I find pretty useful; doing that in bash requires writing your own functions.

Likewise, you can easily have sophisticated right-hand prompts.

**/ recursive directory listing, though that is something you can turn on in bash. (shopt -s globstar)

Even more extended globbing, including excluding patterns, or selecting files based on modification time within a window and other criteria.

Redirection tricks, some of which reduce the need for tee. |& pipes stdout and stderr to a program such as less. >! can clobber files even when you have noclobber on.

I'd anticipated sticking to bash for scripting, for better standards compliance/portability, but I realized that I'm not writing a package script, just in-house tools. And zsh scripting has a lot going for it. Arrays just work, while bash arrays were described Sunday as the worst of any language. I'm using the mod time glob mentioned above.

zsh can share history between shells. I find this useful and annoying -- useful now for storing and reusing commands, but also destroys the individual history of a particular window. Oh well. An impressive application was when I found myself reusing history across *machines*, where my home dir was NFS mounted.

"Named directories" mean I can collapse long pathnames in my prompt, e.g. Main/wsgi-scripts becomes just ~WS

Probably a lot more, but those come to mind.

That said, there is one odd lacuna in zsh. bash has --rc-file, to tell it to read in a custom rc (like bashrc) file after everything else. zsh... doesn't. And sometimes I would like to start a shell with a custom additional environment, e.g. from ssh.

comments

Sometimes doing things right does help

2017-04-02T01:52:13Z

The work system has two kinds of debug/logging statements.

One is a class with the standard methods (trace through critical) which are just print statements that add "DEBUG" or "INFO" as appropriate. There's not even a hint of output throttling. But it is a class, and so I can rip its guts out and replace them with calls to Python's logging module, and it works.

Then there's the 500+ naked print statements, with "ERROR" or such in their constructed output strings. I can search for them easily -- though I can just search for 'print', I think these are the only uses -- but I don't see any programmatic way of converting them, especially as the logging output formatting needs types (%s, %d) which are totally absent from the existing statements. (And it's python 2, so they are statements.)

I see a day of boring editing in my future.

comments

I hate whiteboarding

2017-02-11T12:29:25Z

(Definition: solving a problem or writing code in an interview, in front of people, under time pressure. Video/online counts as well.)

I'm not sure I've gotten much better at this in the past year. It's one thing if the problem is one I already know and I just have to write code; I think I regurgitate under pressure fairly well. But if I have to really think about the problem then it feels like my IQ drops 20+ points under stress and being stared at. And when I come up with one idea for a solution, it's hard to try to think of others that might be better in some sense -- after all, the clock is ticking, and I have to start writing code! Not to mention the fun of having to write correct code without backup documentation or a compiler -- my memory prioritizes the stuff that's hard to look up, like What You Shouldn't Do, or Where Information Is, over the stuff that's trivial to look up at need.

As for actually being creative, that goes a lot better when I have time to relax, or step away from the problem and not consciously think about it. A lot of my best solutions just come to me when doing something else, or musing in bed or the shower, or walking.

Post prompted by Thursday's experience, where I was asked to construct a relative path between two directories, and I saw it as a tree problem and hared down making C++ trees with nodes and everything. At the end I asked how I'd done, and was told "well, this works, or would work with a bit more polishing of the code, but there's a simpler way with just lists." One minute after leaving the building I saw what that would be, and at home it took me 18 minutes to code and debug, which I e-mailed in, but apparently got no credit for that.

I did better Monday, with some basic linked list questions; that rejection was "you did well and seem a fine technologist, but not commercially experienced enough". Which is back to the Catch-22 of not being able to get experience because I'm not experienced enough.

On the flip side, Wednesday had a video interview where I had no idea how I did, but they want me to go to NYC for an onsite next week. So yay, progress... of course, that'll probably be more whiteboarding.

comments

JavaScript quirks

2017-02-05T17:35:25Z

I've started studying JavaScript a bit, as you might guess from recent posts. It's struck me as a mix of Perl, Python, and crack. It's got some neat things, especially in comparison to one language or the other. And it's got lots of... wtf things.

+: exponentiation operator; nested arrays and objects (dictionaries) (without Perl's references); first class functions and lambdas and closures, including nested functions (unlike Perl); Perl-like push/pop/shift/unshift array operators (but what's the performance?); consistent 'valueOf' and 'toString' methods; JSON; multiple kinds of for loops; Perl style labeled break and continue; some convenient conversions (but see below); nice Date methods.

-: oh boy.

* JSON stringifies nested structures nicely, but simple output doesn't: [1, [2,3]] outputs as [1, 2, 3].
* (object1 == object2) is always false, no matter the underlying values. This holds for arrays and Date objects too. Nothing like Python's structural equality, or even that of STL containers.
** But you can do *inequality* comparison: ([1,2,3] < [2,2,3]) == true.
* strings take negative indices a la Python, but arrays don't. [2020 edit: what was I thinking? string.slice can take negative indices, but character access via [] doesn't. And array.slice also takes negative.]
* there's a typeof operator, but it just says 'object' for arrays.
* "5"+2 == "52" (convert 2 to "2", concatenate), but "5"-2 == 3 (convert "5" to 5, subtract.) And no Python string multiplier like "a"*2 == "aa".
** As Avi noted, it gets even weirder given that the values could be hidden in a variable. a+2=="52", a-2==3
* [1,2]+[3,4] doesn't concatenate arrays, doesn't add by element, doesn't give a type error, but gives... "1,23,4" (turn arrays into strings, concatenate without delimiter.)

My friend Mark linked me to https://www.destroyallsoftware.com/talks/wat which gives some more:
* []+{} == [object Object]
Ok, addition is commutative, right?
{}+[] == 0
And for luck: {}+{} == NaN
As above, []+[] == ""
** Actually, on playing with typeof, I think those are actually all strings. "[object Object]", "0", "NaN". OTOH, {}+[4]+5 == 9 (but typeof string)
**
>>> 5+{}+[4]
5[object Object]4 // because of course it does
*

return
  x;

turns into

 return;
 x;

* All numbers are 64-bit floats; you can still bitshift them, but as integers, so 5.2 << 2 == 20. This makes more sense when I remembered that floats are weird, not integers+fractions, so a simple bitshift of the fraction wouldn't make sense.

comments

Hoisting Shadows

2017-02-05T16:49:15Z

A bit after writing the previous post on shadowing variables in JavaScript, I came across this page on hoisting. JavaScript re-writes your code so that declarations but not initializations to the top of current scope, meaning the script or function[1]. So

console.log(a);
var a=5;

turns into

var a;
console.log(a);
a=5;

Put that way, it's clear why the problem happens, if not why the language was designed this way.

Does the same thing happen in Python? Searching did not get clear results. I saw pages contrasting JavaScript with Python, which doesn't even *have* declarations. OTOH, the same behavior occurs. So... I dunno.

[1] JavaScipt does not have block scoping the way C and Perl[2] do; scopes are delimited by curly braces, but random curly braces do nothing to isolate variables.

{ a=5; }
console.log(a);

will work just fine. :(

[2] As I was verifying this in Perl, I ran into behavior I'd forgotten. I'd thrown in "use strict;" but wasn't getting the errors I expected. Eventually I recalled that $a and $b have special meaning in Perl (I think for comparison functions), and I guess are pre-declared, and I was using $a a la the above code, so strict didn't complain about trying to access $a before assigning to it. Sigh.

comments

Shadows of JavaScript

2017-02-02T18:43:58Z

Months ago, Robbie had found this scoping problem in Python, which I reduced to essentials.

I've started finally learning JavaScript, and it has nicer lambdas than Python, and proper hiding of nested functions unlike Perl. But it has the same scope problem:

g1 = 12;
function func() {
  document.getElementById("demo").innerHTML = g1;

  var g1 = 5;

}
func();

(I'm not including the HTML framework because DW/LJ would yell at me if I did.)

Output is 'undefined', rather than 12. As in Python, the local variable further down shadows the outer scope variable (doesn't matter if the "g1=12" has a 'var' before it) even for lines before the local variable.

As mentioned before, Perl has proper lexical scoping here (though not for nested functions.) I don't think I can even create similar code in Scheme/Lisp, where the scoping is explicit with parentheses. (There's 'define' but I think that makes a new global, and it didn't work.) In Ocaml I have

let g1="10";;

let func () =
  print_endline g1;
  let g1="cat" in
    g1
  ;;

func();;

Which I suspect is as explicit as Lisp parentheses, in its own way; the print line is obviously outside the following "let ... in...".

comments

fast array rotation

2017-01-28T18:27:15Z

A simple problem I'd never had occasion to think much before, before I saw a sample coding problem.

How to rotate the elements of an N-element array by k spaces? An obvious way is to shuffle it one space, k times, but that's slow, O(N*k). Faster, which I saw about as soon as I thought about performance, is to use a second array B, where B[(i+k)%N] = A[i]. But that takes O(N) extra space. Can you do better?

Yes, as I realized at 5am. For each element A[i], move it to A[(i+k)%N]. O(N) time, O(1) extra space. Can't beat that!

Except some mildly tricky math intervenes: the natural approach only works if N and k are relatively prime. A more general solution is

let g = gcd(N,k)
for i in [0,g)
  for j in [0, N/g)
    shuffle element k spaces

Despite looking like a double loop, it's still O(N), with g*N/g iterations.

[2020 Edit: Looking back... I find this less than clear. I went to the code to make sure I got it right.

#ar = (array)
g = gcd(N, k)
for i in range(g):
    cur = ar[i]
    for j in range(1, N//g+1):
        temp = ar[(i + j*k) % N]
        ar[(i + j*k) % N] = cur
        cur = temp

]

I've also learned that C++ has a gcd function now. std::experimental::gcd, in the header experimental/numeric. C++17 moves it out of experimental but even g++ isn't doing that by default.

The really annoying this is that this is the sort of solution that comes naturally to me lying in bed, with little conscious effort, but that I'd likely fail to get in a whiteboard session or timed coding test, due to stress dropping my working IQ.

comments

Applying A*

2017-01-17T18:08:38Z

I've played a lot of freeciv and freecol over recent years. Both games let you order a unit to go to some square, and hopefully it takes the fastest route there. The 'world' is a square grid, of various terrain types and associated movement costs -- e.g. plains or desert take 1 move, but mountains take 3; roads and rivers take 1/3 no matter what terrain type they're laid over.

A* is this sweet magic algorithm for finding shortest paths in some graphs efficiently, vs. doing breadth-first search in all directions, but I was having trouble applying it mentally. I was using the common Manhattan distance heuristic, h((0,0),(x,y)) = x+y, and I wasn't getting good results: the algorithm would cheerfully march down a straight plains path to the goal, while ignoring a path that might step away and into mountains, but then ride a river to the goal much faster.

So I backed off, and thought about BFS. I realized that would work better if instead of naive BFS, enqueuing grid squares as you found them, instead you ranked them by total travel time so far. This is basically A* without a heuristic. Instead of exploring all paths N squares away, you'd explore all paths N moves away; it would still be radiating in all sorts of directions, but at least you'd find the shortest path to the goal.

Then I realized I'd been using the wrong heuristic; the right one should be the shortest possible journey. Or as WP says, "it never overestimates the actual cost to get to the nearest goal node." So the heuristic in this application has to consider rivers and roads, such that h() = (x+y)/3, not (x+y). This works much better: the plains march looks less attractive as it advances, converting cheap heuristic moves into actual plains moves, and the "mountain and away" move gets a chance to be considered.

Actually, units and roads can go diagonally, though rivers are rectilinear, so the proper heuristic is h((0,0),(x,y)) = max(x,y)/3.

Actually actually, infantry units only have one move, but are still guaranteed one square of movement per turn, so can march across mountains as easily as across plains; it's mounted units, with e.g. 4 move in freecol, that really care about base terrain type. Also fractional movement can be useful, e.g. I think a unit with 2/3 move left (after moving on river) can still move onto an adjacent plains.

comments