mindstalk | Entries tagged with how to measure

Wikipedia

I've heard of this for years, but How To Measure Anything goes into it in some depth. And provides some samples tests here (MS Word doc).

Two kinds of questions: ones like "when was X born" or "how big is Y", where you give a range of numbers such that you think the answer is 90% likely to be in between them; if you're right, then out of 20 questions you should get 18 right, out of 100 questions you should get 90 right. Note this isn't a straight test of knowledge: if you're less certain, just give a wider range, right? But most people tend to get like 60% right, so are overconfident in their assessments.

The other kind of question is "Statement Z", and you pick whether it's true or false, and your confidence from 50% to 100%. (If you're less than 50% confident, you should have picked the opposite answer...) You add up your expected score -- if Q1 is 100% and Q2 is 70%, you expect 1.7, etc. -- then grade the test. If you score a lot lower than expected, you're overconfident!

The book has a short test in chapter 5 (10 questions of each type), then 2 longer (20 questions) tests in the back. On the first type I've done 80%, 75%, 95%; on the second I've always scored higher than expected, only missing the questions where I give 50% ("I have no clue") confidence. So underconfident there, which doesn't surprise me. On the first type, I think I added some error from believing I was underconfident and trying to correct for that.

There's a subtlety with the second type: Hubbard says that if you get a single 100% answer wrong, that's a sign of overconfidence, but his test just has you circle probabilities by 10% -- 50, 60... 90, 100%. So what should you do if you're like 96% confident? It's natural to round up, but that leaves you being wrong on "100%" occasionally; if you take the floor, then you'll get "90%" right more often than you should.

Anyway, so after having taken one such test and probably found that you're overconfident, the next step is to try to calibrate yourself. Tricks suggested include "try to imagine you're wrong, and what could have gone wrong"; "instead of picking a value and applying error bars, start with an absurdly wide range and narrow it"; "think that the true value should be 95% likely to be under your upper bound, and 95% likely to be above your lower bound", and equivalent bets.

That one is imagining that you could get money by being right (especially for the first type of question), or get money by spinning a wheel with 90% payout chance. If you'd rather go with the wheel, then you're not actually 90% confident in your answer; if you'd rather go with your answer, then your range is too wide. This is supposedly the main silver bullet for calibration, though I've found it hard to apply.

The chapter also talks about evidence that risk estimation is a learnable skill (for 95% of people) and that getting better on trivial questions does generalize to more useful applications.

Also I think you could make your own tests easily enough after seeing one: come up with your own questions where you don't know the answer but could easily look it up, trying to answer the questions, then score yourself. 'Fun', plus you'll learn things!

I'm reading How To Measure Anything and it's had some surprising revelations.

Rule of Five: if you take 5 random samples from a population, there's a 93.75% chance the population median is within the minimum and maximum (range) of the sample. For it not to be, all 5 samples would have to be e.g. below the median. They're random, so the chance of that is 0.5**5 = 3.125%. They could also be above the median, so that's another 3.125%.

The same math gives that a sample set of 3 has a 75% chance of bracketing the population median! A set of 7, 98.4%.

Single Sample Majority: This one's a bit trickier, but say you have a bunch of urns, containing red or green marbles, the urns have a completely uniform distribution. If you draw a single marble from each urn, and bet that the majority of each urn matches the corresponding sample, you'll be right 75% of the time. Bayes:

p(urn|draw) = p(draw|urn) * p(urn)/p(draw)

Uniform, so p(urn) = p(draw)

Uniform, so p(draw|urn) = 0.75. For individual urns it varies: if the urn is 51% red, 51% chance of drawing a red; if 95% red, 95% chance of a red draw, but the urns range over all the percentages.

Honestly this one seems less generally useful than the Rule of Five, but it's still impressive -- if you don't know much, even a single sample can be meaningful.

S	M	T	W	T	F	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Rich and Strange Aeons

Entries tagged with how to measure

calibrated probability assessment

The power of small samples.

Profile

May 2025

Page Summary

Active Entries

Expand Cut Tags

Syndicate

Style Credit