mindstalk | SQL and text files

In April a friend introduced me to csvkit, a suite of command line tools for manipulating CSV files, including doing SQL queries against them, and that sounded cool so I made a note. A bit later, friend Z Facebooked about q, which is the worst software name ever, which also ran queries against CSV files. I made another note.

My use case is my finances, which I'd been keeping in ad hoc text files like "May2015", with some awk scripts to sum up categories in a month, and crosscheck that the overall sum matched the sum of all categories, to detect miscategorization. It worked well for that task but wasn't very flexible, and late last year I had the idea of finally going to 'proper' software. At first I assumed a spreadsheet, because spreadsheets = finances, right? But then I realized that for the queries I wanted to do, SQL was more appropriate.

So I wrote a Python script to convert my years of files into one big CSV files, with date broken down into year and day for easy queries, and my text tags converted into a category column. Then I imported it into MySQL and it was good.

But what about going forward? I spend more, and make new text files... making notes in the full format (date, year, month, day, amount, category, notes) is a pain, and I kept forgetting how to import more into MySQL, and I just let things slide.

Last night I decided to get back to it, as part of checking my spending and savings, and checked out the old tools, with this year's spending in a simpler (date, amount, notes) CSV file.

Both programs work, and I figured out sqlite for extracting month on the fly (so I can group sums by month, or compare power spending across all Junes, say.) Sample queries:

q -H -d, "select sum(amount) from ./mon where code like '%rent%'"

q -H -d, "select strftime('%m', date) as month, sum(amount) from ./mon where code like '%transport%' group by month"

csvsql --query "select Year, sum(amount) from money2 where Month='06' group by year" money2.csv
#that's against the more complex CSV

How do they compare? Probably the more important is that q is way faster, perceptually instantaneous on a 7000+ line file, while csvsql has notable startup time. Both are Python, but csvkit also requires Java, so maybe it's starting a JVM in the background.

q is much lighter, an 1800 line Python program; csvkit has a long dependency list. I tried using the Arch AUR package, but don't have an AUR dependency tracer, so ended up using 'pip install csvkit' instead.

q needs to be told that the CSV file is actually comma separated, not space-separated, and has a header; OTOH csvsql needs to be told if you want to do a query, and the file you're querying.

It looks like both only do SELECT, not UPDATE; I'd wanted to do UPDATE in cleaning up my booklog CSV file but ended up resorting to another Python script. (After trying to push everything into a real sqlite database, but failing to get the weird CSV imported correctly.)

q only does queries; csvsql does more, I dunno exactly.

q has a man page, csvkit docs are entirely online.

I'll probably be using q.

Why not use an actual database? Mostly to cut out steps: new expenditures or books read are easy to update in a text file, and if I can treat that as a database, I don't need a step to update some other DB.

mysql felt heavy and clunky, though thanks to work I now know about the '~/.my.cnf' file which can store authentication. You still need a mysqld up. sqlite3 can run directly off a file and is certainly worth considering -- though as noted, I never got it actually working.

S	M	T	W	T	F	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Rich and Strange Aeons

SQL and text files

SQL and text files

VisiData

Profile

May 2025

Page Summary

Active Entries

Expand Cut Tags

Style Credit

Rich and Strange Aeons

SQL and text files

SQL and text files

VisiData

Profile

May 2025

Page Summary

Active Entries

Most Popular Tags

Expand Cut Tags

Style Credit