<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:dw="https://www.dreamwidth.org">
  <id>tag:dreamwidth.org,2009-05-20:374172</id>
  <title>Rich and Strange Aeons</title>
  <subtitle>mindstalk</subtitle>
  <author>
    <name>mindstalk</name>
  </author>
  <link rel="alternate" type="text/html" href="https://mindstalk.dreamwidth.org/"/>
  <link rel="self" type="text/xml" href="https://mindstalk.dreamwidth.org/data/atom"/>
  <updated>2017-07-01T01:39:02Z</updated>
  <dw:journal username="mindstalk" type="personal"/>
  <entry>
    <id>tag:dreamwidth.org,2009-05-20:374172:480146</id>
    <link rel="alternate" type="text/html" href="https://mindstalk.dreamwidth.org/480146.html"/>
    <link rel="self" type="text/xml" href="https://mindstalk.dreamwidth.org/data/atom/?itemid=480146"/>
    <title>SQL and text files</title>
    <published>2017-07-01T01:39:02Z</published>
    <updated>2017-07-01T01:39:02Z</updated>
    <category term="sql"/>
    <category term="computer"/>
    <category term="database"/>
    <dw:security>public</dw:security>
    <dw:reply-count>1</dw:reply-count>
    <content type="html">In April a friend introduced me to &lt;a href="https://csvkit.readthedocs.io/en/1.0.2/"&gt;csvkit&lt;/a&gt;, a suite of command line tools for manipulating CSV files, including doing SQL queries against them, and that sounded cool so I made a note.  A bit later, friend Z Facebooked about &lt;a href="https://github.com/harelba/q"&gt;q&lt;/a&gt;, which is the worst software name ever, which also ran queries against CSV files.  I made another note.&lt;br /&gt;&lt;br /&gt;My use case is my finances, which I'd been keeping in ad hoc text files like "May2015", with some awk scripts to sum up categories in a month, and crosscheck that the overall sum matched the sum of all categories, to detect miscategorization.  It worked well for that task but wasn't very flexible, and late last year I had the idea of finally going to 'proper' software.  At first I assumed a spreadsheet, because spreadsheets = finances, right?  But then I realized that for the queries I wanted to do, SQL was more appropriate.&lt;br /&gt;&lt;br /&gt;So I wrote a Python script to convert my years of files into one big CSV files, with date broken down into year and day for easy queries, and my text tags converted into a category column.  Then I imported it into MySQL and it was good.&lt;br /&gt;&lt;br /&gt;But what about going forward?  I spend more, and make new text files... making notes in the full format (date, year, month, day, amount, category, notes) is a pain, and I kept forgetting how to import more into MySQL, and I just let things slide.&lt;br /&gt;&lt;br /&gt;Last night I decided to get back to it, as part of checking my spending and savings, and checked out the old tools, with this year's spending in a simpler (date, amount, notes) CSV file.&lt;br /&gt;&lt;br /&gt;Both programs work, and I figured out sqlite for extracting month on the fly (so I can group sums by month, or compare power spending across all Junes, say.)  Sample queries:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;
q -H -d, "select sum(amount) from ./mon where code like '%rent%'"

q -H -d, "select strftime('%m', date) as month, sum(amount) from ./mon where code like '%transport%' group by month"

csvsql --query "select Year, sum(amount) from money2 where Month='06' group by year" money2.csv
#that's against the more complex CSV
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;How do they compare?  Probably the more important is that q is way faster, perceptually instantaneous on a 7000+ line file, while csvsql has notable startup time.  Both are Python, but csvkit also requires Java, so maybe it's starting a JVM in the background.&lt;br /&gt;&lt;br /&gt;q is much lighter, an 1800 line Python program; csvkit has a long dependency list.  I tried using the Arch AUR package, but don't have an AUR dependency tracer, so ended up using 'pip install csvkit' instead.&lt;br /&gt;&lt;br /&gt;q needs to be told that the CSV file is actually comma separated, not space-separated, and has a header; OTOH csvsql needs to be told if you want to do a query, and the file you're querying.&lt;br /&gt;&lt;br /&gt;It looks like both only do SELECT, not UPDATE; I'd wanted to do UPDATE in cleaning up my booklog CSV file but ended up resorting to another Python script.  (After trying to push everything into a real sqlite database, but failing to get the weird CSV imported correctly.)&lt;br /&gt;&lt;br /&gt;q only does queries; csvsql does more, I dunno exactly.&lt;br /&gt;&lt;br /&gt;q has a man page, csvkit docs are entirely online.&lt;br /&gt;&lt;br /&gt;I'll probably be using q.&lt;br /&gt;&lt;br /&gt;Why not use an actual database?  Mostly to cut out steps: new expenditures or books read are easy to update in a text file, and if I can treat that as a database, I don't need a step to update some other DB.&lt;br /&gt;&lt;br /&gt;mysql felt heavy and clunky, though thanks to work I now know about the '~/.my.cnf' file which can store authentication.  You still need a mysqld up.  sqlite3 can run directly off a file and is certainly worth considering -- though as noted, I never got it actually working.&lt;br /&gt;&lt;br /&gt;&lt;img src="https://www.dreamwidth.org/tools/commentcount?user=mindstalk&amp;ditemid=480146" width="30" height="12" alt="comment count unavailable" style="vertical-align: middle;"/&gt; comments</content>
  </entry>
</feed>
