<?xml version='1.0' encoding='utf-8' ?>

<rss version='2.0' xmlns:lj='http://www.livejournal.org/rss/lj/1.0/' xmlns:atom10='http://www.w3.org/2005/Atom'>
<channel>
  <title>Rich and Strange Aeons</title>
  <link>https://mindstalk.dreamwidth.org/</link>
  <description>Rich and Strange Aeons - Dreamwidth Studios</description>
  <lastBuildDate>Sat, 01 Jul 2017 01:39:02 GMT</lastBuildDate>
  <generator>LiveJournal / Dreamwidth Studios</generator>
  <lj:journal>mindstalk</lj:journal>
  <lj:journaltype>personal</lj:journaltype>
  <image>
    <url>https://v2.dreamwidth.org/241388/374172</url>
    <title>Rich and Strange Aeons</title>
    <link>https://mindstalk.dreamwidth.org/</link>
    <width>100</width>
    <height>75</height>
  </image>

<item>
  <guid isPermaLink='true'>https://mindstalk.dreamwidth.org/480146.html</guid>
  <pubDate>Sat, 01 Jul 2017 01:39:02 GMT</pubDate>
  <title>SQL and text files</title>
  <link>https://mindstalk.dreamwidth.org/480146.html</link>
  <description>In April a friend introduced me to &lt;a href=&quot;https://csvkit.readthedocs.io/en/1.0.2/&quot;&gt;csvkit&lt;/a&gt;, a suite of command line tools for manipulating CSV files, including doing SQL queries against them, and that sounded cool so I made a note.  A bit later, friend Z Facebooked about &lt;a href=&quot;https://github.com/harelba/q&quot;&gt;q&lt;/a&gt;, which is the worst software name ever, which also ran queries against CSV files.  I made another note.&lt;br /&gt;&lt;br /&gt;My use case is my finances, which I&apos;d been keeping in ad hoc text files like &quot;May2015&quot;, with some awk scripts to sum up categories in a month, and crosscheck that the overall sum matched the sum of all categories, to detect miscategorization.  It worked well for that task but wasn&apos;t very flexible, and late last year I had the idea of finally going to &apos;proper&apos; software.  At first I assumed a spreadsheet, because spreadsheets = finances, right?  But then I realized that for the queries I wanted to do, SQL was more appropriate.&lt;br /&gt;&lt;br /&gt;So I wrote a Python script to convert my years of files into one big CSV files, with date broken down into year and day for easy queries, and my text tags converted into a category column.  Then I imported it into MySQL and it was good.&lt;br /&gt;&lt;br /&gt;But what about going forward?  I spend more, and make new text files... making notes in the full format (date, year, month, day, amount, category, notes) is a pain, and I kept forgetting how to import more into MySQL, and I just let things slide.&lt;br /&gt;&lt;br /&gt;Last night I decided to get back to it, as part of checking my spending and savings, and checked out the old tools, with this year&apos;s spending in a simpler (date, amount, notes) CSV file.&lt;br /&gt;&lt;br /&gt;Both programs work, and I figured out sqlite for extracting month on the fly (so I can group sums by month, or compare power spending across all Junes, say.)  Sample queries:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;
q -H -d, &quot;select sum(amount) from ./mon where code like &apos;%rent%&apos;&quot;

q -H -d, &quot;select strftime(&apos;%m&apos;, date) as month, sum(amount) from ./mon where code like &apos;%transport%&apos; group by month&quot;

csvsql --query &quot;select Year, sum(amount) from money2 where Month=&apos;06&apos; group by year&quot; money2.csv
#that&apos;s against the more complex CSV
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;How do they compare?  Probably the more important is that q is way faster, perceptually instantaneous on a 7000+ line file, while csvsql has notable startup time.  Both are Python, but csvkit also requires Java, so maybe it&apos;s starting a JVM in the background.&lt;br /&gt;&lt;br /&gt;q is much lighter, an 1800 line Python program; csvkit has a long dependency list.  I tried using the Arch AUR package, but don&apos;t have an AUR dependency tracer, so ended up using &apos;pip install csvkit&apos; instead.&lt;br /&gt;&lt;br /&gt;q needs to be told that the CSV file is actually comma separated, not space-separated, and has a header; OTOH csvsql needs to be told if you want to do a query, and the file you&apos;re querying.&lt;br /&gt;&lt;br /&gt;It looks like both only do SELECT, not UPDATE; I&apos;d wanted to do UPDATE in cleaning up my booklog CSV file but ended up resorting to another Python script.  (After trying to push everything into a real sqlite database, but failing to get the weird CSV imported correctly.)&lt;br /&gt;&lt;br /&gt;q only does queries; csvsql does more, I dunno exactly.&lt;br /&gt;&lt;br /&gt;q has a man page, csvkit docs are entirely online.&lt;br /&gt;&lt;br /&gt;I&apos;ll probably be using q.&lt;br /&gt;&lt;br /&gt;Why not use an actual database?  Mostly to cut out steps: new expenditures or books read are easy to update in a text file, and if I can treat that as a database, I don&apos;t need a step to update some other DB.&lt;br /&gt;&lt;br /&gt;mysql felt heavy and clunky, though thanks to work I now know about the &apos;~/.my.cnf&apos; file which can store authentication.  You still need a mysqld up.  sqlite3 can run directly off a file and is certainly worth considering -- though as noted, I never got it actually working.&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;https://www.dreamwidth.org/tools/commentcount?user=mindstalk&amp;ditemid=480146&quot; width=&quot;30&quot; height=&quot;12&quot; alt=&quot;comment count unavailable&quot; style=&quot;vertical-align: middle;&quot;/&gt; comments</description>
  <comments>https://mindstalk.dreamwidth.org/480146.html</comments>
  <category>sql</category>
  <category>database</category>
  <category>computer</category>
  <lj:security>public</lj:security>
  <lj:reply-count>1</lj:reply-count>
</item>
</channel>
</rss>
