It'd be interesting to compare runtimes as well. I would guess that there's some overhead in loading into the DB up front, but that you might gain some speedup by converting longer chains of Unix pipes into one SQL query. On the other hand you might lose some parallelism. Would take some testing on different kinds of queries and data sets to get an idea of the pros/cons I think.
There is one that supports tabular data (file_fdw), and another one for JSON files (json_fdw). If you have files in other formats, you can also write your fdw for it. This way, you get complete SQL coverage.
Also, if you don't want to pay the overhead of parsing the file every time, you can use the new materialized feature for caching: http://www.postgresql.org/docs/9.3/static/rules-materialized...
(Disclaimer: Enthused Postgres user.)
Rather than making a custom tool to issue SQL, the idea is that regular CLI tools map well to the traditional relational algebra operations. sed is like selection, cat is like union, etc.
I guess it just applies tokenization and throws the text into a temporary database.
Quite similar to the Go project https://github.com/dinedal/textql, at least superficially.
https://news.ycombinator.com/item?id=7175830
> MS ADO / ODBC
> Perl DBI
> npm j (with jqa)
> Ruby (csv2sqlite)
> Python (csvkit)
> Go (textql, comp)
> Java (optiq, openrefine, H2 SQL)
> R (sqldf)
> Haskell (txt-sushi)
> XML (xmlstarlet, xmllint, xmlstar)
> HTML (HtmlAgilityPack, Chrome $x())
> Postgres file_fdw
> Oracle external tables
> SQL Server OPENDATASOURCE and OPENQUERY
> Log file viewers (MS LogParser, Apache asql, lnav)There are obviously lots of other software which can provide a similar capability, and while I haven't checked all of them out, I'm really believe that most of them do a great job. However, my rationale for creating this tool was to provide a seamless addition to the Linux command line toolset - A tool as most Linux commands are, and not a capability. The distinction I'm doing here is that tools are reusable, composable and such, vs a capability which is usually less reusable in different contexts. I'm sure that some of the above are definitely tools. I just hope that the tool I have created provides value to people and helps them with their tasks.
As I posted here elsewhere, my complete rationale for creating the tool is available on the README of the github project. Comments and issues are most welcome.
Harel Ben-Attia
In this case, it might be... they are trying to make a command-line tool. So in theory, you'll be typing the command often, meaning that a short name is preferable.
But honestly, it would have probably been a better idea to use a more descriptive name.
http://en.wikipedia.org/wiki/Q_%28programming_language_from_...
import dataset
db = dataset.connect('sqlite:///:memory:')
table = db['sometable']
table.insert(dict(name='John Doe', age=37))
table.insert(dict(name='Jane Doe', age=34, gender='female'))
john = table.find_one(name='John Doe')The Linux toolset is really great, and I use it extensively. The whole idea of the tool is not to replace any of the existing tools, but to extend the toolset to concepts which treat text as data. In a way, it's a metatool which provides an easy and familiar way to add more data processing concepts to the linux toolset. There are many cases where I use 'wc -l' in order to count rows in a file, but if i need to count the rows of only the ones which have a specific column which is larger than the value X, or get the sum of some column per group, then q is a simple and readable way to do it properly, without any need for "tricks".
My rationale for creating it is also explained in the README of the github project.
Any more comments are most welcome.
Harel
NB. Follow link to original post to compare against standard regex version.
There are also some nice grammar parsers available in some languages which make this even easier. For examples of this see Perl6 Rules/Grammar, Perl5 Regexp::Grammars or (for something which doesn't used regex at all is) Rebol Parse.
For eg. Here is my Rebol version of the HN post above: http://www.reddit.com/r/programming/comments/1smpa1/why_rebo...
And here is a presentation which shows a great example using Perl6 grammars: http://jnthn.net/papers/2014-fosdem-perl6-today.pdf
Refs:
- http://en.wikibooks.org/wiki/Perl_6_Programming/Grammars
- http://en.wikipedia.org/wiki/Perl_6_rules
- https://metacpan.org/pod/Regexp::Grammars
- http://www.rebol.com/docs/core23/rebolcore-15.html
- http://blog.hostilefork.com/why-rebol-red-parse-cool/
PS. Alternatively f you looking for something interactive then checkout tools like these: http://rebol.informe.com/blog/2013/07/01/parse-aid/ | https://metacpan.org/pod/Regexp::Debugger
There are tools like Regexper[1] that let you visualize the regex as an automata graph, and there are tools like text2re[2] which will allow you to put in text and visually generate a regex to match it.
I feel like better regex tools should exist on the command line, and it's potentially a great place for such tools to be rapidly developed and adopted. There are GUI tools for this like poirot[3], but the command line still exists because of its accessibility, uniformity, and extensibility.
links:
[2] http://txt2re.com/index.php3?s=24%3AFeb%3A2014+%22This+is+an...
In terms of 'verbosity' you can embed comments inside a regular expression, or build a regular expression over multiple lines, or make a set of regex objects and interpolate them into larger regex's. Perl has copious amounts of documentation to help you understand the many ways to use regexs in Perl.
http://perldoc.perl.org/perlrequick.html http://perldoc.perl.org/perlretut.html http://perldoc.perl.org/perlfaq6.html#How-can-I-hope-to-use-...
Something like SQL would be fine.
It's not a really thought out theory, but I think I'd like to manipulate text via a programming language like VI gods manipulate text with shortcuts.
Thanks for the link I will have a look at them.
Are people really so bad at databases that they'll gladly suffer hacks like this to avoid using one?
This is however useful for one off, throwaway query that offers familiar SQL syntax, if you don't want to use awk that is.