No shit. I used to work as a quant, and while I was an okay quant and mediocre trader at best, I survived for three years in the industry because of my kdb+ proficiency: the firm I was at spent a couple of million dollars on kdb+ only to find out that most people could not wrap their heads around kdb+ let alone debug it effectively.
My (former) colleagues were definitely smart people. In many ways, they were way smarter than myself. But I somehow could get a much better handle of kdb+'s idiosyncrasies, and my ability to stare at dense k/q code (usually no more than a dozen lines) and figure out what's wrong with it earned me the reputation as the "q guy" - and some level of job security.
The firm eventually phased out kdb+ completely after my boss and I left (the two proponents of kdb+).
Yea, I know all about the training and First Derivative. My employer also hired them.
In their defense, every First Derivative KDB+ consultant that I worked with was very sharp and an excellent teacher. They really knew their stuff, and First Derivative is no small part of what has made KDB+ so successful. However, even with their excellent pedagogy, most of my co-workers were totally lost/weren't willing to apply themselves to learn q/kdb+ well.
Here is another way to think about it: many people can't ever get their heads around certain conceptually difficult topics, say, measure theory or quantum physics. I don't think kdb+ is nearly as hard, but it seemed that way looking at my peers who were no slowpokes.
> ... came back buzzing with excrement
1. kdb+ was (and maybe is) a good solution to the problem that we had: doing complex data manipulation/simple statistical calculations against billions of rows of time series data. Hadoop is the term du jour for data processing, but truth of the matter is that finance doesn't have really huge data. At best, it's a couple of terabytes, and most of the time, you are working with a small subset of it. Running KDB+ on a beefy server or two would usually do the job (rather well).
2. Maybe because I studied math, but I find k/q's vectorial/functional sematics appealing. I think the syntax is horrible, but the semantics is very neat.
3. Finally, because it helped me keep my job. It was rather amazing to me that all these Ph.D. statisticians that I worked with couldn't bring themselves to learn kdb+ effectively. Apparently this stuff can be very hard for even the smartest people (or maybe they thought it was such a niche skill with a low ROI).
That's what irritated me most.
What I'd like to understand is - what led the author to this particular conclusion? Is it the fact that this language is super expressive and concise? Is it that it routinely [1] outperforms its C counterparts even if it ultimately translates to C? Is the Z graphical interface so superior that it'll blow the pants off Cocoa and Quartz and X.org or Wayland or what have you? Why would one rewrite emacs or vim on it? I don't want some basic 4 line text editor - I would like to be productive. Why would Mozilla spend energy porting firefox to it? Or Google, chrome? Or bash?
Simply talking about the history of K/kdb+ and how brilliant its creator is simply doesn't help the reader understand why they should be excited about it. If that was the intention of this article, then the real points to make should've started after that line.
That would've been much more interesting.
[1] - No pun intended, of course
Btw: k doesn't translate to C. It's actually a quite simple interpreter. The fact that it outperforms other languages so easily should be saying more about those languages than it should be saying anything about k.
As I commented above, the syntax is horrible. But I still think it is an effective tool for certain problems.
And to help calibrate, what are your preferred languages and styles?
The code is, well, not the easiest to understand.
First, get out the reference manual: http://kparc.com/k.txt and we'll do the first couple lines.
The sequence that goes f x applies x to f. this f is unary.
The sequence that goes x f y applies x and y to f. this f is binary (and just labelled verb).
Some things (adverbs) go f a x and apply f in some special way to x.
Last hint: You read this code from left to right (like english). Do not scan it.
Now let's dive in:
c::a$"\n"
This says: c is a view of where a is a newline. That is, if "a" contains a file, then c is the offsets of each newline.A view is a concept in k that is unusual in other languages: When "a" gets updated, then "c" will automatically contain the new values. This is similar to how a cell in Excel can refer to other cells and reflect whatever the value of those cells happens to be.
b::0,1+c
This says: b is a view of zero join one plus c. That is, where "c" contains the offsets of each newline, 1+c would contain the beginning of each new line. We joint zero to the beginning because of course, the beginning of the file is also the beginning of a line. d::(#c),|/-':b
This says: d is the view of the (count of c) joined with the max reduce, of each pairs difference of b. That sounds like a lot, but "each pairs difference of b" (where b is the position of all the new lines) is also the length of each line, and the max reduce of that is the longest line. You might say that "d" is the dimensions of the file. i::x,j-b x:b'j
This says: i is a view of x (?) joined with j (?) minus b (the offset of the beginning of each line) applied to each x which is defined as the bin of j in b.j hasn't been defined yet, but you can see that x is defined later in the definition, and used to the left. This is because K (like APL) executes from right to left just like other programming languages. you write a(b(c(d()))) in C and you're executing the rightmost code first (d) then applying that leftwards. We can do this with anything including definitions.
The other string thing is that we know that b is the offset of the beginning of each line, and yet we're applying something to it. This is because k does not distinguish between function application and array indexing. Think of all the times you write in JavaScript x.map(function(a) {return a[whatever] }) when you'd really just like to write x[whatever] -- k let's you do this. It's a very powerful concept.
On that subject of binning: b'j is going to find the index of the value of b that is smaller or equal to j. Since we remember that b is the offset of the beginning of each line, then if j is an offset in the file, then this will tell us which line it is on(!)
But we don't understand what j is yet; it's the next definition:
j::*|k
This says: j is a view of the last (first reverse) of k. We don't know what k is yet. f:""
This says: f is defined as an empty string. g::a$f
This says: g is a view of the offsets of f (an empty string) in a. Initially this will be null, but other code will assign to f later making g a value.Next line.
s::i&s|i-w-2
This is very straightforward; & is and, and | is or. While excel doesn't let us use a cell in it's own definition, k does: It means the old value of s. So this is literally: the new view of s is i and the old view of s or i minus w (?) minus 2.We don't know what w is yet.
S:{s::0|x&d-w}
This is a function (lambda). x is the first argument. If we called S[5] it would be the same as setting s to zero or 5 and d (dimensions) minus w. Double-colon changes meaning here; it no longer means view, but set-global. px:{S s+w*x}
This requires some knowledge of g/z: http://kparc.com/z.txtNote that px is defining a callback for when the pageup/pagedown routines are called. x will be 1 if we page down and -1 if we page up. It may now become clear that S is a setter of s that checks it somehow. When we understand what w is (later in the program) it will be absolutely clear, but pageup/pagedown are changing s by negative w when pageup, and positive w when pagedown.
wx:{S s+4*x}
Consulting the g/z documentation, we can see this has to do with wheels. Note we modify s again relative to 4 times x; x is -1 when wheelup and 1 when wheeldown. It becomes clear that s is modified by the pageup/pagedown and the mouse wheel. cc:{9'*k_a}
Again: in the documentation, 9' stashes something in the clipboard. cc is a function that takes the first slice of offsets k (we still don't understand) in a (the file). The g/z documentation says that cc is the callback for control-C. This is expected as control-C traditionally copies things to the clipboard. Since the slice of offsets k in a are being saved in the clipboard, we may guess at this point that k will contain the current selection.This process is time consuming, but it is to be expected: Learning english took a while at first, and often required consulting various dictionaries. Eventually you got better at it and could read somewhat quickly.
I don't know if you want to try the next few lines yourself to see if you can get a feel for it, or if you want to try one of the less dense examples:
* http://www.kparc.com/$/view.k * http://www.kparc.com/$/edit.k
... or if you want me to keep going like this, or if you want to ask a few questions. What are your thoughts?
Can you write this editor in one line of less than 6000 chars of javascript (without a "textarea")?
data:text/html, <html contenteditable>
Done :)
If you've worked with such things for your day job, exposure to an APL language is mind blowing in the same way as exposure to Lisp is. You'll rapidly find out that an awful lot of the numerics world is an ad-hoc reinvention of an APL language. Leading thinkers in the numerics world have noticed. Have a look at the Tensor type inside Torch7, or -idx- class in Lush (the same thing): they are, in fact, a sort of APL with a more conventional, aka painfully wordy, notation.
Writing an editor in an array language seems crazy, but then, writing a parallel processing system in a language that was designed to run applications in your web browser also seems crazy. If people had stuck with APL style languages, well, databases, particularly distributed databases (Kx and 1010, both K based systems, scale to Pentascale, and have for a long time), would suck less, as would CUDA programming. Their revival could make life easier in these problem domains.
The download link at http://www.kparc.com/ asks for password, so I'm not sure whats going on with that.
Looking at his code on http://www.kparc.com/edit.k I'd like to disagree with that statement
I studied Arabic a lot, and Chinese about a year. I cannot speak to Chinese with only one hazy year under my belt, but I can speak to Arabic.
Because Arabic has lots of syntax realized at the morpholgical level, you can encode a whole sentence (subject (with declension inherent and gender variable, verb conjugated (to passive/active, past/present/future, standard/subjunctive) and direct object (declension inherent and gender variable) all in one word as we know the in English.
أضربه (A-dr-b-u; a (I) dr-b (hit) u (him/it): I hit him (present tense)
And that is a super simple example. I have seen much more compicated setences in one word, and even better in two or three. So, I hypothesized Arabic is very, very dense. I think and Russian and others could be considered similar.
However, with this level of density (maybe we argue "compression" from a CS perspective) I noticed books and their translation were routinely about the same length in pages. Never identical mind you, but never something crazy like 50 pages more (I am guessing; it has been a long time since I made such an experiment and would have trouble agreeing with someone on what is significant).
Now, one could hypothesize a shitload about what this means, but computation is realized as the same "stuff" (machine code instructions) in programming languages, where no parallel exists in human language for mapping human language to computaion, as far as I know from my between minor and major courseload in linguistics, specifically computational linguistics. If someone can contradict me, I would LOVE to read about measured cognition and language constructs.
atom (int, float, char, date, symbol, ...)
list (one dimensional array of atoms, dicts, flips or lists)
dict (a map from one list to another)
There's also a flip, which exchanges the first two indexes applied to an item (so, e.g., it effectively transposes a list of lists) but it is just sugar (both syntactic and semantic).
You can trust Whitney that all of these are properly implemented, including appends.
It's not often that you actually need more. I've discovered this after using K for a while, and going back to python.
Back in my pre-K (ha!) C++ and Python day, I had an awful lot of classes everywhere. After using K for a while, my Python and C both have much much fewer (structs in C more often than python, as C is missing python's dict). And the code has gotten much shorter and more efficient. Arguably, more readable as well. And I've essentially dropped C++ for C, because the extra complexity is just not worth it.
But the actual meaning of the program is still lost on me. I can only guess it has something to do with parsing files (note the checks for curly braces). Feeding it its own source code produces some output, but I have no idea what it actually modified.
I believe the audience is developers using K in a commercial/production environment.
The biggest differences, compared to Arthur's style of writing, are: * Less code on each line * A separate comment column on each line * Nominal use of spaces for readability
We also use Forth which I think is a really another way of shifting mind.
Your code in whatever language you work will benefit from that rethinking.
The closest I could find is this [1] but "The model is expressed in SHARP APL", so from the start, it's circular.
https://www.youtube.com/watch?v=VSJpJt3c11c
It goes form solving some Euler problems to a full-blown web app in J.
Coming back to the question, for J its vocabulary on jsoftware.com is a good resource.
An OS this small is incredibly exciting to me.
This summer, Pierre and I got kOS to boot directly into g (the graphical interface; formally called z) with ISR, keymap, modesetting, basic filesystem, etc weighing in around 100 lines of C. That was pretty exciting. Could probably be done with less with some deeper changes to Arthur's code, but it's still very useful to run k under Linux. Oleg made a silly little game in kOS.
Arthur and Oleg did some performance benchmarks staging k against q (current kdb+), Postgres, some "popular RDBMS" (that I can't name), and MongoDB. It was impressive that k is so much faster than q, but it also really underscores the cost of the wrong data structure (and how hard it is to get the right one with SQL or MongoDB).
I realize you have a thriving commercial software company and that's cool. But...wow. This is exactly the sort of thing Alan Kay's team has been working on for the past five years, and you guys seem to be beating them to it, with a completely different approach. It would be pretty amazing to be able to dig into it, find out how the whole system works, and contribute.
"Whitney sent Oleg and Pierre some of the C code he was working on, and notes on a problem he didn’t know how to solve. They emailed back a solution, coded in his style."
Did Pierre and Oleg think their solution out in standard code first and then make it Whitney-like, or did they find themselves thinking in Whitneyese straight away? I imagine their teacher may have noticed Whitney tendencies and that is what led to the original encouragement to make contact.
"Whitney’s strategy was to implement a core of the language – including the bits everyone thought most difficult, the operators and nested arrays – and use that to implement the rest of the language. The core was to be written in self-expanding C. As far as I know, the kdb+ interpreter is built the same way.
Unlike the tall skinny C programs in the textbooks, the code for this interpreter spills sideways across the page. It certainly doesn’t look like C."
This mean that the code is very unreadable C? Like a kind of code-golf?
How replicate it for built a speedy interpreter? And what if I use lua or python instead?
The mainstream says: readable C has function and variable and type names that express meaning, so a function is read like a narrative with verbs, adjectives and nouns. The fact that this narrative scrolls over pages, is unimportant.
The APL/K/J school says: readable C has functions, variables, and types named with single letters, so that the totality of a function is short enough to fit in one glance - preferably, on one line that does not need a scrollbar. The function does exactly what it says, no more and no less; its intent is thus completely clear. To name it descriptively would ruin the ability to grasp the whole thing as a gestalt.
But that how relate to how build a fast interpreter? Faster than C?
However, whats really impressed me with KDB is that you can do so much more with it. In some banks it has effectively become the messaging middleware for connecting hundreds of disparate data sources. In addition to passing messages you get the data storage and analytics tools for free...
Its continued survival is rather a testament to excellency of marketing (as evidenced by this article!) rather than actual merits.
And with the rise of parallel programming we're going to actually switch more to more vector way of describing and solving problems. Just like Lisp ideas are spreading everywhere in modern languages, APL ideas are also fruitful.
There seems to be this strange idea going around that if we just get the right tool, everything else is going to change forever. I see this a lot with people trying to create IDEs that let non-programmers create programs without really knowing how to code.
But the thing is, most people just don't have anything worth coding. The problem isn't that the tools don't exist. They do, even if they aren't perfect. It's that making something that matters isn't an easy thing to do. And no tool can change that.
However if I then show you a programming language, a database engine (similar in capability to SQL but around 1000x faster), a graphical desktop with icons, mouse, editors, filesystems, ISR, and so on- and it's still under 400 lines of C code, then maybe it's easier to have this conversation that I think we need to have: That maybe all you programmers have simply been doing it wrong, and there's something fundamentally wrong with the way you and everyone else programs computers.
I think most programmers take so much offence from the thesis: that everyone programs wrongly (or badly) and that a fundamental shift in the way we program could make programming much better, that it's been difficult to actually work on this problem. Just look at all the people who are complaining about the text editor being difficult to read without saying I can't read this, and I want to get better.
And I think it should be obvious: You've built bridges for thousands of years, so you expect you're pretty good at it now, and yet there are still improvements in bridgemaking today; why do you expect programming to be any different? Why do you have "best practices" for programming if you've only been doing it for a few decades and don't really know how to do it yet?
Maybe it's much easier to think we're working on giving non-programmers the ability to program, but we're not: we're trying to make programming suck less.
At the same time, it does sound slightly off. What's the catch? In the text editor example, how capable is it? Often, such claims rely on a trick, like saying "<textarea/>" is a text editor in 11 characters. Or the tiny Haskell quicksort that's actually rather inefficient.
I think I'll try playing with it. How long should a Logo interpreter in K be?
It's that way with religion, paradigms in medicine, and even chemistry/crystallography (see e.g. Shechtman vs Pauling[1]). It's human nature.
It is also worth pointing out that while this specific version of K is new, K itself is 20, and it is mostly purified APL, itself over 60 years old. People have been ignoring this since forever (APL found stellar success in the OR community in the 70s and 80s on its own, in trading floors in the 80s and 90s thanks to Arthur Whitney, but has since mostly disappeared)
[0] http://en.wikipedia.org/wiki/Somebody_Else%27s_Problem#Ficti...
[1] http://www.theguardian.com/science/2013/jan/06/dan-shechtman...
Right, and I'm not saying programming won't improve. It obviously has. No one wants to write a web app in C++ or assembly. What I'm specifically taking issue with is the idea that there is some sort of monumental change out there that is going to enable us to do... what exactly?... well no one can really answer that, but the claim is that it is big and exciting and will change everything.
To me, part of being an engineer is not taking offense when being told there are potentially better ways to do things, but at the same time I can understand why some people jump to that kind of statement since at first glance it seems like a regression of sorts, especially when the industry has a continuing history of trying to lower the entry barrier. Even though underneath the programming concepts may be revolutionary, people may also be quick to balk at it when it's in such a form (hence the first question).
Would you rather do:
IXDCCCLVI * VIIDCCCXLIX
or
9856 * 7849 ?
If it's really similar in capabilities, why not strap an SQL parser on it and sell it? You can do it client-side to avoid wasting performance where it matters. Surely there's money to be made in the database business if you are an order of magnitude faster than everyone else -- let alone three.
I'm trying not to add more words to that paragraph, though I could/should.
+/%#
You can set this up as: avg=: +/%#
or tally =: #
divide =: %
sumall =: +/
How's that for a dsl? Which allows this now: avg=: sumall divide tally
avg 2 4 6
4
EDIT: I forgot to add that these operations, as others in J or APL, work on scalars, vectors and arrays without any special typing or handling. See this presentation for at least the first 5 minutes to see J in action with explanation: http://www.infoq.com/presentations/j-language
The conciseness allows you to start seeing the small patterns in the small expressions just like mathematics, hence bringing clarity and speed of abstraction to your concept manipulations.> todo
> files, procs, tcp/ip, usb, ..
Oh.
The rest of the wiki is quite useful including a references and tutorials for q.
k5 uses operators instead of words like q.
Btw, being able map/reduce w/ 1000 procs on a 8GB Linux vm (using k5) is both useful and fun.
However, they've been working on a new version of k that's not publicly available yet and I suppose the kparc stuff requires that.
Tristan
When I mentioned to a friend that the current version of K5 was a binary < 100KB, his response was "I don't believe it. I can't write HelloWorld in less than 100 KB!".
Access to more resources does not mean that one should be wasteful.
K benefits from a dedication to avoiding waste and duplication and efficient use of mathematical concepts. The Fundamental New Computing Technologies at VPRI has similar goals (an entire end-user system including "Office" apps in 40 KLOC or less).
Being able to prototype a multi-proc map/reduce algorithm in k with 1000 procs on a laptop with 8 GB RAM is quite nice.
> 500 lines of C code that's meant to change the world? I really doubt it (or, these guys arent using line breaks). There are line breaks, undoubtedly. That said, I'm sure the code is concise - much like k code.
K3 was 1200 lines of code and included the language, windows(GUI), database, IPC, REPL (w/ simple debugger), FFI and OS interaction. The Windows executable was 320 KB.
The real challenge is that, 99% of the time, requirements and integration is what kills you, not raw performance. For the cases where performance (or formal correctness, or whatever) matters, the main challenge is usually to convince the market that it's worth paying for, and then finding the right developer project match.