Products, services, or companies repeatedly lauded in the comment section, in my experience, are remarkably indicative of future broader trends.
For instance, this user, in 2010, lamented about the rampant bitcoin discussions as excessively overflowing on hn like some irritating internet meme: https://news.ycombinator.com/item?id=1998630 ... at the time of posting, bitcoins were selling for $0.06 each. Would it have been a smart idea to buy 10,000 after reading that? Probably.
I can imagine an arb-style subscription to the right sql queries could be packaged and resold for extremely good profit to the right people.
The same signal would have also fired, much more strongly, from August 2013 through December 2013. The LPs of the VC firms who share your view of its predictive power are presently not very happy.
Despite it being public data, because the information circulated on HN is at the core of technology, it could prove valuable to investors with limited knowledge of it (and might well be worth packaging and selling, haha).
I'd also like to postulate that if you were to segment the market into "early adopters", hn would have a larger share of this segment then other forums in the same class, of an equivalent or greater volume of traffic.
If this postulation is correct, then effectively hn is "trendsetters with money" ... a good group to listen to.
I don't have data to back these claims up, but intuitively I feel they are pretty safe.
This of course doesn't give any indication of market velocity. I've done a number of investments based on HN at the wrong velocity - I presumed the stock had been undervalued because of hn content, when in fact, the market had YET to undervalue it. I forecasted a distant chance of success given an undervalued stock (in this case blackberry) - knowing that they were going to do an android with a physical keyboard <eventually>, and I invested upon this speculation --- well before the market doubted the future of the company.
As a result, I bought it way early and it fell precipitously and is only rebounding slightly now. So no, this isn't a magic sauce to time the events or how they will affect the market price, just perhaps one to forecast their eventuality.
Analyzing submissions: http://minimaxir.com/2014/02/hacking-hacker-news/
Analyzing comments: http://minimaxir.com/2014/10/hn-comments-about-comments/
More recently I made a few charts about upvote probability by time slot: https://news.ycombinator.com/item?id=9864254
Edit: I think this[1] is it: "Hacker News as a case study to test the wisdom of the crowd theory". Not quite what you were asking, but you might find it interesting.
My old Posterous blog is one of the top domains ranked by average upvotes. That says something about the time when I was a better and/or more prolific essayist... And something about walled gardens.
I'm a lazy programmer who would love to blog again, but needs something as easy as Posterous. Any suggestions, anyone?
FWIW raganwald, you're one of those who people should continue to pay attention to, even 140 characters at a time. I know I do. Please keep 'em coming!
It's free for up to 200 emails per day. The most difficult parts were formatting issues between different HTML e-mail programs that you can probably ignore for your use case.
Edit: Oops, you said regular person. Didn't read carefully enough.
Math is hard. One out of every 9031 posts.
The peak starts at 12h UTC, is largest at 18h UTC, and goes down at midnight UTC – exactly what I’m used from US people in the chats I am,
and exactly 4am PST, 10am PST, and 4pm PST.
or 8am Eastern Time, 4pm Eastern Time, and 10pm Eastern Time.
Which is Silicon Valley Morning/Workday, East Coast Workday, and European Evening.
Same as reddit.
https://github.com/fhoffa/notebooks/blob/master/analyzing%20...
(Python notebook - renders well on desktop, but GitHub might not show a nice rendering if you try it on mobile)
Personally, I'm glad the growth has been curbed. Too bad we can go back to the good ol' days.
1. Cats
2. Honeybees
3. Dolphins
6. Pigeons[0]
[0] - http://blog.flypigeon.co/our-application-to-y-combinators-w1...
I wrote an overview of the 20 users with most total karma points (submissions+comments) about two years ago, which he is on when you count that way. Maybe still interesting: http://www.kmjn.org/notes/hacker_news_posters.html
Is this submissions and comments, or just subs, or just comments?
There are people who submit about 5 items per day, so I'd be mildly interested to see how many people submit eg more than one article per day.
A quick inspection of user id would have confirmed this. Should read:
6 bootload 4212 28759 PR Programmer, Melbourne, Australia
'dd367, as you probably are aware by now, Kalzumeus is the company/blog of 'patio11.
Anyway, thanks for the great analysis! One thing that surprised me was the word "lisp" not appearing in "Most Commonly Upvoted Words" table.
Since the dataset is derived from the official HN API, there is no tabulation for Comment Karma, which will result in misleading rankings if attempting to reverse-engineer overall karma.
(This was also already publicly discussed somewhere on HN previously, albeit several years ago.)
[1] https://news.ycombinator.com/threads?id=nickb
[2] Smoking gun? https://news.ycombinator.com/item?id=151461
[edit: spelling]
To expect any consistent design principles on a development medium as ad-hoc and devoid of principles as the web, is wishful thinking.
I have to calculate the indent of a comment based on the width of an image in the table layout!
Instead of nested tags for comments like
<comment>
<text>
<list-of-comment>
<comment>
...
</comment>
<comment>
...
</comment>
</list-of-comment>
</comment>
these people have all comments on main level, just indented with image width! <tr><td><image src="" width="40"></td><td>Text</td></tr>
<tr><td><image src="" width="60"></td><td>Text</td></tr>
<tr><td><image src="" width="60"></td><td>Text</td></tr>
<tr><td><image src="" width="20"></td><td>Text</td></tr>You sound like my managers.