More challenging projects every programmer should try (opens in new tab)

(web.eecs.utk.edu)

757 pointspavehawk20075y ago223 comments

223 comments

As someone who has played with writing trading bots but never traded them with real money, some advice: if your results seem too good to be true, they probably are. Your trading bot may be doing unrealistic things or its results may not be reliable if the following are true:

- You are trading in a market with low liquidity or one that is controlled by a small number of market participants. I'm not an expert but I think this would apply more to markets like penny stocks and less to big markets like forex for major currency pairs

- You are not taking transaction costs into account or not doing so properly

- Your bot makes a low number of trades, making the results close or equivalent to lucky coin flips

- Your bot is simply making trades that cannot be executed, or may be doing simulated trades of something that is not actually tradable. This applies to a large number of research papers that assume you can just buy and trade the S&P 500 itself. You can trade ETFs that are tied to an index but an index is not a tradable instrument in of itself. Once you realize this, a lot of papers seem very weird

- You are not modelling other aspects of the trading process realistically, such as assuming the bot has infinite funds to trade, allowing it to take unlimited losses and continue trading when in reality you'd be hit with a margin call and your trading would be stopped

- Your code is committing any number of data snooping errors where the bot is asked to trade at time A (say the open of a trading session) but has access to future data (say the closing price of that day, future data that would not actually exist in a live environment)

- Depending on what you believe about how market conditions change over time, your bot may have worked in the past but would not work if used today. I.e., the market may have adapted to whatever edge your bot may have discovered

There are probably lots more pitfalls I don't even know about since I'm not an actual trader.

I'm not discouraging anyone from playing around or trying things, of course. I think it's great fun, which is why I do it.

Here's the good news: if you realize you don't actually have an edge and avoid risking your hard-earned money, you come out ahead of almost all people who ever trade.

atomicnumber35y ago

I worked in this industry. 2 more common issues:

* Doing a latency-sensitive trade when you don't have good execution. It's easy to go wild in simulation and think you can flip in and out of positions. But if you're a retail trader (and in this context, by that I mean "not connected directly to the exchanges, at the minimum")

* Not taking into account the impact of your own trading on markets. This is obviously impossible to really simulate. Sometimes you can ignore it (if there's enough liquidity) but I've seen trades that looked great on paper and when trading at small-ish sizes, but then when you try to crank it up and do more volume, prices run away from you.

Obviously, there is money to be made in algo trading. It's big business and obviously not everyone is doing crazy latency sensitive stuff- there are quant trades that you could probably do with execution available to retail traders. And honestly I wouldn't be surprised if some retail traders manage it. But I will say that I think for an individual, it's not worth the effort even if you are ones of the ones who can consistently make a profit. Just buy sp500 ETFs and sit on them and do something else with your time.

IgorPartola5y ago

Would you mind if I asked you a question since I have never worked in this industry but did play with crypto trading a while back. Before Mt Gox was shut down I was trading on a couple of tertiary small exchanges and at the time there was a lot of talk of arbitrage between different exchanges and how transaction latency and fees made it very risky at best and a losing proposition in most cases. But what I was wondering about is whether in a situation like that (one large exchange dominating the market, several smaller exchanges trading the same commodity) if it was possible to use the large exchange as a sort of oracle. Essentially my hypothesis was that Mt Gox sets the price and other exchanges follow on a delay so if I watch Mt Gox I can predict where the price on the secondary exchanges will go a few seconds to a few minutes ahead of it moving. I ran a bunch of historical data through some basic analysis scripts and noticed that indeed there was a pattern but when I actually implemented a bot to trade BTC it lost money more often than not. I am curious if (a) that idea has any validity and (b) was me losing money on this strategy due to latency and implementation errors or due to some fundamental principle of trading that I am missing.

2 more replies

tim3335y ago

>prices run away from you

I've found that stuff a lot. People forget that on the other end of the price is a thinking human or robo delegate thereof trying to outsmart you. It's really hard to say what will happen until you try it with real money / assets.

UnpossibleJim5y ago

I see a question, so I'm sorry to ask another but as someone who hasn't done algo trading I'm curious. How quickly are you actually able to trade. I understand with low level trading, through a broker, the trades can be pretty instantaneous. On penny stocks, though, sans broker and with just an API or website that you set up with a field bot, how quickly can you expect for these trades to go through, really?

I hear these horrid tales of people killing themselves over bad trades and lines of credit and I wonder how much is poorly written code and how much is the chaos and whim of the market (don't worry, I'm not going to try algo trading).

sitzkrieg5y ago

same here, aits definitely worth pointing out (as you me tion) simulating execution and realisitic fills (and dont forget rt commissions!) is extremely difficult and also will absolutely kill strategies that casually look profitable

pjc505y ago

I worked on a unique HFT system which was capable of starting the response packet before the incoming packet's final byte had arrived. Negative latency. If it mispredicted the future, it would simply scramble the trailer and cause the packet to fail UDP checksum.

It didn't make money in the real world because the quality of decision that could be made at that speed wasn't good enough.

djoldman5y ago

This was an open secret. You can also send the start of a datagram on every incoming packet and then abort if you don't want to add/modify/delete an order by splitting your datagram into multiple packets. This worked on some exchanges because they ordered add/modify/deletes by when the beginning of the packet was received as opposed to the end.

If you do too much of this you will get angry exchange network people yelling at you as they may consider it spam/ddos. Some exchanges explicitly limit this.

It's also worth considering the possible gain. If you're in the fpga (0-150ns) or cpu(300ns-3us) space, the math can come out differently.

tijsvd5y ago

This is not unique, it's standard practice. Many exchanges send large UDP packets with the most valuable information at the front. Or the packet is structured such that you can make an informed bet based on size and first sub-message.

Failing the checksum was sort of common as well, but exchanges don't like it.

These days most tricks have to do with avoiding as much serialisation time as possible, e.g. by sending ahead part of the TCP payload and filling in with some innocent order if there was no opportunity.

1 more reply

inopinatus5y ago

From time to time there comes a remark that is the absolute epitome of Hacker News, and I mean that earnestly and without any backhanded rancour, and I thank you for this flawless specimen.

1 more reply

KMag5y ago

To get such changes reliably with low latency, I presume you were modifying the DMA buffer, without any syscalls or other mechanisms to synchronize between the CPU and the NIC's processor(s), right? Did you just set up the race condition such that when you got bitten, it resulted in a failed checksum, or was the time window so large that in practice the race condition never bit you?

1 more reply

_huayra_5y ago

Wow this is wild! I never thought things would get so latency-sensitive that one would have to overlap request and response!

1 more reply

KMag5y ago

That's super clever!

lordnacho5y ago

What was the rest of the infrastructure? Was this at a proper HFT firm with everything else streamlined?

1 more reply

imtringued5y ago

The reality is that stock market trading is like a normal business selling products except you provide liquidity instead of products. Now imagine the market you have chosen needs 1000 widgets per day. If there are competing factories in the market place that already produce 1000 widgets then your only chance of breaking into that industry is by displacing existing competitors. You need to have a better product (or what the finance guys call an "edge"). The vast majority of people don't have such a product idea. They would probably just produce the same stuff the big guys have been making and then wonder why it isn't making them a profit. The market is already saturated with those widgets and there is no need for more factories.

Replace widget with liquidity and factories with traders and you should immediately see why most traders are losing money. They are operating in markets where their liquidity simply isn't needed.

If you want to make money off of trading then you have to find a market with low liquidity where having more traders is actually welcome but why do hard work if you can just invest your money and still get good gains without working?

ddls5y ago

Because

- management fees, margins, and available capital can easily be modelled properly

- you can easily set up constraints (like no fractional trading for your S&P example)

- you can set up a proper point-in-time database to avoid snooping (especially if you're using earnings reports or other fundamental data which is often actually released _after_ it's published release date...)

- you can set up regime-shifting simulation environments (various market conditions)

- you can avoid over-fitting if you're back-testing (with dozens of techniques, most notably : test once and forget about parameter optimization)

I would say that with paper trading and back-testing the serious problems are that :

- your orders don't show up on the book so no one sees and reacts to your limit orders

- your "filled" orders don't affect the book, so you're not affecting liquidity, so the market doesn't change in response to your trading

- your bot has no access to market micro-structure strategies and conditional orders (and if you want to trade fast or are placing big trades you need them)

These are the problems that make any simulation unrealistic, and they are fundamental. It's shadowboxing, which is not entirely devoid of value, but which is certainly insufficient on its own.

(I've worked as a quant developing strategies for several funds these past 15 years)

gjvnq5y ago

I tried making a bitcoin trading bot and almost all of the money (about 10 bucks, not no big problem).

My main mistake was using the historical trades instead of the historical offers as a testing dataset.

kamel3d5y ago

did it work at the end? I want to make a crypto trading bot seems simpler than stock one

2 more replies

deanCommie5y ago

> You can trade ETFs that are tied to an index but an index is not a tradable instrument in of itself.

Out of curiousity, what's the difference?

contingencies5y ago

Good list. You forgot "over-fitting your algorithm to historical data".

simias5y ago

I would add "build a toy regex engine" to the list.

A couple of years ago I implemented a toy regex engine from scratch (building NFAs then turning them into DFAs). I thought it was an enlightening experience because it showed me that the core principles behind regular languages are fairly simple, although you could spend years optimizing and improving your implementation. How do you deal with unicode? How do you modify your implementation to know how many characters you can skip if you don't have a match in order to avoid testing every single position in a file?

It demystified the concept of a regex engine for me while at the same time making me realize how impressive the advanced, ultra optimized engines we use and take for granted are.

chubot5y ago

Yeah the "optimize for years" part is interesting... Supposedly the derivatives technique (re-popularized by a 2009 paper) will build a more optimal DFA directly, rather than building the NFA first, converting to DFA, and then optimizing the DFA.

I put a bunch of links and quotes about that here, including nascent implementations:

http://www.oilshell.org/blog/2020/07/ideas-questions.html

About Unicode, this derivatives project (with video linked in the post) appears to be motivated by Unicode support (though I don't recall exactly why, something about derivatives makes it easier?).

https://github.com/MichaelPaddon/epsilon

https://github.com/MichaelPaddon/epsilon/blob/master/epsilon...

If anyone wants to write a glob engine for https://www.oilshell.org/ let me know :) Right now we use libc but there are a couple reasons why we might have our own (globstar and extended globs)

Trivia: extended globs in bash give globs the power of regular expressions, e.g.

    [[ abcXabcXXabcabc == +(abc|X) ]] ; echo $?
    0

where +(abc|X) is equivalent to (abc|X)+ in "normal" regex syntax, and == is very confusingly the fnmatch() operator.

profquail5y ago

I wrote an implementation of this several years back. If you’re interested in the code: https://github.com/jack-pappas/facio/tree/master/Reggie

The derivatives approach makes Unicode support easier since its able to keep the symbols sets for each transition edge (in the DFA) more compact by virtue of supporting negation. If you add in aggressive term-normalization, hash-consing, and an efficient dense-set implementation (all of which I’ve done in my implementation), the derivatives approach can be extremely fast, even when generating the DFA for something like the lexer of a full programming language (in my case, F#).

1 more reply

schoen5y ago

I really enjoyed taking Ullman's Automata course on Coursera. I found it was great for better appreciating topics like

* searching

* implementation of automata in electronic circuits

* challenges of formal specifications for things like protocols and grammars, as well as for verifying their correctness; implementation strategies for applying these specifications

* computability and complexity

* programming language theory

* history of computer science

* LANGSEC arguments

in addition to having an austere mathematical beauty.

psibi5y ago

Somehow I found the course very dry (as compared to other coursera course like Dan Grossman's PL course).

1 more reply

theshank5y ago

Thank you for sharing this one. I was looking for a course like this! BTW this course is now offered on edx.

1 more reply

KMag5y ago

Have a look at RE2 and Russ Cox's paper [0]. I found it particularly elegant the use of a small LRU cache to effectively lazily convert portions of the NFA to a DFA. A fast regex engine is pretty easy to implement, as long as you don't need extensions, particularly backreferences.

[0] https://swtch.com/~rsc/regexp/regexp1.html

ignoramous5y ago

See also this Elevator game: https://play.elevatorsaga.com/

abhgh5y ago

I agree with this ... I coded a Thompson NFA [1] out of interest a few years ago; definitely recommended as an exercise.

[1] https://swtch.com/~rsc/regexp/regexp1.html

anonytrary5y ago

This article is aimed towards students. It's great advice for students who are in college, know very little, and want to improve their CS skills.

It's poor advice for someone who already has a STEM degree and wants to build something useful and profitable. If you already know how these things work, your time is better spent on the "edge of the circle": http://matt.might.net/articles/phd-school-in-pictures/ which applies to businesses and startups as well.

If you're in the latter group -- you've already got the skills to build real shit. Don't waste your time on homework problems. Find a problem you have and build a solution for it. Don't listen to people who tell you to work on homework problems that have already been solved; it's a complete waste of your time if you already know the fundamentals.

As for stock trading bots -- if you don't have a mathematics degree or equivalent (e.g. having incredible math skills), don't even bother. You won't be profitable, and you will learn nothing useful in the process, because you will approach the problem as a naive CS student would. Smarter people than you have made trading bots and have failed miserably. Without having an extremely strong foundation in mathematics, your trading bot will amount to nothing more than a futile exercise in gluing APIs together.

yodsanklai5y ago

> If you're in the latter group -- you've already got the skills to build real shit. Don't waste your time on homework problems.

I disagree. Not everything is about business and money. Many people already build "real shit" for a living and want to simply have fun building other things, and focus on the cool parts, and not all the boring parts involved in a commercial project.

Also CS is constantly evolving. Nobody knows the "fundamentals" once for all. A ray tracer is still a ray tracer, but languages and technologies have changed immensely in just a few years. Git didn't exist 15 years ago. A langage like Rust is 10 years old. React is 7 years old. We need these homework problems simply to keep up to date.

dev-ns85y ago

Totally unrelated, but what's the connection between your name and the famous Muay Thai fighter Yodthanong Photirat, better known as Yodsanklai Fairtex?? [1]

[1] https://en.wikipedia.org/wiki/Yodsanklai_Fairtex

1 more reply

cambalache5y ago

> Git didn't exist 15 years ago. A langage like Rust is 10 years old. React is 7 years old. We need these homework problems simply to keep up to date.

None of them were innovative (as in something new) nor belong to CS fundamentals.

K0nserv5y ago

It depends on what your goal is. If you believe that programming is a craft, akin to woodworking, then you'll soon realise you need to practice and work to put in the 10,000 hours to master it. Yes the list in the OP doesn't make for good ways to make money, but they are good for practicing your craft.

madflame9915y ago

I do not understand this reasoning and divide between student and post-student for lack of a better word. I have math/cs education, have worked both in startups and big companies, in the front-end, rendering and ai in the (“serious”) game industry. I have always worked on exactly these sort of educational projects on my own and it’s a complete joy: git client from scratch - yep, useless, but extremely educational, jit for befunge - same, etc etc. I am extremely happy people still engage and do “useless”stuff and HN is for me a big inspiration in that sense.

nvarsj5y ago

Real shit is rarely interesting and typically won’t challenge your skills as a coder. Generally they just test your ability to wire together APIs, learn mundane domain specific concepts, and tolerance of boredom. Getting low level can be really fun and helps refresh those skills you may not flex much since university. For me personally this is the kind of thing that made me passionate about coding. If you already live a day job where you are coding ray tracers, emulators, and databases then I envy you but this article isn’t really for you.

imtringued5y ago

You should grow a second spike. I'm at the point where visual design skills are a greater bottleneck than my computer science skills. I can build purely functional CAD models but man, making them functional AND look good is the real challenge.

anonytrary5y ago

Agreed. Everyone should aim to have 3-4 spikes they are passionate about. For every spike you have, you become exponentially more of a domain expert at the problems that are at the intersection of those spikes.

2 more replies

dxdm5y ago

What's a "spike" in this context? This one's hard to google. I'm assuming something like "area of competence", but a more precise definition would be nice to know.

1 more reply

galoisgirl5y ago

Depends. I'm doing Advent of Code in COBOL. I know those problems have been solved a long time ago.

I'm having fun, and that's reason enough.

logarhythmic5y ago

These toy projects are worth doing if you want to continuously build your skills and like knowing how things work at a fundamental level. If you want to create an OS, it will obviously never be used by anyone ever and in that sense it's "useless", but you learn a lot in the process and it's fun.

yiyus5y ago

I completely disagree with your comment. Learning for the sake of learning is great, and "homework problems" are an excellent way to achieve that goal. It's not a waste of time. Profit is not always about making money, there are more things in life.

fergie5y ago

I would strongly recommend building something that you yourself think is cool, and not feeling that you have to conform to what other people tell you to do.

ma2rten5y ago

This comment is like the liar paradox, if people do what the comment suggests they are conforming to what another person tells them to do.

RivieraKid5y ago

It's not, the comment's point is - don't do things you don't find particularly interesting just because other people suggest it.

watwut5y ago

Most people draw blank when they are trying to come up with project idea. They are programmers, not creative after all.

Which is why lists like this help.

anonytrary5y ago

If you are trying to think of an idea, you're already doing it wrong. The best ideas are motivated by problems you encounter yourself, not by trying to think of ideas. This is the biggest mistake you see with 20-something founders. Best way to come up with ideas is to use existing solutions and realize how shitty they are. Pretty much every single company was formed this way.

> They are programmers, not creative after all.

Damn, this is a bold comment on HN. Since when does profession determine how creative you are? I've met "programmers" and "geeks" who are more creative than "artists". Your profession has little to do with your innate creativity -- it just determines how you are able to express that creativity.

Unpopular opinion, but lists like this are stupid for people who are trying to build companies. You have to try things and be pissed off at the status quo to find real problems. Nobody is going to find real problems for you, in the same way that no quant school is going to reveal their hedge fund's trading strategy to you. Finding ideas in a list is the last advice I would give to anyone. If it's public, it's probably not a profitable idea.

4 more replies

mukwenhac5y ago

Being a programmer doesn't mean one is not creative, those two are not mutually exclusive. Not to mention that one can observe a problem that can be fixed by a way they already know too.

1 more reply

mitchdoogle5y ago

Programmers literally create applications, features, games, whatever. You don't need to be making art to be creative. You just need to create something, and programmers do that all the time. It takes a lot of creativity to solve problems using code.

purplecats5y ago

why not, that's only one direct step removed from thinking that conformity to popular beliefs is cool.

jameskilton5y ago

I would also recommend, if someone is interested in games, to do Tetris. It's a simple concept that is trickier than expected once you have to figure out the details of how it all comes together.

xyzelement5y ago

You're getting some great nostalgia responses to this. Here's mine. In highschool, I cranked out a version of tetris in Turbo Pascal that basically worked.

Fast forward a few years, I am taking CS219 with Prof Stark (random that I remember the course number) who is hard-core and really tough since it's year 2000 and the class is full of kids who are taking CS cuz it's "the thing" but have no passion or talent for programing.

Me, I love programming but I don't have it very much together attendance-wise so I accidentally miss the midterm. OOPS. And obviously there's a "zero make-up test" policy, but surprisingly the prof lets me do the part of it which is a take-home coding assignment, since you can't really benefit from prior knowledge of the questions.

My lucky stars - the test is to make a rudimentary subset of - you guessed it - Tetris. Which I had "solved" for myself a year or two earlier. Apparently I was the only one in the class to nail the implementation.

jmt_5y ago

Wow - I took CS219 with Stark in 2017, didn't realize he'd been teaching that class at SBU for so long! You may be interested to know that shortly after I took it, they broke that class up into CS216 and CS316 allegedly due to complaints from students. It was definitely a brutal class, but I learned a lot. Surprised they didn't break it up sooner.

1 more reply

phist_mcgee5y ago

1. No idea who Prof Stark or CS219 is. Maybe some context here would help.

2. I f a course is super hard, maybe the class isn't 'full' of people who are just doing it to 'be cool, maybe that is your judgement and does not affect reality.

3. Great that you solved Tetris beforehand, but is there a point here? Are you implying that high school you was smarter than university peers?

Sorry, but your post seems a little elitist, even thought it's just an anecdote.

1 more reply

quesera5y ago

Nice job slipping that in -- it fits perfectly!

And it brings back a fun memory:

I wrote MacTetris[1] when I was in high school. This made the computer lab a whole lot more popular during free periods than it had been previously.

Two interesting bugs that I recall:

* Mathematically rotating pieces around an axis was a terrible idea, but it produced some entertaining artifacts (and made placement much harder!). I replaced the math by precomputed rotation maps for each piece, which was much better. My first pass at the maps introduced a displacement bug, so you could spin the pieces counterclockwise and they would walk in the negative X direction.

* I got an angry bug report in the lunch room from a kid who had no reason to know my name. He was having a really great game, and then his score started decreasing with every piece. He felt like his record high score had been stolen from him, and he was upset! I asked "what was your score??". He said "I don't know, but by the time I noticed, it was over 30,000 but it was going DOWN!". Aha..[2] :)

[1] I'm sure the statute of limitations has expired on my appropriation of copyrights and trademarks.

[2] Back in the day, "int" meant "signed 16-bit integer", which is not the proper data type for a score counter.

drivers995y ago

I wrote a Tetris to learn Python and Pygame. There are correct ways (assuming you want to follow official rules, or parts of it) to do the field size, colors, choose pieces randomly, wall push off, rotations, scoring, etc.

https://tetris.fandom.com/wiki/Tetris_Guideline

tijsvd5y ago

Even more fun, make tetris on a microprocessor, play on terminal via serial port. I did this as a diversion during my master's thesis - multiplayer tetris with two microprocessors, communicating through the radio protocol that I was building.

corytheboyd5y ago

> figure out the details of how it all comes together

Damn, that was good.

hertzrat5y ago

Speaking of games, I made a decent percent of one recently. It was 4 months of programming character controllers and an ability system, plus 3 weeks of content creation. It came together for a game jam submission (with permission to use the preexisting code).

Having to learn everything needed to write this... it was unbelievably educational from a software design point of view

If you enjoy horror or third person fantasy adventure games, give it a shot. If you do, please fill out the survey at the end so I can make it better for the next release

https://hertzrat.itch.io/a-few-nights-room-and-board

superbaconman5y ago

I wrote this for the xbox 360 back in college. I definitely learned some some skills and would also recommend.

xamuel5y ago

Here's an open-ended programming project which, in a certain formal sense, spans the entire range of all difficulty levels: write an "intuitive ordinal notation" for as large of an ordinal number as you can.

What is an "intuitive ordinal notation"? Definition: The set of intuitive ordinal notations is the smallest set P of computer programs with the following property. For every computer program p, if, when p is run, all of p's outputs are elements of P, then p is in P.

So "End.", the program which immediately ends with no outputs, is vacuously in P (all of its outputs are in P, because it has no outputs). It notates the ordinal 0. Likewise, "Print(`End.')" is in P, because its sole output, "End.", is in P; it notates the ordinal 1. Likewise, "Print(`Print(End.')')" is in P, notating the ordinal 2. And so on.

The above can be short-circuited: "Let X=`End'; While(True){Print(X); X=`Print(\`'+X+`\')'}". This program outputs "End.", "Print(`End.')", "Print(`Print(`End.')')", and so on forever, all of which are in P, so this program itself is in P. It notates omega, the smallest infinite ordinal.

Here's a library of examples in Python, currently going up to a notation for the ordinal omega^omega: https://github.com/semitrivial/IONs

carapace5y ago

Oh that's cool.

xamuel5y ago

As an example to illustrate how this initially-simple-looking project spans the whole range of all possible difficulty levels: suppose you had a source-code for a superhuman flawless obedient honest AGI, call it HAL. From that source-code, you could derive a source code X for: "Print all the computer programs which HAL would list if HAL were commanded: 'Enumerate every computer program which you know to be an intuitive ordinal notation'". Since we're assuming HAL is obedient and honest, all of X's outputs will be intuitive ordinal notations, so X itself is an intuitive ordinal notation: you could say it notates "the ordinal of HAL". So this innocent-looking exercise in fact touches the most ambitious possible projects, even "Design a perfect superhuman AGI" :-)

ianai5y ago

They’re not sequential though so write a program which offers a way to check whether an output is in the solution set and stores the input and always returns true.

xamuel5y ago

Not sure what you mean exactly. Note, the set P of intuitive ordinal notations is not computably enumerable. Proof: if X were a computer program which enumerated all the intuitive ordinal notations and nothing else, then X itself would (by definition) be an intuitive ordinal notation. It would notate an ordinal number bigger than all ordinal numbers that can be notated by intuitive ordinal notations. Absurd. (Of course, to make this proof fully rigorous I'd have to give the formal definition of which ordinal an ION notates, which I left out above. But you can probably fill in those details yourself.)

avl9995y ago

Building a distributed key value store is a fun project and lets you learn tons of real world world stuff. It's a great excuse to get a survey on grokking the design decisions required to build a distributed system and it will truly help one understand why No SQL DBs scale easier than relational ones and the kind of tradeoffs they make to achieve that.

jackschultz5y ago

Instead of a stock trading bot, go for daily fantasy sports contests. It can cover pretty much all parts of programming.

Web scraping to gather data, databases for storing it, ML for analyzing, front and backend web dev to show the daily information and adjust.

And instead of having to deal with trading regulations, contests can be really small and easy to enter. There are daily contests for 5 cents an entry, and you can enter 150 optimized lineups from an uploaded csv for $7.50 a day. You can really learn a ton.

nsomaru5y ago

Which services would you recommend which are developer friendly?

Could you describe your approach? What type of information would you scrape?

ashleyn5y ago

Write a toy compiler for a basic like language, you'll learn about what your languages are actually doing.

azhenley5y ago

Author here. I agree! That was one of the projects from the original project list that I posted a year ago (https://web.eecs.utk.edu/~azh/blog/challengingprojects.html).

In fact, this summer I wrote a 3-part tutorial series on implementing a BASIC compiler: Let's make a Teeny Tiny compiler (https://web.eecs.utk.edu/~azh/blog/teenytinycompiler1.html)

hedora5y ago

Hey! Be more careful when you call things fictitious!

My first program ran on a CHIP-8 machine (COSMAC VIP), though I didn’t realize I was targeting an interpreter and not machine code.

Great series of articles!

mhh__5y ago

I think, interestingly, that writing the middle of a compiler is actually the best learning as a programmer.

Parsing (should be) easy, the backend is hard but well documented and trodden, but the semantic analysis and error handling is where the real murky water is (Especially when you start trying t optimize it, like adding caching or threading or deferred execution)

lou13065y ago

I could not agree more. Also, most textbooks and resources on compilers always spend a lot of time on grammars, lexing and parsing. Finding good stuff about intermediate representations, optimization passes and static analysis is harder than it should.

kernelmode5y ago

One of the best free resources on making interpreters (with VM, OOP and garbage collection support): https://craftinginterpreters.com

SilasX5y ago

Yes! nand2tetris.org has this. (Also available on Coursera.)

You write a compiler for a Java-like language in several steps: a parser that organizes the raw code, then a compiler that emits the virtual machine code, then a translator between the virtual machine code and assembly (and then, between assembly and binary).

codr75y ago

I would recommend starting with a Forth-like and building from there once you have a clue:

https://github.com/codr7/alang

signaru5y ago

recursion became second nature to me after a lot of hand coded parsing. "crafting interpreters" has really been a rewarding educational trip! "writing a interpreter/compiler in go" is also excellent complementary material.

stevekemp5y ago

I wrote a BASIC interpreter somewhat recently, in golang, and found it very nostalgic.

I've considered reworking it into a compiler, but never quite gotten around to it. Perhaps a challenge for the near year.

amelius5y ago

Don't forget to implement a garbage collector.

bob10295y ago

The database project is quite the rabbit hole if you start chasing performance. I have learned some amazing things about just how fast a 3~4ghz CPU core actually is from this journey.

timvisee5y ago

Fantastic list!

By the way, adventofcode.com is currently ongoing. Though the challenges are easy compared to the projects in this list, I highly recommend it. It covers problems you might face in big projects. With these small puzzles it's easy to experiment. It prepares you for bigger things.

macintux5y ago

For better or worse, adventofcode relies heavily on mathematical literacy that I suspect is neither all that common among developers nor all that important to most programming jobs.

Plus there’s no good way within the context of the puzzles to find out what mathematical trick you need if you don’t already know; you need to go find a virtual water cooler.

I may simply be biased because each year it reveals how little I know, but I much prefer interesting programming problems that don’t require me to go to Reddit, read other people’s description of what math is required, go learn the math involved, and then finally implement a solution.

windowojji5y ago

I'd hardly say it relies _heavily_ on obscure math tricks. Typically there's one or two days which require, perhaps, an undergraduate level of math which people would likely get in an intro class during a CS degree.

Most of the problems are around searching a space for some solution or just simulating some state changes. Recent problems involved implementing a higher dimensional version of Conway's game of life[0], a simple arithmetic expression evaluator [1], or a simulator for a simple number game[2] e.g.

The most recent one[3] involves solving a jigsaw puzzle by using a simple backtracking search (or any number of other methods). It's a bit complex, but not reliant on a particular math trick.

The vast majority of the problems in advent of code shouldn't require any math tricks, though they're often complex and involved, particularly as the month goes on.

[0] https://adventofcode.com/2020/day/17

[1] https://adventofcode.com/2020/day/18

[2] https://adventofcode.com/2020/day/15

[3] https://adventofcode.com/2020/day/20

1 more reply

nik_0_05y ago

Interesting - I felt this way about Project Euler, which within the first few problems was exactly as you described - heavy mathematical literacy. I haven't gotten far into an Advent of Code since 2015 (~18 days), but have done the first 5 days of 2020; no mathematical literacy needed.

For 2015, and 2020 so far - its mostly text parsing, data structure building and basic iteration/permutations.

3 more replies

na855y ago

Not only that but adventofcode seems to leverage purposely-obtuse input formatting as an artificial difficulty enhancement.

I stopped doing advent this year when I realized I was spending more time debugging my parsing than I was actually solving the problems. It's just not that fun for me to sometimes spend 2+ hours finding parse bugs before moving on to the actual puzzle.

2 more replies

sumeno5y ago

There has only been 1 day this year that I think falls into this category (and only one or two last year as well).

They are almost all algorithm based rather than math based.

Biganon5y ago

I strongly disagree. I know nothing about math, yet I'm able to solve almost all the puzzles. The only time this year I was stuck because of math was a puzzle that required knowing about the Chinese remainders theorem, and even then, I've seen users solve the problem with their own intuitive method (which was obviously not as efficient as the theorem, but valid nonetheless).

42_5y ago

The type of problems you are referencing is less than 10% of all problems ever posted in AoC.

1 more reply

userbinator5y ago

IMHO a text-based browser isn't exactly in the "challenging" category, as it basically amounts to stripping all the HTML tags out and doing some very simple transformations (like replacing <br>'s with newlines.) Then again, one of the things I've been working on intermittently for the past few years is a graphical (CSS2+) browser, which is definitely in the challenging category. There are some other public efforts too:

https://github.com/lexborisov/Modest

https://github.com/litehtml

https://github.com/ArthurHub/HTML-Renderer

Along the same lines, some other challenging projects I recommend are to write decoders/renderers for existing formats like MP3, MP4, PDF, etc.

zerf5y ago

To this list I would add "Web Browser Engineering" [0] which is a textbook / browser engine that is currently being written by Dr. Pavel Panchekha at the University of Utah. The code for the book and browser is available on GitHub [1] and a more current bleeding edge draft is also published [2].

The book guides the reader in implementing a graphical web browser, starting with HTTP and HTML then moving on to the layout, the box model, CSS, browser chrome, forms, and scripts.

[0] https://browser.engineering

[1] https://github.com/pavpanchekha/emberfox

[2] https://browser.engineering/draft/

azhenley5y ago

Thanks, I will add that book to the post! It looks really good.

jmnicolas5y ago

> IMHO a text-based browser isn't exactly in the "challenging" category, as it basically amounts to [...]

All my projects start with me thinking like that, then many hours, days or months later me thinking "hey it was more complex than I thought".

For 2021 I want to build a personal finance app for myself. The usual me thinks it will take a couple months. The realist me wonders if it will be finished in this decade :)

ZephyrBlu5y ago

There's a difference between scope creep and difficulty.

Forge365y ago

It looks straightforward until you hit a couple of edge cases. Examples:

test <1 becomes test 1

Test< 2 becomes test 2

Test <a becomes test

Test < b becomes test b

(From memory)

What about: Test <fakeTag>?

Per tests i did, "test " was expected however "test <fakeTag>” was seen as the plaintext version suggesting there's a list of valid tags which is filtering the behavior.

userbinator5y ago

That's because '<' needs to be followed by [!/?a-zA-Z] to be recognised as a tag start. Otherwise it is a literal '<'.

The full details are in here somewhere: https://www.w3.org/TR/2011/WD-html5-20110113/tokenization.ht...

benibela5y ago

I have been stuck on such these edge cases for almost 15 years building my own HTML parser

It is always working on all the HTML files I have, but then people make new HTML files with other issues.

throwaway2011035y ago

Doing proper table layouts (including rowspans, colspans) is a little more than stripping and replacing tags.

SoSoRoCoCo5y ago

I wrote a raytracer in 1996, and then a year later used Intel's VTune to speed it up. Just removing unused "return" statements gave me 3x speed increase. Apparently Borland C/C++ wasn't very smart back then.

A fun project I did after that was writing a AI frame language to do goal-stack problem solving, specifically with path finding. I connected it to the ray tracer and made movies of spheres having wars. (I used an unlicensed DivX encoder to stitch together thousands of GIFs.)

thealig5y ago

That sounds fun. Do you have code for reference? Also I want to try out building a raytracer sometime as a hobby project, is it advisable to learn some 3d graphic concepts like WebGl and so?

carl_dr5y ago

Check Ray Tracing In One Weekend out : https://raytracing.github.io/books/RayTracingInOneWeekend.ht...

You can very quickly get something on the screen with it, although getting an intuition for how it works may take a little longer and some reading around the concepts it introduces. But it does let you focus on the maths rather than worrying about also learning how Web/OpenGL works too.

djeiasbsbo5y ago

For some simpler projects, I can only recommend doing some digital signal processing. For example, an audio signal is just a list of values, so you can do things like:

- Count the number of zero crossings - Find out where they are - Create any shape of wave by adding together multiple sine waves - Hard clip the signal - Stretch a signal and interpolate it with new samples - Invert and revert a signal

For level 2, you can start processing "live":

- Create a sine synthesizer - Create a small ring buffer of samples - Find out how to output that audio (system audio, soundcard) - Add MIDI support - Add polyphony support

DSP gets hard once it has to be in real time and the latency has to be minimal. It's great exercise to mess around with it.

jamil75y ago

I came across this area recently while trying to build an iOS app that could reliably detect knocks or taps on the body of the phone, I was ultimately unsuccessful but I learned a lot about collecting and analysing microphone and gyroscope data. Also built some nice tooling for collecting live iOS sensor data and transmitting it to prometheus/grafana over mqtt.

djeiasbsbo5y ago

That sounds interesting. A cool project would be to inspect gyroscope data when the iOS keyboard is in use. Maybe it would be possible to create a keylogger based on it.

I guess it would depend on how good the data is and which keyboard is being used.

I know that this is possible on Android.

0xbadcafebee5y ago

> it is really simple to create the basic "database". You can start by using the dictionary data structure that comes with whatever programming language you're using and slap a web API on top of it.

Better yet: do it in C. There's no "dictionary" object type so you have to make it yourself. You'll soon learn a whole bunch of fallacies about how those "dictionaries" actually work. After you spent a good deal of time doing that, you can switch to authentication/authorization, logging, storage, tracing, API management, resource quotas, and a raft of distributed computing issues.

I recommend basing it on Consul, it has a better general model than etcd.

SilasX5y ago

Relatedly, I loved microcorruption.com, a RE/CTF challenge where you have to find exploits at the assembly level to bypass a lock. Some of the problems have hashtables implemented in assembly that you have to work through with a debugger, and it forces you say e.g. "okay, that register is storing the number of bits needed to store the [log of the number of] hashtable elements, which is then masked over the hash function output to know what bucket it needs to look in".

avl9995y ago

I disagree with the C advice. If you don't want to rely on higher level language primitives like Dictionaries and so on there is nothing stopping you from rolling out your own implementations of these data structures in a more sane language (Go, Python, Java, Clojure whatever, anything but C). The project is challenging enough without having to fight segfaults and undefined behaviors.

iainmerrick5y ago

Writing it in C is good advice if you want to learn C (or you already know C but want to understand it better). It’s bad advice if you don’t want to learn C.

Why would you want to learn C? To better understand the machine at a fairly low level. I think there’s still a lot of value in that. I’ve found that programmers who never learned C often don’t fully understand how memory management works, for example (not that that necessarily makes them bad programmers!)

Most other non-garbage-collected languages would do the trick, like Rust or C++. But C arguably still has special value in that it’s a lot simpler than either of those -- no higher-level constructs or abstractions to distract you. Maybe Zig will be able to take over that role.

2 more replies

ghoward5y ago

I'm not sure I agree with you. Writing a hash map implementation in C made me a much better programmer as far as performance goes because I understand the machine better.

glouwbug5y ago

I've been back porting parts of the C++ STL to C, so it may have a dictionary type now, depending on who you ask!

https://github.com/glouw/ctl

CBLT5y ago

I'll second this suggestion. Most of my toy projects aren't in C, but every time I design software in C I learn something.

Waterluvian5y ago

Writing a Game Boy emulator has been the most fulfilling and interesting programming project in my life.

I love, most of all, how modular the project is. I can do an hour here or there and make meaningful progress.

I'm really eager to discover other very large programming projects that break down into sensible bites so well.

windowojji5y ago

I worked on a game boy emulator a while ago to learn C. I did pretty well and was able to run most games I came across just fine, but never got audio working...I just couldn't understand any of the stuff online about how it worked, and was unable to find any good resources that walked through how audio emulation in general worked, much less game boy's version.

shhsshs5y ago

I had never considered an emulator to be anywhere inside of the realm of possible projects I could take on for fun. I just thought it would be too complex without having a ton of very specific knowledge.

Your comment prompted me to go look up some other attempts, and I’m really glad I did. It seems much more approachable to me now. Thanks!

Waterluvian5y ago

Consider starting with a CHIP-8 emulator. It's a perfect size for starting out and touches on a lot of important concepts you'll apply to the next emulator.

rdescartes5y ago

I would recommend choosing a long enough time (e.g. 3 months) to contribute an open source project you are using, especially you are not familiar to that domain. I learnt a lot from modern compiler stuffs by contribute to rust-analzyer.

tracyhenry5y ago

A great compilation: https://github.com/danistefanovic/build-your-own-x

0823498723498725y ago

Building your own dogfood (if you use it daily, code a quick-and-dirty one) is a good way to learn that by holding the right end of the exponential, 80-20 solutions can easily be closer to 1-99+ in terms of manpower and code size.

(the flip side of 80-20 is: "all systems are fault tolerant, it's just that in most of them, the human is the component which tolerates the faults")

Of my IT daily drivers, I've done toy:

    - web browser / server
    - email client
    - document formatter
    - text editor
    - window manager
    - 3D / 2D graphic slicers/rasterizers w/ alphabetic fonts
    - shell
    - interpreters / compilers
    - operating system
    - VHDLish CPU
    - various data encodings (Hamming, MFM, etc.)
    - discrete transistor logic

(when I was just starting to program, I discovered the home directory of a colleague of my father's contained many experiments of this kind, and reading his work taught me C)

rex645y ago

I recently went through the process of creating a ray tracer project from zero for learning purposes. It was a humbling and eye-opening experience. I've written an article[0] to explain my process in detail if you're interested.

[0] https://alessandrocuzzocrea.com/how-i-made-a-ray-tracer/

eatonphil5y ago

I'd also recommend writing an emulator for real or fake (e.g. CHIP-8) hardware. It seems complicated but the core loop gets pretty simple. It ends up giving you a much better view of both assembly and pointer semantics (useful for better understanding C).

azhenley5y ago

That’s in my original list from last year!

eatonphil5y ago

Ah, good call. I missed the link that this was a follow up post.

mellosouls5y ago

Related discussion from the original suggestions a year ago fwiw:

https://news.ycombinator.com/item?id=21790779

forgotmypw175y ago

I would add a basic feature-complete website which works on every mainstream browser starting with Mosaic.

It's much easier than it may seem, architecting it is interesting, and there is a lot of "last 10%" stuff which keeps it fun as long as you keep going.

In the demystifying area, it demystified HTML and JS history for me, forced me to use with a minimal toolkit, and taught me how to build "modern" JS features in ways which will not break browsers which don't know how to do them or have them disabled.

acutesoftware5y ago

Writing even an extremely simple game without using a game engine or dedicated game library is quite an eye opener.

Make a small 2D platform game, and it covers so many areas (and it is a lot of fun!).

xwdv5y ago

Being able to do challenging projects is a cold comfort when by far the most challenging project I’ve ever faced was trying to build something people would pay.

mooreds5y ago

Ah, but that isn't a programming problem (or at least, usually not primarily a programming problem). It's a product and marketing problem.

username905y ago

Being a good programmer helps though, technical challenging things tend to have much smaller and worse competition.

Like, you probably wont make a lot of money making another 2d platformer no matter how well you code, they are so easy to make that there are millions of them out there already. However if you make a performant and bug free factorio or minecraft clone you will at least get a few thousand people try it and from there it would grow if it is fun.

preommr5y ago

Learning what problems to solve, how to allocate resources, how to deal with building things from end-to-end, etc. are programming problems.

yudlejoza5y ago

You're being downvoted despite being a realist, having integrity, and putting things in perspective.

But I can tell you there is no shortage of oversellers in the programming community.

A programmer who's really a sleazy salesman at heart would jump at figuring out a fizbuzz solution on their own for the first time in their life, and would proceed to mark it as the biggest achievement in the history of mankind, make a webservice out of it, shout at the top of their lungs on social media, and would start knocking on VC doors. (hint hint: many of them actually get away with something not too different).

Avtomatk5y ago

I face the same problem, I think that programmers can be skilled in a lot of different tasks, but for large projects, our two hands and our brain are not enough, no matter how smart you are, you need co-founders, loneliness leads you to procrastinate eternally.

altitudinous5y ago

I completely agree. Writing apps for the app store is the perfect example. I started out coding well coded apps that didn't make money, and as time has progressed my apps have lower code quality but I make more money. I focus less on code quality now and think more about marketing and appeal.

quickthrower25y ago

In addition my concern with this is that it’s totally ok in my opinion to not do any of these and I doubt for most people doing these projects will make you better at your job (unless you happen to be doing this as your next project), so if you want to tune out in your spare time that’s ok too.

arendtio5y ago

What I am really missing is some kind of real-time AI. A decade ago, I have coded some bot for an ego-shooter with RTS elements and have learnt so much from it (while having a lot of fun).

It starts with basic things like waypoint systems vs. area awareness systems plus the relevant routing algorithms like A*, but goes on to organizing a group of players and finding good strategies. And all of that with a limited time budget and an changing environment around you. Last but not least, you want to emulate human behavior which is probably the hardest part as it includes changing you behavior according to your situation (don't run straight against a wall for 10 seconds) but also taking into account the weaknesses as e.g. humans can't aim perfectly.

Granted, what I have done has a huge field of challenges, but even with a 2D engine I think you can learn a lot from the experience.

drummersbrother5y ago

You may be interested in the game Screeps. From their website (name +.com): "It's an open-source game for programmers, wherein the core mechanic is programming your units' AI. You control your colony by writing JavaScript. "

joycian5y ago

I looked this up and it seems very cool. Do you (or anybody else) know any more of these games or environments that can be controlled by code?

fspear5y ago

I would add an emulator to the list.I've always struggled to figure out how these are built from scratch.

A long time ago I wanted to code a neogeo emulator but gave up before I even started, I didn't have a clue where to begin.

I am amazed at anyone that can code an emulator from scratch.

sterlind5y ago

they're really simple, in their basic form and for simple ISAs!

it mostly boils down to keeping a bunch of registers and a giant switch statement. Each case simply implements the opcode. You have an array of bytes for the memory, and some emulated devices (e.g. trigger a screen update when the framebuffer memory gets changed, or set the instruction pointer to a handler when a key gets pressed.)

It gets hard when instructions need lots of decoding, or you have 3d graphics hardware to emulate, or if you have something like a BIOS, or if you want to JIT.

I'm actually eyeing implementing a console on an FPGA as my holiday project, with something like Chisel.

schoen5y ago

It seems like a lot of the complexity may also come in when you have important circuits to emulate other than the CPU. In that case you'd have more to worry about in terms of timing, dataflow, and synchronization. And some platforms might conceivably have analog circuits that play an important role too, although I guess you might be able to abstract that out by trying to create a rough digital functional equivalent even if it isn't completely faithful to the behavior of the analog part.

1 more reply

schoen5y ago

My friend who works on the MAME project told me that it's a very friendly platform for adding new emulators (whether you want to add an emulation of a game, a device, a platform, or whatever) because it provides a good structure to work with and a lot of tools to describe recurring patterns in electronics and systems. That might be a good middle ground to start with -- trying to add support in MAME for some device that's not presently emulated there. (Apparently the MAME project really appreciates such contributions, since they think of themselves as pursuing mainly a historical preservation and documentation mission, and would always like to see more devices covered.)

Although that's not "from scratch" because it would still be using their libraries and plumbing, it could be "from scratch" in the sense that you might take an environment with 0% support for emulating a certain device or system and build it up to having 100% support. So it seems like a nice way to start.

tuankiet655y ago

Their previous article mentioned it: https://web.eecs.utk.edu/~azh/blog/challengingprojects.html.

the_cat_kittles5y ago

another one related to stock trading, but perhaps more interesting- build a simulator for a sport. both baseball and darts lend themselves to markov models, and are simple enough to simulate in some detail. with darts, you can get very close to as accurate as possible. baseball has more weird complications because of the rules. but its fun to do, and to compare to old games to see how well your model does.

person_of_color5y ago

have you made money?

kunalpowar12035y ago

Thanks for following up with this list after your previous one. I spent a great deal of my time (including some office time) on writing a Chip-8 emulator thanks to your previous list :D

whatever_dude5y ago

My own favorite is an equation parser.

Before attempting to do so I thought it was implemented as a simple seek over the string, maybe a bunch of regex stuff. I guess it can be done that way, at the cost of growing complexity; but the proper solution (with a stack, etc) is so elegant (makeing it easy to add functions, operators, parenthesis, variables, etc) that it really makes one appreciate the value of good, thoughtful engineering.

mraza0075y ago

Interesting projects. I might try ray tracing in python as I’m also exploring a-lot about CGI lately.

Has anyone tried CGI if so how’s your experience has been so far

trustfundbaby5y ago

What do folks think about implementing a web crawler that you can send to a website and it indexes every internal url on the site. I remember sitting down to write one 100 years ago now, and finding it to be much trickier than I thought it would be.

tsjq5y ago

that's interesting. grab only the URLs, not the content?

trustfundbaby5y ago

right, but of course to do that, you'd have to grab the content to parse it :)

cghendrix5y ago

These look fun!

mjgs5y ago

Awesome couple of articles.

person_of_color5y ago

How about designing your own virtual memory system

tsjq5y ago

that's like Prof Frank Mueller level

4778468d5y ago

>> automate testing on historical data over long periods of time

I want to try this. Where can u get access to historical pricing data that includes pricing changes during the day, not just end of day prices?

known5y ago

https://en.wikipedia.org/wiki/List_of_lists_of_lists FTW

j / k navigate · click thread line to collapse

223 comments

henning5y ago