- You are trading in a market with low liquidity or one that is controlled by a small number of market participants. I'm not an expert but I think this would apply more to markets like penny stocks and less to big markets like forex for major currency pairs
- You are not taking transaction costs into account or not doing so properly
- Your bot makes a low number of trades, making the results close or equivalent to lucky coin flips
- Your bot is simply making trades that cannot be executed, or may be doing simulated trades of something that is not actually tradable. This applies to a large number of research papers that assume you can just buy and trade the S&P 500 itself. You can trade ETFs that are tied to an index but an index is not a tradable instrument in of itself. Once you realize this, a lot of papers seem very weird
- You are not modelling other aspects of the trading process realistically, such as assuming the bot has infinite funds to trade, allowing it to take unlimited losses and continue trading when in reality you'd be hit with a margin call and your trading would be stopped
- Your code is committing any number of data snooping errors where the bot is asked to trade at time A (say the open of a trading session) but has access to future data (say the closing price of that day, future data that would not actually exist in a live environment)
- Depending on what you believe about how market conditions change over time, your bot may have worked in the past but would not work if used today. I.e., the market may have adapted to whatever edge your bot may have discovered
There are probably lots more pitfalls I don't even know about since I'm not an actual trader.
I'm not discouraging anyone from playing around or trying things, of course. I think it's great fun, which is why I do it.
Here's the good news: if you realize you don't actually have an edge and avoid risking your hard-earned money, you come out ahead of almost all people who ever trade.
* Doing a latency-sensitive trade when you don't have good execution. It's easy to go wild in simulation and think you can flip in and out of positions. But if you're a retail trader (and in this context, by that I mean "not connected directly to the exchanges, at the minimum")
* Not taking into account the impact of your own trading on markets. This is obviously impossible to really simulate. Sometimes you can ignore it (if there's enough liquidity) but I've seen trades that looked great on paper and when trading at small-ish sizes, but then when you try to crank it up and do more volume, prices run away from you.
Obviously, there is money to be made in algo trading. It's big business and obviously not everyone is doing crazy latency sensitive stuff- there are quant trades that you could probably do with execution available to retail traders. And honestly I wouldn't be surprised if some retail traders manage it. But I will say that I think for an individual, it's not worth the effort even if you are ones of the ones who can consistently make a profit. Just buy sp500 ETFs and sit on them and do something else with your time.
I've found that stuff a lot. People forget that on the other end of the price is a thinking human or robo delegate thereof trying to outsmart you. It's really hard to say what will happen until you try it with real money / assets.
I hear these horrid tales of people killing themselves over bad trades and lines of credit and I wonder how much is poorly written code and how much is the chaos and whim of the market (don't worry, I'm not going to try algo trading).
It didn't make money in the real world because the quality of decision that could be made at that speed wasn't good enough.
If you do too much of this you will get angry exchange network people yelling at you as they may consider it spam/ddos. Some exchanges explicitly limit this.
It's also worth considering the possible gain. If you're in the fpga (0-150ns) or cpu(300ns-3us) space, the math can come out differently.
Failing the checksum was sort of common as well, but exchanges don't like it.
These days most tricks have to do with avoiding as much serialisation time as possible, e.g. by sending ahead part of the TCP payload and filling in with some innocent order if there was no opportunity.
Replace widget with liquidity and factories with traders and you should immediately see why most traders are losing money. They are operating in markets where their liquidity simply isn't needed.
If you want to make money off of trading then you have to find a market with low liquidity where having more traders is actually welcome but why do hard work if you can just invest your money and still get good gains without working?
- management fees, margins, and available capital can easily be modelled properly
- you can easily set up constraints (like no fractional trading for your S&P example)
- you can set up a proper point-in-time database to avoid snooping (especially if you're using earnings reports or other fundamental data which is often actually released _after_ it's published release date...)
- you can set up regime-shifting simulation environments (various market conditions)
- you can avoid over-fitting if you're back-testing (with dozens of techniques, most notably : test once and forget about parameter optimization)
I would say that with paper trading and back-testing the serious problems are that :
- your orders don't show up on the book so no one sees and reacts to your limit orders
- your "filled" orders don't affect the book, so you're not affecting liquidity, so the market doesn't change in response to your trading
- your bot has no access to market micro-structure strategies and conditional orders (and if you want to trade fast or are placing big trades you need them)
These are the problems that make any simulation unrealistic, and they are fundamental. It's shadowboxing, which is not entirely devoid of value, but which is certainly insufficient on its own.
(I've worked as a quant developing strategies for several funds these past 15 years)
My main mistake was using the historical trades instead of the historical offers as a testing dataset.
Out of curiousity, what's the difference?
A couple of years ago I implemented a toy regex engine from scratch (building NFAs then turning them into DFAs). I thought it was an enlightening experience because it showed me that the core principles behind regular languages are fairly simple, although you could spend years optimizing and improving your implementation. How do you deal with unicode? How do you modify your implementation to know how many characters you can skip if you don't have a match in order to avoid testing every single position in a file?
It demystified the concept of a regex engine for me while at the same time making me realize how impressive the advanced, ultra optimized engines we use and take for granted are.
I put a bunch of links and quotes about that here, including nascent implementations:
http://www.oilshell.org/blog/2020/07/ideas-questions.html
Also related: http://www.oilshell.org/blog/2020/07/eggex-theory.html
About Unicode, this derivatives project (with video linked in the post) appears to be motivated by Unicode support (though I don't recall exactly why, something about derivatives makes it easier?).
https://github.com/MichaelPaddon/epsilon
https://github.com/MichaelPaddon/epsilon/blob/master/epsilon...
If anyone wants to write a glob engine for https://www.oilshell.org/ let me know :) Right now we use libc but there are a couple reasons why we might have our own (globstar and extended globs)
Trivia: extended globs in bash give globs the power of regular expressions, e.g.
[[ abcXabcXXabcabc == +(abc|X) ]] ; echo $?
0
where +(abc|X) is equivalent to (abc|X)+ in "normal" regex syntax, and == is very confusingly the fnmatch() operator.The derivatives approach makes Unicode support easier since its able to keep the symbols sets for each transition edge (in the DFA) more compact by virtue of supporting negation. If you add in aggressive term-normalization, hash-consing, and an efficient dense-set implementation (all of which I’ve done in my implementation), the derivatives approach can be extremely fast, even when generating the DFA for something like the lexer of a full programming language (in my case, F#).
* searching
* implementation of automata in electronic circuits
* challenges of formal specifications for things like protocols and grammars, as well as for verifying their correctness; implementation strategies for applying these specifications
* computability and complexity
* programming language theory
* history of computer science
* LANGSEC arguments
in addition to having an austere mathematical beauty.
It's poor advice for someone who already has a STEM degree and wants to build something useful and profitable. If you already know how these things work, your time is better spent on the "edge of the circle": http://matt.might.net/articles/phd-school-in-pictures/ which applies to businesses and startups as well.
If you're in the latter group -- you've already got the skills to build real shit. Don't waste your time on homework problems. Find a problem you have and build a solution for it. Don't listen to people who tell you to work on homework problems that have already been solved; it's a complete waste of your time if you already know the fundamentals.
As for stock trading bots -- if you don't have a mathematics degree or equivalent (e.g. having incredible math skills), don't even bother. You won't be profitable, and you will learn nothing useful in the process, because you will approach the problem as a naive CS student would. Smarter people than you have made trading bots and have failed miserably. Without having an extremely strong foundation in mathematics, your trading bot will amount to nothing more than a futile exercise in gluing APIs together.
I disagree. Not everything is about business and money. Many people already build "real shit" for a living and want to simply have fun building other things, and focus on the cool parts, and not all the boring parts involved in a commercial project.
Also CS is constantly evolving. Nobody knows the "fundamentals" once for all. A ray tracer is still a ray tracer, but languages and technologies have changed immensely in just a few years. Git didn't exist 15 years ago. A langage like Rust is 10 years old. React is 7 years old. We need these homework problems simply to keep up to date.
None of them were innovative (as in something new) nor belong to CS fundamentals.
I'm having fun, and that's reason enough.
Which is why lists like this help.
> They are programmers, not creative after all.
Damn, this is a bold comment on HN. Since when does profession determine how creative you are? I've met "programmers" and "geeks" who are more creative than "artists". Your profession has little to do with your innate creativity -- it just determines how you are able to express that creativity.
Unpopular opinion, but lists like this are stupid for people who are trying to build companies. You have to try things and be pissed off at the status quo to find real problems. Nobody is going to find real problems for you, in the same way that no quant school is going to reveal their hedge fund's trading strategy to you. Finding ideas in a list is the last advice I would give to anyone. If it's public, it's probably not a profitable idea.
Fast forward a few years, I am taking CS219 with Prof Stark (random that I remember the course number) who is hard-core and really tough since it's year 2000 and the class is full of kids who are taking CS cuz it's "the thing" but have no passion or talent for programing.
Me, I love programming but I don't have it very much together attendance-wise so I accidentally miss the midterm. OOPS. And obviously there's a "zero make-up test" policy, but surprisingly the prof lets me do the part of it which is a take-home coding assignment, since you can't really benefit from prior knowledge of the questions.
My lucky stars - the test is to make a rudimentary subset of - you guessed it - Tetris. Which I had "solved" for myself a year or two earlier. Apparently I was the only one in the class to nail the implementation.
2. I f a course is super hard, maybe the class isn't 'full' of people who are just doing it to 'be cool, maybe that is your judgement and does not affect reality.
3. Great that you solved Tetris beforehand, but is there a point here? Are you implying that high school you was smarter than university peers?
Sorry, but your post seems a little elitist, even thought it's just an anecdote.
And it brings back a fun memory:
I wrote MacTetris[1] when I was in high school. This made the computer lab a whole lot more popular during free periods than it had been previously.
Two interesting bugs that I recall:
* Mathematically rotating pieces around an axis was a terrible idea, but it produced some entertaining artifacts (and made placement much harder!). I replaced the math by precomputed rotation maps for each piece, which was much better. My first pass at the maps introduced a displacement bug, so you could spin the pieces counterclockwise and they would walk in the negative X direction.
* I got an angry bug report in the lunch room from a kid who had no reason to know my name. He was having a really great game, and then his score started decreasing with every piece. He felt like his record high score had been stolen from him, and he was upset! I asked "what was your score??". He said "I don't know, but by the time I noticed, it was over 30,000 but it was going DOWN!". Aha..[2] :)
[1] I'm sure the statute of limitations has expired on my appropriation of copyrights and trademarks.
[2] Back in the day, "int" meant "signed 16-bit integer", which is not the proper data type for a score counter.
Damn, that was good.
Having to learn everything needed to write this... it was unbelievably educational from a software design point of view
If you enjoy horror or third person fantasy adventure games, give it a shot. If you do, please fill out the survey at the end so I can make it better for the next release
What is an "intuitive ordinal notation"? Definition: The set of intuitive ordinal notations is the smallest set P of computer programs with the following property. For every computer program p, if, when p is run, all of p's outputs are elements of P, then p is in P.
So "End.", the program which immediately ends with no outputs, is vacuously in P (all of its outputs are in P, because it has no outputs). It notates the ordinal 0. Likewise, "Print(`End.')" is in P, because its sole output, "End.", is in P; it notates the ordinal 1. Likewise, "Print(`Print(End.')')" is in P, notating the ordinal 2. And so on.
The above can be short-circuited: "Let X=`End'; While(True){Print(X); X=`Print(\`'+X+`\')'}". This program outputs "End.", "Print(`End.')", "Print(`Print(`End.')')", and so on forever, all of which are in P, so this program itself is in P. It notates omega, the smallest infinite ordinal.
Here's a library of examples in Python, currently going up to a notation for the ordinal omega^omega: https://github.com/semitrivial/IONs
Web scraping to gather data, databases for storing it, ML for analyzing, front and backend web dev to show the daily information and adjust.
And instead of having to deal with trading regulations, contests can be really small and easy to enter. There are daily contests for 5 cents an entry, and you can enter 150 optimized lineups from an uploaded csv for $7.50 a day. You can really learn a ton.
Could you describe your approach? What type of information would you scrape?
In fact, this summer I wrote a 3-part tutorial series on implementing a BASIC compiler: Let's make a Teeny Tiny compiler (https://web.eecs.utk.edu/~azh/blog/teenytinycompiler1.html)
My first program ran on a CHIP-8 machine (COSMAC VIP), though I didn’t realize I was targeting an interpreter and not machine code.
Great series of articles!
Parsing (should be) easy, the backend is hard but well documented and trodden, but the semantic analysis and error handling is where the real murky water is (Especially when you start trying t optimize it, like adding caching or threading or deferred execution)
You write a compiler for a Java-like language in several steps: a parser that organizes the raw code, then a compiler that emits the virtual machine code, then a translator between the virtual machine code and assembly (and then, between assembly and binary).
I've considered reworking it into a compiler, but never quite gotten around to it. Perhaps a challenge for the near year.
By the way, adventofcode.com is currently ongoing. Though the challenges are easy compared to the projects in this list, I highly recommend it. It covers problems you might face in big projects. With these small puzzles it's easy to experiment. It prepares you for bigger things.
Plus there’s no good way within the context of the puzzles to find out what mathematical trick you need if you don’t already know; you need to go find a virtual water cooler.
I may simply be biased because each year it reveals how little I know, but I much prefer interesting programming problems that don’t require me to go to Reddit, read other people’s description of what math is required, go learn the math involved, and then finally implement a solution.
Most of the problems are around searching a space for some solution or just simulating some state changes. Recent problems involved implementing a higher dimensional version of Conway's game of life[0], a simple arithmetic expression evaluator [1], or a simulator for a simple number game[2] e.g.
The most recent one[3] involves solving a jigsaw puzzle by using a simple backtracking search (or any number of other methods). It's a bit complex, but not reliant on a particular math trick.
The vast majority of the problems in advent of code shouldn't require any math tricks, though they're often complex and involved, particularly as the month goes on.
[0] https://adventofcode.com/2020/day/17
[1] https://adventofcode.com/2020/day/18
For 2015, and 2020 so far - its mostly text parsing, data structure building and basic iteration/permutations.
I stopped doing advent this year when I realized I was spending more time debugging my parsing than I was actually solving the problems. It's just not that fun for me to sometimes spend 2+ hours finding parse bugs before moving on to the actual puzzle.
They are almost all algorithm based rather than math based.
https://github.com/lexborisov/Modest
https://github.com/ArthurHub/HTML-Renderer
Along the same lines, some other challenging projects I recommend are to write decoders/renderers for existing formats like MP3, MP4, PDF, etc.
The book guides the reader in implementing a graphical web browser, starting with HTTP and HTML then moving on to the layout, the box model, CSS, browser chrome, forms, and scripts.
[0] https://browser.engineering
All my projects start with me thinking like that, then many hours, days or months later me thinking "hey it was more complex than I thought".
For 2021 I want to build a personal finance app for myself. The usual me thinks it will take a couple months. The realist me wonders if it will be finished in this decade :)
test <1 becomes test 1
Test< 2 becomes test 2
Test <a becomes test
Test < b becomes test b
(From memory)
What about: Test <fakeTag>?
Per tests i did, "test " was expected however "test <fakeTag>” was seen as the plaintext version suggesting there's a list of valid tags which is filtering the behavior.
The full details are in here somewhere: https://www.w3.org/TR/2011/WD-html5-20110113/tokenization.ht...
It is always working on all the HTML files I have, but then people make new HTML files with other issues.
A fun project I did after that was writing a AI frame language to do goal-stack problem solving, specifically with path finding. I connected it to the ray tracer and made movies of spheres having wars. (I used an unlicensed DivX encoder to stitch together thousands of GIFs.)
You can very quickly get something on the screen with it, although getting an intuition for how it works may take a little longer and some reading around the concepts it introduces. But it does let you focus on the maths rather than worrying about also learning how Web/OpenGL works too.
- Count the number of zero crossings - Find out where they are - Create any shape of wave by adding together multiple sine waves - Hard clip the signal - Stretch a signal and interpolate it with new samples - Invert and revert a signal
For level 2, you can start processing "live":
- Create a sine synthesizer - Create a small ring buffer of samples - Find out how to output that audio (system audio, soundcard) - Add MIDI support - Add polyphony support
DSP gets hard once it has to be in real time and the latency has to be minimal. It's great exercise to mess around with it.
I guess it would depend on how good the data is and which keyboard is being used.
I know that this is possible on Android.
Better yet: do it in C. There's no "dictionary" object type so you have to make it yourself. You'll soon learn a whole bunch of fallacies about how those "dictionaries" actually work. After you spent a good deal of time doing that, you can switch to authentication/authorization, logging, storage, tracing, API management, resource quotas, and a raft of distributed computing issues.
I recommend basing it on Consul, it has a better general model than etcd.
Why would you want to learn C? To better understand the machine at a fairly low level. I think there’s still a lot of value in that. I’ve found that programmers who never learned C often don’t fully understand how memory management works, for example (not that that necessarily makes them bad programmers!)
Most other non-garbage-collected languages would do the trick, like Rust or C++. But C arguably still has special value in that it’s a lot simpler than either of those -- no higher-level constructs or abstractions to distract you. Maybe Zig will be able to take over that role.
I love, most of all, how modular the project is. I can do an hour here or there and make meaningful progress.
I'm really eager to discover other very large programming projects that break down into sensible bites so well.
Your comment prompted me to go look up some other attempts, and I’m really glad I did. It seems much more approachable to me now. Thanks!
(the flip side of 80-20 is: "all systems are fault tolerant, it's just that in most of them, the human is the component which tolerates the faults")
Of my IT daily drivers, I've done toy:
- web browser / server
- email client
- document formatter
- text editor
- window manager
- 3D / 2D graphic slicers/rasterizers w/ alphabetic fonts
- shell
- interpreters / compilers
- operating system
- VHDLish CPU
- various data encodings (Hamming, MFM, etc.)
- discrete transistor logic
(when I was just starting to program, I discovered the home directory of a colleague of my father's contained many experiments of this kind, and reading his work taught me C)[0] https://alessandrocuzzocrea.com/how-i-made-a-ray-tracer/
It's much easier than it may seem, architecting it is interesting, and there is a lot of "last 10%" stuff which keeps it fun as long as you keep going.
In the demystifying area, it demystified HTML and JS history for me, forced me to use with a minimal toolkit, and taught me how to build "modern" JS features in ways which will not break browsers which don't know how to do them or have them disabled.
Make a small 2D platform game, and it covers so many areas (and it is a lot of fun!).
Like, you probably wont make a lot of money making another 2d platformer no matter how well you code, they are so easy to make that there are millions of them out there already. However if you make a performant and bug free factorio or minecraft clone you will at least get a few thousand people try it and from there it would grow if it is fun.
But I can tell you there is no shortage of oversellers in the programming community.
A programmer who's really a sleazy salesman at heart would jump at figuring out a fizbuzz solution on their own for the first time in their life, and would proceed to mark it as the biggest achievement in the history of mankind, make a webservice out of it, shout at the top of their lungs on social media, and would start knocking on VC doors. (hint hint: many of them actually get away with something not too different).
It starts with basic things like waypoint systems vs. area awareness systems plus the relevant routing algorithms like A*, but goes on to organizing a group of players and finding good strategies. And all of that with a limited time budget and an changing environment around you. Last but not least, you want to emulate human behavior which is probably the hardest part as it includes changing you behavior according to your situation (don't run straight against a wall for 10 seconds) but also taking into account the weaknesses as e.g. humans can't aim perfectly.
Granted, what I have done has a huge field of challenges, but even with a 2D engine I think you can learn a lot from the experience.
A long time ago I wanted to code a neogeo emulator but gave up before I even started, I didn't have a clue where to begin.
I am amazed at anyone that can code an emulator from scratch.
it mostly boils down to keeping a bunch of registers and a giant switch statement. Each case simply implements the opcode. You have an array of bytes for the memory, and some emulated devices (e.g. trigger a screen update when the framebuffer memory gets changed, or set the instruction pointer to a handler when a key gets pressed.)
It gets hard when instructions need lots of decoding, or you have 3d graphics hardware to emulate, or if you have something like a BIOS, or if you want to JIT.
I'm actually eyeing implementing a console on an FPGA as my holiday project, with something like Chisel.
Although that's not "from scratch" because it would still be using their libraries and plumbing, it could be "from scratch" in the sense that you might take an environment with 0% support for emulating a certain device or system and build it up to having 100% support. So it seems like a nice way to start.
Before attempting to do so I thought it was implemented as a simple seek over the string, maybe a bunch of regex stuff. I guess it can be done that way, at the cost of growing complexity; but the proper solution (with a stack, etc) is so elegant (makeing it easy to add functions, operators, parenthesis, variables, etc) that it really makes one appreciate the value of good, thoughtful engineering.
Has anyone tried CGI if so how’s your experience has been so far
I want to try this. Where can u get access to historical pricing data that includes pricing changes during the day, not just end of day prices?