A little bit of context: I have done a lot of hadoop, and also well aware of spark and storm. Storm is mostly well suited for handling a stream of real-time data. Spark is specifically for running iterative algorithms - it can read from HDFS, and with the expressiveness of Scala, it's great for building machine-learning related stuff.
However, 5GB of data is literally nothing, and that statement holds till your data size is atleast 50-60 GB. Given that 64 GAM RAM machines are now commodity, I would just load the entire thing in RAM and write a multi-threaded program. Sounds old school, but regardless of how well documented hadoop, spark and storm are, there is nevertheless a learning curve and a maintenance cost. Both of which are well worth only if you see your data rapidly growing to the X TB range. Otherwise, it might be just easier to stick it in a single machine and get stuff done.
You can stick to Scala/Java, and so long you develop good abstractions around your core algorithms, you can always move to spark/hadoop when you need it. Feel free to send me an email if you want to talk more (email in profile).