Those who follow fine-tunes of LLMs may know that there’s a company called Nous Research has been releasing a series of fine-tuned models called the Hermes, which seem to have great performance.
Since post-training is relatively cheaper than pre-training, “so” I also want to get into post-training and fine-tuning. Given that I'm GPU poor, with only a M4 MBP and some Tinker credits, so I was wondering if you have any advice and/or recommendations for getting into post-training? For instance, do you think this book https://www.manning.com/books/the-rlhf-book is a good place to start? If not, what’s your other recommendations?
I’m also currently reading “Hands-on LLM” and “Build a LLM from scratch” if that helps.
Many thanks for your time!
Recently I have come across the Moth Fund [0], which to me seems like an interesting VC firm, since it seems to be managed by one person. The feeling or the vibe conveyed by its website is also different from that of other VC firms. So I was wondering that for people whose companies have been supported and backed by Moth Fund, may I ask what's your experience is like?
Many thanks!
[0]: https://www.mothfund.com/
I learned some basic ML from Andrew Ng's Coursera course more than 10 years ago, recently I graduated from the Math Master program and have some free time in my hand, so I am thinking about picking up ML/DL again.
In [Yacine's video](https://www.youtube.com/watch?v=ph6PIchDOcQ), he mentioned [fast.ai's course](https://course.fast.ai/), which I heard of in the past but didn't look into too much. The table of contents of [the book](https://www.amazon.com/Deep-Learning-Coders-fastai-PyTorch/dp/1492045527) looks pretty solid, but it was published in 2020, so I was wondering given the pace of AI development, do you think this book or course series is still a good choice and relevant for today's learners?
*To provide more context about me*: I did math major and CS minor (with Python background) during undergrad but have never taken any ML/DL courses (other than that Coursera one), and I just finished the Master program in math, though I have background and always have interests in graph theory, combinatorics, and theoretical computer science.
I have two books "Hands-on Machine Learning" by Geron and "Hands-on LLMs" by Alammar and Grootendorst, and plan to finish Stanford's CS224N and CS336 and CMU's DL systems when I have enough background knowledges. I am interested in building and improving intelligent systems such as DeepProver and AlphaProof that can be used to improve math proof/research.
Thank you a lot!
I am a Master student in math in Germany and will graduate recently, since I am also interested in programming and LLM, I have been following HN and subs like r/machinelearning and r/localllama for some time. I have been really impressed by projects such as Unsloth that could help people train and/or fine-tune on commercial-grade GPUs.
I also want to develop/contribute to projects such as Unsloth, however, I'm not sure what are the background knowledge or skill sets one should have to contribute. So I was wondering if you have any recommendations of resources and materials for learning more about (post-)training, fine-tuning, and general ML/LLM Ops?
Thank you a lot!
P.S.: I'm reading and working through the book "Hands-on Large Language Model", and I have a Macbook Pro but no NVDA GPUs.
I am reading “Fire in the Valley” and it seems that back then it was relatively easy for hobbyists to build and sell computers. Borland also had “humble beginnings” like selling Pascal.
Now, software development is faster than before and hardware business seems to require more capital and expertise. Some big companies also seem to react to competition faster than before, not to mention they have the ability to acquire small companies.
So I was wondering that is it hard to build a successful company or startup now than in the past? Many thanks!
I am currently a Master student in Math interested in discrete math and theoretical computer science, and I have submitted PhD applications in these fields as well. However, recently as we have seen advances of reasoning capacity of foundational models, I'm also interested in pursuing *ML/LLM reasoning and mechanistic interpretability*, with goals such as applying reasoning models to formalised math proofs (e.g., Lean) and understanding the theoretical foundations of neural networks and/or architectures, such as the transformer.
If I really pursue a PhD in these directions, I may be torn between academic jobs and industry jobs, so I was wondering if you could help me with some questions:
1. I have learned here and elsewhere that AI research in academic institutions is really cutting-throat, or that PhD students would have to work hard (I'm not opposed to working hard, but to working too hard). Or would you say that only engineering-focused research teams would be more like this, and the theory ones are more chill, relatively?
2. Other than academic research, if possible, I'm also interested in pursuing building business based on ML/DL/LLM. From your experience and/or discussions with other people, do you think a PhD is more like something nice to have or a must-have in these scenarios? Or would you say that it depends on the nature of the business/product? For instance, there's a weather forecast company that uses atmospheric foundational models, which I believe would require knowledge from both CS and atmospheric science.
Many thanks!