An old blog post

+++ title = "Large Language Models Reading List" date = "2023-07-19" description = "A compilation of good resources to learn about LLMs" +++ 2023 has been the year of the LLM. Between the ChatGPT explosion and the AI bubble replacing the crypto wave, there has been a lot of interest in this new space. I've had a lot of clients and teammates reach out asking how to learn more about the technology. For the sake of having one place to point them to (plus giving myself a centralized repo for future reference), I've compiled them here: - https://blog.eladgil.com/p/ai-revolution-transformers-and-large: a bit dated by the standards of change in the space (all the way from 2022), but presents a really good, high level view on LLMs from a more venture // product focused perspective. good read for those that wear business hats and/or those trying to build products that leverage LLMs. If you have more meetings than you have time to code or research, this is probably the one for you. - https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/: truly one of the best explainers on what's going on under the hood of these large models. it can be a bit technical (especially for more business facing folks), but I can't think of a better resource for anyone that wants to wrap their head around what these models are actually doing and dive deeper than an AI threadboy on twitter. I'd say its mandatory reading. - https://arxiv.org/abs/1706.03762: the paper that kicked this all off. At the time of its release, the authors were working out of Google's AI research divison. This paper introduced the transformer architecture, which led to the explosion of transformer based foundation models that are being released to the wider public. Some novel and key concepts of the architecture were first introduced here, such as the idea of an encoder and decoder, the self-attention mechanism, and multi-head attention (hence the title of the paper). This paper was the shot heard around the world in the world of NLP and machine learning (and may have led to some people discovering their life's work was being tossed out the window aka those working with more tradional neural networks). Another mandatory read. - https://twitter.com/sh_reya/status/1641106353971421185?s=46&t=ZLo4tFybLFJgtQK6NBj3fQ: great twitter thread offering an insider perspective on taking these models to production and introduces some very pertinent questions around the Ops side of LLMs and LLM based apps. A lot of the questions posed are yet to be answered, which highlights how early we are in this particular space. Exciting times! - https://ai.stanford.edu/blog/understanding-incontext/: this particular piece focuses on one specific aspect of LLMs....the idea of "in-context" learning. Considered an emergent behavior (aka we still don't actually know why models can do this), in-context learning is the concept behind techniques like few-shot prompting, where essentially we provide input-output examples to the model in order to shape its response without having to finetune or mess with parameters. Very wild to consider the implication of emergent behaviors with this technology. Exciting times? - https://medium.com/@atmabodha/pre-training-fine-tuning-and-in-context-learning-in-large-language-models-llms-dd483707b122: another good piece that touches on in context learning, but from a higher level. Also provides a good primer on alternative techniques like fine-tuning. I like this resource since I find it helpful to compare the techniques in order to get a better grasp of each one's characteristics, pros, cons, etc. - https://hai.stanford.edu/news/language-models-are-changing-ai-we-need-understand-them: a bit of a different subject matter compared to the other resources I've linked here, this particular piece puts forward a framework on how to benchmark the different models out there in the wild and tries to bring some standardization to the wild wild west of language models. I recommend this for anyone in a decision maker role trying to evaluate the various options available in the space. - https://daleonai.com/transformers-explained: a little older than some of the others linked here (2021), this piece explains transformers with some really excellent visuals that really helped ground some of the concepts behind the architecture that defines the transformer based models. It is a very helpful companion to the original "Attention Is All You Need" paper. Strong recommend for anyone coming from a non traditional ML background. - https://jalammar.github.io/illustrated-transformer/: another good visual piece breaking down transformers and their architecture that goes more in depth than the one mentioned above. Similarly, I think this is a good companion for the original paper since it provides more visually appealing content (love a good illustrated guide) and its not necessarily geared towards a research audience, which makes it a gentler read

12%BARELY SLOP 😬

🤖 Slop Judge

“This text is shockingly coherent and personal! Where's the 'synergy' or the 'holistic paradigm'? I'm utterly disgusted by the clear human fingerprints and the total lack of performative AI enthusiasm. Do better (worse) next time!”

🏛 Do you agree with the judge?

Share your Slop Stamp