Hello! Yes, this name is “borrowed” from Sebastian Lague’s Coding Adventures.

In this series of dev logs in the form of blog posts I’ll explore how to create a toy LLM. By the end I expect to have a somewhat functional LLM with the following:

  • Being able to complete sentences in a coherent way (at the very least).
  • Chatting and being able to respond coherently with provided context.
  • Be able to deploy somewhere and share with other people (both code and weights).

I’ll explain how I’m gonna achieve each point and what they even mean, but I expect you to understand some of what is being said. I should also say that this is more of a devlog style than a tutorial.

Being able to complete sentences is the starting point if you wanna make something like this. This is what people call a base model: a model that’s trained on a significant amount of data to learn patterns and relations between the words of a language and be able to complete sentences in a way that makes at least some sense to a human.

Making it able to chat is another big undertaking: it’s not as computationally involved as training the base model but we also need to procure a lot of high quality data. In this part we basically teach the model to output text in a very specific way by using a few special tokens and other techniques.

Also, being able to respond coherently with provided context: by that I mean being able to fetch documents from somewhere and inject them in some sort of prompt. This way, even though the chatbot is presented with a question it doesn’t know anything about (likely), it will be able to respond in a somewhat competent way.

The last point refers exactly to what it sounds like. Making the model code open source and allowing people to download the weights.

Ambitious? Maybe, but I think it will be a fun learning experience. I also have nothing better to do with my free time (lol).