(2025-10-13) Karpathy Personal Chatbot Recommendation

Andrej Karpathy tooted, relating to his new nanoChat repo to someone wanting a chatbot built on his own writing, I think this is not a good repo for that. You should think of micro models maybe more as very young children (kindergarten etc.), they just don't have the raw intelligence of their larger cousins. If you finetune/train it on your own data you'll probably get some amusing parroting that feels like your writing in style, but it will be slop.

To achieve what you're looking for you'd want something more like:

  • take your raw data
  • add extensive synthetic data generation rewrites on top (tricky, not obvious, researchy)
  • finetune a state of the art open LLM on it (e.g. tinker)
  • you'd possibly have to mix in a lot of pretraining data to not lose too much raw intelligence during finetuning.

Basically I'd say getting this to work well is still realm of research and not obvious.

Your best non-research bet is just giving all your writing to something like NotebookLM, which RAGs over it (i.e. references it in chunks). Your data makes it into context windows via RAG but doesn't impact the weights. So the model doesn't exactly "know you", but it's maybe the closest you can easily get.

cf (2024-12-01) Trying LLM for my Local Notes and Ebooks


Edited:    |       |    Search Twitter for discussion