Studying
2/15/26
- 7:30 AM – 8:00 AM: Relearned comparative advantage
- Sources
- https://en.wikipedia.org/wiki/Comparative_advantage
- Notes
- So the only reason this works is because of relative opportunity cost. Even if party A has an absolute advantage in efficiency over party B, paty A has an opportunity cost in choosing to spend time making something that they aren’t most efficient at. Thus, they leave it to the scrubs to make that thing (even thought party A is better at it).
- Sources
- 8:00 AM – 8:30 AM: Continued learning transformers
- Sources
- Karpathy’s tutorial on youtube
- Notes
- Biggest takeaway from this session was learning the difference between encoder and decoder, which he doesn’t really explain until the end
- Sources
-
8:28 AM – 9:13 AM: Built this app, made some git repos and pushed changes
- 9:18 AM – 12:42 PM: Continued studying Asymptotics of RL paper
- Sources
- https://arxiv.org/pdf/1911.07304
- Notes
- The measure for the weight distribution is frozen in time, but the solution still evolves in time. Roughly, the network is so wide that we’re making tiny enough updates that the distribution doesn’t change, but the accumulated changes over all weights is \(O(1)\). The learning is driven from the kernel \(A\) and the TD-error given from the environment. The network acts as like a fixed feature space instead of learning to represent new features.
- Sources
2/16/26
- 7:30 AM – 9:00 AM: Reviewed Asymptotics of RL paper and talked a bit with Gemini and NotebookLM
- Notes
- Mainly focused on overall proof flow. The trickiest step is the stochastic decomposition step where the martingale terms pop out.
- Notes
- 9:30 AM – 11:00 AM: Cleared Anki deck, roughly 250 cards.
- Notes
- Still struggling with colors, blues always give me a tough time
- Notes