Daily

Study Journal

What I’m studying these days

February 21, 2026

  • 10:31 AM – 12:59 PM (2h 28m): Asymptotics of rl paper
    • Notes
      • Sowers wanted me to walk him through the "various scales and how they interact". I translated this to roughly mean how the processes are coupled and what the convergence implications are. For example, since $h_t^N := Q_{\lfloor Nt \rfloor}^N$ it appears that letting $N \to \infty$ we would be taking infinitely many Q-learning steps. However, since we scale the output by $1\sqrt{N}$ and impose the learning rate to be $O(N^{-1})$, this amounts to making no parameter updates. So effectively, no learning is occurring even though the processes are coupled. However, once we obtain the limit $h_t$, training actually begins and converges to the Bellman solution as $t \to \infty$. I think this is roughly what we wants.
  • 9:17 AM – 9:54 AM (37m): cleared anki deck
  • 7:21 AM – 9:17 AM (1h 56m): cluesbysam and dailyintegral
    • Notes
      • Built little helper app for clues by sam, and learned Legendre's formula for the dailyintegral puzzle
      • I solved the daily #CluesBySam, Feb 21st 2026 (Hard), in less than 41 minutes
      • 🟩🟩🟩🟩
      • 🟩🟨🟨🟨
      • 🟩🟩🟩🟩
      • 🟩🟩🟩🟩
      • 🟩🟩🟩🟩

February 20, 2026

  • 3:03 PM – 5:27 PM (2h 24m): Texed PDE homework
    • Notes
      • This never gets less painful
  • 1:26 PM – 2:26 PM (1h): Updated this site
  • 11:11 AM – 11:41 AM (30m): Anki
  • 7:58 AM – 9:58 AM (2h): test entry with new site
  • 9:00 AM – 9:10 AM (10m): test json write
  • 7:14 AM – 8:08 AM (54m): clues by sam and dailyintegral logic puzzle

February 19, 2026

  • 9:50 AM – 12:13 PM (2h 23m): Continued PDE homework, mainly practiced higher dimensional weak derivatives

February 18, 2026

  • 8:15 AM – 11:04 AM (2h 49m): PDE homework which mainly covered weak derivatives and more practice solving 1d wave equations

February 17, 2026

  • 11:20 AM – 12:59 PM (1h 39m): More asymptotics of rl
    • Notes
      • Went over the convergence in the finite time case and discovered why the result holds for all discount factors there
  • 10:55 AM – 11:20 AM (25m): Logic puzzle and integral
    • Sources
      • dailyintegral.com
  • 9:15 AM – 9:45 AM (30m): Asymptotics in RL with NN
  • 7:57 AM – 8:57 AM (1h): Read more of high-dimensional probability by Vershnin. Did the first exercise in Chapter 2
    • Notes
      • I got the lower bound with induction, upper bound using Markov's after logging, exponentiating
    • Screenshots

February 16, 2026

  • 9:30 AM – 11:00 AM (1h 30m): Cleared Anki deck, roughly 250 cards.
    • Notes
      • Still struggling with colors, blues always give me a tough time
  • 7:30 AM – 9:00 AM (1h 30m): Reviewed Asymptotics of RL paper and talked a bit with Gemini and NotebookLM
    • Notes
      • Mainly focused on overall proof flow. The trickiest step is the stochastic decomposition step where the martingale terms pop out.

February 15, 2026

  • 9:18 AM – 12:42 PM (3h 24m): Continued studying Asymptotics of RL paper
    • Sources
    • Notes
      • The measure for the weight distribution is frozen in time, but the solution still evolves in time. Roughly, the network is so wide that we're making tiny enough updates that the distribution doesn't change, but the accumulated changes over all weights is \(O(1)\). The learning is driven from the kernel \(A\) and the TD-error given from the environment. The network acts as like a fixed feature space instead of learning to represent new features.
  • 8:28 AM – 9:13 AM (45m): Built this app, made some git repos and pushed changes
  • 8:00 AM – 8:30 AM (30m): Continued learning transformers
    #ml
    • Sources
      • Karpathy's tutorial on youtube
    • Notes
      • Biggest takeaway from this session was learning the difference between encoder and decoder, which he doesn't really explain until the end
  • 7:30 AM – 8:00 AM (30m): Relearned comparative advantage
    • Sources
    • Notes
      • So the only reason this works is because of relative opportunity cost. Even if party A has an absolute advantage in efficiency over party B, paty A has an opportunity cost in choosing to spend time making something that they aren't most efficient at. Thus, they leave it to the scrubs to make that thing (even thought party A is better at it).

Tags

Browse days by topic
#anki 2h 37m 3
#dev 1h 1
#economics 30m 1
#exercise 1h 1
#homework 7h 36m 3
#latex 2h 24m 1
#ml 30m 1
#pde 5h 12m 2
#probability 1h 1
#puzzle 3h 15m 3
#research 9h 31m 4
#rl 7h 52m 4
#test 10m 1