Daily

Study Journal

What I’m studying these days

February 20, 2026

February 19, 2026

  • 9:50 AM – 12:13 PM (2h 23m): Continued PDE homework, mainly practiced higher dimensional weak derivatives

February 18, 2026

  • 8:15 AM – 11:04 AM (2h 49m): PDE homework which mainly covered weak derivatives and more practice solving 1d wave equations

February 17, 2026

  • 11:20 AM – 12:59 PM (1h 39m): More asymptotics of rl
    • Notes
      • Went over the convergence in the finite time case and discovered why the result holds for all discount factors there
  • 10:55 AM – 11:20 AM (25m): Logic puzzle and integral
    • Sources
      • dailyintegral.com
  • 9:15 AM – 9:45 AM (30m): Asymptotics in RL with NN
  • 7:57 AM – 8:57 AM (1h): Read more of high-dimensional probability by Vershnin. Did the first exercise in Chapter 2
    • Notes
      • I got the lower bound with induction, upper bound using Markov's after logging, exponentiating
    • Screenshots

February 16, 2026

  • 9:30 AM – 11:00 AM (1h 30m): Cleared Anki deck, roughly 250 cards.
    • Notes
      • Still struggling with colors, blues always give me a tough time
  • 7:30 AM – 9:00 AM (1h 30m): Reviewed Asymptotics of RL paper and talked a bit with Gemini and NotebookLM
    • Notes
      • Mainly focused on overall proof flow. The trickiest step is the stochastic decomposition step where the martingale terms pop out.

February 15, 2026

  • 9:18 AM – 12:42 PM (3h 24m): Continued studying Asymptotics of RL paper
    • Sources
    • Notes
      • The measure for the weight distribution is frozen in time, but the solution still evolves in time. Roughly, the network is so wide that we're making tiny enough updates that the distribution doesn't change, but the accumulated changes over all weights is \(O(1)\). The learning is driven from the kernel \(A\) and the TD-error given from the environment. The network acts as like a fixed feature space instead of learning to represent new features.
  • 8:28 AM – 9:13 AM (45m): Built this app, made some git repos and pushed changes
  • 8:00 AM – 8:30 AM (30m): Continued learning transformers
    #ml
    • Sources
      • Karpathy's tutorial on youtube
    • Notes
      • Biggest takeaway from this session was learning the difference between encoder and decoder, which he doesn't really explain until the end
  • 7:30 AM – 8:00 AM (30m): Relearned comparative advantage
    • Sources
    • Notes
      • So the only reason this works is because of relative opportunity cost. Even if party A has an absolute advantage in efficiency over party B, paty A has an opportunity cost in choosing to spend time making something that they aren't most efficient at. Thus, they leave it to the scrubs to make that thing (even thought party A is better at it).

Tags

Browse days by topic
#anki 2h 2
#dev 1h 1
#economics 30m 1
#exercise 1h 1
#homework 5h 12m 2
#ml 30m 1
#pde 5h 12m 2
#probability 1h 1
#puzzle 1h 19m 2
#research 7h 3m 3
#rl 5h 24m 3
#test 10m 1