Daily

Study Journal

What I’m studying these days

Browse by tag

February 27, 2026

7:11 AM – 8:11 AM (1h): clues by sam and dailyintegral logic puzzle

#puzzle

February 25, 2026

9:07 AM – 10:22 AM (1h 15m): Asymptotics of rl paper

#research #rl
- Notes
  - Studied the interaction of the scales a bit more. The main idea is that the scaling $\frac{1}{\sqrt{N}} \times \frac{1}{N} \times \frac{1}{\sqrt{N}}$ coming from $\text{Xavier} \times \alpha^N \times \text{Gradient}$ is what ensures convergence. We end up with $\frac{1}{N^2}$, but one $\frac{1}{N}$ gets absorbed in the empirical measure, and the other gets absorbed in the discrete integral (equation 5.4). Thus, we get a prelimit trajectory $h^N_t$ with fluctuation terms that vanish while the desired term in Theorem 3.4 is preserved. Moreover, while the measures $\mu_s^N$ in (5.4) are random, they will converge to $\mu_0$ in the limit.
  - In terms of the evolution of the parameters, each update is on the order of $\frac{1}{N} \times \frac{1}{\sqrt{N}}$ coming from $\alpha^N \times \text{Xavier}$. Over $Nt$ training steps, this is still only on the order of $\frac{1}{\sqrt{N}}$. Thus, as the width goes off to infinity, the parameters make smaller and smaller updates, and in fact converge in distribution to the initial distribution.
8:07 AM – 9:07 AM (1h): cluesbysam dailyintegral logic puzzle

#puzzle
- Notes
  - finally got a perfect score on cluesbysam

February 24, 2026

8:14 AM – 8:41 AM (27m): cluesbysam and dailyintegral logic puzzle

#puzzle
- Notes
  - Less than 10 minutes
  - 🟩🟩🟨🟩
  - 🟩🟩🟩🟩
  - 🟩🟩🟩🟩
  - 🟩🟩🟩🟩
  - 🟩🟩🟩🟩
  - Logic puzzle was the classic Nim game
7:27 AM – 7:57 AM (30m): Anki

#anki

February 23, 2026

7:24 PM – 8:28 PM (1h 4m): PDE homework

#homework #pde
- Notes
  - Finished the last problem: continuous $L^2$ dependence on the data for the homogeneous wave equation in three dimensions.

February 22, 2026

12:55 PM – 3:00 PM (2h 5m): More pde homework
- Notes
  - Converting $\int_{0}^t\int_{|\xi| = 1} f(x + s\xi) dS_{\xi} ds$ to $\int_{B_3(t)} \frac{f(y)}{|y|^2} dy$ via $y = x + s\xi)$ took way too long. Basically, if you scale $\xi$ by $s$, you pick up an extra $s^2$ area. For fixed angles defining the area $dS_{\xi}$ at distance $1$ from the origin, when you scale the vector by $s$, each side length of $S_{\xi}$ gets scaled by $s$, so the total area is scaled by $s^2$.
10:07 AM – 12:31 PM (2h 24m): PDE homework

#pde #homework
- Notes
  - Breaking for lunch, but problems for plane waves in 3 dimensions
8:45 AM – 9:41 AM (56m): cluesbysam and dailyintegral logic puzzle

#puzzle
- Notes
  - I solved the daily #CluesBySam, Feb 22nd 2026 (Hard), in less than 14 minutes
  - 🟩🟨🟩🟩
  - 🟩🟩🟩🟩
  - 🟩🟩🟩🟩
  - 🟩🟩🟩🟩
  - 🟩🟩🟩🟩
  - https://cluesbysam.com
  - I failed the logic puzzle on the daily integral, but I enjoyed the problem, and especially since the solution uses the pigeonhole principle.
8:14 AM – 8:44 AM (30m): Reviewed CLT

#probability
- Sources
  - https://www.cs.toronto.edu/~yuvalf/CLT.pdf

February 21, 2026

10:31 AM – 12:59 PM (2h 28m): Asymptotics of rl paper

#research #rl
- Notes
  - Sowers wanted me to walk him through the "various scales and how they interact". I translated this to roughly mean how the processes are coupled and what the convergence implications are. For example, since $h_t^N := Q_{\lfloor Nt \rfloor}^N$ it appears that letting $N \to \infty$ we would be taking infinitely many Q-learning steps. However, since we scale the output by $1\sqrt{N}$ and impose the learning rate to be $O(N^{-1})$, this amounts to making no parameter updates. So effectively, no learning is occurring even though the processes are coupled. However, once we obtain the limit $h_t$, training actually begins and converges to the Bellman solution as $t \to \infty$. I think this is roughly what we wants.
9:17 AM – 9:54 AM (37m): cleared anki deck

#anki
7:21 AM – 9:17 AM (1h 56m): cluesbysam and dailyintegral

#puzzle
- Notes
  - Built little helper app for clues by sam, and learned Legendre's formula for the dailyintegral puzzle
  - I solved the daily #CluesBySam, Feb 21st 2026 (Hard), in less than 41 minutes
  - 🟩🟩🟩🟩
  - 🟩🟨🟨🟨
  - 🟩🟩🟩🟩
  - 🟩🟩🟩🟩
  - 🟩🟩🟩🟩

February 20, 2026

3:03 PM – 5:27 PM (2h 24m): Texed PDE homework

#latex #homework
- Notes
  - This never gets less painful
1:26 PM – 2:26 PM (1h): Updated this site

#dev
11:11 AM – 11:41 AM (30m): Anki

#anki
7:58 AM – 9:58 AM (2h): test entry with new site
9:00 AM – 9:10 AM (10m): test json write

#test
7:14 AM – 8:08 AM (54m): clues by sam and dailyintegral logic puzzle

#puzzle
- Sources
  - https://cluesbysam.com/
  - https://dailyintegral.com/

February 19, 2026

9:50 AM – 12:13 PM (2h 23m): Continued PDE homework, mainly practiced higher dimensional weak derivatives

#homework #pde

February 18, 2026

8:15 AM – 11:04 AM (2h 49m): PDE homework which mainly covered weak derivatives and more practice solving 1d wave equations

#homework #pde

February 17, 2026

11:20 AM – 12:59 PM (1h 39m): More asymptotics of rl

#research
- Notes
  - Went over the convergence in the finite time case and discovered why the result holds for all discount factors there
10:55 AM – 11:20 AM (25m): Logic puzzle and integral

#puzzle
- Sources
  - dailyintegral.com
9:15 AM – 9:45 AM (30m): Asymptotics in RL with NN

#research #rl
7:57 AM – 8:57 AM (1h): Read more of high-dimensional probability by Vershnin. Did the first exercise in Chapter 2

#probability #exercise
- Notes
  - I got the lower bound with induction, upper bound using Markov's after logging, exponentiating
- Screenshots

February 16, 2026

9:30 AM – 11:00 AM (1h 30m): Cleared Anki deck, roughly 250 cards.

#anki
- Notes
  - Still struggling with colors, blues always give me a tough time
7:30 AM – 9:00 AM (1h 30m): Reviewed Asymptotics of RL paper and talked a bit with Gemini and NotebookLM

#research #rl
- Notes
  - Mainly focused on overall proof flow. The trickiest step is the stochastic decomposition step where the martingale terms pop out.

February 15, 2026

9:18 AM – 12:42 PM (3h 24m): Continued studying Asymptotics of RL paper

#research #rl
- Sources
  - https://arxiv.org/pdf/1911.07304
- Notes
  - The measure for the weight distribution is frozen in time, but the solution still evolves in time. Roughly, the network is so wide that we're making tiny enough updates that the distribution doesn't change, but the accumulated changes over all weights is $O(1)$. The learning is driven from the kernel $A$ and the TD-error given from the environment. The network acts as like a fixed feature space instead of learning to represent new features.
8:28 AM – 9:13 AM (45m): Built this app, made some git repos and pushed changes
8:00 AM – 8:30 AM (30m): Continued learning transformers

#ml
- Sources
  - Karpathy's tutorial on youtube
- Notes
  - Biggest takeaway from this session was learning the difference between encoder and decoder, which he doesn't really explain until the end
7:30 AM – 8:00 AM (30m): Relearned comparative advantage

#economics
- Sources
  - https://en.wikipedia.org/wiki/Comparative_advantage
- Notes
  - So the only reason this works is because of relative opportunity cost. Even if party A has an absolute advantage in efficiency over party B, paty A has an opportunity cost in choosing to spend time making something that they aren't most efficient at. Thus, they leave it to the scrubs to make that thing (even thought party A is better at it).

Study Journal

February 27, 2026

February 25, 2026

February 24, 2026

February 23, 2026

February 22, 2026

February 21, 2026

February 20, 2026

February 19, 2026

February 18, 2026

February 17, 2026

February 16, 2026

February 15, 2026

Tags