Blog
-
Why we need new scaling paradigms
The idea has been floating around that the scaling of pre-training is hitting a soft wall, and scaling inference-time compute is now the new thing. Why so? Why this shift in scaling paradigm? Why is scaling pre-training no longer effective? In this post, I try to share a technical perspective, drawing from my past experience in scaling law research.
-
Treating Data as Code: from linear algebra to agentic LLMs
Note
-
The embarrassing redundancy of reward whitening and reward normalization in PPO
In this post, I will theoretically prove that two common implementation tricks in PPO – reward whitening and reward normalization – are unnecessary and can be emulated by adjusting other free parameters.
-
Reflections on Commonsense Explanations
To tackle the task of commonsense question answering, numerous work have proposed to ground the reasoning into explanations or relevant commonsense knowledge (Liu et al., 2021; Liu et al. 2022; Wang et al., 2022; inter alia). In this blog post, I reflect on whether these approaches are really logically sound and bullet-proof.
-
What is missing from ChatGPT / GPT-4?
ChatGPT and GPT-4 are remarkable engineering breakthroughs. In this post I reflect on what are still missing from these models, and most modern LLMs in general.
-
Handling the absorbing state in Beam Search Decoding [zh]
-
A note on BART
-
Theorem Proving - reading notes [zh]
-
A Dummy Trading Strategy - II
In this previous post we discussed how you can buy an asset early and cheap, and take incremental profit as it skyrockets. This does not tell you how to buy it (back) at market dip. In this post, we introduce a unified buying-and-selling strategy so that you can make automated trading decisions and make a profit in a volatile market.
-
Optimal Stopping [zh]
-
Bottom-Fishing in Market Crash
Every couple of years, there is a market correction, or even a market crash. However, historical data tells us that the market will always come back, so we want to buy at cheap price during such time, in hope to make a large return when the market comes back. How should we do this given no one knows where the “real” bottom is in advance? We need a strategy.
-
A Dummy Trading Strategy
There are times in the financial market when a certain asset is traded at a low price but it has huge potential for speculation. Back in 2013 you could buy bitcoins at \$13.50, and by the end of 2017 they were traded at nearly \$20k. When Tesla (TSLA) and Nio (NIO) took off in 2020, their stock prices rallied by 9x and 25x, respectively. In the recent WallStreetBets war with shorting institutions, Gamestop (GME) has had its price skyrocketed because of a short squeeze. The questions is, how can we seize these opportunities and make some profit?
-
Boltzmann distribution, Restricted Boltzmann Machine [zh]
-
Pitfalls in Tensorflow [zh]
-
Reading Notes: MLAPP Chapter 21: Variational Inference
\[\newcommand{\bm}[1]{\boldsymbol{#1}}\] -
Reading Notes: MLAPP Chapter 11: Latent Linear Models
\[\newcommand{\bm}[1]{\boldsymbol{#1}}\] -
Reading Notes: MLAPP Chapter 10: Mixture Models and the EM Algorithm
\[\newcommand{\bm}[1]{\boldsymbol{#1}}\] -
Finetune Pre-trained LMs
Over the weekend, I played with fine-tuning GPT-2 and XLNet (on Colab). Super applause to Huggingface Transformers, it makes all sorts of pre-trained LM extremely accessible. The framework has evolved a lot from a wrapper of pre-trained BERT. It now unifies all models with AutoModel* with different capabilities, so that we only have to know the key and not care about the API. The repo also contains very handy fine-tuning and inference scripts.
-
We won Terminal Live @ UIUC
Our team TOAD ranked #1 in Terminal Live @ UIUC, sponsored by Correlation One and Citadel. We will be sharing a cash prize of $12,000!
-
Phrase Grounding by Soft-Label Chain Conditional Random Field (EMNLP-IJCNLP 2019 Long Paper)
Our paper Phrase Grounding by Soft-Label Chain Conditional Random Field is accepted as long paper in EMNLP-IJCNLP 2019! arXiv link
-
ICPC World Finals 2019 in Porto, Portugal
-
ICPC Bytedance-Moscow Workshop 2019 in Beijing, China
-
DFTnet: efficiently training large neural networks
Recently I played with neural networks, changing the matrix multiplication in NN’s propagation into a convolution, with FFT to speed up computation. This architecture allows for training neural networks with larger layer sizes, given that we allow weights to be reused in a certain way. Preliminary experiments shows 93% accuracy on MNIST dataset.
-
360 Depth Correction: depth correction for virtual objects enclosed by 360 video
In virtual reality, when a 360 monocular video canvas surrounds virtual objects, there will be depth mismatch that creates artifacts. In this scenario, monocular depth cues provided by the canvas will override binocular depth cues on the virtual object. In this paper, I propose an algorithm to geometrically transform the virtual object in order to compensate for the mismatch. This allows natural fusion of virtual objects and 360 environments in virtual reality.
</div>