I document my notes and writings on AI research, LLMs, and engineering here. A mix of long-form posts hosted on this site and selected external articles.
How agents improve the software around a frozen model — the no-gradient twin of environment scaling. The propose-evaluate-select-archive loop, the five surfaces people evolve, the LLM-as-optimizer zoo, and why task supply and verifier honesty are the real bottlenecks, for code agents (SWE-bench, Terminal-bench).
围绕冻结模型进化"harness"(智能体的软件外壳)——环境扩展的无梯度孪生:propose-evaluate-select-archive 循环、人们进化的五个面、把 LLM 当优化器的方法谱系,以及为什么任务供给与 verifier 的诚实度才是真正瓶颈(面向 code agent:SWE-bench、Terminal-bench)。
A 2026 field guide to the frontier-LLM training pipeline — data, pre-training, post-training (RL), evaluation, and safety — synthesizing the MAI-Thinking-1, DeepSeek, Qwen, Kimi, Llama, GLM and other technical reports.
2026 年前沿大模型训练流水线的中文导览——数据、预训练、后训练(RL)、评测与安全,综合 MAI-Thinking-1、DeepSeek、Qwen、Kimi、Llama、GLM 等技术报告。
A concept-first interview guide to RL for LLM post-training and agents, from PPO/GRPO/DPO and RLVR to environments, evaluation, and systems consistency.
一篇以概念为主线的 LLM post-training / Agentic RL 面试复习指南:从 PPO/GRPO/DPO、RLVR,到环境、评估与系统一致性。
A pedagogical tour of how the LLM-agent community turns environments into scalable, verifiable RL training signal — the recurring pipeline, the design axes, and the open challenges.
一篇关于 LLM-agent 社区如何把环境变成可扩展、可验证的 RL 训练信号的教学式导览——反复出现的流水线、设计轴与开放挑战。
Deep Research is about understanding, reasoning, and synthesis—combining adaptive planning, retrieval, analysis, and context engineering to produce long-form, well-cited research outputs. This article explores how Enterprise Deep Research bridges internal knowledge and external insights to serve strategic business goals.
A novel approach to enhancing Large Language Models through synthetic knowledge ingestion, presented at EMNLP 2024 from Intuit AI Research.