👋 Welcome to Jiaxin's Blog

I document my notes and writings on AI research, LLMs, and engineering here. A mix of long-form posts hosted on this site and selected external articles.

Self-Evolving Agentic Harnesses

How agents improve the software around a frozen model — the no-gradient twin of environment scaling. The propose-evaluate-select-archive loop, the five surfaces people evolve, the LLM-as-optimizer zoo, and why task supply and verifier honesty are the real bottlenecks, for code agents (SWE-bench, Terminal-bench).

June 28, 2026 · 57 min read · agents llm self-improvement rl
Self-Evolving Agentic Harnesses (中文版)

围绕冻结模型进化"harness"（智能体的软件外壳）——环境扩展的无梯度孪生：propose-evaluate-select-archive 循环、人们进化的五个面、把 LLM 当优化器的方法谱系，以及为什么任务供给与 verifier 的诚实度才是真正瓶颈（面向 code agent：SWE-bench、Terminal-bench）。

June 28, 2026 · 19 min read · agents llm self-improvement rl
How Frontier Labs Train Large Language Models

A 2026 field guide to the frontier-LLM training pipeline — data, pre-training, post-training (RL), evaluation, and safety — synthesizing the MAI-Thinking-1, DeepSeek, Qwen, Kimi, Llama, GLM and other technical reports.

June 27, 2026 · 62 min read · llm rl pretraining post-training
How Frontier Labs Train Large Language Models (中文版)

2026 年前沿大模型训练流水线的中文导览——数据、预训练、后训练（RL）、评测与安全，综合 MAI-Thinking-1、DeepSeek、Qwen、Kimi、Llama、GLM 等技术报告。

June 27, 2026 · 24 min read · llm rl pretraining post-training
What I Learned from RL and Agentic RL Interview Questions

A concept-first interview guide to RL for LLM post-training and agents, from PPO/GRPO/DPO and RLVR to environments, evaluation, and systems consistency.

June 21, 2026 · 97 min read · rl rlhf grpo agents post-training
What I Learned from RL and Agentic RL Interview Questions (中文版)

一篇以概念为主线的 LLM post-training / Agentic RL 面试复习指南：从 PPO/GRPO/DPO、RLVR，到环境、评估与系统一致性。

June 21, 2026 · 97 min read · rl rlhf grpo agents post-training
Environment Scaling for Agentic RL

A pedagogical tour of how the LLM-agent community turns environments into scalable, verifiable RL training signal — the recurring pipeline, the design axes, and the open challenges.

June 10, 2026 · 40 min read · rl agents environment-scaling llm
Environment Scaling for Agentic RL (中文版)

一篇关于 LLM-agent 社区如何把环境变成可扩展、可验证的 RL 训练信号的教学式导览——反复出现的流水线、设计轴与开放挑战。

June 10, 2026 · 40 min read · rl agents environment-scaling llm
Towards Trustworthy Enterprise Deep Research

Deep Research is about understanding, reasoning, and synthesis—combining adaptive planning, retrieval, analysis, and context engineering to produce long-form, well-cited research outputs. This article explores how Enterprise Deep Research bridges internal knowledge and external insights to serve strategic business goals.

October 24, 2025 · 9 min read · Salesforce Blog · llm agents deep-research
Enhancing LLMs with Synthetic Knowledge Ingestion

A novel approach to enhancing Large Language Models through synthetic knowledge ingestion, presented at EMNLP 2024 from Intuit AI Research.

November 8, 2024 · 5 min read · Medium · Intuit AI Research · llm fine-tuning knowledge