Jiaxin Zhang

AI - Senior Staff Research Scientist
Intuit AI Research
Office: 2535 Garcia Ave Mountain View, CA, 94043

Hi, there! I’m Jiaxin👋!.

🔭 I am an AI Senior Staff Research Scientist at Intuit AI Research, leading a research team specializing in Generative AI (including large language models, diffusion models, vision-language models), and AI Reliability (focusing on uncertainty, confidence, and robustness). Prior to this, I was a Research Staff in the Computer Science and Mathematics Division at Oak Ridge National Laboratory (ORNL) under the US Department of Energy (DOE). My research at ORNL was dedicated to advancing AI for Science on state-of-the-art supercomputers, like Summit and Frontier. I received my Ph.D. from the Johns Hopkins University with an emphasis on uncertainty quantification.

🤔 I am passionate about building reliable AI capability to assist humans in solving complex real-world challenges found at the convergence of language, vision, and science. My research concentrates on AI Reliability and Robustness, Uncertainty Quantification, LLM Alignement and Safety, Optimization, and AI4Science. I have authored over 50 papers, including 35+ as the first author, in leading AI conferences and journals such as NeurIPS, CVPR, EMNLP, AISTATS, and others. Additionally, I actively maintain several GitHub repositories that have collectively garnered over 2,000 stars.

⚡ Research Highlights

Hallucination Detection and Mitigation
Uncertainty Quantification in LLMs
Prompt Optimization w/wo Security and Safety Constraints
Knowledge Injection and Reliable RAG
LLM Adaption and Fine-tuning
Interleaved Text-and-Image Generation and Holistic Evaluation
Constrained Generation and Inferece-time Decoding
LLM Alignment with Feedback
Thinking LLM and Reasoning

👯 Academic Service

Area Chair: ACL, EMNLP, NAACL
Program Committee: NeurIPS, ICML, ICLR, AAAI, AISTATS, ACL, EMNLP, NAACL, CVPR, ECCV, WACV, KDD, SDM
Jounral Reviewer: Transactions on Machine Learning Research (TMLR)

🏆 Awards

CTO Award, Intuit, 2024
A2D Innovation Award, Intuit, 2024
Promising Early‑Career Researcher Award, ORNL, 2020
NeurIPS Travel Award, 2019
Acheson J. Duncan Graduate Research Award, Johns Hopkins, 2018
Dean’s Fellowship, Johns Hopkins, 2014
National Scholarship of P.R. China, 2009, 2012

✈️ Conference Talks/Travels

Dec 2024, NeurIPS @ Vancouver 🇨🇦
Nov 2024, EMNLP @ Miami 🇺🇸
Jul 2024, ICML @ Vienna 🇦🇹
May 2024, AISTATS @ Valencia 🇪🇸
Jan 2024, WACV @ Hawaii 🇺🇸
Dec 2023, NeurIPS @ New Orleans 🇺🇸
Dec 2023, EMNLP @ Singapore 🇸🇬
Feb 2023, AAAI @ Washington DC 🇺🇸
Jul 2022, ICML @ Baltimore 🇺🇸
Jun 2022, CVPR @ New Orleans 🇺🇸
🦠COVID🦠, … 😷 … @ … 😷 …
Dec 2019, NeurIPS @ Vancouver 🇨🇦

💬 I’m always looking for highly motivated Ph.D. students to work with me for research internship positions. Please feel free to email me with your CV if interested. .

news

Oct 15, 2024	[Invited Talk] I will give a talk in NeurIPS 2024 Workshop “Interpretable AI: Past, Present and Future”, Dec, 2024, Vancouver, Canada!
Oct 10, 2024	[EMNLP x 6] Six Long Papers (3 Main, 1 Findings, 2 Industry Track) are accepted by EMNLP 2024. 2 oral presentations and 4 poster presentations! See you in Miami!
Aug 1, 2024	Glad to share that I was promoted to be a Senior Staff Research Scientist @Intuit!
Jun 20, 2024	[Invited talk] I will present my research on hallucination detection and mitigation at Intuit Open Source Meetup!
Jun 1, 2024	Will serve as an Area Chair for EMNLP 2024!
May 1, 2024	One UQ paper was accepted by AISTATS 2024. See you in Valencia, Spain!
Mar 4, 2024	One paper on UQ for LLM was accepted by EACL 2024.
Nov 10, 2023	I created two Github Repos to share resources and papers on LLM Prompt Optimization and LLM RAG. Welcome to contribute and work together!
Oct 24, 2023	Two papers on “DECDM: Document Enhancement using Cycle-Consistent Diffusion Models” and “On the Quantification of Image Reconstruction Uncertainty without Training Data” are accpeted by WACV 2024!
Oct 22, 2023	Our paper on “A Divide-Conquer-Reasoning Approach to Consistency Evaluation and Improvement in Blackbox Large Language Models” is accepted by NeurIPS 2023 Workshop on Socially Responsible Language Modelling Research.
Oct 7, 2023	Our paper on SAC^3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency is accepted by EMNLP 2023! The code is coming soon!
Sep 28, 2023	One patent on “Model based document image enhancement” is issued and published.
Sep 21, 2023	Our paper on “Interactive Multi-fidelity Learning for Cost-effective Adaptation of Language Model with Sparse Human Supervision” is accepted by NeurIPS 2023! Cheers!
Mar 27, 2023	I was invited to be a Reviewer/PC member for NeurIPS 2023, ICLR 2024, ICASSP 2024, WACV 2024, SIAM SDM 2024.
Mar 21, 2023	I built a Github Repo that contains a collection of resources and papers on Reliability, Robustness and Safety in Large Language Models (LLMs).
Feb 21, 2023	Our paper titled “Speech Privacy Leakage from Shared Gradients in Distributed Learning” is accepted by ICASSP 2023!
Dec 12, 2022	Two papers on “Accelerating Inverse Learning via Intelligent Localization with Exploratory Sampling” and “AutoNF: Automated Architecture Optimization of Normalizing Flows with Unconstrained Continuous Relaxation Admitting Optimal Discrete Solution” are accpeted by AAAI 2023!

selected publications

EMNLP 2024

Synthetic Knowledge Ingestion: Towards Knowledge Refinement and Injection for Enhancing Large Language Models

Jiaxin Zhang, Wendi Cui, Yiran Huang, Kamalika Das, and Sricharan Kumar

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Abs arXiv PDF

Large language models (LLMs) are proficient in capturing factual knowledge across various domains. However, refining their capabilities on previously seen knowledge or integrating new knowledge from external sources remains a significant challenge. In this work, we propose a novel synthetic knowledge ingestion method called Ski, which leverages fine-grained synthesis, interleaved generation, and assemble augmentation strategies to construct high-quality data representations from raw knowledge sources. We then integrate Ski and its variations with three knowledge injection techniques: Retrieval Augmented Generation (RAG), Supervised Fine-tuning (SFT), and Continual Pre-training (CPT) to inject and refine knowledge in language models. Extensive empirical experiments are conducted on various question-answering tasks spanning finance, biomedicine, and open-generation domains to demonstrate that Ski significantly outperforms baseline methods by facilitating effective knowledge injection. We believe that our work is an important step towards enhancing the factual accuracy of LLM outputs by refining knowledge representation and injection capabilities.
EMNLP 2024

Do You Know What You Are Talking About? Characterizing Query-Knowledge Relevance For Reliable Retrieval Augmented Generation

Zhuohang Li, Jiaxin Zhang, Chao Yan, Kamalika Das, Sricharan Kumar, Murat Kantarcioglu, and Bradley A Malin

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Abs arXiv PDF

Language models (LMs) are known to suffer from hallucinations and misinformation. Retrieval augmented generation (RAG) that retrieves verifiable information from an external knowledge corpus to complement the parametric knowledge in LMs provides a tangible solution to these problems. However, the generation quality of RAG is highly dependent on the relevance between a user’s query and the retrieved documents. Inaccurate responses may be generated when the query is outside of the scope of knowledge represented in the external knowledge corpus or if the information in the corpus is out-of-date. In this work, we establish a statistical framework that assesses how well a query can be answered by an RAG system by capturing the relevance of knowledge. We introduce an online testing procedure that employs goodness-of-fit (GoF) tests to inspect the relevance of each user query to detect out-of-knowledge queries with low knowledge relevance. Additionally, we develop an offline testing framework that examines a collection of user queries, aiming to detect significant shifts in the query distribution which indicates the knowledge corpus is no longer sufficiently capable of supporting the interests of the users. We demonstrate the capabilities of these strategies through a systematic evaluation on eight question-answering (QA) datasets, the results of which indicate that the new testing framework is an efficient solution to enhance the reliability of existing RAG systems.
EMNLP 2024

HyQE: Ranking Contexts with Hypothetical Query Embeddings

Weichao Zhou, Jiaxin Zhang, Hilaf Hasson, Anu Singh, and Wenchao Li

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Abs arXiv PDF Code

In retrieval-augmented systems, context ranking techniques are commonly employed to reorder the retrieved contexts based on their relevance to a user query. A standard approach is to measure this relevance through the similarity between contexts and queries in the embedding space. However, such similarity often fails to capture the relevance. Alternatively, large language models (LLMs) have been used for ranking contexts. However, they can encounter scalability issues when the number of candidate contexts grows and the context window sizes of the LLMs remain constrained. Additionally, these approaches require fine-tuning LLMs with domain-specific data. In this work, we introduce a scalable ranking framework that combines embedding similarity and LLM capabilities without requiring LLM fine-tuning. Our framework uses a pre-trained LLM to hypothesize the user query based on the retrieved contexts and ranks the context based on the similarity between the hypothesized queries and the user query. Our framework is efficient at inference time and is compatible with many other retrieval and ranking techniques. Experimental results show that our method improves the ranking performance across multiple benchmarks.
EMNLP 2024

Holistic evaluation for interleaved text-and-image generation

Minqian Liu, Zhiyang Xu, Zihao Lin, Trevor Ashby, Joy Rimchala, Jiaxin Zhang, and Lifu Huang

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Abs arXiv PDF Supp Code Website

Interleaved text-and-image generation has been an intriguing research direction, where the models are required to generate both images and text pieces in an arbitrary order. Despite the emerging advancements in interleaved generation, the progress in its evaluation still significantly lags behind. Existing evaluation benchmarks do not support arbitrarily interleaved images and text for both inputs and outputs, and they only cover a limited number of domains and use cases. Also, current works predominantly use similarity-based metrics which fall short in assessing the quality in open-ended scenarios. To this end, we introduce InterleavedBench, the first benchmark carefully curated for the evaluation of interleaved text-and-image generation. InterleavedBench features a rich array of tasks to cover diverse real-world use cases. In addition, we present InterleavedEval, a strong reference-free metric powered by GPT-4o to deliver accurate and explainable evaluation. We carefully define five essential evaluation aspects for InterleavedEval, including text quality, perceptual quality, image coherence, text-image coherence, and helpfulness, to ensure a comprehensive and fine-grained assessment. Through extensive experiments and rigorous human evaluation, we show that our benchmark and metric can effectively evaluate the existing models with a strong correlation with human judgments surpassing previous reference-based metrics. We also provide substantial findings and insights to foster future research in interleaved generation and its evaluation.
EMNLP 2024

Survival of the Safest: Towards Secure Prompt Optimization through Interleaved Multi-Objective Evolution

Ankita Sinha, Wendi Cui, Kamalika Das, and Jiaxin Zhang

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing - Industry Track, 2024

Abs arXiv PDF

Large language models (LLMs) have demonstrated remarkable capabilities; however, the optimization of their prompts has historically prioritized performance metrics at the expense of crucial safety and security considerations. To overcome this shortcoming, we introduce "Survival of the Safest" (SoS), an innovative multi-objective prompt optimization framework that enhances both performance and security in LLMs simultaneously. SoS utilizes an interleaved multi-objective evolution strategy, integrating semantic, feedback, and crossover mutations to effectively traverse the prompt landscape. Differing from the computationally demanding Pareto front methods, SoS provides a scalable solution that expedites optimization in complex, high-dimensional discrete search spaces while keeping computational demands low. Our approach accommodates flexible weighting of objectives and generates a pool of optimized candidates, empowering users to select prompts that optimally meet their specific performance and security needs. Experimental evaluations across diverse benchmark datasets affirm SoS’s efficacy in delivering high performance and notably enhancing safety and security compared to single-objective methods. This advancement marks a significant stride towards the deployment of LLM systems that are both high-performing and secure across varied industrial applications
EMNLP 2024

DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models

Wendi Cui, Zhuohang Li, Lopez Damien, Kamalika Das, Bradley Malin, Sricharan Kumar, and Jiaxin Zhang

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing - Industry Track, 2024

Abs arXiv PDF Code

Evaluating the quality and variability of text generated by Large Language Models (LLMs) poses a significant, yet unresolved research challenge. Traditional evaluation methods, such as ROUGE and BERTScore, which measure token similarity, often fail to capture the holistic semantic equivalence. This results in a low correlation with human judgments and intuition, which is especially problematic in high-stakes applications like healthcare and finance where reliability, safety, and robust decision-making are highly critical. This work proposes DCR, an automated framework for evaluating and improving the consistency of LLM-generated texts using a divide-conquer-reasoning approach. Unlike existing LLM-based evaluators that operate at the paragraph level, our method employs a divide-and-conquer evaluator (DCE) that breaks down the paragraph-to-paragraph comparison between two generated responses into individual sentence-to-paragraph comparisons, each evaluated based on predefined criteria. To facilitate this approach, we introduce an automatic metric converter (AMC) that translates the output from DCE into an interpretable numeric score. Beyond the consistency evaluation, we further present a reason-assisted improver (RAI) that leverages the analytical reasons with explanations identified by DCE to generate new responses aimed at reducing these inconsistencies. Through comprehensive and systematic empirical analysis, we show that our approach outperforms state-of-the-art methods by a large margin (e.g., +19.3% and +24.3% on the SummEval dataset) in evaluating the consistency of LLM generation across multiple benchmarks in semantic, factual, and summarization consistency tasks. Our approach also substantially reduces nearly 90% of output inconsistencies, showing promise for effective hallucination mitigation.
EACL 2024

SPUQ: Perturbation-Based Uncertainty Quantification for Large Language Models

Xiang Gao, Jiaxin Zhang, Lalla Mouatadid, and Kamalika Das

In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024

Abs arXiv PDF Blog Code

In recent years, large language models (LLMs) have become increasingly prevalent, offering remarkable text generation capabilities. However, a pressing challenge is their tendency to make confidently wrong predictions, highlighting the critical need for uncertainty quantification (UQ) in LLMs. While previous works have mainly focused on addressing aleatoric uncertainty, the full spectrum of uncertainties, including epistemic, remains inadequately explored. Motivated by this gap, we introduce a novel UQ method, sampling with perturbation for UQ (SPUQ), designed to tackle both aleatoric and epistemic uncertainties. The method entails generating a set of perturbations for LLM inputs, sampling outputs for each perturbation, and incorporating an aggregation module that generalizes the sampling uncertainty approach for text generation tasks. Through extensive experiments on various datasets, we investigated different perturbation and aggregation techniques. Our findings show a substantial improvement in model uncertainty calibration, with a reduction in Expected Calibration Error (ECE) by 50% on average. Our findings suggest that our proposed UQ method offers promising steps toward enhancing the reliability and trustworthiness of LLMs.
arXiv

PhaseEvo: Towards Unified In-Context Prompt Optimization for Large Language Models

Wendi Cui, Jiaxin Zhang, Zhuohang Li, Hao Sun, Damien Lopez, Kamalika Das, Bradley Malin, and Sricharan Kumar

2024

Abs arXiv PDF

Crafting an ideal prompt for Large Language Models (LLMs) is a challenging task that demands significant resources and expert human input. Existing work treats the optimization of prompt instruction and in-context learning examples as distinct problems, leading to sub-optimal prompt performance. This research addresses this limitation by establishing a unified in-context prompt optimization framework, which aims to achieve joint optimization of the prompt instruction and examples. However, formulating such optimization in the discrete and high-dimensional natural language space introduces challenges in terms of convergence and computational efficiency. To overcome these issues, we present PhaseEvo, an efficient automatic prompt optimization framework that combines the generative capability of LLMs with the global search proficiency of evolution algorithms. Our framework features a multi-phase design incorporating innovative LLM-based mutation operators to enhance search efficiency and accelerate convergence. We conduct an extensive evaluation of our approach across 35 benchmark tasks. The results demonstrate that PhaseEvo significantly outperforms the state-of-the-art baseline methods by a large margin whilst maintaining good efficiency.
AISTATS 2024

Discriminant Distance-Aware Representation on Deterministic Uncertainty Quantification Methods

Jiaxin Zhang, Kamalika Das, and Sricharan Kumar

In International Conference on Artificial Intelligence and Statistics, 2024

Abs arXiv PDF

Uncertainty estimation is a crucial aspect of deploying dependable deep learning models in safety-critical systems. In this study, we introduce a novel and efficient method for deterministic uncertainty estimation called Discriminant Distance-Awareness Representation (DDAR). Our approach involves constructing a DNN model that incorporates a set of prototypes in its latent representations, enabling us to analyze valuable feature information from the input data. By leveraging a distinction maximization layer over optimal trainable prototypes, DDAR can learn a discriminant distance-awareness representation. We demonstrate that DDAR overcomes feature collapse by relaxing the Lipschitz constraint that hinders the practicality of deterministic uncertainty methods (DUMs) architectures. Our experiments show that DDAR is a flexible and architecture-agnostic method that can be easily integrated as a pluggable layer with distance-sensitive metrics, outperforming state-of-the-art uncertainty estimation methods on multiple benchmark problems.
WACV 2024

DECDM: Document Enhancement using Cycle-Consistent Diffusion Models

Jiaxin Zhang, Joy Rimchala, Lalla Mouatadid, Kamalika Das, and Sricharan Kumar

In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024

Abs arXiv PDF Blog

The performance of optical character recognition (OCR) heavily relies on document image quality, which is crucial for automatic document processing and document intelligence. However, most existing document enhancement methods require supervised data pairs, which raises concerns about data separation and privacy protection, and makes it challenging to adapt these methods to new domain pairs. To address these issues, we propose DECDM, an end-to-end document-level image translation method inspired by recent advances in diffusion models. Our method overcomes the limitations of paired training by independently training the source (noisy input) and target (clean output) models, making it possible to apply domain-specific diffusion models to other pairs. DECDM trains on one dataset at a time, eliminating the need to scan both datasets concurrently, and effectively preserving data privacy from the source or target domain. We also introduce simple data augmentation strategies to improve character-glyph conservation during translation. We compare DECDM with state-of-the-art methods on multiple synthetic data and benchmark datasets, such as document denoising and shadow removal, and demonstrate the superiority of performance quantitatively and qualitatively.
WACV 2024

On the Quantification of Image Reconstruction Uncertainty without Training Data

Jiaxin Zhang, Sirui Bi, and Victor Fung

In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024

Abs arXiv PDF

Computational imaging plays a pivotal role in determining hidden information from sparse measurements. A robust inverse solver is crucial to fully characterize the uncertainty induced by these measurements, as it allows for the estimation of the complete posterior of unrecoverable targets. This, in turn, facilitates a probabilistic interpretation of observational data for decision-making. In this study, we propose a deep variational framework that leverages a deep generative model to learn an approximate posterior distribution to effectively quantify image reconstruction uncertainty without the need for training data. We parameterize the target posterior using a flow-based model and minimize their Kullback-Leibler (KL) divergence to achieve accurate uncertainty estimation. To bolster stability, we introduce a robust flow-based model with bi-directional regularization and enhance expressivity through gradient boosting. Additionally, we incorporate a space-filling design to achieve substantial variance reduction on both latent prior space and target posterior space. We validate our method on several benchmark tasks and two real-world applications, namely fastMRI and black hole image reconstruction. Our results indicate that our method provides reliable and high-quality image reconstruction with robust uncertainty estimation.
NeurIPS 2023

Interactive Multi-fidelity Learning for Cost-effective Adaptation of Language Model with Sparse Human Supervision

Jiaxin Zhang, Zhuohang Li, Kamalika Das, and Sricharan Kumar

In Advances in Neural Information Processing Systems, 2023

Abs arXiv PDF Blog Poster Slides

Large language models (LLMs) have demonstrated remarkable capabilities in various tasks. However, their suitability for domain-specific tasks, is limited due to their immense scale at deployment, susceptibility to misinformation, and more importantly, high data annotation costs. We propose a novel Interactive Multi-Fidelity Learning (IMFL) framework for the cost-effective development of small domain-specific LMs under limited annotation budgets. Our approach formulates the domain-specific fine-tuning process as a multi-fidelity learning problem, focusing on identifying the optimal acquisition strategy that balances between low-fidelity automatic LLM annotations and high-fidelity human annotations to maximize model performance. We further propose an exploration-exploitation query strategy that enhances annotation diversity and informativeness, incorporating two innovative designs: 1) prompt retrieval that selects in-context examples from human-annotated samples to improve LLM annotation, and 2) variable batch size that controls the order for choosing each fidelity to facilitate knowledge distillation, ultimately enhancing annotation quality. Extensive experiments on financial and medical tasks demonstrate that IMFL achieves superior performance compared with single fidelity annotations. Given a limited budget of human annotation, IMFL significantly outperforms the human annotation baselines in all four tasks and achieves very close performance as human annotations on two of the tasks. These promising results suggest that the high human annotation costs in domain-specific tasks can be significantly reduced by employing IMFL, which utilizes fewer human annotations, supplemented with cheaper and faster LLM (e.g., GPT-3.5) annotations to achieve comparable performance.
EMNLP 2023

SAC^3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency

Jiaxin Zhang, Zhuohang Li, Kamalika Das, Bradley Malin, and Sricharan Kumar

In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Abs arXiv PDF Blog Code

Hallucination detection is a critical step toward understanding the trustworthiness of modern language models (LMs). To achieve this goal, we re-examine existing detection approaches based on the self-consistency of LMs and uncover two types of hallucinations resulting from 1) question-level and 2) model-level, which cannot be effectively identified through self-consistency check alone. Building upon this discovery, we propose a novel sampling-based method, i.e., semantic-aware cross-check consistency (SAC3) that expands on the principle of self-consistency checking. Our SAC3 approach incorporates additional mechanisms to detect both question-level and model-level hallucinations by leveraging advances including semantically equivalent question perturbation and cross-model response consistency checking. Through extensive and systematic empirical analysis, we demonstrate that SAC3 outperforms the state of the art in detecting both non-factual and factual statements across multiple question-answering and open-domain generation benchmarks.
AAAI 2023

Accelerating Inverse Learning via Intelligent Localization with Exploratory Sampling

Jiaxin Zhang, Sirui Bi, and Victor Fung

Proceedings of the AAAI Conference on Artificial Intelligence, 2023

Abs arXiv PDF Code

In the scope of "AI for Science", solving inverse problems is a longstanding challenge in materials and drug discovery, where the goal is to determine the hidden structures given a set of desirable properties. Deep generative models are recently proposed to solve inverse problems, but these currently use expensive forward operators and struggle in precisely localizing the exact solutions and fully exploring the parameter spaces without missing solutions. In this work, we propose a novel approach (called iPage) to accelerate the inverse learning process by leveraging probabilistic inference from deep invertible models and deterministic optimization via fast gradient descent. Given a target property, the learned invertible model provides a posterior over the parameter space; we identify these posterior samples as an intelligent prior initialization which enables us to narrow down the search space. We then perform gradient descent to calibrate the inverse solutions within a local region. Meanwhile, a space-filling sampling is imposed on the latent space to better explore and capture all possible solutions. We evaluate our approach on three benchmark tasks and two created datasets with real-world applications from quantum chemistry and additive manufacturing, and find our method achieves superior performance compared to several state-of-the-art baseline methods.
AAAI 2023

AutoNF: Automated Architecture Optimization of Normalizing Flows Using a Mixture Distribution Formulation

Yu Wang, Jan Drgona, Jiaxin Zhang, Karthik Somayaji NS, Frank Y Liu, Malachi Schram, and Peng Li

Proceedings of the AAAI Conference on Artificial Intelligence, 2023

Abs arXiv PDF

Normalizing flows (NF) build upon invertible neural networks and have wide applications in probabilistic modeling. Currently, building a powerful yet computationally efficient flow model relies on empirical fine-tuning over a large design space. While introducing neural architecture search (NAS) to NF is desirable, the invertibility constraint of NF brings new challenges to existing NAS methods whose application is limited to unstructured neural networks. Developing efficient NAS methods specifically for NF remains an open problem. We present AutoNF, the first automated NF architectural optimization framework. First, we present a new mixture distribution formulation that allows efficient differentiable architecture search of flow models without violating the invertibility constraint. Second, under the new formulation, we convert the original NP-hard combinatorial NF architectural optimization problem to an unconstrained continuous relaxation admitting the discrete optimal architectural solution, circumventing the loss of optimality due to binarization in architectural optimization. We evaluate AutoNF with various density estimation datasets and show its superior performance-cost trade-offs over a set of existing hand-crafted baselines.
CVPR 2022

Auditing Privacy Defenses in Federated Learning via Generative Gradient Leakage

Zhuohang Li, Jiaxin Zhang, Luyang Liu, and Jian Liu

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Abs arXiv PDF Code

Federated Learning (FL) framework brings privacy benefits to distributed learning systems by allowing multiple clients to participate in a learning task under the coordination of a central server without exchanging their private data. However, recent studies have revealed that private information can still be leaked through shared gradient information. To further protect user’s privacy, several defense mechanisms have been proposed to prevent privacy leakage via gradient information degradation methods, such as using additive noise or gradient compression before sharing it with the server. In this work, we validate that the private training data can still be leaked under certain defense settings with a new type of leakage, ie, Generative Gradient Leakage (GGL). Unlike existing methods that only rely on gradient information to reconstruct data, our method leverages the latent space of generative adversarial networks (GAN) learned from public image datasets as a prior to compensate for the informational loss during gradient degradation. To address the nonlinearity caused by the gradient operator and the GAN model, we explore various gradient-free optimization methods (eg, evolution strategies and Bayesian optimization) and empirically show their superiority in reconstructing high-quality images from gradients compared to gradient-based optimizers. We hope the proposed method can serve as a tool for empirically measuring the amount of privacy leakage to facilitate the design of more robust defense mechanisms.
AAAI 2022

Gradient-based Novelty Detection Boosted by Self-supervised Binary Classification

Jingbo Sun, Li Yang, Jiaxin Zhang, Frank Liu, Mahantesh Halappanavar, Deliang Fan, and Yu Cao

In Proceedings of the AAAI Conference on Artificial Intelligence, 2022

Abs arXiv PDF

Novelty detection aims to automatically identify out-of-distribution (OOD) data, without any prior knowledge of them. It is a critical step in data monitoring, behavior analysis and other applications, helping enable continual learning in the field. Conventional methods of OOD detection perform multi-variate analysis on an ensemble of data or features, and usually resort to the supervision with OOD data to improve the accuracy. In reality, such supervision is impractical as one cannot anticipate the anomalous data. In this paper, we propose a novel, self-supervised approach that does not rely on any pre-defined OOD data:(1) The new method evaluates the Mahalanobis distance of the gradients between the in-distribution and OOD data.(2) It is assisted by a self-supervised binary classifier to guide the label selection to generate the gradients, and maximize the Mahalanobis distance. In the evaluation with multiple datasets, such as CIFAR-10, CIFAR-100, SVHN and TinyImageNet, the proposed approach consistently outperforms state-of-the-art supervised and unsupervised methods in the area under the receiver operating characteristic (AUROC) and area under the precision-recall curve (AUPR) metrics. We further demonstrate that this detector is able to accurately learn one OOD class in continual learning.
NeurIPS 2021

On the Stochastic Stability of Deep Markov Models

Jan Drgona, Sayak Mukherjee, Jiaxin Zhang, Frank Liu, and Mahantesh Halappanavar

Advances in Neural Information Processing Systems, 2021

Abs arXiv PDF Code

Deep Markov models (DMM) are generative models which are scalable and expressive generalization of Markov models for representation, learning, and inference problems. However, the fundamental stochastic stability guarantees of such models have not been thoroughly investigated. In this paper, we present a novel stability analysis method and provide sufficient conditions of DMM’s stochastic stability. The proposed stability analysis is based on the contraction of probabilistic maps modeled by deep neural networks. We make connections between the spectral properties of neural network’s weights and different types of used activation function on the stability and overall dynamic behavior of DMMs with Gaussian distributions. Based on the theory, we propose a few practical methods for designing constrained DMMs with guaranteed stability. We empirically substantiate our theoretical results via intuitive numerical experiments using the proposed stability constraints.
UAI 2021

Enabling Long-range Exploration in Minimization of Multimodal Functions

Jiaxin Zhang, Hoang Tran, Dan Lu, and Guannan Zhang

In Uncertainty in Artificial Intelligence, 2021

Abs arXiv PDF Code

We consider the problem of minimizing multi-modal loss functions with a large number of local optima. Since the local gradient points to the direction of the steepest slope in an infinitesimal neighborhood, an optimizer guided by the local gradient is often trapped in a local minimum. To address this issue, we develop a novel nonlocal gradient to skip small local minima by capturing major structures of the loss’s landscape in black-box optimization. The nonlocal gradient is defined by a directional Gaussian smoothing (DGS) approach. The key idea of DGS is to conducts 1D long-range exploration with a large smoothing radius along orthogonal directions in , each of which defines a nonlocal directional derivative as a 1D integral. Such long-range exploration enables the nonlocal gradient to skip small local minima. The directional derivatives are then assembled to form the nonlocal gradient. We use the Gauss-Hermite quadrature rule to approximate the 1D integrals to obtain an accurate estimator. The superior performance of our method is demonstrated in three sets of examples, including benchmark functions for global optimization, and two real-world scientific problems.
AISTATS 2021

A Scalable Gradient Free Method for Bayesian Experimental Design with Implicit Models

Jiaxin Zhang, Sirui Bi, and Guannan Zhang

In International Conference on Artificial Intelligence and Statistics, 2021

Abs PDF

Bayesian experimental design (BED) is to answer the question that how to choose designs that maximize the information gathering. For implicit models, where the likelihood is intractable but sampling is possible, conventional BED methods have difficulties in efficiently estimating the posterior distribution and maximizing the mutual information (MI) between data and parameters. Recent work proposed the use of gradient ascent to maximize a lower bound on MI to deal with these issues. However, the approach requires a sampling path to compute the pathwise gradient of the MI lower bound with respect to the design variables, and such a pathwise gradient is usually inaccessible for implicit models. In this paper, we propose a novel approach that leverages recent advances in stochastic approximate gradient ascent incorporated with a smoothed variational MI estimator for efficient and robust BED. Without the necessity of pathwise gradients, our approach allows the design process to be achieved through a unified procedure with an approximate gradient for implicit models. Several experiments show that our approach outperforms baseline methods, and significantly improves the scalability of BED in high-dimensional problems
NeurIPS 2019

Learning Nonlinear Level Sets for Dimensionality Reduction in Function Approximation

Guannan Zhang, Jiaxin Zhang, and Jacob Hinkle

Advances in Neural Information Processing Systems, 2019

Abs arXiv PDF Code

We developed a Nonlinear Level-set Learning (NLL) method for dimensionality reduction in high-dimensional function approximation with small data. This work is motivated by a variety of design tasks in real-world engineering applications, where practitioners would replace their computationally intensive physical models (e.g., high-resolution fluid simulators) with fast-to-evaluate predictive machine learning models, so as to accelerate the engineering design processes. There are two major challenges in constructing such predictive models: (a) high-dimensional inputs (e.g., many independent design parameters) and (b) small training data, generated by running extremely time-consuming simulations. Thus, reducing the input dimension is critical to alleviate the over-fitting issue caused by data insufficiency. Existing methods, including sliced inverse regression and active subspace approaches, reduce the input dimension by learning a linear coordinate transformation; our main contribution is to extend the transformation approach to a nonlinear regime. Specifically, we exploit reversible networks (RevNets) to learn nonlinear level sets of a high-dimensional function and parameterize its level sets in low-dimensional spaces. A new loss function was designed to utilize samples of the target functions’ gradient to encourage the transformed function to be sensitive to only a few transformed coordinates. The NLL approach is demonstrated by applying it to three 2D functions and two 20D functions for showing the improved approximation accuracy with the use of nonlinear transformation, as well as to an 8D composite material design problem for optimizing the buckling-resistance performance of composite shells of rocket inter-stages.