Homepage - Xin Ai

Selected Publications (view all )

DepCache: A KV Cache Management Framework for GraphRAG with Dependency Attention

Hao Yuan, Xin Ai, Qiange Wang, Peizheng Li, Jiayang Yu, Chaoyi Chen, Xinbo Yang, Yanfeng Zhang, Zhenbo Fu, Yingyou Wen, Ge Yu

Special Interest Group on Management of Data (SIGMOD) 2026

We introduce dependency attention, a novel graph-aware attention mechanism that restricts attention computation to token pairs with structural dependencies in the retrieved subgraph. Unlike standard self-attention that computes fully connected interactions, dependency attention prunes irrelevant token pairs and reuses computations along shared relational paths, substantially reducing inference overhead. Building on this idea, we develop DepCache, a KV cache management framework tailored for dependency attention.

DepCache: A KV Cache Management Framework for GraphRAG with Dependency Attention

Hao Yuan, Xin Ai, Qiange Wang, Peizheng Li, Jiayang Yu, Chaoyi Chen, Xinbo Yang, Yanfeng Zhang, Zhenbo Fu, Yingyou Wen, Ge Yu

Special Interest Group on Management of Data (SIGMOD) 2026

NeutronHeter: Optimizing Distributed Graph Neural Network Training for Heterogeneous Clusters

Chunyu Cao*, Xin Ai*, Qiange Wang, Yanfeng Zhang, Zhenbo Fu, Hao Yuan, Mingyi Cao, Chaoyi Chen, Yingyou Wen, Yu Gu, Ge Yu (* equal contribution)

Proceedings of the International Conference on Management of Data (SIGMOD) 2026

We present NeutronHeter, an efficient GNN training system for heterogeneous clusters. Our system leverages two key components to achieve its performance, including a multi-level workload mapping framework that transforms the complex multi-way mapping problem into a top-down workload mapping on a tree-like resource graph, and an adaptive communication migration strategy that reduces communication overhead by migrating communication from low-bandwidth links to local computation or high-bandwidth links.

NeutronHeter: Optimizing Distributed Graph Neural Network Training for Heterogeneous Clusters

Chunyu Cao*, Xin Ai*, Qiange Wang, Yanfeng Zhang, Zhenbo Fu, Hao Yuan, Mingyi Cao, Chaoyi Chen, Yingyou Wen, Yu Gu, Ge Yu (* equal contribution)

Proceedings of the International Conference on Management of Data (SIGMOD) 2026

NeutronAscend: Optimizing GNN Training with Ascend AI Processors

Xin Ai, Bing Zhang, Qiange Wang, Yanfeng Zhang, Hao Yuan, Shufeng Gong, Ge Yu

ACM Transactions on Architecture and Code Optimization (TACO) 2025

The Ascend AI processor is typically architected on multiple AI Cores that are physically decoupled and designed for dense matrix computation. When processing graph data with inherent sparsity and power-law distribution, the Ascend AI processors suffer from the inter-core workload imbalance and inefficient intra-core resource utilization. In this paper, we present NeutronAscend, an efficient GNN training framework tailored for the Ascend AI processor. NeutronAscend employs two critical designs for both inter-core and intra-core performance optimization.

[Paper]

NeutronAscend: Optimizing GNN Training with Ascend AI Processors

Xin Ai, Bing Zhang, Qiange Wang, Yanfeng Zhang, Hao Yuan, Shufeng Gong, Ge Yu

ACM Transactions on Architecture and Code Optimization (TACO) 2025

[Paper]

NeutronTP: Load-Balanced Distributed Full-Graph GNN Training with Tensor Parallelism

Xin Ai, Hao Yuan, Zeyu Ling, Xin Ai, Qiange Wang, Yanfeng Zhang, Zhenbo Fu, Chaoyi Chen, Yu Gu, Ge Yu

Very Large Data Bases (VLDB) 2025

We present NeutronTP, a load-balanced and efficient distributed full-graph GNN training system. NeutronTP leverages GNN tensor parallelism for distributed training, which partitions feature rather than graph structures. Compared to GNN data parallelism, NeutronTP eliminates cross-worker vertex dependencies and achieves a balanced workload.

[Paper] [Code]

NeutronTP: Load-Balanced Distributed Full-Graph GNN Training with Tensor Parallelism

Xin Ai, Hao Yuan, Zeyu Ling, Xin Ai, Qiange Wang, Yanfeng Zhang, Zhenbo Fu, Chaoyi Chen, Yu Gu, Ge Yu

Very Large Data Bases (VLDB) 2025

[Paper] [Code]

NeutronTask: Scalable and Efficient Multi-GPU GNN Training with Task Parallelism

Zhenbo Fu*, Xin Ai*, Qiange Wang, Yanfeng Zhang, Shizhan Lu, Chaoyi Chen, Chunyu Cao, Hao Yuan, Zhewei Wei, Yu Gu, Yingyou Wen, Ge Yu (* equal contribution)

Very Large Data Bases (VLDB) 2025

In this work, we propose NeutronTask, a multi-GPU GNN training system that adopts GNN task parallelism. Instead of partitioning the graph structure, NeutronTask partitions training tasks in each layer across different GPUs, which significantly reduces neighbor replication.

[Paper] [Code]

NeutronTask: Scalable and Efficient Multi-GPU GNN Training with Task Parallelism

Zhenbo Fu*, Xin Ai*, Qiange Wang, Yanfeng Zhang, Shizhan Lu, Chaoyi Chen, Chunyu Cao, Hao Yuan, Zhewei Wei, Yu Gu, Yingyou Wen, Ge Yu (* equal contribution)

Very Large Data Bases (VLDB) 2025

[Paper] [Code]

NeutronOrch: Rethinking Sample-based GNN Training under CPU-GPU Heterogeneous Environments

Xin Ai, Qiange Wang, Chunyu Cao, Yanfeng Zhang, Chaoyi Chen, Hao Yuan, Yu Gu, Ge Yu

Very Large Data Bases (VLDB) 2024

In this paper, we propose NeutronOrch, a system for sample-based GNN training that incorporates a layer-based task orchestrating method and ensures balanced utilization of the CPU and GPU. NeutronOrch decouples the training process by layer and pushes down the training task of the bottom layer to the CPU. This significantly reduces the computational load and memory footprint of GPU training.

[Paper] [Code] [Slides]

NeutronOrch: Rethinking Sample-based GNN Training under CPU-GPU Heterogeneous Environments

Xin Ai, Qiange Wang, Chunyu Cao, Yanfeng Zhang, Chaoyi Chen, Hao Yuan, Yu Gu, Ge Yu

Very Large Data Bases (VLDB) 2024

[Paper] [Code] [Slides]

HyTGraph: GPU-Accelerated Graph Processing with Hybrid Transfer Management

Qiange Wang*, Xin Ai*, Yanfeng Zhang, Jing Chen, Ge Yu (* equal contribution)

International Conference on Data Engineering (ICDE) 2023

In this work, we propose a hybrid transfer management approach to take the merits of both the two approaches at runtime, with an objective to achieve the shortest execution time in each iteration. Based on the hybrid approach, we present HyTGraph, a GPU-accelerated graph processing framework, which is empowered by a set of effective task scheduling optimizations to improve the performance.

[Code] [Slides] [Video]

HyTGraph: GPU-Accelerated Graph Processing with Hybrid Transfer Management

Qiange Wang*, Xin Ai*, Yanfeng Zhang, Jing Chen, Ge Yu (* equal contribution)

International Conference on Data Engineering (ICDE) 2023

[Code] [Slides] [Video]

Warning

Action required

Education

Honors & Awards

Selected Publications (view all )

DepCache: A KV Cache Management Framework for GraphRAG with Dependency Attention

DepCache: A KV Cache Management Framework for GraphRAG with Dependency Attention

NeutronHeter: Optimizing Distributed Graph Neural Network Training for Heterogeneous Clusters

NeutronHeter: Optimizing Distributed Graph Neural Network Training for Heterogeneous Clusters

NeutronAscend: Optimizing GNN Training with Ascend AI Processors

NeutronAscend: Optimizing GNN Training with Ascend AI Processors

NeutronTP: Load-Balanced Distributed Full-Graph GNN Training with Tensor Parallelism

NeutronTP: Load-Balanced Distributed Full-Graph GNN Training with Tensor Parallelism

NeutronTask: Scalable and Efficient Multi-GPU GNN Training with Task Parallelism

NeutronTask: Scalable and Efficient Multi-GPU GNN Training with Task Parallelism

NeutronOrch: Rethinking Sample-based GNN Training under CPU-GPU Heterogeneous Environments

NeutronOrch: Rethinking Sample-based GNN Training under CPU-GPU Heterogeneous Environments

HyTGraph: GPU-Accelerated Graph Processing with Hybrid Transfer Management

HyTGraph: GPU-Accelerated Graph Processing with Hybrid Transfer Management

All publications