Homepage - Xin Ai

Selected Publications (view all )

NeutronAscend: Optimizing GNN Training with Ascend AI Processors

Xin Ai, Bing Zhang, Qiange Wang, Yanfeng Zhang, Hao Yuan, Shufeng Gong, Ge Yu

ACM Transactions on Architecture and Code Optimization (TACO) 2025

The Ascend AI processor is typically architected on multiple AI Cores that are physically decoupled and designed for dense matrix computation. When processing graph data with inherent sparsity and power-law distribution, the Ascend AI processors suffer from the inter-core workload imbalance and inefficient intra-core resource utilization. In this paper, we present NeutronAscend, an efficient GNN training framework tailored for the Ascend AI processor. NeutronAscend employs two critical designs for both inter-core and intra-core performance optimization.

[Paper] [Code]

NeutronAscend: Optimizing GNN Training with Ascend AI Processors

Xin Ai, Bing Zhang, Qiange Wang, Yanfeng Zhang, Hao Yuan, Shufeng Gong, Ge Yu

ACM Transactions on Architecture and Code Optimization (TACO) 2025

[Paper] [Code]

NeutronTP: Load-Balanced Distributed Full-Graph GNN Training with Tensor Parallelism

Xin Ai, Hao Yuan, Zeyu Ling, Xin Ai, Qiange Wang, Yanfeng Zhang, Zhenbo Fu, Chaoyi Chen, Yu Gu, Ge Yu

Very Large Data Bases (VLDB) 2025

We present NeutronTP, a load-balanced and efficient distributed full-graph GNN training system. NeutronTP leverages GNN tensor parallelism for distributed training, which partitions feature rather than graph structures. Compared to GNN data parallelism, NeutronTP eliminates cross-worker vertex dependencies and achieves a balanced workload.

[Paper] [Code] [Slides]

NeutronTP: Load-Balanced Distributed Full-Graph GNN Training with Tensor Parallelism

Xin Ai, Hao Yuan, Zeyu Ling, Xin Ai, Qiange Wang, Yanfeng Zhang, Zhenbo Fu, Chaoyi Chen, Yu Gu, Ge Yu

Very Large Data Bases (VLDB) 2025

[Paper] [Code] [Slides]

NeutronOrch: Rethinking Sample-based GNN Training under CPU-GPU Heterogeneous Environments

Xin Ai, Qiange Wang, Chunyu Cao, Yanfeng Zhang, Chaoyi Chen, Hao Yuan, Yu Gu, Ge Yu

Very Large Data Bases (VLDB) 2024

In this paper, we propose NeutronOrch, a system for sample-based GNN training that incorporates a layer-based task orchestrating method and ensures balanced utilization of the CPU and GPU. NeutronOrch decouples the training process by layer and pushes down the training task of the bottom layer to the CPU. This significantly reduces the computational load and memory footprint of GPU training.

[Paper] [Code] [Slides]

NeutronOrch: Rethinking Sample-based GNN Training under CPU-GPU Heterogeneous Environments

Xin Ai, Qiange Wang, Chunyu Cao, Yanfeng Zhang, Chaoyi Chen, Hao Yuan, Yu Gu, Ge Yu

Very Large Data Bases (VLDB) 2024

[Paper] [Code] [Slides]

Warning

Action required

Education

Honors & Awards

Selected Publications (view all )

NeutronAscend: Optimizing GNN Training with Ascend AI Processors

NeutronAscend: Optimizing GNN Training with Ascend AI Processors

NeutronTP: Load-Balanced Distributed Full-Graph GNN Training with Tensor Parallelism

NeutronTP: Load-Balanced Distributed Full-Graph GNN Training with Tensor Parallelism

NeutronOrch: Rethinking Sample-based GNN Training under CPU-GPU Heterogeneous Environments

NeutronOrch: Rethinking Sample-based GNN Training under CPU-GPU Heterogeneous Environments

All publications