# Publications

## 2024

- KDDRelaxing Continuous Constraints of Equivariant Graph Neural Networks for Broad Physical Dynamics LearningZinan Zheng, Yang Liu, Jia Li, Jianhua Yao, and
*Yu Rong**In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining*, 2024Incorporating Euclidean symmetries (e.g. rotation equivariance) as inductive biases into graph neural networks has improved their generalization ability and data efficiency in unbounded physical dynamics modeling. However, in various scientific and engineering applications, the symmetries of dynamics are frequently discrete due to the boundary conditions. Thus, existing GNNs either over-look necessary symmetry, resulting in suboptimal representation ability, or impose excessive equivariance, which fails to generalize to unobserved symmetric dynamics. In this work, we propose a general Discrete Equivariant Graph Neural Network (DEGNN) that guarantees equivariance to a given discrete point group. Specifically, we show that such discrete equivariant message passing could be constructed by transforming geometric features into permutation-invariant embeddings. Through relaxing continuous equivariant constraints, DEGNN can employ more geometric feature combinations to approximate unobserved physical object interaction functions. Two implementation approaches of DEGNN are proposed based on ranking or pooling permutation-invariant functions. We apply DEGNN to various physical dynamics, ranging from particle, molecular, crowd to vehicle dynamics. In twenty scenarios, DEGNN significantly outperforms existing state-of-the-art approaches. Moreover, we show that DEGNN is data efficient, learning with less data, and can generalize across scenarios such as unobserved orientation.

- Neural NetworksSolving the non-submodular network collapse problems via Decision TransformerKaili Ma, Han Yang, Shanchao Yang, Kangfei Zhao, Lanqing Li, Yongqiang Chen, and 3 more authors
*Neural Networks*, 2024Given a graph G, the network collapse problem (NCP) selects a vertex subset S of minimum cardinality from G such that the difference in the values of a given measure function f(G)−f(G∖S) is greater than a predefined collapse threshold. Many graph analytic applications can be formulated as NCPs with different measure functions, which often pose a significant challenge due to their NP-hard nature. As a result, traditional greedy algorithms, which select the vertex with the highest reward at each step, may not effectively find the optimal solution. In addition, existing learning-based algorithms do not have the ability to model the sequence of actions taken during the decision-making process, making it difficult to capture the combinatorial effect of selected vertices on the final solution. This limits the performance of learning-based approaches in non-submodular NCPs. To address these limitations, we propose a unified framework called DT-NC, which adapts the Decision Transformer to the Network Collapse problems. DT-NC takes into account the historical actions taken during the decision-making process and effectively captures the combinatorial effect of selected vertices. The ability of DT-NC to model the dependency among selected vertices allows it to address the difficulties caused by the non-submodular property of measure functions in some NCPs effectively. Through extensive experiments on various NCPs and graphs of different sizes, we demonstrate that DT-NC outperforms the state-of-the-art methods and exhibits excellent transferability and generalizability.

- VLDBInductive Attributed Community Search: To Learn Communities Across GraphsShuheng Fang, Kangfei Zhao,
*Yu Rong*, Zhixun Li, and Jeffrey Xu Yu*Proc. VLDB Endow.*, Aug 2024Attributed community search (ACS) aims to identify subgraphs satisfying both structure cohesiveness and attribute homogeneity in attributed graphs, for a given query that contains query nodes and query attributes. Previously, algorithmic approaches deal with ACS in a two-stage paradigm, which suffer from structural inflexibility and attribute irrelevance. To overcome this problem, recently, learning-based approaches have been proposed to learn both structures and attributes simultaneously as a one-stage paradigm. However, these approaches train a transductive model which assumes the graph to infer unseen queries is as same as the graph used for training. That limits the generalization and adaptation of these approaches to different heterogeneous graphs.In this paper, we propose a new framework, Inductive Attributed Community Search, IACS, by inductive learning, which can be used to infer new queries for different communities/graphs. Specifically, IACS employs an encoder-decoder neural architecture to handle an ACS task at a time, where a task consists of a graph with only a few queries and corresponding ground-truth. We design a three-phase workflow, "training-adaptation-inference", which learns a shared model to absorb and induce prior effective common knowledge about ACS across different tasks. And the shared model can swiftly adapt to a new task with small number of ground-truth. We conduct substantial experiments in 7 real-world datasets to verify the effectiveness of IACS for CS/ACS. Our approach IACS achieves 28.97% and 25.60% improvements in F1-score on average in CS and ACS, respectively.

- J COMPUT BIOLToward Robust Self-Training Paradigm for Molecular Prediction TasksHehuan Ma, Feng Jiang,
*Yu Rong*, Yuzhi Guo, and Junzhou Huang*Journal of Computational Biology*, 2024Molecular prediction tasks normally demand a series of professional experiments to label the target molecule, which suffers from the limited labeled data problem. One of the semisupervised learning paradigms, known as self-training, utilizes both labeled and unlabeled data. Specifically, a teacher model is trained using labeled data and produces pseudo labels for unlabeled data. These labeled and pseudo-labeled data are then jointly used to train a student model. However, the pseudo labels generated from the teacher model are generally not sufficiently accurate. Thus, we propose a robust self-training strategy by exploring robust loss function to handle such noisy labels in two paradigms, that is, generic and adaptive. We have conducted experiments on three molecular biology prediction tasks with four backbone models to gradually evaluate the performance of the proposed robust self-training strategy. The results demonstrate that the proposed method enhances prediction performance across all tasks, notably within molecular regression tasks, where there has been an average enhancement of 41.5%. Furthermore, the visualization analysis confirms the superiority of our method. Our proposed robust self-training is a simple yet effective strategy that efficiently improves molecular biology prediction performance. It tackles the labeled data insufficient issue in molecular biology by taking advantage of both labeled and unlabeled data. Moreover, it can be easily embedded with any prediction task, which serves as a universal approach for the bioinformatics community.

- Nature MethodsscPROTEIN: a versatile deep graph contrastive learning framework for single-cell proteomics embeddingWei Li, Fan Yang, Fang Wang,
*Yu Rong*, Linjing Liu, Bingzhe Wu, and 2 more authors*Nature Methods*, Apr 2024Single-cell proteomics sequencing technology sheds light on protein–protein interactions, posttranslational modifications and proteoform dynamics in the cell. However, the uncertainty estimation for peptide quantification, data missingness, batch effects and high noise hinder the analysis of single-cell proteomic data. It is important to solve this set of tangled problems together, but the existing methods tailored for single-cell transcriptomes cannot fully address this task. Here we propose a versatile framework designed for single-cell proteomics data analysis called scPROTEIN, which consists of peptide uncertainty estimation based on a multitask heteroscedastic regression model and cell embedding generation based on graph contrastive learning. scPROTEIN can estimate the uncertainty of peptide quantification, denoise protein data, remove batch effects and encode single-cell proteomic-specific embeddings in a unified framework. We demonstrate that scPROTEIN is efficient for cell clustering, batch correction, cell type annotation, clinical analysis and spatially resolved proteomic data exploration.

## 2023

- AAAIDrugOOD: Out-of-Distribution Dataset Curator and Benchmark for AI-Aided Drug Discovery – a Focus on Affinity Prediction Problems with Noise AnnotationsYuanfeng Ji, Lu Zhang, Jiaxiang Wu, Bingzhe Wu, Lanqing Li, Long-Kai Huang, and 11 more authors
*Proceedings of the AAAI Conference on Artificial Intelligence*, Jun 2023AI-aided drug discovery (AIDD) is gaining popularity due to its potential to make the search for new pharmaceuticals faster, less expensive, and more effective. Despite its extensive use in numerous fields (e.g., ADMET prediction, virtual screening), little research has been conducted on the out-of-distribution (OOD) learning problem with noise. We present DrugOOD, a systematic OOD dataset curator and benchmark for AIDD. Particularly, we focus on the drug-target binding affinity prediction problem, which involves both macromolecule (protein target) and small-molecule (drug compound). DrugOOD offers an automated dataset curator with user-friendly customization scripts, rich domain annotations aligned with biochemistry knowledge, realistic noise level annotations, and rigorous benchmarking of SOTA OOD algorithms, as opposed to only providing fixed datasets. Since the molecular data is often modeled as irregular graphs using graph neural network (GNN) backbones, DrugOOD also serves as a valuable testbed for graph OOD learning problems. Extensive empirical studies have revealed a significant performance gap between in-distribution and out-of-distribution experiments, emphasizing the need for the development of more effective schemes that permit OOD generalization under noise for AIDD.

- TKDEAdversarial Attack Framework on Graph Embedding Models With Limited KnowledgeHeng Chang,
*Yu Rong*, Tingyang Xu, Wenbing Huang, Honglei Zhang, Peng Cui, and 3 more authors*IEEE Transactions on Knowledge and Data Engineering*, 2023With the success of the graph embedding model in both academic and industrial areas, the robustness of graph embeddings against adversarial attacks inevitably becomes a crucial problem in graph learning. Existing works usually perform the attack in a white-box fashion: they need to access the predictions/labels to construct their adversarial losses. However, the inaccessibility of predictions/labels makes the white-box attack impractical for a real graph learning system. This paper promotes current frameworks in a more general and flexible sense – we consider the ability of various types of graph embedding models to remain resilient against black-box driven attacks. We investigate the theoretical connection between graph signal processing and graph embedding models, and formulate the graph embedding model as a general graph signal process with a corresponding graph filter. Therefore, we design a generalized adversarial attack framework: GF-Attack . Without accessing any labels and model predictions, GF-Attack can perform the attack directly on the graph filter in a black-box fashion. We further prove that GF-Attack can perform an effective attack without assumption on the number of layers/window-size of graph embedding models. To validate the generalization of GF-Attack , we construct GF-Attack on five popular graph embedding models. Extensive experiments validate the effectiveness of GF-Attack on several benchmark datasets.

- Information SciencesExploiting node-feature bipartite graph in graph convolutional networksYuli Jiang, Huaijia Lin, Ye Li,
*Yu Rong*, Hong Cheng, and Xin Huang*Information Sciences*, 2023In recent years, Graph Convolutional Networks (GCNs), which extend convolutional neural networks to graph structure, have achieved great success on many graph learning tasks by fusing structure and feature information, such as node classification. However, the graph structure is constructed from real-world data and usually contains noise or redundancy. In addition, this structural information is based on manually defined relations and is not potentially optimal for downstream tasks. In this paper, we utilize the knowledge from node features to enhance the expressive power of GCN models in a plug-and-play fashion. Specifically, we build a node-feature bipartite graph and exploit the bipartite graph convolutional network to model node-feature relations. By aligning results from the original graph structure and node-feature relations, we can make a more accurate prediction for each node in an end-to-end manner. Extensive experiments demonstrate that the proposed model can extract knowledge from two branches and improve the performance of various GCN models on typical graph data sets and 3D point cloud data.

- VLDBJLearned sketch for subgraph counting: a holistic approachKangfei Zhao, Jeffrey Xu Yu, Qiyan Li, Hao Zhang, and
*Yu Rong**The VLDB Journal*, 2023Subgraph counting, as a fundamental problem in network analysis, is to count the number of subgraphs in a data graph that match a given query graph by either homomorphism or subgraph isomorphism. The importance of subgraph counting derives from the fact that it provides insights of a large graph, in particular a labeled graph, when a collection of query graphs with different sizes and labels are issued. The problem of counting is challenging. On the one hand, exact counting by enumerating subgraphs is NP-hard. On the other hand, approximate counting by subgraph isomorphism can only support small query graphs over unlabeled graphs. Another way for subgraph counting is to specify it as an SQL query and estimate the cardinality of the query in RDBMS. Existing approaches for cardinality estimation can only support subgraph counting by homomorphism up to some extent, as it is difficult to deal with sampling failure when a query graph becomes large. A question that arises is how we support subgraph counting by machine learning (ML) and deep learning (DL). To devise an ML/DL solution, apart from the query graphs, another issue is to deal with large data graphs by ML/DL, as the existing DL approach for subgraph isomorphism counting can only support small data graphs. In addition, the ML/DL approaches proposed in RDBMS context for approximate query processing and cardinality estimation cannot be used, as subgraph counting is to do complex self-joins over one relation, whereas existing approaches focus on multiple relations. In this work, we propose an active learned sketch for subgraph counting (𝖠𝖫𝖲𝖲 ) with two main components: a learned sketch for subgraph counting and an active learner. The sketch is constructed by a neural network regression model, and the active learner is to perform model updates based on new arrival test query graphs. Our holistic learning framework supports both undirected graphs and directed graphs, whose nodes and/or edges are associated zero to multiple labels. We conduct extensive experimental studies to confirm the effectiveness and efficiency of 𝖠𝖫𝖲𝖲 using large real labeled graphs. Moreover, we show that 𝖠𝖫𝖲𝖲 can assist query optimizers in finding a better query plan for complex multi-way self-joins.

- TMLRNoise-robust Graph Learning by Estimating and Leveraging Pairwise InteractionsXuefeng Du, Tian Bian,
*Yu Rong*, Bo Han, Tongliang Liu, Tingyang Xu, and 3 more authors*Transactions on Machine Learning Research*, 2023Teaching Graph Neural Networks (GNNs) to accurately classify nodes under severely noisy labels is an important problem in real-world graph learning applications, but is currently underexplored. Although pairwise training methods have demonstrated promise in supervised metric learning and unsupervised contrastive learning, they remain less studied on noisy graphs, where the structural pairwise interactions (PI) between nodes are abundant and thus might benefit label noise learning rather than the pointwise methods. This paper bridges the gap by proposing a pairwise framework for noisy node classification on graphs, which relies on the PI as a primary learning proxy in addition to the pointwise learning from the noisy node class labels. Our proposed framework PI-GNN contributes two novel components: (1) a confidence-aware PI estimation model that adaptively estimates the PI labels, which are defined as whether the two nodes share the same node labels, and (2) a decoupled training approach that leverages the estimated PI labels to regularize a node classification model for robust node classification. Extensive experiments on different datasets and GNN architectures demonstrate the effectiveness of PI-GNN, yielding a promising improvement over the state-of-the-art methods. Code is publicly available at https://github.com/TianBian95/pi-gnn.

- AAAIHuman Mobility Modeling during the COVID-19 Pandemic via Deep Graph Diffusion InfomaxYang Liu,
*Yu Rong*, Zhuoning Guo, Nuo Chen, Tingyang Xu, Fugee Tsung, and 1 more author*Proceedings of the AAAI Conference on Artificial Intelligence*, Jun 2023Non-Pharmaceutical Interventions (NPIs), such as social gathering restrictions, have shown effectiveness to slow the transmission of COVID-19 by reducing the contact of people. To support policy-makers, multiple studies have first modelled human mobility via macro indicators (e.g., average daily travel distance) and then study the effectiveness of NPIs. In this work, we focus on mobility modelling and, from a micro perspective, aim to predict locations that will be visited by COVID-19 cases. Since NPIs generally cause economic and societal loss, such a prediction benefits governments when they design and evaluate them. However, in real-world situations, strict privacy data protection regulations result in severe data sparsity problems (i.e., limited case and location information). To address these challenges and jointly model variables including a geometric graph, a set of diffusions and a set of locations, we propose a model named Deep Graph Diffusion Infomax (DGDI). We show the maximization of DGDI can be bounded by two tractable components: a univariate Mutual Information (MI) between geometric graph and diffusion representation, and a univariate MI between diffusion representation and location representation. To facilitate the research of COVID-19 prediction, we present two benchmarks that contain geometric graphs and location histories of COVID-19 cases. Extensive experiments on the two benchmarks show that DGDI significantly outperforms other competing methods.

- DASFAALearning With Small Data: Subgraph Counting QueriesKangfei Zhao, Jeffrey Xu Yu, Zongyan He, and
*Yu Rong**In Database Systems for Advanced Applications: 28th International Conference*, 2023Deep Learning (DL) has been widely used in many applications, and its success is achieved with large training data. A key issue is how to provide a DL solution when there is no efficient training data to learn initially. In this paper, we explore a meta learning approach for a specific problem, subgraph isomorphism counting, which is a fundamental problem in graph analysis to count the number of a given pattern graph, p, in a data graph, g, that matches p. This problem is NP-hard, and needs large training data to learn by DL in nature. To solve this problem, we design a Gaussian Process (GP) model which combines graph neural network with Bayesian nonparametric, and we train the GP by a meta learning algorithm on a small set of training data. By meta learning, we obtain a generalized meta-model to better encode the information of data and pattern graphs and capture the prior of small tasks. We handle a collection of pairs (g, p), as a task, where some pairs may be associated with the ground-truth, and some pairs are the queries to answer. There are two cases. One is there are some with ground-truth (few-shot), and one is there is none with ground-truth (zero-shot). We provide our solutions for both. We conduct substantial experiments to confirm that our approach is robust to model degeneration on small training data, and our meta model can fast adapt to new queries by few/zero-shot learning.

- TPAMISemi-Supervised Hierarchical Graph ClassificationJia Li, Yongfeng Huang, Heng Chang, and
*Yu Rong**IEEE Transactions on Pattern Analysis and Machine Intelligence*, 2023Node classification and graph classification are two graph learning problems that predict the class label of a node and the class label of a graph respectively. A node of a graph usually represents a real-world entity, e.g., a user in a social network, or a document in a document citation network. In this work, we consider a more challenging but practically useful setting, in which a node itself is a graph instance. This leads to a hierarchical graph perspective which arises in many domains such as social network, biological network and document collection. We study the node classification problem in the hierarchical graph where a “node” is a graph instance. As labels are usually limited, we design a novel semi-supervised solution named SEAL-CI. SEAL-CI adopts an iterative framework that takes turns to update two modules, one working at the graph instance level and the other at the hierarchical graph level. To enforce a consistency among different levels of hierarchical graph, we propose the Hierarchical Graph Mutual Information (HGMI) and further present a way to compute HGMI with theoretical guarantee. We demonstrate the effectiveness of this hierarchical graph modeling and the proposed SEAL-CI method on text and social network data.

- TNNLSStructure-Aware DropEdge Toward Deep Graph Convolutional NetworksJiaqi Han, Wenbing Huang,
*Yu Rong*, Tingyang Xu, Fuchun Sun, and Junzhou Huang*IEEE Transactions on Neural Networks and Learning Systems*, 2023It has been discovered that graph convolutional networks (GCNs) encounter a remarkable drop in performance when multiple layers are piled up. The main factor that accounts for why deep GCNs fail lies in oversmoothing, which isolates the network output from the input with the increase of network depth, weakening expressivity and trainability. In this article, we start by investigating refined measures upon DropEdge—an existing simple yet effective technique to relieve oversmoothing. We term our method as DropEdge ++ for its two structure-aware samplers in contrast to DropEdge: layer-dependent (LD) sampler and feature-dependent (FD) sampler. Regarding the LD sampler, we interestingly find that increasingly sampling edges from the bottom layer yields superior performance than the decreasing counterpart as well as DropEdge. We theoretically reveal this phenomenon with mean-edge-number (MEN), a metric closely related to oversmoothing. For the FD sampler, we associate the edge sampling probability with the feature similarity of node pairs and prove that it further correlates the convergence subspace of the output layer with the input features. Extensive experiments on several node classification benchmarks, including both full-and semi-supervised tasks, illustrate the efficacy of DropEdge ++ and its compatibility with a variety of backbones by achieving generally better performance over DropEdge and the no-drop version.

- VLDBComputing Graph Edit Distance via Neural Graph MatchingChengzhi Piao, Tingyang Xu, Xiangguo Sun,
*Yu Rong*, Kangfei Zhao, and Hong Cheng*Proc. VLDB Endow.*, Jun 2023Graph edit distance (GED) computation is a fundamental NP-hard problem in graph theory. Given a graph pair (G1, G2), GED is defined as the minimum number of primitive operations converting G1 to G2. Early studies focus on search-based inexact algorithms such as A*-beam search, and greedy algorithms using bipartite matching due to its NP-hardness. They can obtain a sub-optimal solution by constructing an edit path (the sequence of operations that converts G1 to G2). Recent studies convert the GED between a given graph pair (G1, G2) into a similarity score in the range (0, 1) by a well designed function. Then machine learning models (mostly based on graph neural networks) are applied to predict the similarity score. They achieve a much higher numerical precision than the sub-optimal solutions found by classical algorithms. However, a major limitation is that these machine learning models cannot generate an edit path. They treat the GED computation as a pure regression task to bypass its intrinsic complexity, but ignore the essential task of converting G1 to G2. This severely limits the interpretability and usability of the solution.In this paper, we propose a novel deep learning framework that solves the GED problem in a two-step manner: 1) The proposed graph neural network GEDGNN is in charge of predicting the GED value and a matching matrix; and 2) A post-processing algorithm based on k-best matching is used to derive k possible node matchings from the matching matrix generated by GEDGNN. The best matching will finally lead to a high-quality edit path. Extensive experiments are conducted on three real graph data sets and synthetic power-law graphs to demonstrate the effectiveness of our framework. Compared to the best result of existing GNN-based models, the mean absolute error (MAE) on GED value prediction decreases by 4.9% 74.3%. Compared to the state-of-the-art searching algorithm Noah, the MAE on GED value based on edit path reduces by 53.6% 88.1%.

- ICDEDecision Support System for Chronic Diseases Based on Drug-Drug InteractionsT. Bian, Y. Jiang, J. Li, T. Xu, Y. Rong, Y. Su, and 3 more authors
*In 2023 IEEE 39th International Conference on Data Engineering (ICDE)*, Apr 2023Many patients with chronic diseases resort to multiple medications to relieve various symptoms, which raises concerns about the safety of multiple medication use, as severe drug-drug antagonism can lead to serious adverse effects or even death. This paper presents a Decision Support System, called DSSDDI, based on drug-drug interactions to support doctors prescribing decisions. DSSDDI contains three modules, Drug-Drug Interaction (DDI) module, Medical Decision (MD) module and Medical Support (MS) module. The DDI module learns safer and more effective drug representations from the drug-drug interactions. To capture the potential causal relationship between DDI and medication use, the MD module considers the representations of patients and drugs as context, DDI and patients’ similarity as treatment, and medication use as outcome to construct counterfactual links for the representation learning. Furthermore, the MS module provides drug candidates to doctors with explanations. Experiments on the chronic data collected from the Hong Kong Chronic Disease Study Project and a public diagnostic data MIMIC-III demonstrate that DSSDDI can be a reliable reference for doctors in terms of safety and efficiency of clinical diagnosis, with significant improvements compared to baseline methods. Source code of the proposed DSSDDI is publicly available at https://github.com/TianBian95/DSSDDI.

- Nature Comm.Collaborative and privacy-preserving retired battery sorting for profitable direct recycling via federated machine learningShengyu Tao, Haizhou Liu, Chongbo Sun, Haocheng Ji, Guanjun Ji, Zhiyuan Han, and 11 more authors
*Nature Communications*, Dec 2023Unsorted retired batteries with varied cathode materials hinder the adoption of direct recycling due to their cathode-specific nature. The surge in retired batteries necessitates precise sorting for effective direct recycling, but challenges arise from varying operational histories, diverse manufacturers, and data privacy concerns of recycling collaborators (data owners). Here we show, from a unique dataset of 130 lithium-ion batteries spanning 5 cathode materials and 7 manufacturers, a federated machine learning approach can classify these retired batteries without relying on past operational data, safeguarding the data privacy of recycling collaborators. By utilizing the features extracted from the end-of-life charge-discharge cycle, our model exhibits 1% and 3% cathode sorting errors under homogeneous and heterogeneous battery recycling settings respectively, attributed to our innovative Wasserstein-distance voting strategy. Economically, the proposed method underscores the value of precise battery sorting for a prosperous and sustainable recycling industry. This study heralds a new paradigm of using privacy-sensitive data from diverse sources, facilitating collaborative and privacy-respecting decision-making for distributed systems.

- BRIEF BIOINFORMscMHNN: a novel hypergraph neural network for integrative analysis of single-cell epigenomic, transcriptomic and proteomic dataWei Li, Bin Xiang, Fan Yang,
*Yu Rong*, Yanbin Yin, Jianhua Yao, and 1 more author*Briefings in Bioinformatics*, Nov 2023Technological advances have now made it possible to simultaneously profile the changes of epigenomic, transcriptomic and proteomic at the single cell level, allowing a more unified view of cellular phenotypes and heterogeneities. However, current computational tools for single-cell multi-omics data integration are mainly tailored for bi-modality data, so new tools are urgently needed to integrate tri-modality data with complex associations. To this end, we develop scMHNN to integrate single-cell multi-omics data based on hypergraph neural network. After modeling the complex data associations among various modalities, scMHNN performs message passing process on the multi-omics hypergraph, which can capture the high-order data relationships and integrate the multiple heterogeneous features. Followingly, scMHNN learns discriminative cell representation via a dual-contrastive loss in self-supervised manner. Based on the pretrained hypergraph encoder, we further introduce the pre-training and fine-tuning paradigm, which allows more accurate cell-type annotation with only a small number of labeled cells as reference. Benchmarking results on real and simulated single-cell tri-modality datasets indicate that scMHNN outperforms other competing methods on both cell clustering and cell-type annotation tasks. In addition, we also demonstrate scMHNN facilitates various downstream tasks, such as cell marker detection and enrichment analysis.

- KDDPrivacy Matters: Vertical Federated Linear Contextual Bandits for Privacy Protected RecommendationZeyu Cao, Zhipeng Liang, Bingzhe Wu, Shu Zhang, Hangyu Li, Ouyang Wen, and 2 more authors
*In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining*, 2023Recent awareness of privacy protection and compliance requirement resulted in a controversial view of recommendation system due to personal data usage. Therefore, privacy-protected recommendation emerges as a novel research direction. In this paper, we first formulate this problem as a vertical federated learning problem, i.e., features are vertically distributed over different departments. We study a contextual bandit learning problem for recommendation in the vertical federated setting. To this end, we carefully design a customized encryption scheme named orthogonal matrix-based mask mechanism (O3M). O3M mechanism, a tailored component for contextual bandits by carefully exploiting their shared structure, can ensure privacy protection while avoiding expensive conventional cryptographic techniques. We further apply the mechanism to two commonly-used bandit algorithms, LinUCB and LinTS, and instantiate two practical protocols for online recommendation. The proposed protocols can perfectly recover the service quality of centralized bandit algorithms while achieving a satisfactory runtime efficiency, which is theoretically proved and analysed in this paper. By conducting extensive experiments on both synthetic and real-world datasets, we show the superiority of the proposed method in terms of privacy protection and recommendation performance.

- AAAIEnergy-Motivated Equivariant Pretraining for 3D Molecular GraphsRui Jiao, Jiaqi Han, Wenbing Huang,
*Yu Rong*, and Yang Liu*Proceedings of the AAAI Conference on Artificial Intelligence*, Jun 2023Pretraining molecular representation models without labels is fundamental to various applications. Conventional methods mainly process 2D molecular graphs and focus solely on 2D tasks, making their pretrained models incapable of characterizing 3D geometry and thus defective for downstream 3D tasks. In this work, we tackle 3D molecular pretraining in a complete and novel sense. In particular, we first propose to adopt an equivariant energy-based model as the backbone for pretraining, which enjoys the merits of fulfilling the symmetry of 3D space. Then we develop a node-level pretraining loss for force prediction, where we further exploit the Riemann-Gaussian distribution to ensure the loss to be E(3)-invariant, enabling more robustness. Moreover, a graph-level noise scale prediction task is also leveraged to further promote the eventual performance. We evaluate our model pretrained from a large-scale 3D dataset GEOM-QM9 on two challenging 3D benchmarks: MD17 and QM9. Experimental results demonstrate the efficacy of our method against current state-of-the-art pretraining approaches, and verify the validity of our design for each proposed component. Code is available at https://github.com/jiaor17/3D-EMGP.

- CIKMGeometric Graph Learning for Protein Mutation Effect PredictionKangfei Zhao,
*Yu Rong*, Biaobin Jiang, Jianheng Tang, Hengtong Zhang, Jeffrey Xu Yu, and 1 more author*In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management*, 2023Proteins govern a wide range of biological systems. Evaluating the changes in protein properties upon protein mutation is a fundamental application of protein design, where modeling the 3D protein structure is a principal task for AI-driven computational approaches. Existing deep learning (DL) approaches represent the protein structure as a 3D geometric graph and simplify the graph modeling to different degrees, thereby failing to capture the low-level atom patterns and high-level amino acid patterns simultaneously. In addition, limited training samples with ground truth labels and protein structures further restrict the effectiveness of DL approaches. In this paper, we propose a new graph learning framework, Hierarchical Graph Invariant Network (HGIN), a fine-grained and data-efficient graph neural encoder for encoding protein structures and predicting the mutation effect on protein properties. For fine-grained modeling, HGIN hierarchically models the low-level interactions of atoms and the high-level interactions of amino acid residues by Graph Neural Networks. For data efficiency, HGIN preserves the invariant encoding for atom permutation and coordinate transformation, which is an intrinsic inductive bias of property prediction that bypasses data augmentations. We integrate HGIN into a Siamese network to predict the quantitative effect on protein properties upon mutations. Our approach outperforms 9 state-of-the-art approaches on 3 protein datasets. More inspiringly, when predicting the neutralizing ability of human antibodies against COVID-19 mutant viruses, HGIN achieves an absolute improvement of 0.23 regarding the Spearman coefficient.

- TKDEFinding Critical Users in Social Communities via Graph ConvolutionsKangfei Zhao, Zhiwei Zhang,
*Yu Rong*, Jeffrey Xu Yu, and Junzhou Huang*IEEE Transactions on Knowledge and Data Engineering*, 2023Finding critical users whose existence keeps a social community cohesive and large is an important issue in social networks. In the literature, such criticalness of a user is measured by the number of followers who will leave the community together when the user leaves. By taking a social community as a k -core, which can be computed in linear time, the problem of finding critical users is to find a set of nodes, U , with a user-given size b in a k -core community that maximizes the number of nodes (followers) to be deleted from the k -core when all nodes in U are deleted. This problem is known to be NP-hard. In the literature, the state-of-the-art approach, a greedy algorithm is proposed with no guarantee on the set of nodes U found, since there does not exist a submodular function the greedy algorithm can use to get a better answer iteratively. Furthermore, the greedy algorithm designed is to handle k -core in any social networks such that it does not consider the structural complexity of a given single graph and cannot get the global optimal by the local optimal found in iterations. In this paper, we propose a novel learning-based approach. Distinguished from traditional experience-based heuristics, we propose a neural network model, called Self-attentive Core Graph Convolution Network ( SCGCN ), to capture the hidden structure of the criticalness among node combinations that break the engagement of a specific social community. Supervised by sampling node combinations, SCGCN has the ability to inference the criticalness of unseen combinations of nodes. To further reduce the sampling and inference space, we propose a deterministic strategy to prune unpromising nodes on the graph. Our experiments conducted on many real-world graphs show that SCGCN significantly improves the quality of the solution compared with the state-of-the-art greedy algorithm.

## 2022

- BioinformaticsCross-dependent graph neural networks for molecular property predictionHehuan Ma, Yatao Bian,
*Yu Rong*, Wenbing Huang, Tingyang Xu, Weiyang Xie, and 2 more authors*Bioinformatics*, Jan 2022The crux of molecular property prediction is to generate meaningful representations of the molecules. One promising route is to exploit the molecular graph structure through graph neural networks (GNNs). Both atoms and bonds significantly affect the chemical properties of a molecule, so an expressive model ought to exploit both node (atom) and edge (bond) information simultaneously. Inspired by this observation, we explore the multi-view modeling with GNN (MVGNN) to form a novel paralleled framework, which considers both atoms and bonds equally important when learning molecular representations. In specific, one view is atom-central and the other view is bond-central, then the two views are circulated via specifically designed components to enable more accurate predictions. To further enhance the expressive power of MVGNN, we propose a cross-dependent message-passing scheme to enhance information communication of different views. The overall framework is termed as CD-MVGNN.We theoretically justify the expressiveness of the proposed model in terms of distinguishing non-isomorphism graphs. Extensive experiments demonstrate that CD-MVGNN achieves remarkably superior performance over the state-of-the-art models on various challenging benchmarks. Meanwhile, visualization results of the node importance are consistent with prior knowledge, which confirms the interpretability power of CD-MVGNN.The code and data underlying this work are available in GitHub at https://github.com/uta-smile/CD-MVGNN.Supplementary data are available at Bioinformatics online.

- ICMLLocal Augmentation for Graph Neural NetworksSongtao Liu, Rex Ying, Hanze Dong, Lanqing Li, Tingyang Xu,
*Yu Rong*, and 3 more authors*In Proceedings of the 39th International Conference on Machine Learning*, 17–23 jul 2022Graph Neural Networks (GNNs) have achieved remarkable performance on graph-based tasks. The key idea for GNNs is to obtain informative representation through aggregating information from local neighborhoods. However, it remains an open question whether the neighborhood information is adequately aggregated for learning representations of nodes with few neighbors. To address this, we propose a simple and efficient data augmentation strategy, local augmentation, to learn the distribution of the node representations of the neighbors conditioned on the central node’s representation and enhance GNN’s expressive power with generated features. Local augmentation is a general framework that can be applied to any GNN model in a plug-and-play manner. It samples feature vectors associated with each node from the learned conditional distribution as additional input for the backbone model at each training iteration. Extensive experiments and analyses show that local augmentation consistently yields performance improvement when applied to various GNN architectures across a diverse set of benchmarks. For example, experiments show that plugging in local augmentation to GCN and GAT improves by an average of 3.4% and 1.6% in terms of test accuracy on Cora, Citeseer, and Pubmed. Besides, our experimental results on large graphs (OGB) show that our model consistently improves performance over backbones. Code is available at https://github.com/SongtaoLiu0823/LAGNN.

- arXivTransformer for Graphs: An Overview from Architecture PerspectiveErxue Min, Runfa Chen, Yatao Bian, Tingyang Xu, Kangfei Zhao, Wenbing Huang, and 4 more authors2022
Recently, Transformer model, which has achieved great success in many artificial intelligence fields, has demonstrated its great potential in modeling graph-structured data. Till now, a great variety of Transformers has been proposed to adapt to the graph-structured data. However, a comprehensive literature review and systematical evaluation of these Transformer variants for graphs are still unavailable. It’s imperative to sort out the existing Transformer models for graphs and systematically investigate their effectiveness on various graph tasks. In this survey, we provide a comprehensive review of various Graph Transformer models from the architectural design perspective. We first disassemble the existing models and conclude three typical ways to incorporate the graph information into the vanilla Transformer: 1) GNNs as Auxiliary Modules, 2) Improved Positional Embedding from Graphs, and 3) Improved Attention Matrix from Graphs. Furthermore, we implement the representative components in three groups and conduct a comprehensive comparison on various kinds of famous graph data benchmarks to investigate the real performance gain of each component. Our experiments confirm the benefits of current graph-specific modules on Transformer and reveal their advantages on different kinds of graph tasks.

- TPAMIGraph Convolutional Module for Temporal Action Localization in VideosRunhao Zeng, Wenbing Huang, Mingkui Tan,
*Yu Rong*, Peilin Zhao, Junzhou Huang, and 1 more author*IEEE Transactions on Pattern Analysis and Machine Intelligence*, 2022Temporal action localization, which requires a machine to recognize the location as well as the category of action instances in videos, has long been researched in computer vision. The main challenge of temporal action localization lies in that videos are usually long and untrimmed with diverse action contents involved. Existing state-of-the-art action localization methods divide each video into multiple action units (i.e., proposals in two-stage methods and segments in one-stage methods) and then perform action recognition/regression on each of them individually, without explicitly exploiting their relations during learning. In this paper, we claim that the relations between action units play an important role in action localization, and a more powerful action detector should not only capture the local content of each action unit but also allow a wider field of view on the context related to it. To this end, we propose a general graph convolutional module (GCM) that can be easily plugged into existing action localization methods, including two-stage and one-stage paradigms. To be specific, we first construct a graph, where each action unit is represented as a node and their relations between two action units as an edge. Here, we use two types of relations, one for capturing the temporal connections between different action units, and the other one for characterizing their semantic relationship. Particularly for the temporal connections in two-stage methods, we further explore two different kinds of edges, one connecting the overlapping action units and the other one connecting surrounding but disjointed units. Upon the graph we built, we then apply graph convolutional networks (GCNs) to model the relations among different action units, which is able to learn more informative representations to enhance action localization. Experimental results show that our GCM consistently improves the performance of existing action localization methods, including two-stage methods (e.g., CBR [15] and R-C3D [47]) and one-stage methods (e.g., D-SSAD [22]), verifying the generality and effectiveness of our GCM. Moreover, with the aid of GCM, our approach significantly outperforms the state-of-the-art on THUMOS14 (50.9 percent versus 42.8 percent). Augmentation experiments on ActivityNet also verify the efficacy of modeling the relationships between action units. The source code and the pre-trained models are available at https://github.com/Alvin-Zeng/GCM .

- arXivGeometrically Equivariant Graph Neural Networks: A SurveyJiaqi Han,
*Yu Rong*, Tingyang Xu, and Wenbing Huang2022Many scientific problems require to process data in the form of geometric graphs. Unlike generic graph data, geometric graphs exhibit symmetries of translations, rotations, and/or reflections. Researchers have leveraged such inductive bias and developed geometrically equivariant Graph Neural Networks (GNNs) to better characterize the geometry and topology of geometric graphs. Despite fruitful achievements, it still lacks a survey to depict how equivariant GNNs are progressed, which in turn hinders the further development of equivariant GNNs. To this end, based on the necessary but concise mathematical preliminaries, we analyze and classify existing methods into three groups regarding how the message passing and aggregation in GNNs are represented. We also summarize the benchmarks as well as the related datasets to facilitate later researches for methodology development and experimental evaluation. The prospect for future potential directions is also provided.

- ICLREquivariant Graph Mechanics Networks with ConstraintsWenbing Huang, Jiaqi Han,
*Yu Rong*, Tingyang Xu, Fuchun Sun, and Junzhou Huang*In The Tenth International Conference on Learning Representations, ICLR 2022*, 2022Learning to reason about relations and dynamics over multiple interacting objects is a challenging topic in machine learning. The challenges mainly stem from that the interacting systems are exponentially-compositional, symmetrical, and commonly geometrically-constrained. Current methods, particularly the ones based on equivariant Graph Neural Networks (GNNs), have targeted on the first two challenges but remain immature for constrained systems. In this paper, we propose Graph Mechanics Network (GMN) which is combinatorially efficient, equivariant and constraint-aware. The core of GMN is that it represents, by generalized coordinates, the forward kinematics information (positions and velocities) of a structural object. In this manner, the geometrical constraints are implicitly and naturally encoded in the forward kinematics. Moreover, to allow equivariant message passing in GMN, we have developed a general form of orthogonality-equivariant functions, given that the dynamics of constrained systems are more complicated than the unconstrained counterparts. Theoretically, the proposed equivariant formulation is proved to be universally expressive under certain conditions. Extensive experiments support the advantages of GMN compared to the state-of-the-art GNNs in terms of prediction accuracy, constraint satisfaction and data efficiency on the simulated systems consisting of particles, sticks and hinges, as well as two real-world datasets for molecular dynamics prediction and human motion capture.

- TheWebConfDivide-and-Conquer: Post-User Interaction Network for Fake News Detection on Social MediaErxue Min,
*Yu Rong*, Yatao Bian, Tingyang Xu, Peilin Zhao, Junzhou Huang, and 1 more author*In Proceedings of the ACM Web Conference 2022*, 2022Fake News detection has attracted much attention in recent years. Social context based detection methods attempt to model the spreading patterns of fake news by utilizing the collective wisdom from users on social media. This task is challenging for three reasons: (1) There are multiple types of entities and relations in social context, requiring methods to effectively model the heterogeneity. (2) The emergence of news in novel topics in social media causes distribution shifts, which can significantly degrade the performance of fake news detectors. (3) Existing fake news datasets usually lack of great scale, topic diversity and user social relations, impeding the development of this field. To solve these problems, we formulate social context based fake news detection as a heterogeneous graph classification problem, and propose a fake news detection model named Post-User Interaction Network (PSIN), which adopts a divide-and-conquer strategy to model the post-post, user-user and post-user interactions in social context effectively while maintaining their intrinsic characteristics. Moreover,we adopt an adversarial topic discriminator for topic-agnostic feature learning, in order to improve the generalizability of our method for new-emerging topics. Furthermore, we curate a new dataset for fake news detection, which contains over 27,155 news from 5 topics, 5 million posts, 2 million users and their induced social graph with 0.2 billion edges. It has been published on https://github.com/qwerfdsaplking/MC-Fake. Extensive experiments illustrate that our method outperforms SOTA baselines in both in-topic and out-of-topic settings.

- ICMLFrustratingly Easy Transferability EstimationLong-Kai Huang, Junzhou Huang,
*Yu Rong*, Qiang Yang, and Ying Wei*In Proceedings of the 39th International Conference on Machine Learning*, Jul 2022Transferability estimation has been an essential tool in selecting a pre-trained model and the layers in it for transfer learning, to transfer, so as to maximize the performance on a target task and prevent negative transfer. Existing estimation algorithms either require intensive training on target tasks or have difficulties in evaluating the transferability between layers. To this end, we propose a simple, efficient, and effective transferability measure named TransRate. Through a single pass over examples of a target task, TransRate measures the transferability as the mutual information between features of target examples extracted by a pre-trained model and their labels. We overcome the challenge of efficient mutual information estimation by resorting to coding rate that serves as an effective alternative to entropy. From the perspective of feature representation, the resulting TransRate evaluates both completeness (whether features contain sufficient information of a target task) and compactness (whether features of each class are compact enough for good generalization) of pre-trained features. Theoretically, we have analyzed the close connection of TransRate to the performance after transfer learning. Despite its extraordinary simplicity in 10 lines of codes, TransRate performs remarkably well in extensive evaluations on 35 pre-trained models and 16 downstream tasks.

- VLDBQuery Driven-Graph Neural Networks for Community Search: From Non-Attributed, Attributed, to Interactive AttributedYuli Jiang,
*Yu Rong*, Hong Cheng, Xin Huang, Kangfei Zhao, and Junzhou Huang*Proc. VLDB Endow.*, Feb 2022Given one or more query vertices, Community Search (CS) aims to find densely intra-connected and loosely inter-connected structures containing query vertices. Attributed Community Search (ACS), a related problem, is more challenging since it finds communities with both cohesive structures and homogeneous vertex attributes. However, most methods for the CS task rely on inflexible pre-defined structures and studies for ACS treat each attribute independently. Moreover, the most popular ACS strategies decompose ACS into two separate sub-problems, i.e., the CS task and subsequent attribute filtering task. However, in real-world graphs, the community structure and the vertex attributes are closely correlated to each other. This correlation is vital for the ACS problem. In this vein, we argue that the separation strategy cannot fully capture the correlation between structure and attributes simultaneously and it would compromise the final performance.In this paper, we propose Graph Neural Network (GNN) models for both CS and ACS problems, i.e., Query Driven-GNN (QD-GNN) and Attributed Query Driven-GNN (AQD-GNN). In QD-GNN, we combine the local query-dependent structure and global graph embedding. In order to extend QD-GNN to handle attributes, we model vertex attributes as a bipartite graph and capture the relation between attributes by constructing GNNs on this bipartite graph. With a Feature Fusion operator, AQD-GNN processes the structure and attribute simultaneously and predicts communities according to each attributed query. Experiments on real-world graphs with ground-truth communities demonstrate that the proposed models outperform existing CS and ACS algorithms in terms of both efficiency and effectiveness. More recently, an interactive setting for CS is proposed that allows users to adjust the predicted communities. We further verify our approaches under the interactive setting and extend to the attributed context. Our method achieves 2.37% and 6.29% improvements in F1-score than the state-of-the-art model without attributes and with attributes respectively.

- IJCAIFine-Tuning Graph Neural Networks via Graph Topology Induced Optimal TransportJiying Zhang, Xi Xiao, Long-Kai Huang,
*Yu Rong*, and Yatao Bian*In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence*, Jul 2022Recently, the pretrain-finetuning paradigm has attracted tons of attention in graph learning community due to its power of alleviating the lack of labels problem in many real-world applications. Current studies use existing techniques, such as weight constraint, representation constraint, which are derived from images or text data, to transfer the invariant knowledge from the pre-train stage to fine-tuning stage. However, these methods failed to preserve invariances from graph structure and Graph Neural Network (GNN) style models. In this paper, we present a novel optimal transport-based fine-tuning framework called GTOT-Tuning, namely, Graph Topology induced Optimal Transport fine-Tuning, for GNN style backbones. GTOT-Tuning is required to utilize the property of graph data to enhance the preservation of representation produced by fine-tuned networks. Toward this goal, we formulate graph local knowledge transfer as an Optimal Transport (OT) problem with a structural prior and construct the GTOT regularizer to constrain the fine-tuned model behaviors. By using the adjacency relationship amongst nodes, the GTOT regularizer achieves node-level optimal transport procedures and reduces redundant transport procedures, resulting in efficient knowledge transfer from the pre-trained models. We evaluate GTOT-Tuning on eight downstream tasks with various GNN backbones and demonstrate that it achieves state-of-the-art fine-tuning performance for GNNs.

- SIGIRNeighbour Interaction Based Click-Through Rate Prediction via Graph-Masked TransformerErxue Min,
*Yu Rong*, Tingyang Xu, Yatao Bian, Da Luo, Kangyi Lin, and 3 more authors*In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval*, 2022Click-Through Rate (CTR) prediction, which aims to estimate the probability that a user will click an item, is an essential component of online advertising. Existing methods mainly attempt to mine user interests from users’ historical behaviours, which contain users’ directly interacted items. Although these methods have made great progress, they are often limited by the recommender system’s direct exposure and inactive interactions, and thus fail to mine all potential user interests. To tackle these problems, we propose Neighbor-Interaction based CTR prediction (NI-CTR), which considers this task under a Heterogeneous Information Network (HIN) setting. In short, Neighbor-Interaction based CTR prediction involves the local neighborhood of the target user-item pair in the HIN to predict their linkage. In order to guide the representation learning of the local neighbourhood, we further consider different kinds of interactions among the local neighborhood nodes from both explicit and implicit perspective, and propose a novel Graph-Masked Transformer (GMT) to effectively incorporates these kinds of interactions to produce highly representative embeddings for the target user-item pair. Moreover, in order to improve model robustness against neighbour sampling, we enforce a consistency regularization loss over the neighbourhood embedding. We conduct extensive experiments on two real-world datasets with millions of instances and the experimental results show that our proposed method outperforms state-of-the-art CTR models significantly. Meanwhile, the comprehensive ablation studies verify the effectiveness of every component of our model. Furthermore, we have deployed this framework on the WeChat Official Account Platform with billions of users. The online A/B tests demonstrate an average CTR improvement of 21.9% against all online baselines.

- Pattern RecognitionStructure-aware conditional variational auto-encoder for constrained molecule optimizationJunchi Yu, Tingyang Xu,
*Yu Rong*, Junzhou Huang, and Ran He*Pattern Recognition*, 2022The goal of molecule optimization is to optimize molecular properties by modifying molecule structures. Conditional generative models provide a promising way to transfer the input molecules to the ones with better property. However, molecular properties are highly sensitive to small changes in molecular structures. This leads to an interesting thought that we can improve the property of molecules with limited modification in structure. In this paper, we propose a structure-aware conditional Variational Auto-Encoder, namely SCVAE, which exploits the topology of molecules as structure condition and optimizes the molecular properties with constrained structural modification. SCVAE leverages graph alignment of two-level molecule structures in an unsupervised manner to bind the structure conditions between two molecules. Then, this structure condition facilitates the molecule optimization with limited structural modification, namely, constrained molecule optimization, under a novel variational auto-encoder framework. Extensive experimental evaluations demonstrate that structure-aware CVAE generates new molecules with high similarity to the original ones and better molecular properties.

- Diversified Multiscale Graph Learning with Graph Self-CorrectionYuzhao Chen, Yatao Bian, Jiying Zhang, Xi Xiao, Tingyang Xv, and
*Yu Rong**In Proceedings of Topological, Algebraic, and Geometric Learning Workshops 2022*, 2022Though the multiscale graph learning techniques have enabled advanced feature extraction frameworks, we find that the classic ensemble strategy shows inferior performance while encountering the high homogeneity of the learnt representation, which is caused by the nature of existing graph pooling methods. To cope with this issue, we propose a diversified multiscale graph learning model equipped with two core ingredients: a graph self-correction mechanism to generate informative embedded graphs, and a diversity boosting regularizer to achieve a comprehensive characterization of the input graph. The proposed mechanism compensates the pooled graph with the lost information during the graph pooling process by feeding back the estimated residual graph, which serves as a plug-in component for popular graph pooling methods. Meanwhile, pooling methods enhanced with the self-correcting procedure encourage the discrepancy of node embeddings, and thus it contributes to the success of ensemble learning strategy. The proposed regularizer instead enhances the ensemble diversity at the graph-level embeddings by leveraging the interaction among individual classifiers. Extensive experiments on popular graph classification benchmarks show that the approaches lead to significant improvements over state-of-the-art graph pooling methods, and the ensemble multiscale graph learning models achieve superior enhancement.

- ICLREnergy-Based Learning for Cooperative Games, with Applications to Valuation Problems in Machine LearningYatao Bian,
*Yu Rong*, Tingyang Xu, Jiaxiang Wu, Andreas Krause, and Junzhou Huang*In the Tenth International Conference on Learning Representations, ICLR*, 2022Valuation problems, such as feature interpretation, data valuation and model valuation for ensembles, become increasingly more important in many machine learning applications. Such problems are commonly solved by well-known game-theoretic criteria, such as Shapley value or Banzhaf value. In this work, we present a novel energy-based treatment for cooperative games, with a theoretical justification by the maximum entropy framework. Surprisingly, by conducting variational inference of the energy-based model, we recover various game-theoretic valuation criteria through conducting one-step fixed point iteration for maximizing the mean-field ELBO objective. This observation also verifies the rationality of existing criteria, as they are all attempting to decouple the correlations among the players through the mean-field approach. By running fixed point iteration for multiple steps, we achieve a trajectory of the valuations, among which we define the valuation with the best conceivable decoupling error as the Variational Index. We prove that under uniform initializations, these variational valuations all satisfy a set of game-theoretic axioms. We experimentally demonstrate that the proposed Variational Index enjoys lower decoupling error and better valuation performance on certain synthetic and real-world valuation problems.

- BCERobust Self-Training Strategy for Various Molecular Biology Prediction TasksHehuan Ma, Feng Jiang,
*Yu Rong*, Yuzhi Guo, and Junzhou Huang*In Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics*, 2022Molecular biology prediction tasks suffer the limited labeled data problem since it normally demands a series of professional experiments to label the target molecule. Self-training is one of the semi-supervised learning paradigms that utilizes both labeled and unlabeled data. It trains a teacher model on labeled data, and uses it to generate pseudo labels for unlabeled data. The labeled and pseudo-labeled data are then combined to train a student model. However, the pseudo labels generated from the teacher model are not sufficiently accurate. Thus, we propose a robust self-training strategy by exploring robust loss function to handle such noisy labels, which is model and task agnostic, and can be easily embedded with any prediction tasks. We have conducted molecular biology prediction tasks to gradually evaluate the performance of proposed robust self-training strategy. The results demonstrate that the proposed method consistently boosts the prediction performance, especially for molecular regression tasks, which have gained a 41.5% average improvement.

- GNN BookGraph Neural Networks: ScalabilityHehuan Ma,
*Yu Rong*, and Junzhou Huang*In Graph Neural Networks: Foundations, Frontiers, and Applications*, 2022Over the past decade, Graph Neural Networks have achieved remarkable success in modeling complex graph data. Nowadays, graph data is increasing exponentially in both magnitude and volume, e.g., a social network can be constituted by billions of users and relationships. Such circumstance leads to a crucial question, how to properly extend the scalability of Graph Neural Networks? There remain two major challenges while scaling the original implementation of GNN to large graphs. First, most of the GNN models usually compute the entire adjacency matrix and node embeddings of the graph, which demands a huge memory space. Second, training GNN requires recursively updating each node in the graph, which becomes infeasible and ineffective for large graphs. Current studies propose to tackle these obstacles mainly from three sampling paradigms: node-wise sampling, which is executed based on the target nodes in the graph; layer-wise sampling, which is implemented on the convolutional layers; and graph-wise sampling, which constructs sub-graphs for the model inference. In this chapter, we will introduce several representative research accordingly.

- BIBMIntegrating Prior Knowledge with Graph Encoder for Gene Regulatory Inference from Single-cell RNA-Seq DataJiawei Li, Fan Yang, Fang Wang,
*Yu Rong*, Peilin Zhao, Shizhan Chen, and 3 more authors*In 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)*, 2022Inferring gene regulatory networks based on single-cell transcriptomes is critical for systematically understanding cell-specific regulatory networks and discovering drug targets in tumor cells. Here we show that existing methods mainly perform co-expression analysis and apply the image-based model to deal with the non-euclidean scRNA-seq data, which may not reasonably handle the dropout problem and not fully take advantage of the validated gene regulatory topology. We propose a graph-based end-to-end deep learning model for GRN inference (GRNInfer) with the help of known regulatory relations through transductive learning. The robustness and superiority of the model are demonstrated by comparative experiments.

- NeurIPSEquivariant Graph Hierarchy-Based Neural NetworksJiaqi Han, Wenbing Huang, Tingyang Xu, and
*Yu Rong**In Advances in Neural Information Processing Systems*, 2022Equivariant Graph neural Networks (EGNs) are powerful in characterizing the dynamics of multi-body physical systems. Existing EGNs conduct flat message passing, which, yet, is unable to capture the spatial/dynamical hierarchy for complex systems particularly, limiting substructure discovery and global information fusion. In this paper, we propose Equivariant Hierarchy-based Graph Networks (EGHNs) which consist of the three key components: generalized Equivariant Matrix Message Passing (EMMP) , E-Pool and E-UnPool. In particular, EMMP is able to improve the expressivity of conventional equivariant message passing, E-Pool assigns the quantities of the low-level nodes into high-level clusters, while E-UnPool leverages the high-level information to update the dynamics of the low-level nodes. As their names imply, both E-Pool and E-UnPool are guaranteed to be equivariant to meet physic symmetry. Considerable experimental evaluations verify the effectiveness of our EGHN on several applications including multi-object dynamics simulation, motion capture, and protein dynamics modeling.

## 2021

- DPMLFedGraphNN: A Federated Learning System and Benchmark for Graph Neural NetworksChaoyang He, Keshav Balasubramanian, Emir Ceyani,
*Yu Rong*, Peilin Zhao, Junzhou Huang, and 2 more authors*ICLR 2021 Workshop on Distributed and Private Machine Learning (DPML)*, 2021Graph Neural Network (GNN) research is rapidly growing thanks to the capacity of GNNs in learning distributed representations from graph-structured data. However, centralizing a massive amount of real-world graph data for GNN training is prohibitive due to privacy concerns, regulation restrictions, and commercial competitions. Federated learning (FL), a trending distributed learning paradigm, provides possibilities to solve this challenge while preserving data privacy. Despite recent advances in vision and language domains, there is no suitable platform for the FL of GNNs. To this end, we introduce FedGraphNN, an open FL benchmark system that can facilitate research on federated GNNs. FedGraphNN is built on a unified formulation of graph FL and contains a wide range of datasets from different domains, popular GNN models, and FL algorithms, with secure and efficient system support. Particularly for the datasets, we collect, preprocess, and partition 36 datasets from 7 domains, including both publicly available ones and specifically obtained ones such as hERG and Tencent. Our empirical analysis showcases the utility of our benchmark system, while exposing significant challenges in graph FL: federated GNNs perform worse in most datasets with a non-IID split than centralized GNNs; the GNN model that attains the best result in the centralized setting may not maintain its advantage in the FL setting. These results imply that more research efforts are needed to unravel the mystery behind federated GNNs. Moreover, our system performance analysis demonstrates that the FedGraphNN system is computationally efficient and secure to large-scale graphs datasets. We maintain the source code at this https URL.

- ICLRGraph Information Bottleneck for Subgraph RecognitionJunchi Yu, Tingyang Xu,
*Yu Rong*, Yatao Bian, Junzhou Huang, and Ran He*In 9th International Conference on Learning Representations, ICLR 2021*, 2021Given the input graph and its label/property, several key problems of graph learning, such as finding interpretable subgraphs, graph denoising and graph compression, can be attributed to the fundamental problem of recognizing a subgraph of the original one. This subgraph shall be as informative as possible, yet contains less redundant and noisy structure. This problem setting is closely related to the well-known information bottleneck (IB) principle, which, however, has less been studied for the irregular graph data and graph neural networks (GNNs). In this paper, we propose a framework of Graph Information Bottleneck (GIB) for the subgraph recognition problem in deep graph learning. Under this framework, one can recognize the maximally informative yet compressive subgraph, named IB-subgraph. However, the GIB objective is notoriously hard to optimize, mostly due to the intractability of the mutual information of irregular graph data and the unstable optimization process. In order to tackle these challenges, we propose: i) a GIB objective based-on a mutual information estimator for the irregular graph data; ii) a bi-level optimization scheme to maximize the GIB objective; iii) a connectivity loss to stabilize the optimization process. We evaluate the properties of the IB-subgraph in three application scenarios: improvement of graph classification, graph interpretation and graph denoising. Extensive experiments demonstrate that the information-theoretic IB-subgraph enjoys superior graph properties.

- CIKMSpectral Graph Attention Network with Fast Eigen-ApproximationHeng Chang,
*Yu Rong*, Tingyang Xu, Wenbing Huang, Somayeh Sojoudi, Junzhou Huang, and 1 more author*In Proceedings of the 30th ACM International Conference on Information & Knowledge Management*, 2021Variants of Graph Neural Networks (GNNs) for representation learning have been proposed recently and achieved fruitful results in various fields. Among them, Graph Attention Network (GAT) first employs a self-attention strategy to learn attention weights for each edge in the spatial domain. However, learning the attentions over edges can only focus on the local information of graphs and greatly increases the computational costs. In this paper, we first introduce the attention mechanism in the spectral domain of graphs and present Spectral Graph Attention Network (SpGAT) that learns representations for different frequency components regarding weighted filters and graph wavelets bases. In this way, SpGAT can better capture global patterns of graphs in an efficient manner with much fewer learned parameters than that of GAT. Further, to reduce the computational cost of SpGAT brought by the eigen-decomposition, we propose a fast approximation variant SpGAT-Cheby. We thoroughly evaluate the performance of SpGAT and SpGAT-Cheby in semi-supervised node classification tasks and verify the effectiveness of the learned attentions in the spectral domain.

- NeurocomputingMolecular graph enhanced transformer for retrosynthesis predictionKelong Mao, Xi Xiao, Tingyang Xu,
*Yu Rong*, Junzhou Huang, and Peilin Zhao*Neurocomputing*, 2021With massive possible synthetic routes in chemistry, retrosynthesis prediction is still a challenge for researchers. Recently, retrosynthesis prediction is formulated as a Machine Translation (MT) task. Namely, since each molecule can be represented as a Simplified Molecular-Input Line-Entry System (SMILES) string, the process of retrosynthesis is analogized to a process of language translation from the product to reactants. However, the MT models that applied on SMILES data usually ignore the information of natural atomic connections and the topology of molecules. To make more chemically plausible constrains on the atom representation learning for better performance, in this paper, we propose a Graph Enhanced Transformer (GET) framework, which adopts both the sequential and graphical information of molecules. Four different GET designs are proposed, which fuse the SMILES representations with atom embeddings learned from our improved Graph Neural Network (GNN). Empirical results show that our model significantly outperforms the vanilla Transformer model in test accuracy.

- IJCAIOn Self-Distilling Graph Neural NetworkYuzhao Chen, Yatao Bian, Xi Xiao,
*Yu Rong*, Tingyang Xu, and Junzhou Huang*In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence*, Aug 2021Recently, the teacher-student knowledge distillation framework has demonstrated its potential in training Graph Neural Networks (GNNs). However, due to the difficulty of training over-parameterized GNN models, one may not easily obtain a satisfactory teacher model for distillation. Furthermore, the inefficient training process of teacher-student knowledge distillation also impedes its applications in GNN models. In this paper, we propose the first teacher-free knowledge distillation method for GNNs, termed GNN Self-Distillation (GNN-SD), that serves as a drop-in replacement of the standard training process. The method is built upon the proposed neighborhood discrepancy rate (NDR), which quantifies the non-smoothness of the embedded graph in an efficient way. Based on this metric, we propose the adaptive discrepancy retaining (ADR) regularizer to empower the transferability of knowledge that maintains high neighborhood discrepancy across GNN layers. We also summarize a generic GNN-SD framework that could be exploited to induce other distillation strategies. Experiments further prove the effectiveness and generalization of our approach, as it brings: 1) state-of-the-art GNN distillation performance with less training cost, 2) consistent and considerable performance enhancement for various popular backbones.

- AAAIHierarchical Graph Capsule NetworkJinyu Yang, Peilin Zhao,
*Yu Rong*, Chaochao Yan, Chunyuan Li, Hehuan Ma, and 1 more author*Proceedings of the AAAI Conference on Artificial Intelligence*, May 2021Graph Neural Networks (GNNs) draw their strength from explicitly modeling the topological information of structured data. However, existing GNNs suffer from limited capability in capturing the hierarchical graph representation which plays an important role in graph classification. In this paper, we innovatively propose hierarchical graph capsule network (HGCN) that can jointly learn node embeddings and extract graph hierarchies. Specifically, disentangled graph capsules are established by identifying heterogeneous factors underlying each node, such that their instantiation parameters represent different properties of the same entity. To learn the hierarchical representation, HGCN characterizes the part-whole relationship between lower-level capsules (part) and higher-level capsules (whole) by explicitly considering the structure information among the parts. Experimental studies demonstrate the effectiveness of HGCN and the contribution of each component. Code: https://github.com/uta-smile/HGCN

- TPAMIRecognizing Predictive Substructures with Subgraph Information BottleneckJunchi Yu, Tingyang Xu,
*Yu Rong*, Yatao Bian, Junzhou Huang, and Ran He*IEEE Transactions on Pattern Analysis and Machine Intelligence*, 2021The emergence of Graph Convolutional Network (GCN) has greatly boosted the progress of graph learning. However, two disturbing factors, noise and redundancy in graph data, and lack of interpretation for prediction results, impede further development of GCN. One solution is to recognize a predictive yet compressed subgraph to get rid of the noise and redundancy and obtain the interpretable part of the graph. This setting of subgraph is similar to the information bottleneck (IB) principle, which is less studied on graph-structured data and GCN. Inspired by the IB principle, we propose a novel subgraph information bottleneck (SIB) framework to recognize such subgraphs, named IB-subgraph. However, the intractability of mutual information and the discrete nature of graph data makes the objective of SIB notoriously hard to optimize. To this end, we introduce a bilevel optimization scheme coupled with a mutual information estimator for irregular graphs. Moreover, we propose a continuous relaxation for subgraph selection with a connectivity loss for stabilization. We further theoretically prove the error bound of our estimation scheme for mutual information and the noise-invariant nature of IB-subgraph. Extensive experiments on graph learning and large-scale point cloud tasks demonstrate the superior property of IB-subgraph.

- NeurIPSNot All Low-Pass Filters are Robust in Graph Convolutional NetworksHeng Chang,
*Yu Rong*, Tingyang Xu, Yatao Bian, Shiji Zhou, Xin Wang, and 2 more authors*In Advances in Neural Information Processing Systems*, 2021Graph Convolutional Networks (GCNs) are promising deep learning approaches in learning representations for graph-structured data. Despite the proliferation of such methods, it is well known that they are vulnerable to carefully crafted adversarial attacks on the graph structure. In this paper, we first conduct an adversarial vulnerability analysis based on matrix perturbation theory. We prove that the low- frequency components of the symmetric normalized Laplacian, which is usually used as the convolutional filter in GCNs, could be more robust against structural perturbations when their eigenvalues fall into a certain robust interval. Our results indicate that not all low-frequency components are robust to adversarial attacks and provide a deeper understanding of the relationship between graph spectrum and robustness of GCNs. Motivated by the theory, we present GCN-LFR, a general robust co-training paradigm for GCN-based models, that encourages transferring the robustness of low-frequency components with an auxiliary neural network. To this end, GCN-LFR could enhance the robustness of various kinds of GCN-based models against poisoning structural attacks in a plug-and-play manner. Extensive experiments across five benchmark datasets and five GCN-based models also confirm that GCN-LFR is resistant to the adversarial attacks without compromising on performance in the benign situation.

- VLDBExploring Robustness of Unsupervised Domain Adaptation in Semantic SegmentationJinyu Yang, Chunyuan Li, Weizhi An, Hehuan Ma, Yuzhi Guo,
*Yu Rong*, and 2 more authors*In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)*, Oct 2021Recent studies imply that deep neural networks are vulnerable to adversarial examples, i.e., inputs with a slight but intentional perturbation are incorrectly classified by the network. Such vulnerability makes it risky for some security-related applications (e.g., semantic segmentation in autonomous cars) and triggers tremendous concerns on the model reliability. For the first time, we comprehensively evaluate the robustness of existing UDA methods and propose a robust UDA approach. It is rooted in two observations: i) the robustness of UDA methods in semantic segmentation remains unexplored, which poses a security concern in this field; and ii) although commonly used self-supervision (e.g., rotation and jigsaw) benefits model robustness in classification and recognition tasks, they fail to provide the critical supervision signals that are essential in semantic segmentation. These observations motivate us to propose adversarial self-supervision UDA (or ASSUDA) that maximizes the agreement between clean images and their adversarial examples by a contrastive loss in the output space. Extensive empirical studies on commonly used benchmarks demonstrate that ASSUDA is resistant to adversarial attacks.

- ICMLLearning Diverse-Structured Networks for Adversarial RobustnessXuefeng Du, Jingfeng Zhang, Bo Han, Tongliang Liu,
*Yu Rong*, Gang Niu, and 2 more authors*In Proceedings of the 38th International Conference on Machine Learning*, 18–24 jul 2021In adversarial training (AT), the main focus has been the objective and optimizer while the model has been less studied, so that the models being used are still those classic ones in standard training (ST). Classic network architectures (NAs) are generally worse than searched NA in ST, which should be the same in AT. In this paper, we argue that NA and AT cannot be handled independently, since given a dataset, the optimal NA in ST would be no longer optimal in AT. That being said, AT is time-consuming itself; if we directly search NAs in AT over large search spaces, the computation will be practically infeasible. Thus, we propose diverse-structured network (DS-Net), to significantly reduce the size of the search space: instead of low-level operations, we only consider predefined atomic blocks, where an atomic block is a time-tested building block like the residual block. There are only a few atomic blocks and thus we can weight all atomic blocks rather than find the best one in a searched block of DS-Net, which is an essential tradeoff between exploring diverse structures and exploiting the best structures. Empirical results demonstrate the advantages of DS-Net, i.e., weighting the atomic blocks.

- CIKMUnsupervised Large-Scale Social Network Alignment via Cross Network EmbeddingZhehan Liang,
*Yu Rong*, Chenxin Li, Yunlong Zhang, Yue Huang, Tingyang Xu, and 2 more authors*In Proceedings of the 30th ACM International Conference on Information & Knowledge Management*, 2021Nowadays, it is common for a person to possess different identities on multiple social platforms. Social network alignment aims to match the identities that from different networks. Recently, unsupervised network alignment methods have received significant attention since no identity anchor is required. However, to capture the relevance between identities, the existing unsupervised methods generally rely heavily on user profiles, which is unobtainable and unreliable in real-world scenarios. In this paper, we propose an unsupervised alignment framework named Large-Scale Network Alignment (LSNA) to integrate the network information and reduce the requirement on user profile. The embedding module of LSNA, named Cross Network Embedding Model (CNEM), aims to integrate the topology information and the network correlation to simultaneously guide the embedding process. Moreover, in order to adapt LSNA to large-scale networks, we propose a network disassembling strategy to divide the costly large-scale network alignment problem into multiple executable sub-problems. The proposed method is evaluated over multiple real-world social network datasets, and the results demonstrate that the proposed method outperforms the state-of-the-art methods.

- ACS OmegaA Novel Scalarized Scaffold Hopping Algorithm with Graph-Based Variational Autoencoder for Discovery of JAK1 InhibitorsYang Yu, Tingyang Xu, Jiawen Li, Yaping Qiu,
*Yu Rong*, Zhen Gong, and 6 more authors*ACS Omega*, 2021We have developed a graph-based Variational Autoencoder with Gaussian Mixture hidden space (GraphGMVAE), a deep learning approach for controllable magnitude of scaffold hopping in generative chemistry. It can effectively and accurately generate molecules from a given reference compound, with excellent scaffold novelty against known molecules in the literature or patents (97.9% are novel scaffolds). Moreover, a pipeline for prioritizing the generated compounds was also proposed to narrow down our validation focus. In this work, GraphGMVAE was validated by rapidly hopping the scaffold from FDA-approved upadacitinib, which is an inhibitor of human Janus kinase 1 (JAK1), to generate more potent molecules with novel chemical scaffolds. Seven compounds were synthesized and tested to be active in biochemical assays. The most potent molecule has 5.0 nM activity against JAK1 kinase, which shows that the GraphGMVAE model can design molecules like how a human expert does but with high efficiency and accuracy.

- WISEGraph Ordering: Towards the Optimal by LearningKangfei Zhao,
*Yu Rong*, Jeffrey Xu Yu, Wenbing Huang, Junzhou Huang, and Hao Zhang*In Web Information Systems Engineering – WISE 2021*, 2021Graph ordering concentrates on optimizing graph layouts, which has a wide range of real applications. As an NP-hard problem, traditional approaches solve it via greedy algorithms. To overcome the shortsightedness and inflexibility of the hand-crafted heuristics, we propose a learning-based framework: Deep Ordering Network with Reinforcement Learning (DON-RL) to capture the hidden structure from partial vertex order sets over a specific large graph. In DON-RL, we propose a permutation invariant neural network DON to encode the information from partial vertex order. Furthermore, to alleviate the combinatorial explosion for partial vertex order sets and make the efficient training data sampling, we propose RL-Sampler, a reinforcement learning-based sampler to tune the vertex sampling probabilities adaptively during the training phase of DON. Comprehensive experiments on both synthetic and real graphs validate that our approach outperforms the state-of-the-art heuristic algorithm consistently. The case study on graph compression demonstrates the potentials of DON-RL in real applications.

- BIBMGradient-Norm Based Attentive Loss for Molecular Property PredictionHehuan Ma,
*Yu Rong*, Boyang Liu, Yuzhi Guo, Chaochao Yan, and Junzhou Huang*In 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)*, 2021Molecular property prediction is one fundamental yet challenging task for drug discovery. Many studies have addressed this problem by designing deep learning algorithms, e.g., sequence-based models and graph-based models. However, the underlying data distribution is rarely explored. We discover that there exist easy samples and hard samples in the molecule datasets, and the overall distribution is usually imbalanced. Current research mainly treats them equally during the model training, while we believe that they shall not share the same weights since neural networks training is dominated by the majority class. Therefore, we propose to utilize a self-attention mechanism to generate a learnable weight for each data sample according to the associated gradient norm. The learned attention value is then embedded into the prediction models to construct an attentive loss for the network updating and back-propagation. It is empirically demonstrated that our proposed method can consistently boost the prediction performance for both classification and regression tasks.

- IJCNNTowards Feature-free TSP Solver Selection: A Deep Learning ApproachKangfei Zhao, Shengcai Liu, Jeffrey Xu Yu, and
*Yu Rong**In 2021 International Joint Conference on Neural Networks (IJCNN)*, 2021It is widely recognized that for the traveling salesman problem (TSP), there exists no universal best solver for all problem instances. This observation has greatly facilitated the research on Algorithm Selection (AS), which seeks to identify the solver best suited for each TSP instance. Such segregation usually relies on a prior representation step, in which problem instances are first represented by carefully established problem features. However, the creation of good features is non-trivial, typically requiring considerable domain knowledge and human effort. To alleviate this issue, this paper proposes a deep learning framework, named CTAS, for TSP solver selection. Specifically, CTAS exploits deep convolutional neural networks (CNN) to automatically extract informative features from TSP instances and utilizes data augmentation to handle the scarcity of labeled instances. Extensive experiments are conducted on a challenging TSP benchmark with 6,000 instances, which is the largest benchmark ever considered in this area. CTAS achieves over 2 × speedup of the average running time, compared with the single best solver. More importantly, CTAS is the first feature-free approach that notably outperforms classical AS models, showing huge potential of applying deep learning to AS tasks.

- DASFAATowards Expectation-Maximization by SQL in RDBMSKangfei Zhao, Jeffrey Xu Yu,
*Yu Rong*, Ming Liao, and Junzhou Huang*In Database Systems for Advanced Applications*, 2021Integrating machine learning techniques into RDBMSs is an important task since many real applications require modeling (e.g., business intelligence, strategic analysis) as well as querying data in RDBMSs. Without integration, it needs to export the data from RDBMSs to build a model using specialized ML toolkits and frameworks, and import the model trained back to RDBMSs for further querying. Such a process is not desirable since it is time-consuming and needs to repeat when data is changed. In this paper, we provide an SQL solution that has the potential to support different ML models in RDBMSs. We study how to support unsupervised probabilistic modeling, that has a wide range of applications in clustering, density estimation, and data summarization, and focus on Expectation-Maximization (EM) algorithms, which is a general technique for finding maximum likelihood estimators. To train a model by EM, it needs to update the model parameters by an E-step and an M-step in a while-loop iteratively until it converges to a level controlled by some thresholds or repeats a certain number of iterations. To support EM in RDBMSs, we show our solutions to the matrix/vectors representations in RDBMSs, the relational algebra operations to support the linear algebra operations required by EM, parameters update by relational algebra, and the support of a while-loop by SQL recursion. It is important to note that the SQL ’99 recursion cannot be used to handle such a while-loop since the M-step is non-monotonic. In addition, with a model trained by an EM algorithm, we further design an automatic in-database model maintenance mechanism to maintain the model when the underlying training data changes. We have conducted experimental studies and will report our findings in this paper.

- SIGMODA Learned Sketch for Subgraph CountingKangfei Zhao, Jeffrey Xu Yu, Hao Zhang, Qiyan Li, and
*Yu Rong**In Proceedings of the 2021 International Conference on Management of Data*, 2021Subgraph counting, as a fundamental problem in network analysis, is to count the number of subgraphs in a data graph that match a given query graph by either homomorphism or subgraph isomorphism. The importance of subgraph counting derives from the fact that it provides insights of a large graph, in particular a labeled graph, when a collection of query graphs with different sizes and labels are issued. The problem of counting is challenging. On one hand, exact counting by enumerating subgraphs is NP-hard. % On the other hand, approximate counting by subgraph isomorphism can only support 3/5-node query graphs over unlabeled graphs. % Another way for subgraph counting is to specify it as an SQL query and estimate the cardinality of the query in rdbm. Existing approaches for cardinality estimation can only support subgraph counting by homomorphism up to some extent, as it is difficult to deal with sampling failure when a query graph becomes large. A question that arises is if subgraph counting can be supported by machine learning (ML) and deep learning (DL). The existing DL approach for subgraph isomorphism can only support small data graphs. The ML/DL approaches proposed in rdbm context for approximate query processing and cardinality estimation cannot be used, as subgraph counting is to do complex self-joins over one relation, whereas existing approaches focus on multiple relations. In this paper, we propose an Active Learned Sketch for Subgraph Counting (ALSS) with two main components: a sketch learned (ŁSS) and an active learner (AL). The sketch is learned by a neural network regression model, and the active learner is to perform model updates based on new arrival test query graphs. % We conduct extensive experimental studies to confirm the effectiveness and efficiency of ALSS using large real labeled graphs. Moreover, we show that ALSS can assist query optimizers to find a better query plan for complex multi-way self-joins.

## 2020

- TheWebConfGraph Representation Learning via Graphical Mutual Information MaximizationZhen Peng, Wenbing Huang, Minnan Luo, Qinghua Zheng,
*Yu Rong*, Tingyang Xu, and 1 more author*In Proceedings of The Web Conference 2020*, 2020The richness in the content of various information networks such as social networks and communication networks provides the unprecedented potential for learning high-quality expressive representations without external supervision. This paper investigates how to preserve and extract the abundant information from graph-structured data into embedding space in an unsupervised manner. To this end, we propose a novel concept, Graphical Mutual Information (GMI), to measure the correlation between input graphs and high-level hidden representations. GMI generalizes the idea of conventional mutual information computations from vector space to the graph domain where measuring mutual information from two aspects of node features and topological structure is indispensable. GMI exhibits several benefits: First, it is invariant to the isomorphic transformation of input graphs—an inevitable constraint in many existing graph representation learning algorithms; Besides, it can be efficiently estimated and maximized by current mutual information estimation methods such as MINE; Finally, our theoretical analysis confirms its correctness and rationality. With the aid of GMI, we develop an unsupervised learning model trained by maximizing GMI between the input and output of a graph neural encoder. Considerable experiments on transductive as well as inductive node classification and link prediction demonstrate that our method outperforms state-of-the-art unsupervised counterparts, and even sometimes exceeds the performance of supervised ones.

- AAAIRumor Detection on Social Media with Bi-Directional Graph Convolutional NetworksTian Bian, Xi Xiao, Tingyang Xu, Peilin Zhao, Wenbing Huang,
*Yu Rong*, and 1 more author*Proceedings of the AAAI Conference on Artificial Intelligence*, Apr 2020Social media has been developing rapidly in public due to its nature of spreading new information, which leads to rumors being circulated. Meanwhile, detecting rumors from such massive information in social media is becoming an arduous challenge. Therefore, some deep learning methods are applied to discover rumors through the way they spread, such as Recursive Neural Network (RvNN) and so on. However, these deep learning methods only take into account the patterns of deep propagation but ignore the structures of wide dispersion in rumor detection. Actually, propagation and dispersion are two crucial characteristics of rumors. In this paper, we propose a novel bi-directional graph model, named <em>Bi-Directional Graph Convolutional Networks</em> (Bi-GCN), to explore both characteristics by operating on both top-down and bottom-up propagation of rumors. It leverages a GCN with a top-down directed graph of rumor spreading to learn the patterns of rumor propagation; and a GCN with an opposite directed graph of rumor diffusion to capture the structures of rumor dispersion. Moreover, the information from source post is involved in each layer of GCN to enhance the influences from the roots of rumors. Encouraging empirical results on several benchmarks confirm the superiority of the proposed method over the state-of-the-art approaches.

- NeurIPSDeep Multimodal Fusion by Channel ExchangingYikai Wang, Wenbing Huang, Fuchun Sun, Tingyang Xu,
*Yu Rong*, and Junzhou Huang*In Advances in Neural Information Processing Systems*, 2020Deep multimodal fusion by using multiple sources of data for classification or regression has exhibited a clear advantage over the unimodal counterpart on various applications. Yet, current methods including aggregation-based and alignment-based fusion are still inadequate in balancing the trade-off between inter-modal fusion and intra-modal processing, incurring a bottleneck of performance improvement. To this end, this paper proposes Channel-Exchanging-Network (CEN), a parameter-free multimodal fusion framework that dynamically exchanges channels between sub-networks of different modalities. Specifically, the channel exchanging process is self-guided by individual channel importance that is measured by the magnitude of Batch-Normalization (BN) scaling factor during training. The validity of such exchanging process is also guaranteed by sharing convolutional filters yet keeping separate BN layers across modalities, which, as an add-on benefit, allows our multimodal architecture to be almost as compact as a unimodal network. Extensive experiments on semantic segmentation via RGB-D data and image translation through multi-domain input verify the effectiveness of our CEN compared to current state-of-the-art methods. Detailed ablation studies have also been carried out, which provably affirm the advantage of each component we propose. Our code is available at https://github.com/yikaiw/CEN.

- AAAIA Restricted Black-Box Adversarial Framework Towards Attacking Graph Embedding ModelsHeng Chang,
*Yu Rong*, Tingyang Xu, Wenbing Huang, Honglei Zhang, Peng Cui, and 2 more authors*Proceedings of the AAAI Conference on Artificial Intelligence*, Apr 2020With the great success of graph embedding model on both academic and industry area, the robustness of graph embedding against adversarial attack inevitably becomes a central problem in graph learning domain. Regardless of the fruitful progress, most of the current works perform the attack in a white-box fashion: they need to access the model predictions and labels to construct their adversarial loss. However, the inaccessibility of model predictions in real systems makes the white-box attack impractical to real graph learning system. This paper promotes current frameworks in a more general and flexible sense – we demand to attack various kinds of graph embedding model with black-box driven. To this end, we begin by investigating the theoretical connections between graph signal processing and graph embedding models in a principled way and formulate the graph embedding model as a general graph signal process with corresponding graph filter. As such, a generalized adversarial attacker: <em>GF-Attack</em> is constructed by the graph filter and feature matrix. Instead of accessing any knowledge of the target classifiers used in graph embedding, <em>GF-Attack</em> performs the attack only on the graph filter in a black-box attack fashion. To validate the generalization of <em>GF-Attack</em>, we construct the attacker on four popular graph embedding models. Extensive experimental results validate the effectiveness of our attacker on several benchmark datasets. Particularly by using our attack, even small graph perturbations like one-edge flip is able to consistently make a strong attack in performance to different graph embedding models.

- TheWebConfAdversarial Attack on Community Detection by Hiding IndividualsJia Li, Honglei Zhang, Zhichao Han,
*Yu Rong*, Hong Cheng, and Junzhou Huang*In Proceedings of The Web Conference 2020*, 2020It has been demonstrated that adversarial graphs, i.e., graphs with imperceptible perturbations added, can cause deep graph models to fail on node/graph classification tasks. In this paper, we extend adversarial graphs to the problem of community detection which is much more difficult. We focus on black-box attack and aim to hide targeted individuals from the detection of deep graph community detection models, which has many applications in real-world scenarios, for example, protecting personal privacy in social networks and understanding camouflage patterns in transaction networks. We propose an iterative learning framework that takes turns to update two modules: one working as the constrained graph generator and the other as the surrogate community detection model. We also find that the adversarial graphs generated by our method can be transferred to other learning based community detection models.

- ArxivTackling Over-Smoothing for General Graph Convolutional NetworksWenbing Huang,
*Yu Rong*, Tingyang Xu, Fuchun Sun, and Junzhou Huang*CoRR*, 2020Increasing the depth of GCN, which is expected to permit more expressivity, is shown to incur performance detriment especially on node classification. The main cause of this lies in over-smoothing. The over-smoothing issue drives the output of GCN towards a space that contains limited distinguished information among nodes, leading to poor expressivity. Several works on refining the architecture of deep GCN have been proposed, but it is still unknown in theory whether or not these refinements are able to relieve over-smoothing. In this paper, we first theoretically analyze how general GCNs act with the increase in depth, including generic GCN, GCN with bias, ResGCN, and APPNP. We find that all these models are characterized by a universal process: all nodes converging to a cuboid. Upon this theorem, we propose DropEdge to alleviate over-smoothing by randomly removing a certain number of edges at each training epoch. Theoretically, DropEdge either reduces the convergence speed of over-smoothing or relieves the information loss caused by dimension collapse. Experimental evaluations on simulated dataset have visualized the difference in over-smoothing between different GCNs. Moreover, extensive experiments on several real benchmarks support that DropEdge consistently improves the performance on a variety of both shallow and deep GCNs.

- NeurIPSDirichlet Graph Variational AutoencoderJia Li, Jianwei Yu, Jiajin Li, Honglei Zhang, Kangfei Zhao,
*Yu Rong*, and 2 more authors*In Advances in Neural Information Processing Systems*, 2020Graph Neural Networks (GNN) and Variational Autoencoders (VAEs) have been widely used in modeling and generating graphs with latent factors. However there is no clear explanation of what these latent factors are and why they perform well. In this work, we present Dirichlet Graph Variational Autoencoder (DGVAE) with graph cluster memberships as latent factors. Our study connects VAEs based graph generation and balanced graph cut, and provides a new way to understand and improve the internal mechanism of VAEs based graph generation. Specifically, we first interpret the reconstruction term of DGVAE as balanced graph cut in a principled way. Furthermore, motivated by the low pass characteristics in balanced graph cut, we propose a new variant of GNN named Heatts to encode the input graph into cluster memberships. Heatts utilizes the Taylor series for fast computation of Heat kernels and has better low pass characteristics than Graph Convolutional Networks (GCN). Through experiments on graph generation and graph clustering, we demonstrate the effectiveness of our proposed framework.

- KDDDeep Graph Learning: Foundations, Advances and Applications
*Yu Rong*, Tingyang Xu, Junzhou Huang, Wenbing Huang, Hong Cheng, Yao Ma, and 4 more authors*In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining*, 2020Many real data come in the form of non-grid objects, i.e. graphs, from social networks to molecules. Adaptation of deep learning from grid-alike data (e.g. images) to graphs has recently received unprecedented attention from both machine learning and data mining communities, leading to a new cross-domain field—Deep Graph Learning (DGL). Instead of painstaking feature engineering, DGL aims to learn informative representations of graphs in an end-to-end manner. It has exhibited remarkable success in various tasks, such as node/graph classification, link prediction, etc.In this tutorial, we aim to provide a comprehensive introduction to deep graph learning. We first introduce the theoretical foundations on deep graph learning with a focus on describing various Graph Neural Network Models (GNNs). We then cover the key achievements of DGL in recent years. Specifically, we discuss the four topics: 1) training deep GNNs; 2) robustness of GNNs; 3) scalability of GNNs; and 4) self-supervised and unsupervised learning of GNNs. Finally, we will introduce the applications of DGL towards various domains, including but not limited to drug discovery, computer vision, medical image analysis, social network analysis, natural language processing and recommendation.

- VLDBMaximizing the Reduction Ability for Near-Maximum Independent Set ComputationChengzhi Piao, Weiguo Zheng,
*Yu Rong*, and Hong Cheng*Proc. VLDB Endow.*, Jul 2020Finding the maximum independent set is a fundamental NP-hard problem in graph theory. Recent studies have paid much attention to designing efficient algorithms that find a maximal independent set of good quality (the more vertices the better). Kernelization is a widely used technique that applies rich reduction rules to determine the vertices that definitely belong to the maximum independent set. When no reduction rules can be applied anymore, greedy strategies including vertex addition or vertex deletion are employed to break the tie. It remains an open problem that how to apply these reduction rules and determine the greedy strategy to optimize the overall performance including both solution quality and time efficiency. Thus we propose a scheduling framework that dynamically determines the reduction rules and greedy strategies rather than applying them in a fixed order. As an important reduction rule, degree-two reduction exhibits powerful pruning ability but suffers from high time complexity O(nm), where n and m denote the number of vertices and edges respectively. We propose a novel data structure called representative graph, based on which the worst-case time complexity of degree-two reduction is reduced to O(m log n). Moreover, we enrich the naive vertex addition strategy by considering the graph topology and develop efficient methods (active vertex index and lazy update mechanism) to improve the time efficiency. Extensive experiments are conducted on both large real networks and various types of synthetic graphs to confirm the effectiveness, efficiency and robustness of our algorithms.

- ICLRDropEdge: Towards Deep Graph Convolutional Networks on Node Classification
*Yu Rong*, Wenbing Huang, Tingyang Xu, and Junzhou Huang*In 8th International Conference on Learning Representations, ICLR 2020*, 2020Over-fitting and over-smoothing are two main obstacles of developing deep Graph Convolutional Networks (GCNs) for node classification. In particular, over-fitting weakens the generalization ability on small dataset, while over-smoothing impedes model training by isolating output representations from the input features with the increase in network depth. This paper proposes DropEdge, a novel and flexible technique to alleviate both issues. At its core, DropEdge randomly removes a certain number of edges from the input graph at each training epoch, acting like a data augmenter and also a message passing reducer. Furthermore, we theoretically demonstrate that DropEdge either reduces the convergence speed of over-smoothing or relieves the information loss caused by it. More importantly, our DropEdge is a general skill that can be equipped with many other backbone models (e.g. GCN, ResGCN, GraphSAGE, and JKNet) for enhanced performance. Extensive experiments on several benchmarks verify that DropEdge consistently improves the performance on a variety of both shallow and deep GCNs. The effect of DropEdge on preventing over-smoothing is empirically visualized and validated as well. Codes are released on https://github.com/DropEdge/DropEdge.

- NeurIPSSelf-Supervised Graph Transformer on Large-Scale Molecular Data
*Yu Rong*, Yatao Bian, Tingyang Xu, Weiyang Xie, Ying WEI, Wenbing Huang, and 1 more author*In Advances in Neural Information Processing Systems*, 2020How to obtain informative representations of molecules is a crucial prerequisite in AI-driven drug design and discovery. Recent researches abstract molecules as graphs and employ Graph Neural Networks (GNNs) for molecular representation learning. Nevertheless, two issues impede the usage of GNNs in real scenarios: (1) insufficient labeled molecules for supervised training; (2) poor generalization capability to new-synthesized molecules. To address them both, we propose a novel framework, GROVER, which stands for Graph Representation frOm self-superVised mEssage passing tRansformer. With carefully designed self-supervised tasks in node-, edge- and graph-level, GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data. Rather, to encode such complex information, GROVER integrates Message Passing Networks into the Transformer-style architecture to deliver a class of more expressive encoders of molecules. The flexibility of GROVER allows it to be trained efficiently on large-scale molecular dataset without requiring any supervision, thus being immunized to the two issues mentioned above. We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules—the biggest GNN and the largest training dataset in molecular representation learning. We then leverage the pre-trained GROVER for molecular property prediction followed by task-specific fine-tuning, where we observe a huge improvement (more than 6% on average) from current state-of-the-art methods on 11 challenging benchmarks. The insights we gained are that well-designed self-supervision losses and largely-expressive pre-trained models enjoy the significant potential on performance boosting.

## 2019

- ICCVGraph Convolutional Networks for Temporal Action LocalizationRunhao Zeng, Wenbing Huang, Mingkui Tan,
*Yu Rong*, Peilin Zhao, Junzhou Huang, and 1 more author*In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)*, Oct 2019Most state-of-the-art action localization systems process each action proposal individually, without explicitly exploiting their relations during learning. However, the relations between proposals actually play an important role in action localization, since a meaningful action always consists of multiple proposals in a video. In this paper, we propose to exploit the proposal-proposal relations using GraphConvolutional Networks (GCNs). First, we construct an action proposal graph, where each proposal is represented as a node and their relations between two proposals as an edge. Here, we use two types of relations, one for capturing the context information for each proposal and the other one for characterizing the correlations between distinct actions. Then we apply the GCNs over the graph to model the relations among different proposals and learn powerful representations for the action classification and localization. Experimental results show that our approach significantly outperforms the state-of-the-art on THUMOS14(49.1% versus 42.8%). Moreover, augmentation experiments on ActivityNet also verify the efficacy of modeling action proposal relationships.

- CVPRProgressive Feature Alignment for Unsupervised Domain AdaptationChaoqi Chen, Weiping Xie, Wenbing Huang,
*Yu Rong*, Xinghao Ding, Yue Huang, and 2 more authors*In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, Jun 2019Unsupervised domain adaptation (UDA) transfers knowledge from a label-rich source domain to a fully-unlabeled target domain. To tackle this task, recent approaches resort to discriminative domain transfer in virtue of pseudo-labels to enforce the class-level distribution alignment across the source and target domains. These methods, however, are vulnerable to the error accumulation and thus incapable of preserving cross-domain category consistency, as the pseudo-labeling accuracy is not guaranteed explicitly. In this paper, we propose the Progressive Feature Alignment Network (PFAN) to align the discriminative features across domains progressively and effectively, via exploiting the intra-class variation in the target domain. To be specific, we first develop an Easy-to-Hard Transfer Strategy (EHTS) and an Adaptive Prototype Alignment (APA) step to train our model iteratively and alternatively. Moreover, upon observing that a good domain adaptation usually requires a non-saturated source classifier, we consider a simple yet efficient way to retard the convergence speed of the source classification loss by further involving a temperature variate into the soft-max function. The extensive experimental results reveal that the proposed PFAN exceeds the state-of-the-art performance on three UDA datasets.

- TheWebConfSemi-Supervised Graph Classification: A Hierarchical Graph PerspectiveJia Li,
*Yu Rong*, Hong Cheng, Helen Meng, Wenbing Huang, and Junzhou Huang*In The World Wide Web Conference*, 2019Node classification and graph classification are two graph learning problems that predict the class label of a node and the class label of a graph respectively. A node of a graph usually represents a real-world entity, e.g., a user in a social network, or a protein in a protein-protein interaction network. In this work, we consider a more challenging but practically useful setting, in which a node itself is a graph instance. This leads to a hierarchical graph perspective which arises in many domains such as social network, biological network and document collection. For example, in a social network, a group of people with shared interests forms a user group, whereas a number of user groups are interconnected via interactions or common members. We study the node classification problem in the hierarchical graph where a “node” is a graph instance, e.g., a user group in the above example. As labels are usually limited in real-world data, we design two novel semi-supervised solutions named SEmi-supervised grAph cLassification via Cautious/Active Iteration (or SEAL-C/AI in short). SEAL-C/AI adopt an iterative framework that takes turns to build or update two classifiers, one working at the graph instance level and the other at the hierarchical graph level. To simplify the representation of the hierarchical graph, we propose a novel supervised, self-attentive graph embedding method called SAGE, which embeds graph instances of arbitrary size into fixed-length vectors. Through experiments on synthetic data and Tencent QQ group data, we demonstrate that SEAL-C/AI not only outperform competing methods by a significant margin in terms of accuracy/Macro-F1, but also generate meaningful interpretations of the learned representations.

## 2018

- NeurIPSAdaptive Sampling Towards Fast Graph Representation LearningWenbing Huang, Tong Zhang,
*Yu Rong*, and Junzhou Huang*In Advances in Neural Information Processing Systems*, 2018Graph Convolutional Networks (GCNs) have become a crucial tool on learning representations of graph vertices. The main challenge of adapting GCNs on large-scale graphs is the scalability issue that it incurs heavy cost both in computation and memory due to the uncontrollable neighborhood expansion across layers. In this paper, we accelerate the training of GCNs through developing an adaptive layer-wise sampling method. By constructing the network layer by layer in a top-down passway, we sample the lower layer conditioned on the top one, where the sampled neighborhoods are shared by different parent nodes and the over expansion is avoided owing to the fixed-size sampling. More importantly, the proposed sampler is adaptive and applicable for explicit variance reduction, which in turn enhances the training of our method. Furthermore, we propose a novel and economical approach to promote the message passing over distant nodes by applying skip connections. Intensive experiments on several benchmarks verify the effectiveness of our method regarding the classification accuracy while enjoying faster convergence speed.

- KDDTATC: Predicting Alzheimer’s Disease with Actigraphy DataJia Li,
*Yu Rong*, Helen Meng, Zhihui Lu, Timothy Kwok, and Hong Cheng*In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining*, 2018With the increase of elderly population, Alzheimer’s Disease (AD), as the most common cause of dementia among the elderly, is affecting more and more senior people. It is crucial for a patient to receive accurate and timely diagnosis of AD. Current diagnosis relies on doctors’ experience and clinical test, which, unfortunately, may not be performed until noticeable AD symptoms are developed. In this work, we present our novel solution named time-aware TICC and CNN (TATC), for predicting AD from actigraphy data. TATC is a multivariate time series classification method using a neural attention-based deep learning approach. It not only performs accurate prediction of AD risk, but also generates meaningful interpretation of daily behavior pattern of subjects. TATC provides an automatic, low-cost solution for continuously monitoring the change of physical activity of subjects in daily living environment. We believe the future deployment of TATC can benefit both doctors and patients in early detection of potential AD risk.

- DASFAAExploiting Ranking Consistency Principle in Representation Learning for Location PromotionSiyuan Zhang,
*Yu Rong*, Yu Zheng, Hong Cheng, and Junzhou Huang*In Database Systems for Advanced Applications*, 2018Location-based services, which use information of people’s geographical position as service context, are becoming part of our daily life. Given the large volume of heterogeneous data generated by location-based services, one important problem is to estimate the visiting probability of users who haven’t visited a target Point of Interest (POI) yet, and return the target user list based on their visiting probabilities. This problem is called the location promotion problem. The location promotion problem has not been well studied due to the following difficulties: (1) the cold start POI problem: a target POI for promotion can be a new POI with no check-in records; and (2) heterogeneous information integration. Existing methods mainly focus on developing a general mobility model for all users’ check-ins, but ignore the ranking utility from the perspective of POIs and the interaction between geographical and preference influence of POIs.

## 2017

- CIKMMinimizing Dependence between Graphs
*Yu Rong*, and Hong Cheng*In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management*, 2017In recent years, modeling the relation between two graphs has received unprecedented attention from researchers due to its wide applications in many areas, such as social analysis and bioinformatics. The nature of relations between two graphs can be divided into two categories: the vertex relation and the link relation. Many studies focus on modeling the vertex relation between graphs and try to find the vertex correspondence between two graphs. However, the link relation between graphs has not been fully studied. Specifically, we model the cross-graph link relation as cross-graph dependence, which reflects the dependence of a vertex in one graph on a vertex in the other graph. A generic problem, called Graph Dependence Minimization (GDM), is defined as: given two graphs with cross-graph dependence, how to select a subset of vertexes from one graph and copy them to the other, so as to minimize the cross-graph dependence. Many real applications can benefit from the solution to GDM. Examples include reducing the cross-language links in online encyclopedias, optimizing the cross-platform communication cost between different cloud services, and so on. This problem is trivial if we can select as many vertexes as we want to copy. But what if we can only choose a limited number of vertexes to copy so as to make the two graphs as independent as possible? We formulate GDM with a budget constraint into a combinatorial optimization problem, which is proven to be NP-hard. We propose two algorithms to solve GDM. Firstly, we prove the submodularity of the objective function of GDM and adopt the size-constrained submodular minimization (SSM) algorithm to solve it. Since the SSM-based algorithm cannot scale to large graphs, we design a heuristic algorithm with a provable approximation guarantee. We prove that the error achieved by the heuristic algorithm is bounded by an additive factor which is proportional to the square of the given budget. Extensive experiments on both synthetic and real-world graphs show that the proposed algorithms consistently outperform the well-studied graph centrality measure based solutions. Furthermore, we conduct a case study on the Wikipedia graphs with millions of vertexes and links to demonstrate the potential of GDM to solve real-world problems.

## 2016

- CIKMA Model-Free Approach to Infer the Diffusion Network from Event Cascade
*Yu Rong*, Qiankun Zhu, and Hong Cheng*In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management*, 2016Information diffusion through various types of networks, such as social networks and media networks, is a very common phenomenon on the Internet nowadays. In many scenarios, we can track only the time when the information reaches a node. However, the source infecting this node is usually unobserved. Inferring the underlying diffusion network based on cascade data (observed sequence of infected nodes with timestamp) without additional information is an essential and challenging task in information diffusion. Many studies have focused on constructing complex models to infer the underlying diffusion network in a parametric way. However, the diffusion process in the real world is very complex and hard to be captured by a parametric model. Even worse, inferring the parameters of a complex model is impractical under a large data volume.Different from previous works focusing on building models, we propose to interpret the diffusion process from the cascade data directly in a non-parametric way, and design a novel and efficient algorithm named Non-Parametric Distributional Clustering (NPDC). Our algorithm infers the diffusion network according to the statistical difference of the infection time intervals between nodes connected with diffusion edges versus those with no diffusion edges. NPDC is a model-free approach since we do not define any transmission models between nodes in advance. We conduct experiments on synthetic data sets and two large real-world data sets with millions of cascades. Our algorithm achieves substantially higher accuracy of network inference and is orders of magnitude faster compared with the state-of-the-art solutions.

## 2015

- KDDWhy It Happened: Identifying and Modeling the Reasons of the Happening of Social Events
*Yu Rong*, Hong Cheng, and Zhiyu Mo*In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, 2015In nowadays social networks, a huge volume of content containing rich information, such as reviews, ratings, microblogs, etc., is being generated, consumed and diffused by users all the time. Given the temporal information, we can obtain the event cascade which indicates the time sequence of the arrival of information to users. Many models have been proposed to explain how information diffuses. However, most existing models cannot give a clear explanation why every specific event happens in the event cascade. Such explanation is essential for us to have a deeper understanding of information diffusion as well as a better prediction of future event cascade.In order to uncover the mechanism of the happening of social events, we analyze the rating event data crawled from Douban.com, a Chinese social network, from year 2006 to 2011. We distinguish three factors: social, external and intrinsic influence which can explain the emergence of every specific event. Then we use the mixed Poisson process to model event cascade generated by different factors respectively and integrate different Poisson processes with shared parameters. The proposed model, called Combinational Mixed Poisson Process (CMPP) model, can explain not only how information diffuses in social networks, but also why a specific event happens. This model can help us to understand information diffusion from both macroscopic and microscopic perspectives. We develop an efficient Classification EM algorithm to infer the model parameters. The explanatory and predictive power of the proposed model has been demonstrated by the experiments on large real data sets.

## 2014

- TheWebConfA Monte Carlo Algorithm for Cold Start Recommendation
*Yu Rong*, Xiao Wen, and Hong Cheng*In Proceedings of the 23rd International Conference on World Wide Web*, 2014Recommendation systems have been widely used in E-commerce sites, social networks, etc. One of the core tasks in recommendation systems is to predict the users’ ratings on items. Although many models and algorithms have been proposed, how to make accurate prediction for new users with extremely few rating records still remains a big challenge, which is called the cold start problem. Many existing methods utilize additional information, such as social graphs, to cope with the cold start problem. However, the side information may not always be available. In contrast to such methods, we propose a more general solution to address the cold start problem based on the observed user rating records only. Specifically we define a random walk on a bipartite graph of users and items to simulate the preference propagation among users, in order to alleviate the data sparsity problem for cold start users. Then we propose a Monte Carlo algorithm to estimate the similarity between different users. This algorithm takes a precomputation approach, and thus can efficiently compute the user similarity given any new user for rating prediction. In addition, our algorithm can easily handle dynamic updates and can be parallelized naturally, which are crucial for large recommendation systems. Theoretical analysis is presented to demonstrate the efficiency and effectiveness of our algorithm, and extensive experiments also confirm our theoretical findings.