## VIBS-689: Single-cell Data Analysis via Machine Learning

## VIBS-689: Single-cell Data Analysis via Machine Learning (Spring 2020)

Instructor: James Cai (jcai@tamu.edu)

Lecture & computer lab: Tues & Thurs 10 AM - 12 PM, VIDI (bldg #1813) Rm 104 & 115

### Course Description

*The aim of the course is to provide a practical introduction to the analysis of single-cell data using machine learning. Topics will range from data visualization/exploration to advanced machine learning methods. Practical examples and applications will be illustrated by using Matlab, R and Julia. *

Principles and concepts in single-cell RNA sequencing (scRNAseq) experiments; real-world applications of scRNAseq with examples; machine learning (ML) methods for single-cell data analysis; practical and effective ML methods and concepts; applications of ML methods in high-dimensional scRNAseq data; algorithm design and development of scientific software using high-level high-performance scientific computer languages; emerging techniques for integrative single-cell data analysis, and the assumptions, advantages, and limitations of these techniques.

### Course Milestones

**1. Introduction to single-cell technologies (10x Genomics and C1)**; INTRODUCTION: mathematical prerequisites and programming languages MATLAB/R/JULIA, data exploration and visualization (1/14)

**2. Getting expression counts (UMIs) for individual cells (10X cell-ranger and salmon-alevin pipeline)**; VECTORS & MATRICES: linear algebra, norm, distances, eigenvalue and eigenvector (1/21)

**3. Basic statistics of scRNAseq data and QC**; DISTANCE & SIMILARITY (1/28)

**4. scRNAseq data modeling and normalization**; PROBABILITY & DISTRIBUTIONS (2/4)

**5. Dimension reduction and visualization**; DIMENSION REDUCTION: PCA, tSNE, UMAP, PHATE (2/11)

**6. Clustering analysis of high-dimensional scRNAseq data**; CLUSTERING: k-means, DBSCAN, spectral clustering (2/18)

**7. Marker genes and cell type identification**; CLASSIFICATION: classifiers, LDA, random forest, k-NN, neural network (2/25)

**8. Feature selection, identification of highly variable genes**; KERNELS: kernel methods, support vector machine (3/3)

=== Spring Break (3/9-3/13) (3/16-3/20) ===

**9. Differential expression analyses with scRNA-seq data**; REGRESSION & REGULARIZATION (3/24)

**10. scRNAseq vs bulk RNAseq**; COMPONENT & DECOMPOSITION: SVD, ICA, NMF (3/31)

**11. Pseudotime and trajectory analyses**; OPTIMIZATION: gradient descent, quadratic programming, evolutionary algorithm (4/7)

**12. Manifold alignment for combining different types of data, e.g., scRNAseq and scATACseq**; TENSOR FACTORIZATION & MANIFOLD LEARNING (4/14)

**13. Construction of single-cell gene regulatory networks (scGRNs)**; NETWORK & COMMUNITY: centrality analysis, community detection, clustering with multi-layer graphs (4/21)

14. Final exam (4/28)

### Learning objectives/outcomes

Acquisition of a working knowledge about running a full pipeline on scRNAseq data. After this course participants should be able to:

- Develop and implement scRNAseq data visualization/exploration solutions
- Understand advanced data analysis strategies in the context of scRNAseq research
- Understand principles and applications of machine learning
- Implement reproducible workflows of data analysis in real-word applications

### Course prerequisites

basics of a programming language, basic statistical knowledge, basic knowledge of linear algebra