VIBS-689: Machine Learning for Biologists
VIBS-689: Machine Learning for Biologists (Spring 2020)
Instructor: James Cai (jcai@tamu.edu)
Lecture & computer lab: Tues & Thurs 10 AM - 12 PM, VIDI (bldg #1813) Rm 104 & 115
Course Description
The aim of the course is to provide a practical introduction to the analysis of “omics” and single-cell data using machine learning. Topics will range from data visualization/exploration to advanced machine learning methods. Practical examples and applications will be illustrated by using Matlab, R and Julia.
Practical and effective machine learning methods and concepts; principles and concepts in statistical experiment design and analytics; single cell data analysis, population genetics, and evolutionary genomics; applications of quantitative approaches (computation, statistics, and mathematics) in analyzing large-scale and complex biological data sets; algorithm design and development of scientific software using high-level high-performance scientific computer languages; emerging techniques for integrative data analysis, and the assumptions, advantages, and limitations of these techniques.
Course Milestones
1. BASIC MATH & MATLAB/R/JULIA: data exploration, visualization, programming languages (1/14)
2. VECTORS & MATRICES: linear algebra, norm, distances, eigenvalue and eigenvector (1/21)
3. DISTRIBUTIONS, CURVE FITTING & OPTIMIZATION: anomaly detection, gradient descent, evolutionary algorithm (1/28)
4. CORRELATION & REGRESSION: multivariate analysis, regularization, LASSO (2/4)
5. ENTROPY & MUTUAL INFORMATION (2/11)
6. COMPONENT & DECOMPOSITION: SVD, ICA, NMF (2/18)
7. DIMENSION REDUCTION: PCA, tSNE, UMAP, PHATE (2/25)
8. CLUSTERING: : k-means, nearest neighbor, DBSCAN, spectral clustering (3/3)
=== Spring Break (3/9-3/13) ===
9. CLASSIFICATION: classifiers, LDA, random forest, neural network (3/17)
10. KERNELS: support vector machine (3/24)
11. NETWORK & COMMUNITY: centrality analysis, community detection, clustering with multi-layer graphs (3/31)
12. DIFFERENTIAL NETWORK ANALYSIS: WGCNA, single-cell regulatory network, graph comparison (4/7)
13. TENSOR FACTORIZATION & MANIFOLD LEARNING (4/14)
14. Student presentations (4/21) and final exam (4/28)
Learning objectives/outcomes
Acquisition of a working knowledge about running a full classification/profiling pipeline on omics data (e.g., gene expression). After this course participants should be able to:
- Develop and implement data visualization/exploration solutions
- Understand advanced data analysis strategies in the context of omics research
- Understand principles and applications of machine learning
- Implement reproducible workflows of data analysis in real-word applications
Course prerequisites
basics of a programming language, basic statistical knowledge, basic knowledge of linear algebra