Single-Cell RNA-seq: A Practical Overview

Single-cell RNA sequencing (scRNA-seq) has transformed our ability to study cellular heterogeneity. Instead of averaging gene expression across thousands of cells, we can profile each cell individually. That shift in resolution changes what questions we can ask.

The Core Workflow

A typical scRNA-seq analysis moves through these stages:

Alignment & quantification — map reads to a reference transcriptome (Cell Ranger, STARsolo, Salmon/Alevin)
Quality control — filter low-quality cells based on library size, gene count, and mitochondrial fraction
Normalization — correct for sequencing depth differences between cells
Dimensionality reduction — PCA, then UMAP or t-SNE for visualization
Clustering — identify groups of similar cells
Annotation — assign cell types to clusters using marker genes

Tooling Landscape

The two dominant ecosystems are:

Seurat (R) — well-documented, opinionated, excellent for standard workflows
Scanpy (Python) — flexible, memory-efficient, integrates well with deep learning tools

For most exploratory work I reach for Seurat, but Scanpy becomes attractive when you’re working with very large datasets or integrating with models like scVI.

Common Pitfalls

Doublets — two cells captured together look like one weird cell. Always run a doublet detection tool (DoubletFinder, Scrublet) before clustering.

Batch effects — samples processed on different days often cluster by batch rather than biology. Harmony and scVI are reliable integration methods.

Over-clustering — it’s tempting to push resolution until you get many fine-grained clusters, but many of them won’t be biologically meaningful. Validate with known marker genes.

Reproducibility Note

Container-based environments (Docker or Singularity) are essential here. R package versions in particular can drastically affect results. Pin everything.

scRNA-seq analysis is as much art as science right now — the field is moving fast, and best practices are still evolving. Stay close to the methods papers.