Single-cell RNA sequencing (scRNA-seq) has transformed our ability to study cellular heterogeneity. Instead of averaging gene expression across thousands of cells, we can profile each cell individually. That shift in resolution changes what questions we can ask.
The Core Workflow
A typical scRNA-seq analysis moves through these stages:
- Alignment & quantification — map reads to a reference transcriptome (Cell Ranger, STARsolo, Salmon/Alevin)
- Quality control — filter low-quality cells based on library size, gene count, and mitochondrial fraction
- Normalization — correct for sequencing depth differences between cells
- Dimensionality reduction — PCA, then UMAP or t-SNE for visualization
- Clustering — identify groups of similar cells
- Annotation — assign cell types to clusters using marker genes
Tooling Landscape
The two dominant ecosystems are:
- Seurat (R) — well-documented, opinionated, excellent for standard workflows
- Scanpy (Python) — flexible, memory-efficient, integrates well with deep learning tools
For most exploratory work I reach for Seurat, but Scanpy becomes attractive when you’re working with very large datasets or integrating with models like scVI.
Common Pitfalls
Doublets — two cells captured together look like one weird cell. Always run a doublet detection tool (DoubletFinder, Scrublet) before clustering.
Batch effects — samples processed on different days often cluster by batch rather than biology. Harmony and scVI are reliable integration methods.
Over-clustering — it’s tempting to push resolution until you get many fine-grained clusters, but many of them won’t be biologically meaningful. Validate with known marker genes.
Reproducibility Note
Container-based environments (Docker or Singularity) are essential here. R package versions in particular can drastically affect results. Pin everything.
scRNA-seq analysis is as much art as science right now — the field is moving fast, and best practices are still evolving. Stay close to the methods papers.