Gene Regulatory Network Inference: Methods, Tools, and Pitfalls

Gene regulatory networks (GRNs) describe how transcription factors (TFs) control the expression of target genes. Reconstructing these networks from transcriptomic data is one of the hardest problems in computational biology — and one of the most rewarding.

This post covers the current landscape of GRN inference methods, how to run them in practice, and the pitfalls that trip people up most often.

What Are We Actually Trying to Infer?

A GRN is a directed graph where:

Nodes represent genes (or specifically, transcription factors and their targets)
Edges represent regulatory relationships (TF → target gene)
Edge weights represent the strength or confidence of regulation

        ┌──────────┐
        │  GATA2   │  (Transcription Factor)
        └────┬─────┘
             │  activates
    ┌────────┼────────┐
    ▼        ▼        ▼
 [KIT]    [TAL1]   [EPOR]   (Target Genes)
            │
            │ represses
            ▼
          [GATA1]

Most methods infer an undirected or signed network from co-expression patterns, then optionally layer in TF-binding evidence (ChIP-seq, motif databases) to add directionality.

The Core Methods

GENIE3 / GRNBoost2

GENIE3 was a landmark method: it trains a random forest (one per TF) to predict the expression of every target gene from TF expression, then uses feature importances as edge weights.

GRNBoost2 is a faster, gradient-boosted replacement that produces near-identical results with much lower compute time. Both operate on a gene expression matrix and a list of known TFs.

from arboreto.algo import grnboost2
from arboreto.utils import load_tf_names
import pandas as pd

# ex_matrix: cells × genes DataFrame (log-normalised counts)
ex_matrix = pd.read_csv("expression_matrix.tsv", sep="\t", index_col=0)

# TF list — use a curated list from AnimalTFDB or TFcheckpoint
tf_names = load_tf_names("hs_hgnc_tfs.txt")

# Run GRNBoost2 (parallelised via Dask)
network = grnboost2(
    expression_data=ex_matrix,
    tf_names=tf_names,
    verbose=True,
    seed=42,
)

# Result: DataFrame with columns [TF, target, importance]
network.sort_values("importance", ascending=False).head(20)

This is conceptually simple: a high importance score means TF expression is a good predictor of target expression, suggesting a regulatory relationship.

SCENIC / pySCENIC

SCENIC extends GRNBoost2 by adding two crucial steps:

Regulon pruning via motif enrichment — keeps only TF→target edges where the TF’s binding motif is enriched in the target’s cis-regulatory elements
Regulon activity scoring — scores each regulon (TF + its validated target set) per cell, giving a cell-level regulatory activity matrix

This workflow converts a noisy co-expression network into a set of high-confidence regulons backed by cis-regulatory evidence — a much more interpretable output.

                    GEX Matrix
                        │
                        ▼
              ┌──────────────────┐
              │   GRNBoost2      │  Step 1: Co-expression network
              │  (TF→target      │
              │   importances)   │
              └────────┬─────────┘
                       │
                       ▼
              ┌──────────────────┐
              │  Motif enrichment│  Step 2: Prune by TF binding motifs
              │  (cisTarget DBs) │  → Validated TF-regulon pairs
              └────────┬─────────┘
                       │
                       ▼
              ┌──────────────────┐
              │  AUCell scoring  │  Step 3: Per-cell regulon activity
              │  (regulon AUC    │  → Cells × Regulons activity matrix
              │   per cell)      │
              └──────────────────┘

Running pySCENIC (command-line, containerised):

# Step 1: GRN inference
pyscenic grn \
    expression_matrix.loom \
    hs_hgnc_tfs.txt \
    -o adj.tsv \
    --num_workers 16 \
    --seed 42

# Step 2: Regulon pruning with cisTarget
pyscenic ctx \
    adj.tsv \
    hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather \
    hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.feather \
    --annotations_fname motifs-v9-nr.hgnc-m0.001-o0.0.tbl \
    --expression_mtx_fname expression_matrix.loom \
    --output regulons.csv \
    --mask_dropouts \
    --num_workers 16

# Step 3: AUCell scoring
pyscenic aucell \
    expression_matrix.loom \
    regulons.csv \
    --output auc_matrix.loom \
    --num_workers 16

The AUCell matrix is gold: each column is a regulon, each row is a cell, and values represent how active that regulatory program is in each cell. You can overlay this on a UMAP to see which TFs drive which cell states.

Visualising Regulon Activity on a UMAP

import scanpy as sc
import loompy
import pandas as pd

# Load AUCell scores
auc_loom = loompy.connect("auc_matrix.loom", mode='r')
auc_df = pd.DataFrame(
    auc_loom.ca["RegulonsAUC"].T,
    index=auc_loom.ca["CellID"],
    columns=auc_loom.ra["Regulons"],
)
auc_loom.close()

# Load your existing Scanpy AnnData with UMAP
adata = sc.read_h5ad("adata_clustered.h5ad")
adata.obs = adata.obs.join(auc_df, how="left")

# Plot activity of GATA2 regulon on UMAP
sc.pl.umap(adata, color="GATA2(+)", cmap="RdYlBu_r", vmin=0, vmax=0.3)

Benchmarking: What Actually Works?

The BEELINE benchmark evaluated 12 GRN inference methods on synthetic and real single-cell data. Key findings:

No method is universally best; performance is highly dataset-dependent
Simple baselines (correlation, mutual information) are surprisingly competitive
Methods that integrate prior knowledge (motifs, ChIP-seq) consistently outperform pure expression-based methods
Single-cell data is noisier than bulk; methods designed for scRNA-seq are not always better

The practical takeaway: run multiple methods and look for consensus edges.

from scipy.stats import spearmanr

# Compare GRNBoost2 and SCENIC rankings for overlapping edges
grn_edges = set(zip(grn_df['TF'], grn_df['target']))
scenic_edges = set(zip(scenic_df['TF'], scenic_df['target']))

overlap = grn_edges & scenic_edges
print(f"Overlapping edges: {len(overlap)} / {len(grn_edges)} ({100*len(overlap)/len(grn_edges):.1f}%)")

# High-confidence set: top-ranked edges in both methods
consensus = grn_df[grn_df.apply(lambda r: (r['TF'], r['target']) in overlap, axis=1)]

Common Pitfalls

Using all genes as potential regulators. Always restrict TF candidates to a curated, species-specific TF list. Running GRNBoost2 over all 20,000 genes produces a co-expression network, not a regulatory network.

Ignoring dropout. scRNA-seq suffers from high dropout (zeros that aren’t truly zero expression). This inflates correlation-based edges. Use methods designed for sparse data, or impute cautiously with MAGIC before inference.

Confusing correlation with regulation. Co-expression can arise from shared upstream regulation, cell-type differences, or technical artifacts. A regulatory edge requires directional evidence — motif enrichment, ChIP-seq, or perturbation data.

Not accounting for cell type. GRN inference on a mixed cell population often reflects cell-type proportions rather than regulatory programs. Subset to a single cell type or state before running inference — or use methods like ANANSE that can model cell-type-specific networks.

Treating the output as ground truth. GRN inference is hypothesis generation, not fact. Prioritise the top edges for experimental validation, and treat the network as a scaffold for exploration rather than a definitive wiring diagram.

Resources

pySCENIC GitHub — well-maintained, Nextflow-compatible
BEELINE benchmark — code for rigorous evaluation
ANANSE — GRN inference with peak accessibility data
Dorothea — curated TF-target gene sets for human and mouse

GRN inference sits at the intersection of statistics, machine learning, and molecular biology. The algorithms are a means to an end — the goal is biological insight, and that requires bringing domain knowledge to every step of the analysis.