Analysis Details
This page provides additional details on my analytical philosophy,
methodological choices, and practical considerations in single-cell
multi-omic data analysis.
The focus is not on exhaustive pipelines, but on why specific methods
are chosen and how common analytical pitfalls are avoided, especially
in the context of peer review.
General Principles
- Analytical choices are driven by biological questions, not by default pipelines
- Statistical assumptions are made explicit whenever possible
- Conservative decisions are preferred over aggressive overfitting
- Visual clarity never overrides biological plausibility
Quality Control (QC)
Quality control is performed in a sample-specific manner, rather than
using fixed global thresholds.
Metrics considered include:
- Number of detected genes
- Total UMI counts
- Mitochondrial gene fraction
- Hemoglobin gene expression
Thresholds are determined by inspecting distribution patterns within each sample, aiming to balance signal preservation and noise removal.
Batch Effect Correction
Batch correction strategies are selected based on:
- Experimental design
- Strength of batch effects
- Risk of biological overcorrection
Methods commonly evaluated include:
Harmony, CCA/rPCA, BBKNN, and scVI.
The primary goal is to reduce technical variation while preserving
true biological heterogeneity.
Cell Type Annotation
Cell annotation follows a hierarchical and iterative strategy:
- Initial low-resolution clustering
- Identification of major cell lineages
- Progressive refinement into functional subpopulations
This approach avoids forced overannotation and aligns with reviewer expectations regarding interpretability and reproducibility.
Differential Expression Analysis
Whenever applicable, differential expression is performed using a
Pseudobulk framework combined with DESeq2.
Cells are aggregated at the sample or condition level to avoid
pseudo-replication error caused by treating individual cells as independent
biological replicates.
Immune Repertoire (VDJ) Analysis
Immune repertoire data are re-quantified using MiXCR.
Two parallel representations are generated:
- Allele-aware data for clonotype similarity and public clonotype analysis
- Allele-collapsed data for clonal expansion and diversity analysis
This design allows flexibility across different analytical objectives.
Scope and Limitations
Not all datasets support all types of analyses.
Method selection depends on:
- Sample size and replication
- Biological context
- Data quality and experimental design
Analyses are adapted accordingly to ensure statistical validity and interpretability.
Contact
If you would like to discuss whether a specific dataset or study design is suitable for a particular analysis strategy, feel free to email me.