scMAVERICS - Empowering researchers to run multi-omic single-cell analysis faster and easier.

Apr 11

Quick summary:

scMAVERICS - Empowering researchers to run multi-omic single-cell analysis faster and easier. Standard tools combined into a containerized pipeline for stability that leads to lower barriers in data harmonization while increasing transparency and reproducibility. Optimized for applications to brain datasets and deployed at NIH’s CARD. Check out the beta version here and let us know what you think. Right now it handles transcriptomics and ATAC datasets, but spatial capabilities are incoming thanks to our very own Christina Como.

Amazing work from the team including: Adam Catching, Cory Weller, Amber Trujillo, Andy Henrie, Elizabeth Hutchins and Eric Alsop, with help from Eugene Fong and Rachel Jiang. Complete pipeline curation in less than two months, synthesizing workflows from various internal and external sources to best approximate community needs + testing … great hustle.

Also not actually a planned Top Gun reference, but RIP Val Kilmer.

Deeper dive:

scMAVERICS (single-cell Multiome Analysis using Variational-inference and Enhancer-driven Regulatory-networks to Inform Cell-atlas Structure) is a comprehensive pipeline developed by DataTecnica and collaborators for the NIH Center for Alzheimer's and Related Dementias (CARD) for analyzing single-cell multiome data. It integrates RNA and ATAC sequencing data to construct detailed cell atlases, particularly focusing on human brain samples. Spatial transcriptomics in the next release.

🧠 Purpose and Motivation

Single-cell multiome technologies, like the 10x Genomics Multiome kit, provide simultaneous measurements of gene expression (RNA-seq) and chromatin accessibility (ATAC-seq) from individual nuclei. However, processing and integrating these complex datasets can be challenging. scMAVERICS addresses this by offering a streamlined, reproducible workflow that minimizes processing time and technical barriers, enabling researchers to focus on data analysis and interpretation

🔧 Key Features

Automated Workflow Utilizes Snakemake for orchestrating the analysis steps, ensuring reproducibility and scalability.
Ambient RNA Correction Incorporates CellBender to correct for ambient RNA contamination in single-cell RNA-seq data.
Data Processing and Integration:

RNA-seq Processes gene expression data using Scanpy and scVI for dimensionality reduction and batch correction.
ATAC-seq Processes chromatin accessibility data using snapATAC2 and pycisTopic, followed by peak calling with MACS.
Integration Merges RNA and ATAC data into a unified multi-ome atlas using Muon, facilitating joint analysis.

Cell Type Annotation Assigns cell types based on curated marker genes, enhancing biological interpretability.
Differential Analysis Identifies differentially expressed genes and accessible chromatin regions across conditions or cell type.
Modular Design Allows customization of parameters and integration of additional tools as needed.

🧬 Workflow Overview

Input Preparation: Requires metadata, marker gene lists, and CellRanger-ARC outputs.
Quality Control: Filters cells based on mitochondrial content, doublet scores, and gene counts.
Data Processing:

*RNA: Processes and filters gene expression data.
*ATAC: Processes and filters chromatin accessibility data.

Integration: Combines RNA and ATAC data into a multiome atlas.
Analysis:

*Dimensionality Reduction: Applies scVI for embedding.
*Clustering: Performs Leiden clustering and UMAP visualization.
*Cell Type Annotation: Assigns cell types using marker gens.
*Differential Analysis: Identifies differentially expressed genes and accessible regions.

🧪 Applications

scMAVERICS is particularly suited for studies involving:

Neurodegenerative diseases, such as Alzheimer's disease.
Comparative analyses across different conditions or treatments.
Construction of comprehensive cell atlases integrating transcriptomic and epigenomic data.

🚀 Getting Stared

To use scMAVERICS:

Clone the repository:

git clone https://github.com/NIH-CARD/scMAVERICS.git

2. Prepare the required input files as specified in the repository.

3. Customize the `snakefile` parameters to match your datset.

4. Run the pipeine:

```bash

bash snakemake.sh

```

Note: The pipeline is designed for high-performance computing environments using Slurm.

---

For more detailed information and updates, visit the [scMAVERICS GitHub repository](https://github.com/NIH-CARD/scMAVERICS).