Generating synthetic scDNA-seq reads
This tutorial illustrates how to generate raw synthetic single-cell DNA-sequencing reads using SISTEM. Our experimental setup will mostly follow the Generating single-cell CNA profiles and cell tree tutorial, but we will only create one replicate and generating the reads requires an extra step. In addition to the scDNA-seq reads, it will also generate ground truth haplotype-specific CNA profiles and haplotype-specific read counts. As this is a single-cell experiment, we will be using a relatively low mean coverage of \(0.02\), and a read length of \(150\). To speed up the read simulation, we wil leverage multiple processors. The main simulation step may take around 25 minutes, while generating the sequencing reads will take much longer.
import os
from sistem import GrowthSimulator, Parameters
from sistem.selection import RandomRegionLibrary
from sistem.anatomy import SimpleAnatomy
from sistem.data import gen_reads
# First initialize parameters
params = Parameters(
out_dir='sim4_out',
nsites=1,
capacities=1e7,
min_detectable=5e6,
arm_rate=5e-4,
chromosomal_rate=1e-4,
SNV_pass_rate=0,
region_len=1e6,
CN_coeff=0.1,
OG_r=0.1,
TSG_r=0.1,
ncells_prim = 1000,
coverage = 0.02,
read_len = 150,
num_processors = 4
)
# Create a Library instance and initialize with random coefficients
library = RandomRegionLibrary(params=params)
library.initialize(params=params)
# Create the Anatomy instance and then GrowthSimulator. No need to initialize distances using SimpleAnatomy.
anatomy = SimpleAnatomy(library, params=params)
gs = GrowthSimulator(anatomy)
# Simulate tumor growth. This may take ~20 minutes. You can monitor the sim.log file in real time.
gs.simulate_agents(params=params)
gs.save_checkpoint(os.path.join(params.out_dir, 'gs.pkl'))
# Now sample cells and simulate the cell-lineage tree
gs.sample_cells(params=params)
tree = gs.simulate_singlecell_lineage(params=params)
# Call the gen_reads function with the first parameter being the Tree object outputted by gs.simulate_singlecell_lineage.
gen_reads(tree, params=params)