Selection models

class sistem.selection.RandomArmLibrary(**kwargs)

The chromosome-arm selection model with randomly generated selection coefficients.

Under this model, each chromosome-arm is assigned a random selection coefficient in the range (-CN_coeff, CN_coeff). Arms with a positive coefficient correspond to those with a greater balance of OGs to TSGs, while arms with a negative coefficient will have a greater balance of TSGs to OGs. The fitness \(s_a(c)\) of cell \(c\) in site \(a\) is computed as

\[s_a(c) = \prod_{k=1}^K \prod_{arm \in \{p,q\}} (1 + \delta_{k,arm})^{x_{k,arm} / p_c},\]

where \(K\) is the number of chromosomes, \(\delta\) are selection coefficients, \(x_{k,arm}\) is the average copy number of chromosome arm \(k,arm\) in cell \(c\), and \(p_c\) is cell ploidy.

initialize(params: Parameters | None = None, CN_coeff: float | None = None)

Method used to initialize selection coefficients. See Parameters for an explanation of the parameters.

Parameters:
  • params (Parameters, optional)

  • CN_coeff (float, optional)

class sistem.selection.FittedArmLibrary(**kwargs)

Uses the same model as RandomArmLibrary, but the selection coefficients are given by the user.

initialize(filepath: str | None = None, delta: Dict | None = None)

Method used to initialize selection coefficients.

Parameters:
  • filepath (str, optional) – Path to a tsv file containing the selection coefficients. The first entry is the chromosome name, followed by the selection coefficient of the short and long arm, respectively.

  • delta (dict, optional) – Instead of passing a file, users can pass a dictionary where keys are chromosome names and values are a 2-length array of coefficients.

class sistem.selection.RandomRegionLibrary(**kwargs)

The region selection model with randomly generated selection coefficients.

This model is similar to the chromosome-arm model but operates at the region/gene level. In particular, each reference region i on chromosome k is assigned a selection coefficient in range (-CN_coeff, CN_coeff), where positive values correspond to OGs, negative values correspond to tumor suppressor genes TSGs, and zero corresponds to neutral regions (NEU). The fitness \(s_a(c)\) of cell \(c\) in site \(a\) is computed as

\[s_a(c) = \prod_{k=1}^K \prod_{i=1}^{m_k} (1 + \delta_{k,i})^{x_{k,i} / p_c},\]

where \(K\) is the number of chromosomes, \(m_k\) is the number of reference regions on chromosome \(k\), the \(\delta\)’s are selection coefficients, \(x_{k,i}\) is the copy number of region \(i\) on chromosome \(k\) in cell \(c\), and \(p_c\) is cell ploidy.

initialize(params: Parameters | None = None, CN_coeff: float | None = None, OG_r: float | None = None, TSG_r: float | None = None, EG_r: float | None = None)

Method used to initialize selection coefficients. See Parameters for an explanation of the parameters.

Parameters:
  • params (Parameters, optional)

  • CN_coeff (float, optional)

  • OG_r (float, optional)

  • TSG_r (float, optional)

  • EG_r (float, optional)

class sistem.selection.FittedRegionLibrary(**kwargs)

Uses the same model as RandomRegionLibrary, but the selection coefficients are given by the user.

initialize(filepath: str | None = None, delta: Dict | None = None)

Method used to initialize selection coefficients.

Parameters:
  • filepath (str, optional) – Path to a tsv file containing the selection coefficients. The first entry is the chromosome name, followed by the selection coefficients in order of reference regions.

  • delta (dict, optional) – Instead of passing a file, users can pass a dictionary where keys are chromosome names and values are an \(m_k\)-length array of coefficients.

class sistem.selection.RandomHybridLibrary(**kwargs)

The hybrid selection model with randomly generated selection coefficients.

The hybrid model extends the region model to consider SNVs as well. The idea is that SNVs can disrupt the function of genes, thereby altering the selective effect of that gene on the cell’s fitness. Here we distinguish between driver SNVs, which have this effect, and passenger SNVs, which have no effect (synonymous). For each region on each chromosome \((k,i)\), randomly assign a second coefficient \(\lambda_{k,i}\) in the range (-SNV_coeff, SNV_coeff), where \(\lambda_{k,i} > 0\) if \((k,i)\) is a TSG but \(\lambda_{k,i} > 0\) or \(\lambda_{k,i} < 0\) if \((k,i)\) is an OG. Additionally, a fraction of NEU regions are set to be essential genes (EG), where if \((k,i)\) is an EG then \(\delta_{k,i} = 0\) and \(\lambda_{k,i} > 0\), while \(\lambda_{k,i} = 0\) for the remaining neutral genes. Fitness is computed as

\[s_a(c) = \prod_{k=1}^K \prod_{i=1}^{m_k} \Big(1 + \delta_{k,i}\cdot(e^{-\hat{y}_{k,i} \lambda_{k,i}})\Big)^{x_{k,i}/p_c} \cdot \Big(e^{-\hat{y}_{k,i} \lambda_{k,i}}\Big)^{h(k,i)},\]

where \(\hat{y}_{k,i}\) is the average number of driver SNVs on any copy of region \(i' of chromosome :math:`k' and :math:`h\) is an indicator function with \(h(k,i) = 1\) if region \((k,i)\) is an EG and \(h(k,i) = 0\) otherwise.

initialize(params: Parameters | None = None, CN_coeff: float | None = None, SNV_coeff: float | None = None, OG_r: float | None = None, TSG_r: float | None = None, EG_r: float | None = None)

Method used to initialize selection coefficients. See Parameters for an explanation of the parameters.

Parameters:
  • params (Parameters, optional)

  • CN_coeff (float, optional)

  • SNV_coeff (float, optional)

  • OG_r (float, optional)

  • TSG_r (float, optional)

  • EG_r (float, optional)

class sistem.selection.FittedHybridLibrary(**kwargs)

Uses the same model as RandomHybridLibrary, but the selection coefficients are given by the user.

initialize(filepath: str | None = None, delta: Dict | None = None, lam: Dict | None = None)

Method used to initialize selection coefficients.

Parameters:
  • filepath (str, optional) – Path to a tsv file containing the selection coefficients. The first entry is the chromosome name, followed by the selection coefficient pairs in order of reference regions. Each pair is of the form x,y where x is the CN coefficient and y is the SNV coefficient (\(\delta\) and \(\lambda\)).

  • delta (dict, optional) – Instead of passing a file, users can pass a dictionary where keys are chromosome names and values are an \(m_k\)-length array of CN coefficients. Must also be passed with lam in the same form.

  • lam (dict, optional) – Instead of passing a file, users can pass a dictionary where keys are chromosome names and values are an \(m_k\)-length array of SNV coefficients. Must also be passed with delta in the same form.

class sistem.selection.BaseLibrary(params: Parameters | None = None, chrom_lens: dict | None = None, arm_ratios: float | dict | None = None, region_len: int | None = None, max_distinct_driv_ratio: float | None = None, max_region_CN: int | None = None, max_region_SNV: int | None = None, max_ploidy: int | float | None = None, min_ploidy: int | float | None = None)

Bases: ABC

Abstract base class for selection libraries. If creating a custom selection library, users must create a child class which inherits from this class. See Parameters for an explanation of the constructor parameters.

Parameters:
  • params (Parameters, optional)

  • chrom_lens (dict, optional)

  • arm_ratios (float, dict, optional)

  • region_len (int, optional)

  • max_distinct_driv_ratio (float, optional)

  • max_region_CN (int, optional)

  • max_region_SNV (int, optional)

  • max_ploidy (int, float, optional)

  • min_ploidy (int, float, optional)

is_driver_region_model

True if individual regions can be drivers, False otherwise.

Type:

bool

is_driver_SNV_model

True if individual SNVs can impact fitness, False otherwise.

Type:

bool

abstractmethod check_viability(clone)

Checks if clone passes viability checkpoints based on mutated driver stats.

Parameters:

clone (Clone) – The clone object in question.

Returns:

True if passes, False if not.

Return type:

(bool)

abstractmethod compute_fitness(clone)

Computes the fitness of a clone.

abstractmethod get_driver_start_regions(cell, chromosome, size)

Identifies the possible start region indices of a driver event.

Parameters:
  • cell (Cell) – The current cell.

  • chromosome (Chromosome) – The current chromosome.

  • size (int) – The size in number of regions of the event (1 for SNVs, >=1 for CNAs).

abstractmethod get_passenger_start_regions(cell, chromosome, size)

Identifies the possible start region indices of a passenger event.

Parameters:
  • cell (Cell) – The current cell.

  • chromosome (Chromosome) – The current chromosome.

  • size (int) – The size in number of regions of the event (1 for SNVs, >=1 for CNAs).

abstractmethod init_attributes(clone)

Creates an Attributes object for a clone and updates it to reflect its genome state.

Parameters:

clone (Clone) – The clone object in question.

abstractmethod init_base_fit()

Sets the base_fit attribute to the fitness of an unmutated Cell.

abstractmethod init_max_fit()

Sets the max_fit attribute to an estimate of the maximum possible fitness.

abstractmethod initialize(**kwargs)

Function used to initialize the selection coefficients.

abstractmethod update_stats(clone, chromosome, start, end, mag=1)

Function which, upon a mutation occurring, updates intermediate stats stores in the clones Attributes object to make fitness computation more efficient.

Parameters:
  • clone (Clone) – The clone undergoing a mutation.

  • chromosome (Chromosome) – The Chromosome object being mutated.

  • start (int) – The index of the starting region.

  • end (int) – The index of the ending region.

  • mag (int) – If the event is a CNA (start != end), then it is the number of copies being gained. If the event is an SNV (start == end), then it interprets an SNV being added in region start.