prism.extract module

prism.extract.character_indices(string, characters)[source]

Returns a sorted list of indices of characters in the string.

Parameters:
  • string (string) – A string where the characters are expected to appear.
  • characters (list) – List of characters to return indices.
Returns:

A list of indices of string where the characters appear.

prism.extract.discard_noninformative_reads(cpgs_reads_dict, depth_cutoff, num_cpg_cutoff)[source]

Discards noninformative reads with low depth or few CpGs.

Parameters:
  • cpgs_reads_dict (dict) – Dictionary mapping a set of CpG coordinates to a group of reads.
  • depth_cutoff (int) – Minimum depth of a group of reads to be retained.
  • num_cpg_cutoff (int) – Mininum number of CpGs appearing in the reads to be retained.
Returns:

Retained (informative) CpG-Read group mapping dictionary.

prism.extract.extend_region(region, read)[source]

Returns the union of genomic regions covered by given region and read.

Parameters:
  • region (Region) – Genomic region to be extended.
  • read (AlignedSegment) – Read used for the extension.
Returns:

Extended genomic region, which is the union of region and read.

prism.extract.get_cpg_coordinates(read, paired=False)[source]

Returns genomic coordinates of cytosines of CpGs in the read.

Parameters:
  • read (AlignedSegment) – Read to examine for the location of CpGs.
  • paired (bool) – True if the sequencing library is paired, otherwise False.
Returns:

Absolute genomic coordinates of CpGs that appear in the read.

prism.extract.has_overlap(region, read)[source]

Returns True if given region and read overlap, otherwise False.

Parameters:
  • region (Region) – Genomic region to test overlap.
  • read (AlignedSegment) – Read to test overlap.
Returns:

True if given region and read overlap, otherwise False.

prism.extract.run(input_fp, output_fp, depth_cutoff, num_cpg_cutoff, prepend_chr, paired, verbose)[source]
prism.extract.save_met_file(handle, output_path, depth_cutoff, num_cpg_cutoff, paired, contigs)[source]

Read through the bam file and identify group of reads mapped to the same genomic region. Group of reads with insufficient depth and insufficient number of CpGs will be discarded. NOTE: BAM file should be sorted to guarantee proper generation of MET file.

Parameters:
  • handle (AlignmentFile) – pysam AlignmentFile object for BAM file.
  • output_path (string) – Output MET file path.
  • depth_cutoff (int) – Minimum depth for a group of reads to be retained.
  • num_cpg_cutoff – Minimum number of CpGs appearing in the reads to be retained.
  • paired (bool) – True if the sequencing library is paired, otherwise False.
  • contigs (list) – A list of contigs to examine.