IsoSLAM
IsoSLAM module.
append_data(assigned_conversions, coverage_counts, read_uid, assignment, results, schema)
Create a Polars dataframe combining the ''assigned_conversions'' and ''coverage_counts''.
Adds ''assignment'' to the resulting dataframe.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
assigned_conversions
|
set[list[Any]]
|
A set of assigned conversions. Each element of the set is a list of key features (CHECK WHAT THESE ARE). |
required |
coverage_counts
|
dict[str, int] dest_dir: str | Path
|
A dictionary of coverage counts indexed by CHECK. |
required |
read_uid
|
int
|
Integer representing the unique read ID. |
required |
assignment
|
str
|
Type of assignment, either ''Rep'' or ''Spl'' (for Splice). |
required |
results
|
DataFrame
|
Polars DataFrame to append data to. This will initially be empty but the schema matches the variables that are added. |
required |
schema
|
dict[str, type]
|
Schema dictionary for data frame. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
Returns a Polars DataFrame of the data structure. |
Source code in isoslam/isoslam.py
conversions_per_read(read, conversion_from, conversion_to, convertible, converted_position, coverage, vcf_file)
Build sets of genome position for conversions, converted positions and coverage for a given read.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
read
|
dict[str, dict[str, Any]]
|
Aligned read. |
required |
conversion_from
|
str
|
The base pair the conversion is from, typically either ''T'' or ''C''. |
required |
conversion_to
|
str
|
The base pair the conversion is to, typically the opposite pairing of ''from'', i.e. ''A'' or ''C'' respectively. |
required |
convertible
|
set
|
Set, possibly empty, to which the genome position is added if the sequence at a given location matches ''conversion_from''. |
required |
converted_position
|
set
|
Set, possibly, empty, to which the genome position is added if a conversion has occurred. |
required |
coverage
|
set
|
Set, possibly empty, to which the genome position is added for all aligned pairs of a read. |
required |
vcf_file
|
VariantFile
|
VCF file. |
required |
Returns:
| Type | Description |
|---|---|
tuple[set[str], set[str], set[str]]
|
Three sets of the ''convertible'', ''converted_position'' and ''coverage''. |
Source code in isoslam/isoslam.py
count_conversions_across_pairs(forward_read, reverse_read, vcf_file, forward_conversion=None, reverse_conversion=None)
Count conversions across paired reads.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
forward_read
|
dict[str, dict[str, Any]]
|
Aligned segment for forward read. |
required |
reverse_read
|
dict[str, dict[str, Any]]
|
Aligned segment for reversed read. |
required |
vcf_file
|
VariantFile
|
Variant File. |
required |
forward_conversion
|
dict
|
Forward conversion dictionary typically ''{"from": "A", "to": "G"}''. |
None
|
reverse_conversion
|
dict
|
Reverse conversion, typically ''{"from": "T", "to": "C"}''. |
None
|
Returns:
| Type | Description |
|---|---|
tuple[int, int, int]
|
Tuple of the number of convertible base pairs, the number of conversions and the coverage of the paired alignments. |
Raises:
| Type | Description |
|---|---|
ValueError
|
ValueError is raised if either ''forward_conversion'' or ''reverse_conversion'' is ''None''. |
Source code in isoslam/isoslam.py
extract_features_from_pair(pair)
Extract features from a pair of reads.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pair
|
list[AlignedSegment]
|
A list of two aligned segments from |
required |
Returns:
| Type | Description |
|---|---|
dic[str, dict[str, Any]]
|
Returns a nested dictionaries of the |
Source code in isoslam/isoslam.py
extract_features_from_read(read)
Extract start, end and length from an aligned segment read.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
read
|
AlignedSegment
|
An aligned segment read from |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dictionary of |
Source code in isoslam/isoslam.py
extract_segment_pairs(bam_file)
Extract pairs of AlignedSegments from a .bam file.
When there are two adjacent AlignedSegments with the same query_name only the first is paired, subsequent
segments are dropped.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bam_file
|
str | Path
|
Path to a |
required |
Yields:
| Type | Description |
|---|---|
Generator
|
Itterable of paired segments. |
Source code in isoslam/isoslam.py
extract_strand_transcript(gtf_file)
Extract strand and transcript ID data from .gtf file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gtf_file
|
Path | str
|
Path to a 'gtf' file. |
required |
Returns:
| Type | Description |
|---|---|
tuple[dict[str, tuple[str]], dict[str, tuple[str]]]
|
Two dictionaries are returned, one of the |
Source code in isoslam/isoslam.py
extract_transcripts(bed_file)
Extract features from .bed file and return as a dictionary indexed by transcript_id.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bed_file
|
str | Path
|
Path, as string or pathlib Path, to a |
required |
Returns:
| Type | Description |
|---|---|
dict[Any, list[tuple[Any, int, int, Any, Any]]]
|
Dictionary of |
Source code in isoslam/isoslam.py
extract_utron(features, gene_transcript, coordinates)
Extract and sum the utrons based on tag.
ACTION : This function needs better documentation, my guess is that its extracting the transcripts to genes and then getting some related information (what I'm not sure) from the .bed file and adding these up.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
features
|
str
|
A tag from an assigned read. |
required |
gene_transcript
|
TextIO
|
Transcript to gene from a |
required |
coordinates
|
Any
|
Untranslated region coordinates from a |
required |
Returns:
| Type | Description |
|---|---|
list | None
|
List of the length of assigned regions. |
Source code in isoslam/isoslam.py
filter_spliced_utrons(pair_features, blocks, read='read1')
Filter utrons where start is in the block ends or end is in the block start.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pair_features
|
dict[str, dict]
|
Dictionary of extracted features and utron in both read directions. |
required |
blocks
|
dic[str:(dict[str, set])]
|
Nested dictionary of start and ends for each read. Top level is read, with a dictionary of start and end. |
required |
read
|
str
|
Direction of read to filter on, default is ''read1'' but can also use ''read2''. |
'read1'
|
Returns:
| Type | Description |
|---|---|
dict[str, tuple(Any)]
|
Dictionary of the chromosome, start, end and strand of transcripts that are within introns. |
Source code in isoslam/isoslam.py
filter_within_introns(pair_features, blocks, read='read1')
Filter utrons that are within introns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pair_features
|
dict[str, dict]
|
Dictionary of extracted features and utron in both read directions. |
required |
blocks
|
dic[str:(dict[str, set])]
|
Nested dictionary of start and ends for each read. Top level is read, with a dictionary of start and end. |
required |
read
|
str
|
Direction of read to filter on, default is ''read1'' but can also use ''read2''. |
'read1'
|
Returns:
| Type | Description |
|---|---|
dict[str, tuple(Any)]
|
Dictionary of the chromosome, start, end and strand of transcripts that are within introns. |
Source code in isoslam/isoslam.py
remove_common_reads(retained, spliced)
Remove reads that are common to both retained and spliced sets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
retained
|
set[list[Any]]
|
Set of retained reads. Each item is a tuple with ''transcript_id'' and a list of ''start'', ''end'', ''chromosome'' and ''strand''. |
required |
spliced
|
set[list[Any]]
|
Set of retained reads. Each item is a tuple with ''transcript_id'' and a list of ''start'', ''end'', ''chromosome'' and ''strand''. |
required |
Returns:
| Type | Description |
|---|---|
tuple[set[list[Any]], set[list[Any]]]
|
A tuple of the ''retained'' (first) and ''spliced'' reads with common items removed. |
Source code in isoslam/isoslam.py
unique_conversions(reads1, reads2)
Create a unique set of conversions that are to be retained.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
reads1
|
dict[str, list[tuple[Any]]]
|
A dictionary of reads mapped to transcripts (key) which overlap introns. Each read has the ''start'', ''end'', ''chromsome'' and ''strand'' recorded. |
required |
reads2
|
dict[str, list[tuple[Any]]]
|
A dictionary of reads mapped to transcripts (key) which overlap introns. Each read has the ''start'', ''end'', ''chromsome'' and ''strand'' recorded. |
required |
Returns:
| Type | Description |
|---|---|
set[list[Any]]
|
Combines the two sets of observations and de-duplicates them, returning only the unique assigned conversions. |
Source code in isoslam/isoslam.py
zip_blocks(read)
Zip the block starts and ends into two lists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
read
|
AlignedSegment
|
An individual aligned segment read from a ''.bam'' file. |
required |
Returns:
| Type | Description |
|---|---|
tuple[list[int], list[int]]
|
Tuple of two lists of integers the first is start location, the second is the end location. |