List Comprehensions (Advanced) II/III

This is advanced topic for List Comprehensions.

Welcome back, bioinformatics enthusiasts!

In Part I, we introduced the power of list comprehensions and showed how to simplify tasks like generating lists and validating DNA/RNA sequences. In this post, we’ll explore more advanced applications of list comprehensions, focusing on translating RNA codons to amino acids, generating random sequences, and working with FASTA files.

Translating RNA Codons to Amino Acids

In bioinformatics, translating RNA sequences into their corresponding amino acids is a fundamental task. Using a dictionary of RNA codons (RNA_codon_table), we can map each codon to its amino acid in a clean, Pythonic way.

Function to Translate a Single Codon

Here’s a simple function that uses the dictionary to return the amino acid for a given RNA codon:

from random import randint

# First we will get our RNA_codon_table
RNA_codon_table = {
#                       Second Base
#      U             C             A               G
# U
 'UUU': 'Phe', 'UCU': 'Ser', 'UAU': 'Tyr', 'UGU': 'Cys', # UxU
 'UUC': 'Phe', 'UCC': 'Ser', 'UAC': 'Tyr', 'UGC': 'Cys', # UxC
 'UUA': 'Leu', 'UCA': 'Ser', 'UAA': '---', 'UGA': '---', # UxA
 'UUG': 'Leu', 'UCG': 'Ser', 'UAG': '---', 'UGG': 'Urp', # UxG
# C
 'CUU': 'Leu', 'CCU': 'Pro', 'CAU': 'His', 'CGU': 'Arg', # CxU
 'CUC': 'Leu', 'CCC': 'Pro', 'CAC': 'His', 'CGC': 'Arg', # CxC
 'CUA': 'Leu', 'CCA': 'Pro', 'CAA': 'Gln', 'CGA': 'Arg', # CxA
 'CUG': 'Leu', 'CCG': 'Pro', 'CAG': 'Gln', 'CGG': 'Arg', # CxG
# A
 'AUU': 'Ile', 'ACU': 'Thr', 'AAU': 'Asn', 'AGU': 'Ser', # AxU
 'AUC': 'Ile', 'ACC': 'Thr', 'AAC': 'Asn', 'AGC': 'Ser', # AxC
 'AUA': 'Ile', 'ACA': 'Thr', 'AAA': 'Lys', 'AGA': 'Arg', # AxA
 'AUG': 'Met', 'ACG': 'Thr', 'AAG': 'Lys', 'AGG': 'Arg', # AxG
# G
 'GUU': 'Val', 'GCU': 'Ala', 'GAU': 'Asp', 'GGU': 'Gly', # GxU
 'GUC': 'Val', 'GCC': 'Ala', 'GAC': 'Asp', 'GGC': 'Gly', # GxC
 'GUA': 'Val', 'GCA': 'Ala', 'GAA': 'Glu', 'GGA': 'Gly', # GxA
 'GUG': 'Val', 'GCG': 'Ala', 'GAG': 'Glu', 'GGG': 'Gly'  # GxG
}


def translate_RNA_codon(codon):
    """Returns the amino acid for the given codon."""
    return RNA_codon_table[codon]

Translating Random Codons Using List Comprehension

With the ability to generate random codons from Part I, we can now translate these codons into their corresponding amino acids:

def random_codons_translation(minlength=3, maxlength=10):
    """Generates a random list of amino acids between minimum and maximum length."""
    return [translate_RNA_codon(codon) for codon in random_codons(minlength, maxlength, RNAflag=True)]

Example Output:

print(random_codons_translation(minlength=3, maxlength=10))
# Example Output: ['Leu', 'Ser', 'Tyr', 'Arg']

This function uses list comprehension to iterate over randomly generated codons, translating each one into its respective amino acid.

Handling FASTA Files Efficiently

FASTA files are widely used in bioinformatics to store nucleotide or protein sequences. Let’s dive into how list comprehensions simplify reading and processing these files.

Reading FASTA Strings

The first step is to read sequences from a FASTA file:

def read_FASTA_strings(filename):
    with open(filename) as file:
        return file.read().split(">")[1:]

This function splits the content of the file by the > delimiter, which marks the beginning of a new sequence.

Extracting FASTA Entries

We can use list comprehensions to extract each entry’s header and sequence:

def read_FASTA_entries(filename):
    return [seq.partition("\n") for seq in read_FASTA_strings(filename)]

Here’s how it works:

Formatting FASTA Sequences

To clean up the data and remove newline characters from sequences, use this function:

def read_FASTA_sequences(filename):
    return [[seq[0], seq[2].replace("\n", "")] for seq in read_FASTA_entries(filename)]

This function processes the list of entries:

Example Usage:

print(read_FASTA_sequences("seqdump.txt"))
# Example Output: [['Header1', 'ATGCGTACG'], ['Header2', 'GGCTACGTT']]

Combining List Comprehensions for Complex Tasks

What makes list comprehensions powerful is their ability to stack or combine tasks. For example, you can:

Here’s a quick example:

def translate_FASTA(filename):
    fasta_sequences = read_FASTA_sequences(filename)
    return [
        [header, [translate_RNA_codon(seq[i:i+3]) for i in range(0, len(seq), 3)]]
        for header, seq in fasta_sequences
    ]

This function:

Takeaways

Key Benefits of List Comprehensions

Applications in Bioinformatics

What’s Next?

Ready to take your list comprehension skills even further? Head over to Part III: Advanced List Comprehensions where we’ll explore more complex applications in genomic data analysis. You won’t want to miss these powerful techniques! 🧬

As always, feel free to comment below with questions or ideas. Let us know how you’re using Python to make breakthroughs in your research!

Happy coding! 🚀

← Previous Next →