March 14, 2022 16 min to read

List Comprehensions (Advanced) II/III

This is advanced topic for List Comprehensions.

Welcome back, bioinformatics enthusiasts!

In Part I, we introduced the power of list comprehensions and showed how to simplify tasks like generating lists and validating DNA/RNA sequences. In this post, we’ll explore more advanced applications of list comprehensions, focusing on translating RNA codons to amino acids, generating random sequences, and working with FASTA files.

Translating RNA Codons to Amino Acids

In bioinformatics, translating RNA sequences into their corresponding amino acids is a fundamental task. Using a dictionary of RNA codons (RNA_codon_table), we can map each codon to its amino acid in a clean, Pythonic way.

Function to Translate a Single Codon

Here’s a simple function that uses the dictionary to return the amino acid for a given RNA codon:

from random import randint

# First we will get our RNA_codon_table
RNA_codon_table = {
#                       Second Base
#      U             C             A               G
# U
 'UUU': 'Phe', 'UCU': 'Ser', 'UAU': 'Tyr', 'UGU': 'Cys', # UxU
 'UUC': 'Phe', 'UCC': 'Ser', 'UAC': 'Tyr', 'UGC': 'Cys', # UxC
 'UUA': 'Leu', 'UCA': 'Ser', 'UAA': '---', 'UGA': '---', # UxA
 'UUG': 'Leu', 'UCG': 'Ser', 'UAG': '---', 'UGG': 'Urp', # UxG
# C
 'CUU': 'Leu', 'CCU': 'Pro', 'CAU': 'His', 'CGU': 'Arg', # CxU
 'CUC': 'Leu', 'CCC': 'Pro', 'CAC': 'His', 'CGC': 'Arg', # CxC
 'CUA': 'Leu', 'CCA': 'Pro', 'CAA': 'Gln', 'CGA': 'Arg', # CxA
 'CUG': 'Leu', 'CCG': 'Pro', 'CAG': 'Gln', 'CGG': 'Arg', # CxG
# A
 'AUU': 'Ile', 'ACU': 'Thr', 'AAU': 'Asn', 'AGU': 'Ser', # AxU
 'AUC': 'Ile', 'ACC': 'Thr', 'AAC': 'Asn', 'AGC': 'Ser', # AxC
 'AUA': 'Ile', 'ACA': 'Thr', 'AAA': 'Lys', 'AGA': 'Arg', # AxA
 'AUG': 'Met', 'ACG': 'Thr', 'AAG': 'Lys', 'AGG': 'Arg', # AxG
# G
 'GUU': 'Val', 'GCU': 'Ala', 'GAU': 'Asp', 'GGU': 'Gly', # GxU
 'GUC': 'Val', 'GCC': 'Ala', 'GAC': 'Asp', 'GGC': 'Gly', # GxC
 'GUA': 'Val', 'GCA': 'Ala', 'GAA': 'Glu', 'GGA': 'Gly', # GxA
 'GUG': 'Val', 'GCG': 'Ala', 'GAG': 'Glu', 'GGG': 'Gly'  # GxG
}


def translate_RNA_codon(codon):
    """Returns the amino acid for the given codon."""
    return RNA_codon_table[codon]

Translating Random Codons Using List Comprehension

With the ability to generate random codons from Part I, we can now translate these codons into their corresponding amino acids:

def random_codons_translation(minlength=3, maxlength=10):
    """Generates a random list of amino acids between minimum and maximum length."""
    return [translate_RNA_codon(codon) for codon in random_codons(minlength, maxlength, RNAflag=True)]

Example Output:

print(random_codons_translation(minlength=3, maxlength=10))
# Example Output: ['Leu', 'Ser', 'Tyr', 'Arg']

This function uses list comprehension to iterate over randomly generated codons, translating each one into its respective amino acid.

Handling FASTA Files Efficiently

FASTA files are widely used in bioinformatics to store nucleotide or protein sequences. Let’s dive into how list comprehensions simplify reading and processing these files.

Reading FASTA Strings

The first step is to read sequences from a FASTA file:

def read_FASTA_strings(filename):
    with open(filename) as file:
        return file.read().split(">")[1:]

This function splits the content of the file by the > delimiter, which marks the beginning of a new sequence.

Extracting FASTA Entries

We can use list comprehensions to extract each entry’s header and sequence:

def read_FASTA_entries(filename):
    return [seq.partition("\n") for seq in read_FASTA_strings(filename)]

Here’s how it works:

partition("\n") separates the header (before \n) from the sequence (after \n).
The result is a list of tuples: (header, '\n', sequence).

Formatting FASTA Sequences

To clean up the data and remove newline characters from sequences, use this function:

def read_FASTA_sequences(filename):
    return [[seq[0], seq[2].replace("\n", "")] for seq in read_FASTA_entries(filename)]

This function processes the list of entries:

seq[0] extracts the header.
seq[2].replace("\n", "") removes newline characters from the sequence.

Example Usage:

print(read_FASTA_sequences("seqdump.txt"))
# Example Output: [['Header1', 'ATGCGTACG'], ['Header2', 'GGCTACGTT']]

Combining List Comprehensions for Complex Tasks

What makes list comprehensions powerful is their ability to stack or combine tasks. For example, you can:

Parse a FASTA file.
Translate its sequences into amino acids.
Filter sequences based on specific criteria—all in one clean, Pythonic expression.

Here’s a quick example:

def translate_FASTA(filename):
    fasta_sequences = read_FASTA_sequences(filename)
    return [
        [header, [translate_RNA_codon(seq[i:i+3]) for i in range(0, len(seq), 3)]]
        for header, seq in fasta_sequences
    ]

This function:

Reads sequences from a FASTA file.
Translates each sequence into amino acids by splitting it into codons.
Returns a list of headers and their corresponding amino acid sequences.

Takeaways

Key Benefits of List Comprehensions

Conciseness: Simplify multi-step operations into one-liners.
Readability: Make complex operations easy to understand at a glance.
Performance: Optimize tasks by reducing intermediate operations.

Applications in Bioinformatics

Sequence translation.
FASTA file parsing.
Generating random DNA/RNA sequences for testing.

What’s Next?

Ready to take your list comprehension skills even further? Head over to Part III: Advanced List Comprehensions where we’ll explore more complex applications in genomic data analysis. You won’t want to miss these powerful techniques! 🧬

As always, feel free to comment below with questions or ideas. Let us know how you’re using Python to make breakthroughs in your research!

Happy coding! 🚀

← Previous Next →

Bioinformatics Guy

List Comprehensions (Advanced) II/III

Translating RNA Codons to Amino Acids

Function to Translate a Single Codon

Translating Random Codons Using List Comprehension

Handling FASTA Files Efficiently

Reading FASTA Strings

Extracting FASTA Entries

Formatting FASTA Sequences

Combining List Comprehensions for Complex Tasks

Takeaways

Key Benefits of List Comprehensions

Applications in Bioinformatics

What’s Next?

Set, Dictionary and Conditional Comprehensions III/III

Ali Hassan

Comments

List Comprehensions (Advanced) II/III

Translating RNA Codons to Amino Acids

Function to Translate a Single Codon

Translating Random Codons Using List Comprehension

Handling FASTA Files Efficiently

Reading FASTA Strings

Extracting FASTA Entries

Formatting FASTA Sequences

Combining List Comprehensions for Complex Tasks

Takeaways

Key Benefits of List Comprehensions

Applications in Bioinformatics

What’s Next?

Set, Dictionary and Conditional Comprehensions III/III

Don't go yet!

PICKLING

Reading and writing files with Python.

Ali Hassan

Comments