Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 454 and homopolymers

    I am planning to use CORTEX to assemble 454 data and I want to use the cut-homopolymer option but I am having trouble deciding at what size homopolymers are a problem for 454. Seems to me like it would somewhere in the 4-6 base range, however I would like to disrupt my reads as little as possible.

    Does anyone has some insight into this? Has anyone looked at error rates as a function of homopolymer length in 454?

    Thanks,
    David

  • #2
    Hi there

    I wrote Cortex, so I know about that, but I have somewhat limited 454 experience.
    If you want to be systematic, you can

    1. load in your reads multiple times, and each time use a different homopolymer threshold, and use the --dump_filtered_readlen_distribution option to dump a file showing how it affects your read lengths. That tells you how much read-length you are throwing away

    2. Theres the issue of 454 homopolymer errors, and at what length they are prevalent. I can't help with that to be honest.

    If you are just making variant calls, as I was when I did this, I would use a limit of 3 and see how I go, but that's quite conservative.

    Comment


    • #3
      Originally posted by Zam View Post
      Hi there

      I wrote Cortex, so I know about that, but I have somewhat limited 454 experience.
      If you want to be systematic, you can

      1. load in your reads multiple times, and each time use a different homopolymer threshold, and use the --dump_filtered_readlen_distribution option to dump a file showing how it affects your read lengths. That tells you how much read-length you are throwing away

      2. Theres the issue of 454 homopolymer errors, and at what length they are prevalent. I can't help with that to be honest.

      If you are just making variant calls, as I was when I did this, I would use a limit of 3 and see how I go, but that's quite conservative.
      I wrote a script to do #1 and if you set the threshold to 3 or 4, it seems like a lot of reads get fragmented beyond use. I was hoping to balance the threshold with the desire to keep as many reads as possible. It is hard to do this, however, without knowing at what length the error rate explodes for homopolymers. I suppose I could determine this but I am hoping someone on seqanswers has some experience here.

      Comment


      • #4
        Fair enough - good luck!

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 11:49 AM
        0 responses
        13 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-24-2024, 08:47 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        61 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Working...
        X