Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cleaning paired-end Nextera prepped reads

    These reads were sequenced (2x100) with Hiseq 2500 rapid mode. In the first pictures, the reads as they were when I got them:

    R1




    R2





    And then I ran trim_galore (0.3.7). I used the reverse complement of the 3'-end of the Nextera® transposase sequence as an adapter sequence:

    trim_galore -q 20 --fastqc -a CTGTCTCTTATA --stringency 1 --paired --retain_unpaired --length 50 $1 $2

    And this is what the reads look like now:

    R1:




    R2:




    I assume all the bias that is left in the 5' and 3'-ends is due to the cutting preferences of the Nextera transposase. Is this assumption correct, i.e. are my reads clean?
    savetherhino.org

  • #2
    That's correct. You should be fine, though it does look like you gained a weird bias on your last 1bp. However, fastqc makes it hard to tell if that's 1bp or several. I assume that you are overtrimming the reads by removing even 1p of sequence that matches the adapter, or something like that.

    Comment


    • #3
      Originally posted by Brian Bushnell View Post
      That's correct. You should be fine, though it does look like you gained a weird bias on your last 1bp. However, fastqc makes it hard to tell if that's 1bp or several. I assume that you are overtrimming the reads by removing even 1p of sequence that matches the adapter, or something like that.
      Thanks for the reply. About the 3'-end, I'm indeed most likely overtrimming. In trim_galore:

      Code:
      --stringency <INT> Overlap with adapter sequence required to trim a sequence. Defaults to a very stringent setting of 1, i.e. even a single bp of overlapping sequence will be trimmed off from the 3' end of any read.
      Anyway, relatively little information is lost because of this, so there's no harm. I was more concerned about the 5'-end, since I never saw what trimmed Nextera were supposed to look like. I also thought it was a little bit weird how the Nextera transposase sequence started appearing from the middle of the reads onwards but I guess that's normal..
      savetherhino.org

      Comment


      • #4
        I also thought it was a little bit weird how the Nextera transposase sequence started appearing from the middle of the reads onwards but I guess that's normal
        Your plots indicates that around 15% of library fragments had shorter insert than number of sequencing cycles. If the library size cut off was a bit larger, Nextera adapters would not have been sequenced which would provide more useful data.

        Comment


        • #5
          Originally posted by nucacidhunter View Post
          Your plots indicates that around 15% of library fragments had shorter insert than number of sequencing cycles. If the library size cut off was a bit larger, Nextera adapters would not have been sequenced which would provide more useful data.
          I have to admit that I'm a little bit ignorant on the technical details of the sequencing part. Is library size cut off related to fragment size distribution? Also, what is the relation of fragment sizes and the number of sequencing cycles? Also, the DNA comes from a metagenomic sample which was very small (6.8ng), so overall I'm quite happy about the output..
          Last edited by rhinoceros; 04-24-2015, 01:38 AM.
          savetherhino.org

          Comment


          • #6
            If you have a smaller fragment size than read length, you will get adapter sequence on the 3' -end of your reads. I think you can try running Brians' bbmerge to merge reads by overlapping.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            30 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            32 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            28 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Working...
            X