Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cut 5' and 3' ends of paired-ends reads

    Dear all,

    I have 2x250 paired-end reads for de novo assembly of bacterial genomes.

    The reads were processed with cutadapt to remove Illumina adaptors.

    In FastQC I see that I should remove some (~10) bases from the beggining of the reads (both reads in the pair) because of base distribution at this region, even though quality is fine. I also wish to cut the end of the reads dues to low quality; ~10 bases on the first read and ~40 on the second one. This is almost constant in all genomes sequenced.

    In the end I wish to filter out too short reads.

    I went through cutadapt, trimmomatic and a couple of other tools, but I cannot find how I could cut a defined number of bases from both ends from both reads at the same time as to keep reads paired.

    Suggestions would be much appreciated. Any other ideas on how to process these reads?

    Thanks!

  • #2
    The bias in the first bases of the reads from libraries generated with random hexaprimers has been documented, and discussed over and over again. Do not cut them! You will just be discarding perfectly good bases.
    Bridged amplification & clustering followed by sequencing by synthesis. (Genome Analyzer / HiSeq / MiSeq)


    With Trimmomatic, you have the option of setting the minimum quality of the leading or trailing bases, with the options LEADING and TRAILING. It's true that there doesn't seem to be an option to cut a specified number of bases off the tail. There is only an option for the head with HEADCROP. But, it just makes so much more sense to trim by quality score anyway. Unless, you are using an aligner that absolutely requires all reads to have the same length.

    Frankly, I would just use the example command given in the Trimmomatic manual, and only change the minimum length, given that you will want to keep only reads long enough to do a proper assembly.

    With Cutadapt, you do have the option --cut which will allow you to specify the number of reads you want to trim off the 5' and 3' ends. Again, it is preferable to trim by quality unless your assembler requires all reads to be of the same length, which is generally not the case.

    There is also BBDuk, written by Brian Bushnell, an active member of this forum, which seems to have just about every option imaginable.

    Comment


    • #3
      The trimmomatic command CROP removes bases from the 3' end of the reads.

      Comment


      • #4
        @mastal is correct.

        The parameters are a bit different though from HEADCROP. Rather than specifying the number of bases to cut, you specify the read length after cutting.

        Comment


        • #5
          Thank you for the reply.

          I guess you are right and I should work on quality trimming.

          Our libraries however were prepared by mechanical shearing of gDNA, not the usual random hexamer protocol in Nextera, if that is what you meant.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          49 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          66 views
          0 likes
          Last Post seqadmin  
          Working...
          X