Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cut 5' and 3' ends of paired-ends reads

    Dear all,

    I have 2x250 paired-end reads for de novo assembly of bacterial genomes.

    The reads were processed with cutadapt to remove Illumina adaptors.

    In FastQC I see that I should remove some (~10) bases from the beggining of the reads (both reads in the pair) because of base distribution at this region, even though quality is fine. I also wish to cut the end of the reads dues to low quality; ~10 bases on the first read and ~40 on the second one. This is almost constant in all genomes sequenced.

    In the end I wish to filter out too short reads.

    I went through cutadapt, trimmomatic and a couple of other tools, but I cannot find how I could cut a defined number of bases from both ends from both reads at the same time as to keep reads paired.

    Suggestions would be much appreciated. Any other ideas on how to process these reads?

    Thanks!

  • #2
    The bias in the first bases of the reads from libraries generated with random hexaprimers has been documented, and discussed over and over again. Do not cut them! You will just be discarding perfectly good bases.
    Bridged amplification & clustering followed by sequencing by synthesis. (Genome Analyzer / HiSeq / MiSeq)


    With Trimmomatic, you have the option of setting the minimum quality of the leading or trailing bases, with the options LEADING and TRAILING. It's true that there doesn't seem to be an option to cut a specified number of bases off the tail. There is only an option for the head with HEADCROP. But, it just makes so much more sense to trim by quality score anyway. Unless, you are using an aligner that absolutely requires all reads to have the same length.

    Frankly, I would just use the example command given in the Trimmomatic manual, and only change the minimum length, given that you will want to keep only reads long enough to do a proper assembly.

    With Cutadapt, you do have the option --cut which will allow you to specify the number of reads you want to trim off the 5' and 3' ends. Again, it is preferable to trim by quality unless your assembler requires all reads to be of the same length, which is generally not the case.

    There is also BBDuk, written by Brian Bushnell, an active member of this forum, which seems to have just about every option imaginable.

    Comment


    • #3
      The trimmomatic command CROP removes bases from the 3' end of the reads.

      Comment


      • #4
        @mastal is correct.

        The parameters are a bit different though from HEADCROP. Rather than specifying the number of bases to cut, you specify the read length after cutting.

        Comment


        • #5
          Thank you for the reply.

          I guess you are right and I should work on quality trimming.

          Our libraries however were prepared by mechanical shearing of gDNA, not the usual random hexamer protocol in Nextera, if that is what you meant.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Recent Advances in Sequencing Analysis Tools
            by seqadmin


            The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
            05-06-2024, 07:48 AM
          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 02:46 PM
          0 responses
          11 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-07-2024, 06:57 AM
          0 responses
          13 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-06-2024, 07:17 AM
          0 responses
          17 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-02-2024, 08:06 AM
          0 responses
          23 views
          0 likes
          Last Post seqadmin  
          Working...
          X