Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Quality trimming & filtering illumina reads

    Hi,

    I have a illumina MiSeq data set, 32GB size genome, 300bp reads. Quality of reads degrades towards the 3' end in both R1 & R2, more in R2. I want to align reads to its reference using BWA-mem and later proceed in to variant calling using GATK pipeline.

    I decided to do quality trimming of poor quality bases. I used Trimmomatic with window size 5, avg quality 20 and filtered reads <70bp. Are these parameters too stringent?

    FASTQC reports for raw reads and trimmed reads are attached.

    Output of paired data sets from Trimmomatic recovered 82% for both R1 & R2. Unpaired sets were 8% and 1% for R1 & R2 respectively. In this case is it ok to disregard unpaired sets in the mapping step?

    Based on my raw data is it advisable to straight away move on to mapping & skip trimming?

    How could I verify that my mapping is satisfactory? Would you recommend any tool to check mapping quality?

    Appreciate comments on these isues.

    Thanks
    Best Regards
    Rangika
    Attached Files

  • #2
    Quality-trimming prior to mapping is not usually a good idea at a high level like Q20. If you want to do quality-trimming, something like Q6 is more appropriate. But that's not necessary unless you are using an aligner that is intolerant of errors. In general, every additional base will improve alignment accuracy. Variant-callers take base quality into consideration and should not make spurious calls from low-quality bases.

    Also, I suggest avoiding Trimmomatic because it needlessly generates multiple output files. The process is much easier if reads are maintained in a paired configuration, which BBDuk will do.

    There's no easy way to check mapping quality for real data (it's easy for synthetic data, though). You can either rigorously test and manually verify mappings, or have faith in the tools you are using.
    Last edited by Brian Bushnell; 09-09-2016, 08:55 AM.

    Comment


    • #3
      Thank you Brian. I will proceed & see as you have suggested.

      Comment


      • #4
        If you are worried about your base qualities (tbh the ends of your R2 are a bit low), bootstrapped BQSR may be useful:


        I'm assuming there isn't a reference SNP database for your organism. Also, no real reason to discard unpaired reads IMO.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin


          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
          Yesterday, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        54 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        50 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        44 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        55 views
        0 likes
        Last Post seqadmin  
        Working...
        X