Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adapter_and_kmer_trimming

    Hi everyone,

    I am using publicly available, 51 bp paired-end RNA seq data and I have some questions concerning the quality trimming of the data before passing them to Tophat2 for mapping.

    Specifically I do not know which adapters were used, so I used fastqc and then trim_galore to remove the default illumina adapter "AGATCGGAAGAGC" and one overrepresented sequence "CTTTGTGTTTGATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT".

    It is really important to remove as much adapter contamination as possible because my analysis has to do with discovering variations that may correspond to RNA editing, rather than studying gene expression.

    So my questions are:

    1) I am still getting 3 kmers in numbers ranging from 500 to 1800 that can be found within illumina adapters, and are reported to be in the middle of the read's length (positions 20, 34 and 41). Each one is found in a different adapter and an RNA PCR Index primer.

    Should I use trim_galore to remove these kmers from my reads?

    kmers in fastqc found in illumina adapters, marked in red boxes:

    Click image for larger version

Name:	SRR1524292_1_val_1_fastqc_kmers.png
Views:	1
Size:	115.3 KB
ID:	308961

    2) I have already performed removal with trim_galore for these kmers and trimming to improve Per base sequence content.

    However the kmer GTACGTA appears in my fastqc report, and this kmer can be found in the TruSeq Adapter, Index 22. This adapter begins with GATCGGAAGAGC and should have been removed during the first step of trim_galore --illumina.

    Should this kmer be removed as well?

    Generally is it possible for kmers to be found within illumina adapters by chance?

    3) After applying trim_galore --illumina the Per base sequence content of the 3 prime end of the reads is starting to show divergence, which gets worse every time I remove a sequence.

    Is this because of the different length of the reads because some are trimmed more than others? (read length 51 to read length 20-51)

    Should I trim the 3 prime end of the reads in this case?

    Data before trimming: Click image for larger version

Name:	Per_base_sequence_content_SRR1524292_1.png
Views:	1
Size:	35.7 KB
ID:	308962

    Data after trim_galore --illumina: Click image for larger version

Name:	Per_base_sequence_content_SRR1524292_1_after_trim_galore--illumina.png
Views:	1
Size:	31.6 KB
ID:	308963

    Data after trim_galore a)--illumina, b)kmers and c)overrexpressed sequence: Click image for larger version

Name:	Per_base_sequence_content_after_removing_--illumina_kmers_overrepresented_seq.png
Views:	1
Size:	41.5 KB
ID:	308964

    4) My last question is: Is tophat2 going to have a problem in alighning paired end reads with length ranging from 20 to 32?

  • #2
    I would recommend trying bbduk from BBMap. @Brian includes all common adapters you are likely to run into and they are included in the "resources" directory in BBMap download and will be scanned at the same time without you having to provide them ad hoc.

    I would not worry about the kmers (unless you see an issue after alignment) since they may be real part of the data.

    While you use TopHat, go ahead and try BBMap (it is splice aware) as an alternate aligner.

    Comment


    • #3
      Thank you GenoMax, I will be sure to check out BBMap.

      I would not worry about the kmers (unless you see an issue after alignment) since they may be real part of the data.
      So your opinion is that the presence of at least some of these kmers in illumina adapters is random? Or that it will pose no problem?

      Comment


      • #4
        TrimGalore appears to be overly aggressive in trimming the ends of the reads, by trimming down to only a few bp match at the very end, or something similar. BBDuk's recommended default of "mink=11" avoids this by using a minimum of an 11bp sequence match at the end. The histogram of the raw data showed no evidence of adapter contamination, but I still recommend trimming, since there's always some. Just, not with such aggressive settings, as they will introduce bias.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Advances in Sequencing Analysis Tools
          by seqadmin


          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
          Today, 07:48 AM
        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 07:17 AM
        0 responses
        7 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-02-2024, 08:06 AM
        0 responses
        19 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-30-2024, 12:17 PM
        0 responses
        20 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-29-2024, 10:49 AM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Working...
        X