Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • paired-end adapter trimming

    Hi all,
    Sorry for the two frequent posts in a row.
    I have paired-end RNA-Seq data generated from Illumina GA as well as HiSeq. I obtained this data from SRA (NCBI).
    I am not sure if adapter trimming is a regular practice (must-do) before I can go for the reference genome alignment of these reads.
    How would I know if I needed adapter trimming?

    Thanks

  • #2
    If you run FastQC, it will tell you whether some of the overrepresented sequences in the sample correspond to known illumina adapters.

    Comment


    • #3
      If possible, you should compare results with or without adapter trimming. I have found little usage of trimming adapters, even with samples with a high percentage of adapters present (>10%). Trimmomatic will do the job for you of trimming.

      Comment


      • #4
        Hi all,
        Thanks for your quick suggestions. I think I figured out the problem. Adapter trimming is needed mainly if insert size very low (lower than the number of sequencing cycles or read length). On the 5' end sequencing starts with the first base of the actual RNA/cDNA sequence and the 5' adapter+primer act as primer binding site for the sequencing primer.

        Comment


        • #5
          Originally posted by kga1978 View Post
          If possible, you should compare results with or without adapter trimming. I have found little usage of trimming adapters, even with samples with a high percentage of adapters present (>10%). Trimmomatic will do the job for you of trimming.
          Agreed - adapter rimming isn't so critical for any kind of reference-based alignments, including RNA-seq. Usually the adapter-containing read will fail to align, and even if it is partially useful data, the useful part may well be too short to reliably align. I tend to do it anyway, since clearing out the obvious junk speeds up the alignment step, and usually get a small benefit, but it is pretty marginal.

          For de-novo work though, adapter trimming is essential.

          And glad you're finding trimmomatic useful.

          Comment


          • #6
            Do you do revers complement of the adapters before trimming: For example do I have to revers complement this adapter:

            TruSeq Adapter, Index 1
            5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG

            Comment


            • #7
              Originally posted by Palgrave View Post
              Do you do revers complement of the adapters before trimming: For example do I have to revers complement this adapter:

              TruSeq Adapter, Index 1
              5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG
              No - trimmomatic looks only for the sequence provided, and if appropriately named, only in the forward or reverse read. This is both for performance reasons and to allow user control of exactly what is removed.

              Whether you should look for the reverse-complement is a more complex question - depending on how the sequence is used during library prep, it may be extremely unlikely (or quite likely) to end up in a reverse-complement state. In this case, there is a trade-off between removing a small number of genuine occurrences and removing good data which is merely 'adapter-like'.

              Generally I would advise focussing on getting the most common adapter / occurrence combinations out of the data, but not trying too hard to get every last one, at the cost of real data.

              Comment


              • #8
                Thanks for posting here, Tony! I was wondering about the "palindromic" mode of Trimmomatic. I have assumed that in my particular experiment, we will have read into the TruSeq indexed adapter in some cases (for the forward read). Accordingly, I tried to tag the corresponding sequences in my ILLUMINACLIP file with "Prefix" at the start, and "/1" at the end of the name:

                >Prefix75_TruSeq Adapter, Index 1/1
                (some sequence ...)

                This is how I interpreted the instructions at http://www.usadellab.org/cms/index.php?page=trimmomatic

                However, when I run Trimmomatic using this file, I get the following output: (amongst other things)

                ILLUMINACLIP: Using 0 prefix pairs, 148 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences

                I had been expecting that the sequences I had tagged with "Prefix" and "/1" would have been tagged as "forward-only" sequences. Now all the removal is done in 'simple' mode (which is no disaster, of course.)

                Also, the manual seems to talk about pairs of sequences:

                For 'Palindrome' clipping, the sequence names should both start with 'Prefix', and end in '/1' for the forward adapter and '/2' for the reverse adapter
                Is the idea that you feed Trimmomatic pairs of adapters where /1 is the one you expect to read into in the forward read, and /2 the one you expect to read into in the reverse read? In that case, do these sequences have to have identical names (module the /1 or /2), or do they get paired just based on /2 appearing after /1 in the ILLUMINACLIP file?

                In my case, should I have included a /2 sequence expected to appear in the reverse read for each of my /1 sequences?

                I hope the questions weren't too unclear?
                Last edited by kopi-o; 03-27-2012, 04:39 AM. Reason: grammar

                Comment


                • #9
                  Originally posted by kopi-o View Post
                  Is the idea that you feed Trimmomatic pairs of adapters where /1 is the one you expect to read into in the forward read, and /2 the one you expect to read into in the reverse read? In that case, do these sequences have to have identical names (module the /1 or /2), or do they get paired just based on /2 appearing after /1 in the ILLUMINACLIP file?
                  The reads are paired by name - PrefixX/1 goes with PrefixX/2, PrefixY/1 goes with PrefixY/2. Order within the adapter file is ignored.

                  And most important: the prefix sequences are the sequences effectively ligated 'before' that read, not the sequence which is found within that read (which is always the 'opposite' adapter in a read-through scenario). In palindrome mode, Trimmomatic does an 'in silico' ligation of the prefixes, and attempts to semi-globally align the resulting forward and reverse 'prefix+read' sequences.

                  In my case, should I have included a /2 sequence expected to appear in the reverse read for each of my /1 sequences?
                  Indeed - palindrome mode requires 'matched' pairs of prefix sequences. Since illumina pairs are almost always of equal length, both adapters should be present in such pairs, and thus the read-through scenario recognised with greater confidence.

                  I hope the questions weren't too unclear?
                  Not at all, but apparently my manual page needs work
                  Last edited by tonybolger; 03-27-2012, 05:01 AM.

                  Comment


                  • #10
                    I am using cutadapt to remove adapters. Should I expect adapters at both ends or just at the 3' end of my paired-end reads? I got 2,5% adapters when trimming the 3' end.

                    Comment


                    • #11
                      I am not sure about other technilogies, but for Illumina 5' end sequencing cycle starts right from the start (5' end) of the actual sequence. So adapters sequence contamination in the final read would be only on 3' end.

                      Comment


                      • #12
                        Originally posted by tonybolger View Post
                        The reads are paired by name - PrefixX/1 goes with PrefixX/2, PrefixY/1 goes with PrefixY/2. Order within the adapter file is ignored.

                        And most important: the prefix sequences are the sequences effectively ligated 'before' that read, not the sequence which is found within that read (which is always the 'opposite' adapter in a read-through scenario). In palindrome mode, Trimmomatic does an 'in silico' ligation of the prefixes, and attempts to semi-globally align the resulting forward and reverse 'prefix+read' sequences.


                        Indeed - palindrome mode requires 'matched' pairs of prefix sequences. Since illumina pairs are almost always of equal length, both adapters should be present in such pairs, and thus the read-through scenario recognised with greater confidence.


                        Not at all, but apparently my manual page needs work
                        Tony, truly I appreciate your help on this forum. I still am confused about which adapter should be assigned to each /1 and /2 in the palidromic mode. Could you please provide a sample file, even if it is just a mock without the real adapters, of how to construct the contaminants.fa file?

                        Am I correct in thinking that, with Illumina TruSeq libraries, the first round of sequencing could potentially yield a read through to the reverse complement of the indexed adapter? The second round of sequencing could potentially yield a read-through to the reverse complement of the universal adapter? Please correct me if my understanding of the technology is incorrect.

                        Does this mean that we should make a unique universal adapter sequence for each index, i.e., copy and rename it for each indexed adaptor?

                        example:

                        >PrefixIndex1/1
                        NNNNNNNNNNNNNNNNNNNN <------ where this is the reverse complement of the index1 adaptor
                        >PrefixIndex1/2
                        NNNNNNNNNNNNNNNN <------- where this is the reverse complement of the universal adapter
                        >PrefixIndex2/1
                        NNNNNNNNNNNNNNNNNNNN <------ where this is the reverse complement of the index2 adaptor
                        >PrefixIndex2/2
                        NNNNNNNNNNNNNNNN <------- where this is the reverse complement of the universal adapter

                        ...

                        and so on, for each of the indexed adapters?

                        And then, for each of the adaptors, do we also need to include a separate set if we want to search for them in "simple" mode?

                        Comment


                        • #13
                          Originally posted by safay View Post
                          Tony, truly I appreciate your help on this forum. I still am confused about which adapter should be assigned to each /1 and /2 in the palidromic mode. Could you please provide a sample file, even if it is just a mock without the real adapters, of how to construct the contaminants.fa file?
                          The suggested pair for TruSeq3 is:
                          >PrefixPE/1
                          ACACTCTTTCCCTACACGACGCTCTTCCGATCT
                          >PrefixPE/2
                          TGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
                          but i would also suggest lowering the palindrome threshold from 40 to 30 (since the adapters from the previous protocol were longer, a higher threshold could be reached).

                          Originally posted by safay View Post
                          Am I correct in thinking that, with Illumina TruSeq libraries, the first round of sequencing could potentially yield a read through to the reverse complement of the indexed adapter? The second round of sequencing could potentially yield a read-through to the reverse complement of the universal adapter? Please correct me if my understanding of the technology is incorrect.
                          This is correct, though i haven't had a coffee yet.

                          Originally posted by safay View Post
                          Does this mean that we should make a unique universal adapter sequence for each index, i.e., copy and rename it for each indexed adaptor?
                          It's not actually necessary - using just the 'common' part of all the indexed adapters (between the 'useful' DNA and the index) seems to be sufficient.

                          Originally posted by safay View Post
                          And then, for each of the adaptors, do we also need to include a separate set if we want to search for them in "simple" mode?
                          Yes, but there's still the 'if you want to search for them' part.

                          It probably makes sense to search for the pcr primer sequences, but i'm not sure what other technical sequences occur regularly. I would suggest looking for over-represented sequences using say FastQC, and trimming relatively selectively, rather than adopting a brute-force strategy.

                          It's a balance - removing valid data which looks a bit like a technical sequence (whether caused by having far too many technical sequences or thresholds too low) is at least as bad as leaving true technical sequences in there, since you potentially lose all coverage of a specifc region.

                          Comment


                          • #14
                            Trim adapter index using CLCBio

                            Does anyone here have experienced overrepresented sequences by Tru-seq adapter in their samples data? what to do with this data? should trim the adapter or not? if yes, how? I'm using CLCBio.

                            Comment


                            • #15
                              Originally posted by azleen View Post
                              Does anyone here have experienced overrepresented sequences by Tru-seq adapter in their samples data?
                              This is a pretty normal situation, especially for libraries with short insert sizes, and/or less than perfect size selection.

                              Originally posted by azleen View Post
                              what to do with this data? should trim the adapter or not?
                              Pretty much every possible use of NGS benefits from trimming adapters.

                              Originally posted by azleen View Post
                              if yes, how? I'm using CLCBio.
                              Presumably it supports it, but you also have the choice of many free tools

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              7 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              7 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X