Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • when do you pre-process Illumina reads before analysis?

    I have some PE Illumina reads that I want to analyze with TopHat.
    By looking at the quality plot, I see some deterioration of quality at the 3' end.

    Is it advisable to trim the reads before feeding them to TopHat? If so, what criteria do I use to decide where to trim? Do I trim all reads at the same length?

    Thanks
    PFS

  • #2
    Originally posted by PFS View Post
    Is it advisable to trim the reads before feeding them to TopHat? If so, what criteria do I use to decide where to trim? Do I trim all reads at the same length
    Although alignments are less likely to break than denovo assembly, i'd still recommend trimming reads (unless the alignment tool itself does it).

    Each read should be trimmed on its own merits, based on the quality score.

    Typically i use an adapter removal step, a hard trim of all 'B' quality bases from the tail, removal of N calls from both ends, and a multi-base sliding window, typically cutting off when the average score per base drops below 10-20, depending on the application.

    I also usually drop reads which have below a certain minimal length after this process (typically something like 36 bases, to give a 40-base read a reasonable chance of survival), since shorter reads are not usually informative. This gives me both paired reads and unpaired reads, where the partner has not survived the cull.

    Comment


    • #3
      Originally posted by tonybolger View Post
      This gives me both paired reads and unpaired reads, where the partner has not survived the cull.
      Thanks tonybolger!

      One more question: when you are left with unpaired reads, do you try to remove them or do you keep them in the analysis and maybe use SAM flags to identify them?

      THANKS
      PFS

      Comment


      • #4
        Originally posted by PFS View Post
        One more question: when you are left with unpaired reads, do you try to remove them or do you keep them in the analysis and maybe use SAM flags to identify them?
        After filtering, i have 4 fastq files per lane, forward paired, reverse paired, forward unpaired and reverse unpaired.

        The pipeline from then on generally treats the paired / unpaired data differently, e.g with alignment tools i'd use paired mode vs single mode, but depending on the purpose, it might not make sense to use the unpaired data at all (e.g. scaffolding). On the other hand, sometimes i treat all the reads as single ended (e.g. verifying denovo assembly, where i don't want the bias of assuming the pairing is correct to force a non-optimal alignment).

        If i'm creating SAM files against a reference, i'll typically end up with 3 - one for the paired data, and one for each of the unpaired data files.

        Comment


        • #5
          Originally posted by tonybolger View Post
          After filtering, i have 4 fastq files per lane, forward paired, reverse paired, forward unpaired and reverse unpaired.

          The pipeline from then on generally treats the paired / unpaired data differently, e.g with alignment tools i'd use paired mode vs single mode, but depending on the purpose, it might not make sense to use the unpaired data at all (e.g. scaffolding). On the other hand, sometimes i treat all the reads as single ended (e.g. verifying denovo assembly, where i don't want the bias of assuming the pairing is correct to force a non-optimal alignment).

          If i'm creating SAM files against a reference, i'll typically end up with 3 - one for the paired data, and one for each of the unpaired data files.
          Hi TonyBolger,

          Please can you tell me what software you use to do the trimming with? And did you write custom scripts to separate the paired vs unpaired into different files?

          Thanks!
          Anelda
          Last edited by Anelda; 04-01-2011, 02:37 AM. Reason: Wrong person addressed

          Comment


          • #6
            Ideally we'd like to be able to leave the data alone and let the aligners use the quality values to determine how best to align the sequences. However in practice we usually just trim off really bad sequence (where the majority of the library has dropped to somewhere close to Q0) since this means we can use more stringent parameters when mapping - which can greatly reduce the time taken to do the mapping. Fortunately these days most runs stay at high quality past 50bp which is enough for the types of experiment we run.

            Comment


            • #7
              Originally posted by tonybolger View Post
              Typically i use an adapter removal step, a hard trim of all 'B' quality bases from the tail, removal of N calls from both ends, and a multi-base sliding window, typically cutting off when the average score per base drops below 10-20, depending on the application.
              Can you please elaborate a little on the sliding window stage?
              What size of window do you use and do you use any existing tool to do it?
              thanks!

              Comment


              • #8
                Originally posted by Anelda View Post
                Hi TonyBolger,

                Please can you tell me what software you use to do the trimming with? And did you write custom scripts to separate the paired vs unpaired into different files?
                It's an all-in-one custom app - which i plan to make publically available (this week if i can get the time) since many people seem to want it.

                You give it the input file(s), and a set of filtering steps, and it creates paired and unpaired output files with the appropriate trimming done.

                Comment


                • #9
                  Originally posted by tonybolger View Post
                  It's an all-in-one custom app - which i plan to make publically available (this week if i can get the time) since many people seem to want it.

                  You give it the input file(s), and a set of filtering steps, and it creates paired and unpaired output files with the appropriate trimming done.
                  Would be great :-))

                  Comment


                  • #10
                    Originally posted by reut View Post
                    Can you please elaborate a little on the sliding window stage?
                    What size of window do you use and do you use any existing tool to do it?
                    Normally i use 4 bases window width, and between 10-20 average quality per base within the window. It's a custom written tool, soon to be made publicly available.

                    Comment


                    • #11
                      thanks

                      thanks, please let us know when you publish the tool, it will be useful for us as well.

                      Comment


                      • #12
                        @Tonybolger
                        Yes, such tool would be nice to have! Thanks in advance!

                        Comment


                        • #13
                          Originally posted by tonybolger View Post
                          It's an all-in-one custom app - which i plan to make publically available (this week if i can get the time) since many people seem to want it.

                          You give it the input file(s), and a set of filtering steps, and it creates paired and unpaired output files with the appropriate trimming done.
                          Ah look, it's been almost a month already

                          Anyway, the Trimmomatic is ready for release.

                          Just one issue, does anyone know if Illumina adapter and other sequences can be included in such a tool? I assume i would need to get specific clearance for this. Otherwise each user would need to find / organise the clipping sequences themselves, which is a bit of a pain.

                          Comment


                          • #14
                            FastQC includes adapters

                            I don't know if you can use the Illumina adapters in your tool,
                            but I do know the FastQC tool by Simon Andrews includes a library of adapters and possible contaminators.
                            If it's of any help...

                            Comment


                            • #15
                              BTW, we've been using fastx for the adapter clipping, N removal and 3' trimming (no window though). works fast and well. The only part missing, that we wrote in-house, is to pass over the files afterwards to see which pairs aren't pairs anymore.

                              Like tonybolger, when reads fall below ~30bp we discard them so some pairs don't stay paired.

                              Our script creates 3 files pair1, pair2 and singles.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X