Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Here is the histogram:



    So the aberrant inserts of approx 300bp are clearly caused by an error in how Bowtie classifies insert length.

    I do think this might be for those inserts in the 280-300bp range that fail to have adapters trimmed in the FastQ. What do you think?

    If so, it would explain why there is a dip in apparent insert lenghts in the 280-300bp range, an the anomalous peak at 300bp.

    Comment


    • #17
      Since you now have BBMap installed you can easily check with BBDuk to see how many reads still have adapter contamination. I assume you have not done any trimming on these reads. Specify appropriate adapter file when you trim. Standard illumina adapter files are in "/path_to/bbmap-xx.xx/bbmap/resources/".
      Last edited by GenoMax; 02-05-2015, 06:11 AM.

      Comment


      • #18
        Trimming was performed automatically by Illumina prior to FASTQ download, but I suspect there will be short tails left on a subset of fragments.

        Comment


        • #19
          No harm in trying a pass through BBDuk to verify (specially if you want to get rid of those tails).

          Comment


          • #20
            Initial:
            Memory: free=1039m, used=21m

            Added 16767 kmers; time: 0.225 seconds.
            bbduk output:

            Memory: free=1032m, used=28m

            Input is being processed as paired
            Started output streams: 0.010 seconds.
            Processing time: 14.997 seconds.

            Input: 2546118 reads 645049609 bases.
            KTrimmed: 2543559 reads (99.90%) 633504163 bases (98.21%)
            Result: 467188 reads (18.35%) 11545446 bases (1.79%)

            Time: 15.238 seconds.
            Reads Processed: 2546k 167.09k reads/sec
            Bases Processed: 645m 42.33m bases/sec

            Comment


            • #21
              Hmmm..I don't think I have bbduk working correctly.

              Original Fastq file:

              @M00561:19:000000000-ABAUW:1:1101:10946:1435 1:N:0:9
              TCTCCCTTTTATCTTTACATACTGTCGTTCATTATCCTCTTATCTTATCAAACCTTGCTTTTCATCTTTCTTTTTTTTTTTTTTCTTTTCTCTTTCTTTTCTCCTTTCTACCCTCTATTTTGTTTTCTTTTTTTCTTCAGTTTTTCATCTTTTATTATTCTCTTTTATCTCTGTTTGGATAGCGCCTGACTGAAGTTATAACTTCGGTCATGTTATCTGT
              +
              CBCCCFFFFE9FAFFG9@,C,C@,CC8FF<8CC,CC@F,CC,C,;C,C@,,,;6CE,6,<CC@,,;<CC@,<C,,+78@++7++6<C,<6<,,,<<C?,,5,,,,,<,,,9,,:9,,,9,,,::,:@???<=++8,88,,,,:<8,,,5,,,:,,:,,,5,,,::A,,575,,:8,,+,,,,3+38++,8>,,,,,,,,,,7>@=**66*,,77,@@>,,
              @M00561:19:000000000-ABAUW:1:1101:13213:1452 1:N:0:9
              CCGTATTACCTGCCGCATCATTGTGAGTTGAAGATACATGTGCGGTTGATTTTATCTGGCTAGGCTACGTATTTCTATTTTTTTTCTCCCTTTCTTTTCCTTTTTTCTTTTTTTTTTCTTTTTTCTTTTCTTCTTCTGTTTTTTTTTATTTTTTCTTTTCTTACTCTTTTACTCCAGTGCCTTCAGATGTCTTTTTCTTCGCATTTTCCATTCTTTTTTATTTTTCCTATTTCCTTTATTTCCTCTACCCATCGTTAATATCACTTTCGTTTACTTTCACGTTAGGTTATACCG
              +
              8@<A9EF-C@C<AC+@+B@,CF9C9,,CF,,,,,C,C,C,<,6,+BF+,ECEF,C@E,,;C,,,6,,;,,,,,,69,<,9,,8++:96,::,,6<CC,969,,9,+4,,,<9+++4+4<,,9:@4,<8<8<,,,,5,,<,8,:6++8,::B,,+5,:,:7>,,7,7@,,7,7@>3,,7,77,77,,,7,,7@@<DB>,,6***66@,42,66,6,6,,3*5++5+35++++++3++5+5+5+3+54+2+*+30**2**+**0*2::C***09*2*2*1***0*)))19*2)/*)
              @M00561:19:000000000-ABAUW:1:1101:15428:1464 1:N:0:9
              CACCACCTCTTTTCATGGTACCATTTGCACGCTCCAAACTTGCATAGTGACCTTTTTCGATTAATTGACCAAAGTCAACATTATAACCGTCCTTTTTTTCATCCCTCTTTTCCTCCTCTCTCTTCTCCTCCTCTCCCATCCCCCTATTCAACAGGCCATCTCGTTTTCCTTCTTCTTCTTTTTAACCAATATTTTCTTTTCTTTTCCTTCTTCTTTTTTCCTTTTCCTTTTTTTTTTTTCTCTCTTTTCTCTTCTTTTTTTCCCTCTTTTCTCCTTTTTTTCTTTCTTTTTTCTTCTTT
              +
              8-BB9FFGFFGGGG9F9-C<EE9FFFACAF,@@@@<89DFE9F9F,<C69CCFFEFFE+,<C,,<,,,<;,,,,<,,,9,,,,,,,<,+88@@C,,:++9,,<,:,9:,<<66994=,<:5:9C,<,5,4,94994,,,9:4+8++,,:,,,,,,4+,448:,:+6+:7,:8???@,7A<,,,,,,,,+,,,,,,,,,,,,33,,,,,,,,,,,,,,,+,,,,,,,,,,,,,*******,1++++++++++++++++++))+**+**0*0***0**/**)))********)))****-*
              @M00561:19:000000000-ABAUW:1:1101:12573:1467 1:N:0:9
              CAGCTTATCACCCCGGAATTGGTTTATCCGGAGATGGGGTCTTATGGCTGGAAGAGGCCAGCACCTTTTCTCCCTCCTTTTCTCTTCTGCCGGCCCTTTATATTCCACTCGTATTTTTTGTTTTCTTTCCCTTTCTTACTTTTAACCTCTTCTTGTCTCCTATGTGACCAGCCTCTATTTTTTATTATAATTTTGATAACGTTTGTCTGCTCTTTATCTCCTTCACTTCTTGTTACCTATTTTCTCTCTTCTTCGTGTTTTTAGTGCCTTGGTCTGCCGCAGCGGGCGTGCTTGTTGAC

              Cleaned FASTQ:


              @M00561:19:000000000-ABAUW:1:1101:10946:1435 1:N:0:9
              TCTCCCTTTTATCTT
              +
              CBCCCFFFFE9FAFF
              @M00561:19:000000000-ABAUW:1:1101:13609:1492 1:N:0:9
              TTGTAAAGCATCG
              +
              BC<CCCD<F9FF>
              @M00561:19:000000000-ABAUW:1:1101:17917:1554 1:N:0:9
              TGCTGGACCTGTG
              +
              6-AAB9EFGG8F,
              @M00561:19:000000000-ABAUW:1:1101:10142:1572 1:N:0:9
              TTACTGGCGTCCTTGCTTTCTCCTTC


              It appears to have truncated all my reads by a massive amount.

              Comment


              • #22
                Yikes!

                Can you post your command line for BBDuk? You are trimming the original files, correct (not merged one)?

                Comment


                • #23
                  Sorry my fault. I wrote:

                  k=28 k=12

                  instead of

                  k=28 mink=12

                  New Output:


                  Memory: free=1041m, used=19m

                  Added 126482 kmers; time: 0.091 seconds.
                  Memory: free=1032m, used=28m

                  Input is being processed as paired
                  Started output streams: 0.008 seconds.
                  Processing time: 69.585 seconds.

                  Input: 2546118 reads 645049609 bases.
                  KTrimmed: 11853 reads (0.47%) 597009 bases (0.09%)
                  Result: 2546034 reads (100.00%) 644452600 bases (99.91%)

                  Time: 69.691 seconds.
                  Reads Processed: 2546k 36.53k reads/sec
                  Bases Processed: 645m 9.26m bases/sec

                  Comment


                  • #24
                    Much better. So a few had some adapters left over.

                    Comment


                    • #25
                      If I understand it correctly adapter fragments shorter than 12 will still be left even after this cleaning process. Shoudl I specify a smaller value for mink than 12 to deal with this?

                      Or perhaps there is a different tool for trimming based upon the degree of overlap between read pairs?

                      i.e. IF overlap is less than read length THEN truncate reads to overlap length.

                      Comment


                      • #26
                        Actually it looks like bbmerge can do this with the tbo flag:

                        Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


                        I wonder if there is any benefit to processing our reads through this pipeline prior to aligning to the reference?

                        Comment


                        • #27
                          Having reads free of extraneous sequence is going to benefit all downstream analysis. So you should do that trimming. Brian had mentioned about the tbo flag in his earlier post.

                          Comment


                          • #28
                            In case anyone is interested, the FASTQ 2x300bp files cleaned up with bbduk with paramters of k=28 mink=12 resulted in two cleaned reads, which when realligned with Bowtie2 and then opened in SeqMonk create a read distribution that looks like this:



                            So clearly the residual adapter sequences are causing some issues with Bowtie correctly calling insert length, soemthign that is partially corrected by bbduk (but not fully probably due to the mink=12 parameter).

                            If we have time tomorrow, we'll redo with the tbo flag.

                            Comment


                            • #29
                              Originally posted by GenoMax View Post
                              Having reads free of extraneous sequence is going to benefit all downstream analysis. So you should do that trimming. Brian had mentioned about the tbo flag in his earlier post.
                              OH dear. My apologies for not reading/understanding it all. Yes thanks to Brian and Genomax. It's been a tiring day, but thanks for all the help given to a complete novice!

                              Comment


                              • #30
                                A tiring day perhaps, but I am glad it ended well.

                                Good Luck with the rest of your analysis.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                11 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                51 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X