Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Here is the histogram:



    So the aberrant inserts of approx 300bp are clearly caused by an error in how Bowtie classifies insert length.

    I do think this might be for those inserts in the 280-300bp range that fail to have adapters trimmed in the FastQ. What do you think?

    If so, it would explain why there is a dip in apparent insert lenghts in the 280-300bp range, an the anomalous peak at 300bp.

    Comment


    • #17
      Since you now have BBMap installed you can easily check with BBDuk to see how many reads still have adapter contamination. I assume you have not done any trimming on these reads. Specify appropriate adapter file when you trim. Standard illumina adapter files are in "/path_to/bbmap-xx.xx/bbmap/resources/".
      Last edited by GenoMax; 02-05-2015, 06:11 AM.

      Comment


      • #18
        Trimming was performed automatically by Illumina prior to FASTQ download, but I suspect there will be short tails left on a subset of fragments.

        Comment


        • #19
          No harm in trying a pass through BBDuk to verify (specially if you want to get rid of those tails).

          Comment


          • #20
            Initial:
            Memory: free=1039m, used=21m

            Added 16767 kmers; time: 0.225 seconds.
            bbduk output:

            Memory: free=1032m, used=28m

            Input is being processed as paired
            Started output streams: 0.010 seconds.
            Processing time: 14.997 seconds.

            Input: 2546118 reads 645049609 bases.
            KTrimmed: 2543559 reads (99.90%) 633504163 bases (98.21%)
            Result: 467188 reads (18.35%) 11545446 bases (1.79%)

            Time: 15.238 seconds.
            Reads Processed: 2546k 167.09k reads/sec
            Bases Processed: 645m 42.33m bases/sec

            Comment


            • #21
              Hmmm..I don't think I have bbduk working correctly.

              Original Fastq file:

              @M00561:19:000000000-ABAUW:1:1101:10946:1435 1:N:0:9
              TCTCCCTTTTATCTTTACATACTGTCGTTCATTATCCTCTTATCTTATCAAACCTTGCTTTTCATCTTTCTTTTTTTTTTTTTTCTTTTCTCTTTCTTTTCTCCTTTCTACCCTCTATTTTGTTTTCTTTTTTTCTTCAGTTTTTCATCTTTTATTATTCTCTTTTATCTCTGTTTGGATAGCGCCTGACTGAAGTTATAACTTCGGTCATGTTATCTGT
              +
              CBCCCFFFFE9FAFFG9@,C,C@,CC8FF<8CC,CC@F,CC,C,;C,C@,,,;6CE,6,<CC@,,;<CC@,<C,,+78@++7++6<C,<6<,,,<<C?,,5,,,,,<,,,9,,:9,,,9,,,::,:@???<=++8,88,,,,:<8,,,5,,,:,,:,,,5,,,::A,,575,,:8,,+,,,,3+38++,8>,,,,,,,,,,7>@=**66*,,77,@@>,,
              @M00561:19:000000000-ABAUW:1:1101:13213:1452 1:N:0:9
              CCGTATTACCTGCCGCATCATTGTGAGTTGAAGATACATGTGCGGTTGATTTTATCTGGCTAGGCTACGTATTTCTATTTTTTTTCTCCCTTTCTTTTCCTTTTTTCTTTTTTTTTTCTTTTTTCTTTTCTTCTTCTGTTTTTTTTTATTTTTTCTTTTCTTACTCTTTTACTCCAGTGCCTTCAGATGTCTTTTTCTTCGCATTTTCCATTCTTTTTTATTTTTCCTATTTCCTTTATTTCCTCTACCCATCGTTAATATCACTTTCGTTTACTTTCACGTTAGGTTATACCG
              +
              8@<A9EF-C@C<AC+@+B@,CF9C9,,CF,,,,,C,C,C,<,6,+BF+,ECEF,C@E,,;C,,,6,,;,,,,,,69,<,9,,8++:96,::,,6<CC,969,,9,+4,,,<9+++4+4<,,9:@4,<8<8<,,,,5,,<,8,:6++8,::B,,+5,:,:7>,,7,7@,,7,7@>3,,7,77,77,,,7,,7@@<DB>,,6***66@,42,66,6,6,,3*5++5+35++++++3++5+5+5+3+54+2+*+30**2**+**0*2::C***09*2*2*1***0*)))19*2)/*)
              @M00561:19:000000000-ABAUW:1:1101:15428:1464 1:N:0:9
              CACCACCTCTTTTCATGGTACCATTTGCACGCTCCAAACTTGCATAGTGACCTTTTTCGATTAATTGACCAAAGTCAACATTATAACCGTCCTTTTTTTCATCCCTCTTTTCCTCCTCTCTCTTCTCCTCCTCTCCCATCCCCCTATTCAACAGGCCATCTCGTTTTCCTTCTTCTTCTTTTTAACCAATATTTTCTTTTCTTTTCCTTCTTCTTTTTTCCTTTTCCTTTTTTTTTTTTCTCTCTTTTCTCTTCTTTTTTTCCCTCTTTTCTCCTTTTTTTCTTTCTTTTTTCTTCTTT
              +
              8-BB9FFGFFGGGG9F9-C<EE9FFFACAF,@@@@<89DFE9F9F,<C69CCFFEFFE+,<C,,<,,,<;,,,,<,,,9,,,,,,,<,+88@@C,,:++9,,<,:,9:,<<66994=,<:5:9C,<,5,4,94994,,,9:4+8++,,:,,,,,,4+,448:,:+6+:7,:8???@,7A<,,,,,,,,+,,,,,,,,,,,,33,,,,,,,,,,,,,,,+,,,,,,,,,,,,,*******,1++++++++++++++++++))+**+**0*0***0**/**)))********)))****-*
              @M00561:19:000000000-ABAUW:1:1101:12573:1467 1:N:0:9
              CAGCTTATCACCCCGGAATTGGTTTATCCGGAGATGGGGTCTTATGGCTGGAAGAGGCCAGCACCTTTTCTCCCTCCTTTTCTCTTCTGCCGGCCCTTTATATTCCACTCGTATTTTTTGTTTTCTTTCCCTTTCTTACTTTTAACCTCTTCTTGTCTCCTATGTGACCAGCCTCTATTTTTTATTATAATTTTGATAACGTTTGTCTGCTCTTTATCTCCTTCACTTCTTGTTACCTATTTTCTCTCTTCTTCGTGTTTTTAGTGCCTTGGTCTGCCGCAGCGGGCGTGCTTGTTGAC

              Cleaned FASTQ:


              @M00561:19:000000000-ABAUW:1:1101:10946:1435 1:N:0:9
              TCTCCCTTTTATCTT
              +
              CBCCCFFFFE9FAFF
              @M00561:19:000000000-ABAUW:1:1101:13609:1492 1:N:0:9
              TTGTAAAGCATCG
              +
              BC<CCCD<F9FF>
              @M00561:19:000000000-ABAUW:1:1101:17917:1554 1:N:0:9
              TGCTGGACCTGTG
              +
              6-AAB9EFGG8F,
              @M00561:19:000000000-ABAUW:1:1101:10142:1572 1:N:0:9
              TTACTGGCGTCCTTGCTTTCTCCTTC


              It appears to have truncated all my reads by a massive amount.

              Comment


              • #22
                Yikes!

                Can you post your command line for BBDuk? You are trimming the original files, correct (not merged one)?

                Comment


                • #23
                  Sorry my fault. I wrote:

                  k=28 k=12

                  instead of

                  k=28 mink=12

                  New Output:


                  Memory: free=1041m, used=19m

                  Added 126482 kmers; time: 0.091 seconds.
                  Memory: free=1032m, used=28m

                  Input is being processed as paired
                  Started output streams: 0.008 seconds.
                  Processing time: 69.585 seconds.

                  Input: 2546118 reads 645049609 bases.
                  KTrimmed: 11853 reads (0.47%) 597009 bases (0.09%)
                  Result: 2546034 reads (100.00%) 644452600 bases (99.91%)

                  Time: 69.691 seconds.
                  Reads Processed: 2546k 36.53k reads/sec
                  Bases Processed: 645m 9.26m bases/sec

                  Comment


                  • #24
                    Much better. So a few had some adapters left over.

                    Comment


                    • #25
                      If I understand it correctly adapter fragments shorter than 12 will still be left even after this cleaning process. Shoudl I specify a smaller value for mink than 12 to deal with this?

                      Or perhaps there is a different tool for trimming based upon the degree of overlap between read pairs?

                      i.e. IF overlap is less than read length THEN truncate reads to overlap length.

                      Comment


                      • #26
                        Actually it looks like bbmerge can do this with the tbo flag:

                        Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


                        I wonder if there is any benefit to processing our reads through this pipeline prior to aligning to the reference?

                        Comment


                        • #27
                          Having reads free of extraneous sequence is going to benefit all downstream analysis. So you should do that trimming. Brian had mentioned about the tbo flag in his earlier post.

                          Comment


                          • #28
                            In case anyone is interested, the FASTQ 2x300bp files cleaned up with bbduk with paramters of k=28 mink=12 resulted in two cleaned reads, which when realligned with Bowtie2 and then opened in SeqMonk create a read distribution that looks like this:



                            So clearly the residual adapter sequences are causing some issues with Bowtie correctly calling insert length, soemthign that is partially corrected by bbduk (but not fully probably due to the mink=12 parameter).

                            If we have time tomorrow, we'll redo with the tbo flag.

                            Comment


                            • #29
                              Originally posted by GenoMax View Post
                              Having reads free of extraneous sequence is going to benefit all downstream analysis. So you should do that trimming. Brian had mentioned about the tbo flag in his earlier post.
                              OH dear. My apologies for not reading/understanding it all. Yes thanks to Brian and Genomax. It's been a tiring day, but thanks for all the help given to a complete novice!

                              Comment


                              • #30
                                A tiring day perhaps, but I am glad it ended well.

                                Good Luck with the rest of your analysis.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                17 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                22 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                16 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                46 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X