Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • In addition to the @RG tag and such in the header, there's then an "RG:Z:Something" auxiliary field in each read.

    @frozenlyse, Bison might work for you (it's more of a niche application), if you have a bit more computational resources to use and are happy with using bowtie2 as the underlying aligner. I haven't tried using read groups, but they should work exactly as if you were using bowtie2 directly.

    Comment


    • Originally posted by fkrueger View Post
      We just released a new version of Bismark (v0.8.1) which addresses a few minor issues such as plot colours and legends, but notably also extends the '--ignore' option of the methylation extractor to now work independently for Read 1 and Read 2 of paired-end files. This allows more flexible control of removing biases that can now be visualised via the M-bias plot. Here are the current changes in more detail:

      • Methylation Extractor: Changed the function of the option '--ignore <int>' to ignore the first <int> bp from the 5' end of single-end reads or Read 1 of paired-end files. In addition, added a new option '--ignore_r2 <int>' to ignore the first <int> bp from the 5' end of Read 2 of paired-end files. Since the first couple bases in Read 2 of BS-Seq experiments show a severe bias towards non-methylation as a result of the end-repair of sonicated fragments with unmethylated cytosines (see M-bias plot), it is recommended that the the first couple of bp of Read 2 are removed before starting downstream analysis. Please see the section on M-bias plots in the Bismark User Guide for more details
      • Methylation Extractor: Changed colours, legends and background colour of the M-bias plot
      • Bismark: Changed the way in which the alignment overview file is being named to actually work

      Together with the options '--clip_r1' and '--clip_r2' of Trim Galore one can now choose to remove known biases, e.g. for PBAT-Seq, prior to running the alignments using Trim Galore, or ignoring biased methylation calls after the alignments have been carried out using the methylation extractor.

      Bismark can be downloaded here: https://www.bioinformatics.babraham....jects/bismark/
      I'm very happy to see the M-bias feature added to Bismark! Especially because it is better than my own basic implementation of this idea and thus saves me the trouble of refining mine

      I've had a quick look at the code, but not yet had a chance to run it, and have a couple of questions:
      (1) Since the M-bias subroutines are part of the bismark_methylation_extractor, does this mean it is compatible with Bismark files generated from earlier versions of Bismark (version < 0.8)?
      (2) Is a different M-bias plot computed for each different read length? For instance, the BAM contains reads from multiple sequencing runs, 75bp PE and 100bp PE, of the same sample.
      (3) How do you deal with 5' trimming of reads? I ran into this challenge with my own script since the first position of a read may not in fact correspond to the first cycle of the sequencing run. This can mean that if you 'stack' all reads according to their first position you might be comparing different sequencing cycles across reads for a given read position. If trimming was applied prior to alignment then it seems impossible to tell which sequencing cycle the first base of the read actually corresponds to. However, if (soft) clipping of the 5' end of the read has been performed by the aligner then it should be possible to retain the full correspondence between read position and sequencing cycle by using the CIGAR (I'm not actually sure whether Bismark allows Bowtie1/2 to perform soft-clipping of reads, so this may be a moot point).

      Comment


      • Hi Pete,

        To 1) Yes it should be downward compatible, at least for SAM files (the old vanilla format does a version check at the start which might fail, but I doubt anyone is using the old format still)

        To 2) As it stands, only 1 M-bias plot is generated per read. Since reads are routinely trimmed before alignment it would probably mean that you get >150 M-bias plots for a paired-end library, and I doubt that anyone would want to look through all of them. Technically it should be possible though.

        To 3) I don't see this as a problem because Bismark doesn't allow soft-clipping to take place (Bowtie 1 doesn't do it, and for Bowtie 2 it is not enabled). Trim Galore for example also doesn't do adaptive trimming from the 5' end, and the new options '--clip_r1' etc would trim reads uniformly. So in any case, position 1 in the read should always be the very first position.

        Comment


        • Originally posted by fkrueger View Post
          Hi Pete,

          To 1) Yes it should be downward compatible, at least for SAM files (the old vanilla format does a version check at the start which might fail, but I doubt anyone is using the old format still)

          To 2) As it stands, only 1 M-bias plot is generated per read. Since reads are routinely trimmed before alignment it would probably mean that you get >150 M-bias plots for a paired-end library, and I doubt that anyone would want to look through all of them. Technically it should be possible though.

          To 3) I don't see this as a problem because Bismark doesn't allow soft-clipping to take place (Bowtie 1 doesn't do it, and for Bowtie 2 it is not enabled). Trim Galore for example also doesn't do adaptive trimming from the 5' end, and the new options '--clip_r1' etc would trim reads uniformly. So in any case, position 1 in the read should always be the very first position.
          1) Great, I can retire my (slow) Python script!
          2) True, however Kasper Hansen convinced me of the benefit of this level of stratification in M-bias plots (see e.g. Figure 2b of Hansen, K. D., Langmead, B. & Irizarry, R. A. Genome Biol 13, R83 (2012)). Having given this a bit more thought, for this particular example I think that read length is a proxy for library/sequencing run. I suppose that dealing with this extra complication in the M-bias plots isn't really the role of Bismark; this example is perhaps more a warning to the user to be very careful when combining data from multiple libraries/sequencing runs and in how you perform data QC and methylation calling in such datasets since different libraries/sequencing runs can and will have very different biases and error profiles.
          3) Thanks for clarifying the nature of 5' trimming in Trim Galore and Bismark. That makes my concerns irrelevant

          Comment


          • I agree that one should be careful when merging reads of different length into the same file and then analysing them jointly. Fig. 2B in the Hansen paper works nicely for 5 read lengths and just CpG context, but imagine the mess when plotting out CpG, CHH and CHG context as well as their coverages for every single read length encountered in the sample :P

            I guess the coverage plot could also be used as an indication that something weird is going, e.g. if you see some sort of bias around bp 75 when you also see a huge spike in general methylation call coverage for this position. Thanks for sharing your considerations though.

            Comment


            • Agreed. I'm having a closer look at M-bias as a colleague suggested most of the 3' M-bias he had observed in his data was simply due to the inherent decline in quality of base calls at the 3' end of reads. This would suggest that the 3' end of some reads could be "saved", provided the base calls were of a high enough quality. Now, whether all that extra work is worth the effort is another matter... But I've been dealing with datasets where I'm effectively throwing away > 15% of each aligned read due to extensive M-bias. In such cases it would be desirable to get back some of those read positions if their quality was high enough for me to be confident they were unlikely to be affected by M-bias.

              One more request: could you please add an --mbias option to bismark_methylation_extractor so as to just produce the M-bias output without doing the methylation calling. Thanks!

              Comment


              • Originally posted by PeteH View Post
                One more request: could you please add an --mbias option to bismark_methylation_extractor so as to just produce the M-bias output without doing the methylation calling. Thanks!
                That seems fair enough, I'll put it on my To-Do list!

                Comment


                • Originally posted by PeteH View Post
                  One more request: could you please add an --mbias option to bismark_methylation_extractor so as to just produce the M-bias output without doing the methylation calling. Thanks!
                  Hi Pete,

                  I had a go at implementing a new function '--mbias_only' which you can find attached. If this option is specified, only a report and the M-bias plot(s) and their data are being written out. It will not extract any methylation data, also the bedGraph and cytosine report conversion are not allowed.

                  The new version of the extractor (v0.8.1a) is attached; if you find it working alright I will include it into a new release once I am back from ISMB.
                  Attached Files

                  Comment


                  • Hi Felix - never mind about read groups, I've pipelined adding them during postprocessing!

                    One small bug I've just seen is that the TLEN of some reads in the output bam file is sometimes 0 (this is with v0.7.12 in case this has been fixed in the new 0.8 versions)

                    eg on a paired test paired end alignment of 250,000 150bp PE reads
                    Code:
                    samtools view test_reads.rmdup.bam | awk '{if ($9=="0") print $0}' | wc -l
                    2518
                    samtools view test_reads.rmdup.bam | awk '{if ($9=="0") print $0}' | head
                    HWI-D00119:19:H092KADXX:1:1101:19147:6616_1:N:0:	115	chr1	1248010	255	136M	=	1248010	0	CCCTACAAAAATACTAAAACTTATAAACAAGGTCACCGTCTCGCCATTAACCAACATATAACAATTAACCCCTAAACCCCGAAAAAAAAAAAAACCTCAACCCAAACTACCCGTACTAACCCGAAATAAACCCCAC	'BB<B7BFFBBB07BBB<0000<<BB0000'0'7'70''0'07BBBBBB7B<<B<B<BBB<<FFBBB<BBBB7B77<BB7B<BBBBBFFIIFFFIIFFFFIIIFFBB00FFBB7FFBFFFFFFFFFFFFFFFF<BB	PG:Z:MarkDuplicates	XG:Z:GA	NM:i:45	XM:Z:......zxhh..h..xhhh...h.hhh..x.Z.....Z....Z.....hh..zx...h..h..x..h......xh.....Zxhhh..h.h.h.h.....x....xh..x...Z.h..x....Z.hh.hhh......	XR:Z:CT	XX:Z:G5GGGG2G2GGGG3G1GGG2GC17GG2GG3G2G2G2G6GG6GGGG2G1G1G1G5G4GG2G5G2G6GG1GGG6
                    HWI-D00119:19:H092KADXX:1:1101:19147:6616_1:N:0:	179	chr1	1248010	255	96M	=	1248010	0	ACCTACAAAAATACTAAAACTTATAAACAACGTCACCGTCTCCCCATTAACCAACATATAACAATTAACCCCTAAACCCCGAAAAAAAAAAAAACC	BBBFFFFFFFFFFFFFFIFFFFFFFFFFFFF00BBFF0<B0B0BB7''BBFFBBFF0<0<7BB<B<B<BFFF<7<BBFFF'00777B7<B<B707B	PG:Z:MarkDuplicates	XG:Z:GA	NM:i:34	XM:Z:h.....zxhh..h..xhhh...h.hhh..x.Z.....Z..........hh..zx...h..h..x..h......xh.....Zxhhh..h.h.h.h..	XR:Z:GA	XX:Z:G5GGGG2G2GGGG3G1GGG2G12G5GG2GG3G2G2G2G6GG6GGGG2G1G1G1G2
                    HWI-D00119:19:H092KADXX:1:1101:20264:11719_1:N:0:	115	chr1	2628067	255	150M	=	2628067	0	CAACACCCACATCGCCAAACAAACATCCGCCAACCTAAAACAACACCCACACCCCCAAATAAACATCCGACAACCTAAAACAAAACCCACACCCCTAGATAAACATCCGACACCGTAAAACAACACCCCACACCCACAAATAAACATCTA	7BB70BFFBBBBBBBBFBBBFBBBB70'07BBBBBBFBBB<BBB<BBBB<7FFFBBFFFBFBBBB7<7<FBFBFBBFFFBBFFBFFFFFB<FFFFBFFFFFFFFF<<7<FB<BFFFIIIIIFFFBIIIIFFFIFFFIFFFFFFFFFFBBB	PG:Z:MarkDuplicates	XG:Z:GA	NM:i:32	XM:Z:..x..........Z...xh.z.h.....Z...x...xh....x..............xh.h.h.....Z...x...xh.h..xh.............Hh.h.......Z.....Z.hh.h..x...............hh.h.h.....x	XR:Z:CT	XX:Z:2G14GG1G1G9G3GG4G14GG1G1G9G3GG1G2GG14G1G11T3GG1G2G13T1GG1G1G5G
                    HWI-D00119:19:H092KADXX:1:1101:20264:11719_1:N:0:	179	chr1	2628067	255	149M	=	2628067	0	CAACACCCACATCGCCAAACAAACATCCGCCAACCTAAAACAACACCCACACCCCCAAATAAACATCCGACAACCTAAAACAAAACCCACACCCCTAGATAAACATCCGACACCGTAAAACAACACCCCACACCCACAAATAAACATCT	BBBFFFFFFFFFF0BFFIIIIIFBFFII0BFFIIIIIIIIIIIIIIIIFIBFFFFB<BBBBBBBBBBB0007<BBBBBBBB07BBBBBBB7BBBB77'0BB<<<B<<B'077<B'077BBBBBB7BB<707BBBB7<<BBBBBBBBBB<	PG:Z:MarkDuplicates	XG:Z:GA	NM:i:31	XM:Z:..x..........Z...xh.z.h.....Z...x...xh....x..............xh.h.h.....Z...x...xh.h..xh.............Hh.h.......Z.....Z.hh.h..x...............hh.h.h.....	XR:Z:GA	XX:Z:2G14GG1G1G9G3GG4G14GG1G1G9G3GG1G2GG14G1G11T3GG1G2G13T1GG1G1G5
                    HWI-D00119:19:H092KADXX:1:1101:4148:2705_1:N:0:	115	chr1	3104936	255	143M	=	3104936	0	TTTTTTTTTAAAAAAATAACCCGTATCACCATATCCGGGTAATACCAAACCCAACGCAAACTACGCCACTACCTACCGATACCTATTAACTTTATAATATATTTATCCCGAATATATATATACGTAAAAAAACGCGCTCTCCT	BBBBBBB<BBBBBBB<B<'''00'0'0''0<07'''0'0'<'<BBBB770<77000B<00'0''0<BB<'B70'007<700'0<700B<<<<<BBFF<<0<7BB<77007<BBFBBBFBBB77BFIIIIIFBBBBBFFFFBBB	PG:Z:MarkDuplicates	XG:Z:GA	NM:i:34	XM:Z:............h....hh...Z.........h...Z.Z.hh.h...xh....x.Z..x...x.Z.....x...x..Zx.h...x..h.....h.hh.h.h........Zxh.h.h.h.h...Z.h.hhh...Z.Z.......	XR:Z:CT	XX:Z:12G4GG13G4C2GG1G3GG4G4G3G7G3G3G1G3G2G5G1GG1G1G9GG1G1G1G1G5G1GGG13
                    HWI-D00119:19:H092KADXX:1:1101:4148:2705_1:N:0:	179	chr1	3104936	255	22M	=	3104936	0	TTTTTTTTTAAAAAAATAACCC	<<<BBBBBB<<BBBBB07B'<B	PG:Z:MarkDuplicates	XG:Z:GA	NM:i:3	XM:Z:............h....hh...	XR:Z:GA	XX:Z:12G4GG3
                    HWI-D00119:19:H092KADXX:1:1101:12448:3866_1:N:0:	115	chr1	5254199	255	150M	=	5254199	0	CAAAAAATCACCCAATATAAAACTAAAAAACAAAAACCAAAATATCATCCTATCAAATTTCTCTTCACATAAAAATCTAAAAACAAAAAAAATTCCCTAACTTCCTATCTATTCAATAAACAAAACAAACAAAAATTAACTAAATTCCCA	7<BB<FB<77'<'<<000<<B<077B<<B<0<BBB<<0BFB<0<'<<0'<0<'0<BB<7'7<<7''B<<7BBB<<700<<B77FFFB77'70'7B'77FB'07BFBB'<BB0<0FF<BBB0<IIIFF<F7F<BFF<FFFFB<B00B0B<B	PG:Z:MarkDuplicates	XG:Z:GA	NM:i:33	XM:Z:...h..h...........hh....x.hhhh....hh...xhh.........x...xh..............h.........hh.....h.h...............x........x.h.........xh..xh.h..h...x.h......	XR:Z:CT	XX:Z:3G2G11GG4G1GGGG4GG3GGG9G3GG14G9GG5G1G15G8G1G9GG2GG1G2G3G1G6
                    HWI-D00119:19:H092KADXX:1:1101:12448:3866_1:N:0:	179	chr1	5254199	255	149M	=	5254199	0	CAAAAAATCACCCAATATAAAACTAAAAAACAAAAACCAAAATATCATCCTATCAAATTTCTCTTCACATAAAAATCCCAAAACAAAAAAAATTCCCTCACTTCCTATCTATTCAATAAACAAAACAAACAAAAATTAACTAAATTCCC	BBB<0<FBBFBFFFBBFBFF<0B7BB<FFB77FFBFBFBFII77BBF<<<0BFB''700'0<<7'<B<''0<BB<000'77<<7<07B77'70'000<'000'00'0<<'<'00<<'0<<<<<B'7<B''7<<B'0'0000<<0''<00	PG:Z:MarkDuplicates	XG:Z:GA	NM:i:36	XM:Z:...h..h...........hh....x.hhhh....hh...xhh.........x...xh..............h.........hh.....h.h...............x........x.h.........xh..xh.h..h...x.h.....	XR:Z:GA	XX:Z:3G2G11GG4G1GGGG4GG3GGG9G3GG14G5TA2GG5G1G7A7G8G1G9GG2GG1G2G3G1G5
                    HWI-D00119:19:H092KADXX:1:1101:17706:11652_1:N:0:	115	chr1	9776942	255	150M	=	9776942	0	AAATCCTAAAAATCCTAATATCCAAAAAATAATTAAAACCGCCCCACCAACCGCTCACCCTACACCCCGTCTCAATACATCTACAACTACCTACACAATAAATTAACCCCTCACCTAACCATAATCCATTCCTCCTCCATCCTCGCCATA	<<0<B7<BBB<'0B0<BB<00<7FFFFB<<<'0BB70'0'0FFFFFFB<7<70BBB7FBB<B<7FB<7<BBBFFB<BFFBBBBBFBB<<FB<FFFFFFFIFFBBB<IIIFBFBIIFIFIIIBIIFIIIFBIIFIIFIFFFFFBFFFFBBB	PG:Z:MarkDuplicates	XG:Z:GA	NM:i:35	XM:Z:hhh....xhh......xh.h....xhh.h.hh..hhhh..Z........x..Z........x......Z.....x.......x..x..x...x....x.hhh..h...........x.....hh....................Z....h	XR:Z:CT	XX:Z:GGG4GGG6GG1G4GGG1G1GG2GGGG11G11G12G7G2G2G3G4G1GGG2G11G5GG25G
                    HWI-D00119:19:H092KADXX:1:1101:17706:11652_1:N:0:	179	chr1	9776942	255	149M	=	9776942	0	AAATCCTAAAAATCCTAATATCCAAAAAATAATTAAAACCGCCCCACCAACCGCTCACCCTACACCCCGTCTCAATACATCTACAACTACCTACACAATAAATTAACCCCTCACCTAACCATAATCCATTCCTCCTCCATCCTCGCCAT	BBBFFFFFFFFFBFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFIIFIIIFFFFBFFFBBFBFFFFFBBFFFFBFBBBFFBB<BBBBBFBBBBBBBBBFBBFFFBFFF<BBBF<BBFFBBBBBFFBFFFFFFFFFFFFFFFFFFFBB	PG:Z:MarkDuplicates	XG:Z:GA	NM:i:34	XM:Z:hhh....xhh......xh.h....xhh.h.hh..hhhh..Z........x..Z........x......Z.....x.......x..x..x...x....x.hhh..h...........x.....hh....................Z....	XR:Z:GA	XX:Z:GGG4GGG6GG1G4GGG1G1GG2GGGG11G11G12G7G2G2G3G4G1GGG2G11G5GG25
                    It appears to happen when one read is fully contained by the other (due to trimming I presume)

                    Cheers,
                    Aaron

                    Comment


                    • This supposedly happens only for Bowtie 2 alignments? I will take a look at the issue.

                      Comment


                      • Yes bowtie2 -
                        "bismark -p 4 --bowtie2 -X 1000 --unmapped --ambiguous --gzip --bam " are the bismark options i am using

                        Comment


                        • methylation_extractor bug

                          Hi,
                          I just want to report a little bug in methylation_extractor with option --CX_context , if the output directory is not the current directory, the CX_report won't be create as the bed file is searched in the current directory and not the output directory.
                          Cheers
                          Céline

                          Comment


                          • Originally posted by cnoirot View Post
                            Hi,
                            I just want to report a little bug in methylation_extractor with option --CX_context , if the output directory is not the current directory, the CX_report won't be create as the bed file is searched in the current directory and not the output directory.
                            Cheers
                            Céline
                            thanks for your message, I shall look into this soon.

                            Comment


                            • Originally posted by frozenlyse View Post
                              Hi Felix - never mind about read groups, I've pipelined adding them during postprocessing!

                              One small bug I've just seen is that the TLEN of some reads in the output bam file is sometimes 0 (this is with v0.7.12 in case this has been fixed in the new 0.8 versions)

                              eg on a paired test paired end alignment of 250,000 150bp PE reads
                              Code:
                              samtools view test_reads.rmdup.bam | awk '{if ($9=="0") print $0}' | wc -l
                              2518
                              samtools view test_reads.rmdup.bam | awk '{if ($9=="0") print $0}' | head
                              HWI-D00119:19:H092KADXX:1:1101:19147:6616_1:N:0:	115	chr1	1248010	255	136M	=	1248010	0	CCCTACAAAAATACTAAAACTTATAAACAAGGTCACCGTCTCGCCATTAACCAACATATAACAATTAACCCCTAAACCCCGAAAAAAAAAAAAACCTCAACCCAAACTACCCGTACTAACCCGAAATAAACCCCAC	'BB<B7BFFBBB07BBB<0000<<BB0000'0'7'70''0'07BBBBBB7B<<B<B<BBB<<FFBBB<BBBB7B77<BB7B<BBBBBFFIIFFFIIFFFFIIIFFBB00FFBB7FFBFFFFFFFFFFFFFFFF<BB	PG:Z:MarkDuplicates	XG:Z:GA	NM:i:45	XM:Z:......zxhh..h..xhhh...h.hhh..x.Z.....Z....Z.....hh..zx...h..h..x..h......xh.....Zxhhh..h.h.h.h.....x....xh..x...Z.h..x....Z.hh.hhh......	XR:Z:CT	XX:Z:G5GGGG2G2GGGG3G1GGG2GC17GG2GG3G2G2G2G6GG6GGGG2G1G1G1G5G4GG2G5G2G6GG1GGG6
                              HWI-D00119:19:H092KADXX:1:1101:19147:6616_1:N:0:	179	chr1	1248010	255	96M	=	1248010	0	ACCTACAAAAATACTAAAACTTATAAACAACGTCACCGTCTCCCCATTAACCAACATATAACAATTAACCCCTAAACCCCGAAAAAAAAAAAAACC	BBBFFFFFFFFFFFFFFIFFFFFFFFFFFFF00BBFF0<B0B0BB7''BBFFBBFF0<0<7BB<B<B<BFFF<7<BBFFF'00777B7<B<B707B	PG:Z:MarkDuplicates	XG:Z:GA	NM:i:34	XM:Z:h.....zxhh..h..xhhh...h.hhh..x.Z.....Z..........hh..zx...h..h..x..h......xh.....Zxhhh..h.h.h.h..	XR:Z:GA	XX:Z:G5GGGG2G2GGGG3G1GGG2G12G5GG2GG3G2G2G2G6GG6GGGG2G1G1G1G2
                              HWI-D00119:19:H092KADXX:1:1101:20264:11719_1:N:0:	115	chr1	2628067	255	150M	=	2628067	0	CAACACCCACATCGCCAAACAAACATCCGCCAACCTAAAACAACACCCACACCCCCAAATAAACATCCGACAACCTAAAACAAAACCCACACCCCTAGATAAACATCCGACACCGTAAAACAACACCCCACACCCACAAATAAACATCTA	7BB70BFFBBBBBBBBFBBBFBBBB70'07BBBBBBFBBB<BBB<BBBB<7FFFBBFFFBFBBBB7<7<FBFBFBBFFFBBFFBFFFFFB<FFFFBFFFFFFFFF<<7<FB<BFFFIIIIIFFFBIIIIFFFIFFFIFFFFFFFFFFBBB	PG:Z:MarkDuplicates	XG:Z:GA	NM:i:32	XM:Z:..x..........Z...xh.z.h.....Z...x...xh....x..............xh.h.h.....Z...x...xh.h..xh.............Hh.h.......Z.....Z.hh.h..x...............hh.h.h.....x	XR:Z:CT	XX:Z:2G14GG1G1G9G3GG4G14GG1G1G9G3GG1G2GG14G1G11T3GG1G2G13T1GG1G1G5G
                              HWI-D00119:19:H092KADXX:1:1101:20264:11719_1:N:0:	179	chr1	2628067	255	149M	=	2628067	0	CAACACCCACATCGCCAAACAAACATCCGCCAACCTAAAACAACACCCACACCCCCAAATAAACATCCGACAACCTAAAACAAAACCCACACCCCTAGATAAACATCCGACACCGTAAAACAACACCCCACACCCACAAATAAACATCT	BBBFFFFFFFFFF0BFFIIIIIFBFFII0BFFIIIIIIIIIIIIIIIIFIBFFFFB<BBBBBBBBBBB0007<BBBBBBBB07BBBBBBB7BBBB77'0BB<<<B<<B'077<B'077BBBBBB7BB<707BBBB7<<BBBBBBBBBB<	PG:Z:MarkDuplicates	XG:Z:GA	NM:i:31	XM:Z:..x..........Z...xh.z.h.....Z...x...xh....x..............xh.h.h.....Z...x...xh.h..xh.............Hh.h.......Z.....Z.hh.h..x...............hh.h.h.....	XR:Z:GA	XX:Z:2G14GG1G1G9G3GG4G14GG1G1G9G3GG1G2GG14G1G11T3GG1G2G13T1GG1G1G5
                              HWI-D00119:19:H092KADXX:1:1101:4148:2705_1:N:0:	115	chr1	3104936	255	143M	=	3104936	0	TTTTTTTTTAAAAAAATAACCCGTATCACCATATCCGGGTAATACCAAACCCAACGCAAACTACGCCACTACCTACCGATACCTATTAACTTTATAATATATTTATCCCGAATATATATATACGTAAAAAAACGCGCTCTCCT	BBBBBBB<BBBBBBB<B<'''00'0'0''0<07'''0'0'<'<BBBB770<77000B<00'0''0<BB<'B70'007<700'0<700B<<<<<BBFF<<0<7BB<77007<BBFBBBFBBB77BFIIIIIFBBBBBFFFFBBB	PG:Z:MarkDuplicates	XG:Z:GA	NM:i:34	XM:Z:............h....hh...Z.........h...Z.Z.hh.h...xh....x.Z..x...x.Z.....x...x..Zx.h...x..h.....h.hh.h.h........Zxh.h.h.h.h...Z.h.hhh...Z.Z.......	XR:Z:CT	XX:Z:12G4GG13G4C2GG1G3GG4G4G3G7G3G3G1G3G2G5G1GG1G1G9GG1G1G1G1G5G1GGG13
                              HWI-D00119:19:H092KADXX:1:1101:4148:2705_1:N:0:	179	chr1	3104936	255	22M	=	3104936	0	TTTTTTTTTAAAAAAATAACCC	<<<BBBBBB<<BBBBB07B'<B	PG:Z:MarkDuplicates	XG:Z:GA	NM:i:3	XM:Z:............h....hh...	XR:Z:GA	XX:Z:12G4GG3
                              HWI-D00119:19:H092KADXX:1:1101:12448:3866_1:N:0:	115	chr1	5254199	255	150M	=	5254199	0	CAAAAAATCACCCAATATAAAACTAAAAAACAAAAACCAAAATATCATCCTATCAAATTTCTCTTCACATAAAAATCTAAAAACAAAAAAAATTCCCTAACTTCCTATCTATTCAATAAACAAAACAAACAAAAATTAACTAAATTCCCA	7<BB<FB<77'<'<<000<<B<077B<<B<0<BBB<<0BFB<0<'<<0'<0<'0<BB<7'7<<7''B<<7BBB<<700<<B77FFFB77'70'7B'77FB'07BFBB'<BB0<0FF<BBB0<IIIFF<F7F<BFF<FFFFB<B00B0B<B	PG:Z:MarkDuplicates	XG:Z:GA	NM:i:33	XM:Z:...h..h...........hh....x.hhhh....hh...xhh.........x...xh..............h.........hh.....h.h...............x........x.h.........xh..xh.h..h...x.h......	XR:Z:CT	XX:Z:3G2G11GG4G1GGGG4GG3GGG9G3GG14G9GG5G1G15G8G1G9GG2GG1G2G3G1G6
                              HWI-D00119:19:H092KADXX:1:1101:12448:3866_1:N:0:	179	chr1	5254199	255	149M	=	5254199	0	CAAAAAATCACCCAATATAAAACTAAAAAACAAAAACCAAAATATCATCCTATCAAATTTCTCTTCACATAAAAATCCCAAAACAAAAAAAATTCCCTCACTTCCTATCTATTCAATAAACAAAACAAACAAAAATTAACTAAATTCCC	BBB<0<FBBFBFFFBBFBFF<0B7BB<FFB77FFBFBFBFII77BBF<<<0BFB''700'0<<7'<B<''0<BB<000'77<<7<07B77'70'000<'000'00'0<<'<'00<<'0<<<<<B'7<B''7<<B'0'0000<<0''<00	PG:Z:MarkDuplicates	XG:Z:GA	NM:i:36	XM:Z:...h..h...........hh....x.hhhh....hh...xhh.........x...xh..............h.........hh.....h.h...............x........x.h.........xh..xh.h..h...x.h.....	XR:Z:GA	XX:Z:3G2G11GG4G1GGGG4GG3GGG9G3GG14G5TA2GG5G1G7A7G8G1G9GG2GG1G2G3G1G5
                              HWI-D00119:19:H092KADXX:1:1101:17706:11652_1:N:0:	115	chr1	9776942	255	150M	=	9776942	0	AAATCCTAAAAATCCTAATATCCAAAAAATAATTAAAACCGCCCCACCAACCGCTCACCCTACACCCCGTCTCAATACATCTACAACTACCTACACAATAAATTAACCCCTCACCTAACCATAATCCATTCCTCCTCCATCCTCGCCATA	<<0<B7<BBB<'0B0<BB<00<7FFFFB<<<'0BB70'0'0FFFFFFB<7<70BBB7FBB<B<7FB<7<BBBFFB<BFFBBBBBFBB<<FB<FFFFFFFIFFBBB<IIIFBFBIIFIFIIIBIIFIIIFBIIFIIFIFFFFFBFFFFBBB	PG:Z:MarkDuplicates	XG:Z:GA	NM:i:35	XM:Z:hhh....xhh......xh.h....xhh.h.hh..hhhh..Z........x..Z........x......Z.....x.......x..x..x...x....x.hhh..h...........x.....hh....................Z....h	XR:Z:CT	XX:Z:GGG4GGG6GG1G4GGG1G1GG2GGGG11G11G12G7G2G2G3G4G1GGG2G11G5GG25G
                              HWI-D00119:19:H092KADXX:1:1101:17706:11652_1:N:0:	179	chr1	9776942	255	149M	=	9776942	0	AAATCCTAAAAATCCTAATATCCAAAAAATAATTAAAACCGCCCCACCAACCGCTCACCCTACACCCCGTCTCAATACATCTACAACTACCTACACAATAAATTAACCCCTCACCTAACCATAATCCATTCCTCCTCCATCCTCGCCAT	BBBFFFFFFFFFBFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFIIFIIIFFFFBFFFBBFBFFFFFBBFFFFBFBBBFFBB<BBBBBFBBBBBBBBBFBBFFFBFFF<BBBF<BBFFBBBBBFFBFFFFFFFFFFFFFFFFFFFBB	PG:Z:MarkDuplicates	XG:Z:GA	NM:i:34	XM:Z:hhh....xhh......xh.h....xhh.h.hh..hhhh..Z........x..Z........x......Z.....x.......x..x..x...x....x.hhh..h...........x.....hh....................Z....	XR:Z:GA	XX:Z:GGG4GGG6GG1G4GGG1G1GG2GGGG11G11G12G7G2G2G3G4G1GGG2G11G5GG25
                              It appears to happen when one read is fully contained by the other (due to trimming I presume)

                              Cheers,
                              Aaron
                              Hi Aaron,

                              The TLEN value was set to 0 whenever a read was completely contained within the other one, like this:

                              Code:
                              ------------------------->     read 1
                              <------------------------      read 2
                              I have now changed this behavior to just use the coordinates of the longer read instead, giving it a + sign for read 1 if read 1 was longer, and - sign if read 2 was longer (this seems somewhat arbitrary to me though, which is probably why had set it to 0 in the first place). If you think this is alright I'll keep it this way and make a new release as soon as I have looked at the other issues.

                              Comment


                              • Hi Felix - yep that makes total sense to me, awesome!

                                One other thing I just noticed is an inconsistency in result file names between single and paired end (at least for bowtie2) - paired end have "_bismark_bt2_pe.bam" appended whereas single end have "_bt2_bismark.bam" appended - the bt2 and bismark are swapped. Not a problem, just made writing my wrapper for bismark a bit more complicated looking!

                                Cheers,
                                Aaron

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin


                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                                  Yesterday, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                55 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                51 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                45 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                55 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X