SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   RNA Sequencing (http://seqanswers.com/forums/forumdisplay.php?f=26)
-   -   what is wrong with samtools flagstat or read mapping with tophat? (http://seqanswers.com/forums/showthread.php?t=74370)

rajesh1989 02-21-2017 07:51 PM

what is wrong with samtools flagstat or read mapping with tophat?
 
I have 6,673,385 (around 6 million) reads in each pair end file after quality filtering. but when i map it using tophat and run samtools flagstat on bam file it gives following output
1343686 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
1343686 + 0 mapped (100.00%:-nan%)
1343686 + 0 paired in sequencing
670808 + 0 read1
672878 + 0 read2
1203600 + 0 properly paired (89.57%:-nan%)
1311198 + 0 with itself and mate mapped
32488 + 0 singletons (2.42%:-nan%)
15874 + 0 with mate mapped to a different chr
452 + 0 with mate mapped to a different chr (mapQ>=5)
I am not very sure how to interpret samtools flagstat output, but as i assume there are only 670808 (around 0.6 million) reads in pair1 are mapped 672878 (around 0.6 million) from pair2. is it correct? That is 1/10 th of the total input reads. where are rest of my reads???

Report produced by tophat shows some other statistics

Left reads:
Input : 468668
Mapped : 443344 (94.6% of input)
of these: 216780 (48.9%) have multiple alignments (1 have >20)
Right reads:
Input : 468668
Mapped : 444468 (94.8% of input)
of these: 217699 (49.0%) have multiple alignments (1 have >20)
94.7% overall read mapping rate.
Aligned pairs: 433356
of these: 211726 (48.9%) have multiple alignments
5512 ( 1.3%) are discordant alignments
91.3% concordant pair alignment rate.
why is tophat saying it mapped around 94% of reads when there are around 6 million reads in beginning?
how to interpret all these numbers thank you.

gringer 02-22-2017 02:26 AM

samtools flagstat can only report what's in the file, so if there are no unmapped reads in the BAM file then the calculated mapping rate will be 100% (with some reduction in that due to unpaired and low-quality mappings, if included).

rajesh1989 02-22-2017 06:21 AM

thank you for the reply.
this is output of tophat prep_reads.info

left_min_read_len=25
left_max_read_len=101
left_reads_in =6673385
left_reads_out=6667431
right_min_read_len=25
right_max_read_len=101
right_reads_in =6673385
right_reads_out=6673220

where are rest of the reads if tophat didn't map them. i also checked unmapped.bam it's size is very small.

fanli 02-22-2017 07:12 AM

Quote:

Originally Posted by rajesh1989 (Post 204505)
Report produced by tophat shows some other statistics

Left reads:
Input : 468668
Mapped : 443344 (94.6% of input)
of these: 216780 (48.9%) have multiple alignments (1 have >20)
Right reads:
Input : 468668
Mapped : 444468 (94.8% of input)
of these: 217699 (49.0%) have multiple alignments (1 have >20)
94.7% overall read mapping rate.
Aligned pairs: 433356
of these: 211726 (48.9%) have multiple alignments
5512 ( 1.3%) are discordant alignments
91.3% concordant pair alignment rate.

This says your input to tophat is only ~460k read pairs. This directly contradicts what you posted in the prep_reads.info. Are you sure you don't have mismatched files?

rajesh1989 02-22-2017 07:21 AM

Hello,

whatever i have written here is correct i just copied details and pasted here.

what do you mean by mismatched files?

that is my actual query why tophat is taking only ~460k read pairs?

fanli 02-22-2017 09:17 AM

Like the prep_reads.info is from one sample and the tophat align_summary is from another?

rajesh1989 02-22-2017 09:44 PM

no they are not in very same folder i have those two files.

fanli 02-23-2017 07:16 AM

Perhaps you have mixed up files in your script. You may want to check the logs in your tophat output directory.

As an example, here's what my align_summary.txt looks like:
Code:

Left reads:
          Input    :  6551998
          Mapped  :  5980941 (91.3% of input)
            of these:    199516 ( 3.3%) have multiple alignments (10560 have >10)
Right reads:
          Input    :  6551998
          Mapped  :  5574400 (85.1% of input)
            of these:    184354 ( 3.3%) have multiple alignments (10346 have >10)
88.2% overall read mapping rate.

Aligned pairs:  5394272
    of these:    177939 ( 3.3%) have multiple alignments
                  148603 ( 2.8%) are discordant alignments
80.1% concordant pair alignment rate.

and the corresponding prep_reads.info:
Code:

left_min_read_len=75
left_max_read_len=75
left_reads_in =6551998
left_reads_out=6544622
right_min_read_len=75
right_max_read_len=75
right_reads_in =6551998
right_reads_out=6495499

Note that both files refer to 6551998 as the number of read pairs input.

rajesh1989 02-23-2017 06:18 PM

i got the answer. i think this is some issue with multi threading. when i run tophat on single core i get correct results. other peoples have also reported this issue.


All times are GMT -8. The time now is 06:53 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.