I use tophat to map the rna-seq reads (paired-end). It seems that some of the reads in the .sam do not make sense. Please see some examples below:
Example line 1:
NB500923:20:H7WCJBGXX:1:12109:13961:14748 385 chr1 11646 3 151M chr22 114359002 0 CTTTTGGATTTTTGCCAGTCTAACAGGTGAAGCCCTGGAGATTCTTATTAGTGATTTGGGCTGGGGCCTGGCCATGTGTATTTTTTTAAATTTCCACTGATGATTTTGCTGCATGGCCGGTGTTGAGAATGACTGCGCAAATTTGCCGGAT AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEE/AEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEAEA/EAA<EEEEEEAEA/<AEE XA:i:0 MD:Z:151 NM:i:0 NH:i:2 CC:Z:chr15 CP:i:102519374 HI:i:0
In the above line, it says the start position of the mate on chr22 is 114359002, however, the size of the chr22 is only 51304566
Example line 2:
NB500923:20:H7WCJBGXX:1:23107:14442:20318 323 chr1 11696 0 151M chrUn_GL000249 155257832 0 GTGATTTGGGCTGGGGCCTGGCCATGTGTATTTTTTTAAATTTCCACTGATGATTTTGCTGCATGGCCGGTGTTGAGAATGACTGTGCAAATTTGCCGGATTTCCTTCGCTGTTCCTGCATGTAGTTTAAACGAGATTGCCAGCACCGGGT AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEAEEEEEAEEEEEEE<EEEEEEEEEEEEEEEAAEEA<EEEEEE XA:i:2 MD:Z:85C21T43 NM:i:2 NH:i:8 CC:Z:= CP:i:11696 HI:i:4
It says the start position of the mate on chrUn_GL000249 is 155257832, however, the total size of chrUn_GL000249 is only 38502.
Can anyone tell how this could happen? and how to fix the file?
Thanks,
Example line 1:
NB500923:20:H7WCJBGXX:1:12109:13961:14748 385 chr1 11646 3 151M chr22 114359002 0 CTTTTGGATTTTTGCCAGTCTAACAGGTGAAGCCCTGGAGATTCTTATTAGTGATTTGGGCTGGGGCCTGGCCATGTGTATTTTTTTAAATTTCCACTGATGATTTTGCTGCATGGCCGGTGTTGAGAATGACTGCGCAAATTTGCCGGAT AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEE/AEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEAEA/EAA<EEEEEEAEA/<AEE XA:i:0 MD:Z:151 NM:i:0 NH:i:2 CC:Z:chr15 CP:i:102519374 HI:i:0
In the above line, it says the start position of the mate on chr22 is 114359002, however, the size of the chr22 is only 51304566
Example line 2:
NB500923:20:H7WCJBGXX:1:23107:14442:20318 323 chr1 11696 0 151M chrUn_GL000249 155257832 0 GTGATTTGGGCTGGGGCCTGGCCATGTGTATTTTTTTAAATTTCCACTGATGATTTTGCTGCATGGCCGGTGTTGAGAATGACTGTGCAAATTTGCCGGATTTCCTTCGCTGTTCCTGCATGTAGTTTAAACGAGATTGCCAGCACCGGGT AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEAEEEEEAEEEEEEE<EEEEEEEEEEEEEEEAAEEA<EEEEEE XA:i:2 MD:Z:85C21T43 NM:i:2 NH:i:8 CC:Z:= CP:i:11696 HI:i:4
It says the start position of the mate on chrUn_GL000249 is 155257832, however, the total size of chrUn_GL000249 is only 38502.
Can anyone tell how this could happen? and how to fix the file?
Thanks,
Comment