I've aligned the Solexa data with tophat.
After doing the alignment, I got the bam file.
Since I want to find some fusion junction, which contains two different genes in one read, I have to search from the bam file.
I first transform the bam to sam.
Part of the sam file would be as follows:
HWUSI-EAS1812:1:100:10011:5732#0 147 chr6 63921573 3 76M = 63921570 0 GATTCCTCCCGGGACAGAAACAAGCCCTTTAAGTTTATGCTAGGCAAGCAGGAGGTGATCCGAGGCTGGGAAGAAG ghhghhghghhhhgfhhhhhhhghgghhhhhhhhchhffhahhhhhhhhhhhhhhghhhgfhhhhhfhhhhhhhhh NM:i:0 NH:i:2
HWUSI-EAS1812:1:100:10011:5732#0 163 chr20 1356146 3 76M = 1356149 0 CTTCTTCCCAGCCTCGGATCACCTCCTGCTTGCCTAGCATAAACTTAAAGGGCTTGTTTCTGTCCCGGGAGGAATC hhhhhhhhhfhhhhhfghhhghhhhhhhhhhhhhhahffhhchhhhhhhhgghghhhhhhhfghhhhghghhghhg NM:i:0 NH:i:2 CC:Z:chr6 CP:i:63921573
HWUSI-EAS1812:1:100:10011:5732#0 83 chr20 1356149 3 76M = 1356146 0 CTTCCCAGCCTCGGATCACCTCCTGCTTGCCTAGCATAAACTTAAAGGGCTTGTTTCTGTCCCGGGAGGAATCAAA gefhhhhhhhhhhhhhhdghhhhhhhhhhhhhhhhhhhhghhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:2 CC:Z:chr6 CP:i:63921570
HWUSI-EAS1812:1:100:10011:5732#0 99 chr6 63921570 3 76M = 63921573 0 TTTGATTCCTCCCGGGACAGAAACAAGCCCTTTAAGTTTATGCTAGGCAAGCAGGAGGTGATCCGAGGCTGGGAAG hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhghhhhhhhhhhhhhhhhhhhhgdhhhhhhhhhhhhhhfeg NM:i:0 NH:i:2
Well, I just want to find the paired-end two reads pair, and I want to analyze the CIGAR term.
I'm not sure why the Read name(HWUSI-EAS1812:1:100:10011:5732#0) include so many different positions.
It's quite weird.
Could anybody explain their meanings?
Thanks a lot.
After doing the alignment, I got the bam file.
Since I want to find some fusion junction, which contains two different genes in one read, I have to search from the bam file.
I first transform the bam to sam.
Part of the sam file would be as follows:
HWUSI-EAS1812:1:100:10011:5732#0 147 chr6 63921573 3 76M = 63921570 0 GATTCCTCCCGGGACAGAAACAAGCCCTTTAAGTTTATGCTAGGCAAGCAGGAGGTGATCCGAGGCTGGGAAGAAG ghhghhghghhhhgfhhhhhhhghgghhhhhhhhchhffhahhhhhhhhhhhhhhghhhgfhhhhhfhhhhhhhhh NM:i:0 NH:i:2
HWUSI-EAS1812:1:100:10011:5732#0 163 chr20 1356146 3 76M = 1356149 0 CTTCTTCCCAGCCTCGGATCACCTCCTGCTTGCCTAGCATAAACTTAAAGGGCTTGTTTCTGTCCCGGGAGGAATC hhhhhhhhhfhhhhhfghhhghhhhhhhhhhhhhhahffhhchhhhhhhhgghghhhhhhhfghhhhghghhghhg NM:i:0 NH:i:2 CC:Z:chr6 CP:i:63921573
HWUSI-EAS1812:1:100:10011:5732#0 83 chr20 1356149 3 76M = 1356146 0 CTTCCCAGCCTCGGATCACCTCCTGCTTGCCTAGCATAAACTTAAAGGGCTTGTTTCTGTCCCGGGAGGAATCAAA gefhhhhhhhhhhhhhhdghhhhhhhhhhhhhhhhhhhhghhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 NH:i:2 CC:Z:chr6 CP:i:63921570
HWUSI-EAS1812:1:100:10011:5732#0 99 chr6 63921570 3 76M = 63921573 0 TTTGATTCCTCCCGGGACAGAAACAAGCCCTTTAAGTTTATGCTAGGCAAGCAGGAGGTGATCCGAGGCTGGGAAG hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhghhhhhhhhhhhhhhhhhhhhgdhhhhhhhhhhhhhhfeg NM:i:0 NH:i:2
Well, I just want to find the paired-end two reads pair, and I want to analyze the CIGAR term.
I'm not sure why the Read name(HWUSI-EAS1812:1:100:10011:5732#0) include so many different positions.
It's quite weird.
Could anybody explain their meanings?
Thanks a lot.