Hi guys,
I have a question about counting read number for a specific region on gene. However, the region that I am interested in is neither genes, transcripts, nor exons. It is defined by myself in this way. For example, we have a transcript with 5 exons: 1-2-3-4-5, and suppose exon 3 is very short, so that a read pass through 2-3-4 can be generated. So how can I count the number of reads pass through, 1, 1-2, 2, 2-3, 2-3-4, 3-4, 4, 4-5, 5 based on a bam file give by Tophat or Bowtie? In other words, is there any way to generate a new variable for each line in bam file to denote the reads' type?
And at the same time, how can I build my own annotation file based on the existent exon annotation and Tophat junction file? For example, the know annotation file has a transcript in this way: 1-2-3. But Tophat find a new 5' donor site within exon 1. If this is the case, I want to separate the original 1 into two parts 1 and 1'.
Any one has any idea for coding this?
Thanks for help!
I have a question about counting read number for a specific region on gene. However, the region that I am interested in is neither genes, transcripts, nor exons. It is defined by myself in this way. For example, we have a transcript with 5 exons: 1-2-3-4-5, and suppose exon 3 is very short, so that a read pass through 2-3-4 can be generated. So how can I count the number of reads pass through, 1, 1-2, 2, 2-3, 2-3-4, 3-4, 4, 4-5, 5 based on a bam file give by Tophat or Bowtie? In other words, is there any way to generate a new variable for each line in bam file to denote the reads' type?
And at the same time, how can I build my own annotation file based on the existent exon annotation and Tophat junction file? For example, the know annotation file has a transcript in this way: 1-2-3. But Tophat find a new 5' donor site within exon 1. If this is the case, I want to separate the original 1 into two parts 1 and 1'.
Any one has any idea for coding this?
Thanks for help!