I just went through my first TopHat analysis, and am worried about the output. There are actually two things that concern me:
1. The bitwise FLAG in the SAM file is 16 for every read. I know that 16 means a read from the reverse strand. But shouldn't half of the reads map to the reverse, and the other half to the forward? I used TruSeq3 Stranded for my library prep. If my reads are only mapping to the reverse strand, doesn't that mean I will miss the expression of any gene that resides on the forward strand?
2. TopHat aligned 84.1% of reads (~29 million) to the genome, of which 7.9% (~2 million) were mapped multiple times. Since I chose the option allow a maximum of 1 alignment per read, I would expect my SAM file to be approximately 29 million lines long. However, it is about 32 million lines long! How can this be?
Thanks to anyone who can help me with this.
1. The bitwise FLAG in the SAM file is 16 for every read. I know that 16 means a read from the reverse strand. But shouldn't half of the reads map to the reverse, and the other half to the forward? I used TruSeq3 Stranded for my library prep. If my reads are only mapping to the reverse strand, doesn't that mean I will miss the expression of any gene that resides on the forward strand?
2. TopHat aligned 84.1% of reads (~29 million) to the genome, of which 7.9% (~2 million) were mapped multiple times. Since I chose the option allow a maximum of 1 alignment per read, I would expect my SAM file to be approximately 29 million lines long. However, it is about 32 million lines long! How can this be?
Thanks to anyone who can help me with this.
Comment