Hi experts,
I am new to sequencing and the related use of bioinformatics so to skip the process of generating bam files I do the following:
I first generate Tophat alignment in illumina basespace.
The alignment and the summary of the alignment looks reasonable, for example for a rapid run of 90 samples for one of the sample I get the following data in basespace:
Number of Reads: Read1-4,709,154 Read2-4,709,154
Total Aligned Reads (% Reads): Read1-81.68% Read2-85.20%
Abundant Reads (% Reads) Read1-18.52% Read2-19.53%
Unaligned Reads (% Reads) Read1-18.32% Read2-14.80%
I then take the bam files [*.alignment.bam] and use it to align to the human genes .gtf file using the bioconductor RNAseq workflow pipeline. I have downloaded the human genes .gtf file from iGenome.
when I check for the millions of fragments that uniquely aligned to the genes using the " round( colSums(assay(se)) / 1e6, 1 ) " command in R
I get only 0.5 million reads aligned to genes
given the tophat data from basespace (shown above) even if I consider 81% of 4,709,154 and then subtract the percent for unaligned reads and abundant read I should still get around 2 million reads for the sample.
where are my reads going when I am aligning it to the .gtf file.
Thanks in advance for your valuable time.
Ram
I am new to sequencing and the related use of bioinformatics so to skip the process of generating bam files I do the following:
I first generate Tophat alignment in illumina basespace.
The alignment and the summary of the alignment looks reasonable, for example for a rapid run of 90 samples for one of the sample I get the following data in basespace:
Number of Reads: Read1-4,709,154 Read2-4,709,154
Total Aligned Reads (% Reads): Read1-81.68% Read2-85.20%
Abundant Reads (% Reads) Read1-18.52% Read2-19.53%
Unaligned Reads (% Reads) Read1-18.32% Read2-14.80%
I then take the bam files [*.alignment.bam] and use it to align to the human genes .gtf file using the bioconductor RNAseq workflow pipeline. I have downloaded the human genes .gtf file from iGenome.
when I check for the millions of fragments that uniquely aligned to the genes using the " round( colSums(assay(se)) / 1e6, 1 ) " command in R
I get only 0.5 million reads aligned to genes
given the tophat data from basespace (shown above) even if I consider 81% of 4,709,154 and then subtract the percent for unaligned reads and abundant read I should still get around 2 million reads for the sample.
where are my reads going when I am aligning it to the .gtf file.
Thanks in advance for your valuable time.
Ram
Comment