Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
how to get uniquely aligned reads from bowtie anurupa Bioinformatics 7 10-11-2019 08:05 AM
SAMtools pileup of millions of reads from a single amplicon svos Bioinformatics 8 02-21-2014 08:44 AM
TopHat and uniquely aligned reads bgibb Bioinformatics 13 10-02-2013 09:45 PM
How to count aligned RNA-seq reads after sequenced and aligned by Illumina? IceWater Illumina/Solexa 5 04-05-2012 09:18 AM
Dindel problem: overlapping windows and non-uniquely re-aligned reads Yilong Li Bioinformatics 5 03-07-2011 02:10 PM

Thread Tools
Old 04-03-2017, 04:43 PM   #1
Junior Member
Location: japan

Join Date: Mar 2013
Posts: 4
Default where are my reads going: millions of fragments that uniquely aligned to the genes

Hi experts,

I am new to sequencing and the related use of bioinformatics so to skip the process of generating bam files I do the following:

I first generate Tophat alignment in illumina basespace.
The alignment and the summary of the alignment looks reasonable, for example for a rapid run of 90 samples for one of the sample I get the following data in basespace:

Number of Reads: Read1-4,709,154 Read2-4,709,154
Total Aligned Reads (% Reads): Read1-81.68% Read2-85.20%
Abundant Reads (% Reads) Read1-18.52% Read2-19.53%
Unaligned Reads (% Reads) Read1-18.32% Read2-14.80%

I then take the bam files [*.alignment.bam] and use it to align to the human genes .gtf file using the bioconductor RNAseq workflow pipeline. I have downloaded the human genes .gtf file from iGenome.

when I check for the millions of fragments that uniquely aligned to the genes using the " round( colSums(assay(se)) / 1e6, 1 ) " command in R

I get only 0.5 million reads aligned to genes

given the tophat data from basespace (shown above) even if I consider 81% of 4,709,154 and then subtract the percent for unaligned reads and abundant read I should still get around 2 million reads for the sample.

where are my reads going when I am aligning it to the .gtf file.

Thanks in advance for your valuable time.

rammohanshukla is offline   Reply With Quote
Old 04-03-2017, 08:57 PM   #2
Location: Antwerp, Belgium

Join Date: Oct 2015
Posts: 97

Always the first thing to check is if your fasta and gtf use the same chromosome notation.
wdecoster is offline   Reply With Quote
Old 04-04-2017, 03:05 AM   #3
Junior Member
Location: japan

Join Date: Mar 2013
Posts: 4

But I guess if the chromosome notations are different the command will not be executed. If I am getting some results, shouldn't it mean that the notations are same?
Besides I am using gtf file from illumina igenome and the bam files are also generated by illumina.

In your answer do you mean, bam file and gtf file, right?

rammohanshukla is offline   Reply With Quote
Old 04-04-2017, 06:31 AM   #4
Senior Member
Location: uk

Join Date: Mar 2009
Posts: 667

The chromosome names in the bam file and bam file header will be the same as those in the reference fasta file used to align the reads to.

Have you checked if the basespace alignment and the gtf file use the same version of the human genome?

Last edited by mastal; 04-04-2017 at 06:34 AM.
mastal is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 08:36 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO