SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
HISAT Discordant Alignment Rate of RNAseq data was so high skly RNA Sequencing 2 08-29-2015 07:56 PM
High discordant alignments reventropy RNA Sequencing 5 04-16-2014 02:11 PM
Tophat2 produces thousands of invalid alignments drdna Bioinformatics 6 10-07-2013 03:37 AM
Getting just a few alignments with Tophat2 amarth Bioinformatics 4 01-14-2013 10:01 AM
BWA:high amount of unique alignments despite high mismatch tolerance moritzhess Bioinformatics 2 09-05-2011 12:31 PM

Reply
 
Thread Tools
Old 11-06-2015, 01:31 AM   #1
ea11
Member
 
Location: Southampton

Join Date: Jun 2015
Posts: 36
Default Tophat2 high discordant alignments

Hi,

I am mapping paired end RNAseq data using tophat2, but the alignment summary generated is showing I am getting a very high discordant alignment rate. The only tophat options I am specifying is -p 16 and -o "DIR". Below is the output from tophat2:


PHP Code:
Left reads:
          
Input     :  88556961
           Mapped   
:  76938162 (86.9of input)
            
of these:  20429665 (26.6%) have multiple alignments (622137 have >20)
Right reads:
          
Input     :  88556961
           Mapped   
:  75252663 (85.0of input)
            
of these:  20114304 (26.7%) have multiple alignments (621700 have >20)
Unpaired reads:
          
Input     :     68008
           Mapped   
:     56927 (83.7of input)
            
of these:      8045 (14.1%) have multiple alignments (9 have >20)
85.9overall read mapping rate.

Aligned pairs:  65389463
     of these
:  18622479 (28.5%) have multiple alignments
                61775607 
(94.5%) are discordant alignments
 4.1
concordant pair alignment rate
The flagstat output I get is also below:

PHP Code:
341377625 0 in total (QC-passed reads QC-failed reads)
189129873 0 secondary
0 supplimentary
0 duplicates
341377625 
0 mapped (100.00%:-nan%)
152190825 0 paired in sequencing
76938162 
0 read1
75252663 
0 read2
263998 
0 properly paired (0.17%:-nan%)
130778926 0 with itself and mate mapped
21411899 
0 singletons (14.07%:-nan%)
116104620 0 with mate mapped to a different chr
80799382 
0 with mate mapped to a different chr (mapQ>=5

I am using cutadapt to remove adapters and remove low quality reads and that is running fine. But then when I pass the paired files onto tophat, the results don't seem good. From what I have read, it is to do with the mate pairs no longer being in sync in the two fastq files. Is there a way around this and to get the number of discordant alignments down?

I have tried aligning the fastq files with tophat2 without passing the files through cutadapt first and the alignment is fine and there is a very low discordant alignment rate, so I'm guessing the fastq files are good, but something is happening after the cutadapt step.
Just as a note, I am not using Galaxy for the analysis.

Thanks
ea11 is offline   Reply With Quote
Old 11-06-2015, 04:36 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,547
Default

Quote:
Originally Posted by ea11 View Post
From what I have read, it is to do with the mate pairs no longer being in sync in the two fastq files. Is there a way around this and to get the number of discordant alignments down?
Thanks
Use a paired-end aware trimmer like trimmomatic/BBDuk (from BBMap) which keep the paired end files in sync post trimming.

That said, if you are happy with the cutadapt results and just want to fix the PE read order you can do so by using repair.sh from BBMap (paired end reads in two files example): http://seqanswers.com/forums/showpos...0&postcount=45
GenoMax is offline   Reply With Quote
Old 11-06-2015, 04:37 AM   #3
ea11
Member
 
Location: Southampton

Join Date: Jun 2015
Posts: 36
Default

Thanks for the reply. I though cutadapt did that with the -p option to specify paired end data.
I shall give BBDuk a try and see the results. I was not happy with the results of trimmomatic on my data, so staying away from that trimmer for now.

Thanks
ea11 is offline   Reply With Quote
Old 11-06-2015, 04:43 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,547
Default

Just checking. You are not switching the R1/R2 files when you use them as input for tophat by mistake? That will produce discordant results for obvious reasons.
GenoMax is offline   Reply With Quote
Old 11-06-2015, 04:46 AM   #5
ea11
Member
 
Location: Southampton

Join Date: Jun 2015
Posts: 36
Default

Nope I am not. R1 files are before the R2 files in the script.
ea11 is offline   Reply With Quote
Old 11-06-2015, 04:49 AM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,547
Default

BBMap will do spliced alignments so after you use BBDuk you may want to give BBMap a try on the side while you do your TopHat2 runs.
GenoMax is offline   Reply With Quote
Old 11-06-2015, 04:52 AM   #7
ea11
Member
 
Location: Southampton

Join Date: Jun 2015
Posts: 36
Default

Thanks, I shall have a read and see what the results look like with BBDuk/BBMap while my tophat jobs are running
ea11 is offline   Reply With Quote
Reply

Tags
discordant alignments, tophat2

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:04 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO