SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
HISAT2 unmapped reads Ursus07 Bioinformatics 2 10-21-2016 05:11 PM
hisat2 output more reads than are in the file frymor Bioinformatics 1 04-12-2016 12:01 PM
removed reads that failed vendor QC and their mates jorge Bioinformatics 2 04-25-2012 10:03 AM
Discarding all reads (and their mates) that fail Picard validation wdt Bioinformatics 0 12-08-2010 05:22 PM
PubMed: Aggressive Assembly of Pyrosequencing Reads with Mates. Newsbot! Literature Watch 0 10-28-2008 06:00 AM

Reply
 
Thread Tools
Old 12-21-2016, 10:50 AM   #1
ronaldrcutler
Member
 
Location: Virginia

Join Date: May 2016
Posts: 80
Default Hisat2: Differing amount of reads in mates

Hello all,

I am aligning paired-end reads using Hisat2. Unfortunately, some of the fastq files were corrupted in transfer. I was able to recover them using a gzip recovery protocol, but was left with about half the data. I have previously used Tophat2 to align these "recovered" fastq files, but am getting this error when trying to align with Hisat2:
Code:
Error, fewer reads in file specified with -2 than in file specified with -1
How is Tophat2 dealing with this differently than Hisat2?

To fix this, I am trying repair.sh from the bbtools package to keep the reads that do have pairs in both files and output singletons to a seperate file and then try using all three with Hisat2.

Although I can't seem to find reference to this error anywhere else, any advice on how I should deal with this?
ronaldrcutler is offline   Reply With Quote
Old 12-21-2016, 11:47 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,550
Default

Don't use paired end reads with unmatched pairs since that can lead to erroneous discordant alignments. repair.sh is the way to go.
GenoMax is offline   Reply With Quote
Old 12-21-2016, 01:45 PM   #3
ronaldrcutler
Member
 
Location: Virginia

Join Date: May 2016
Posts: 80
Default

Thanks Genomax. I would assume in the Tophat2 run with these "recovered" files there were erroneous discordant alignments that would be discarded and not affect overall alignment.

I was successfully able to run Hisat2 following repair using the new paired-end files, including the singleton file. A high percentage of the singletons mapped uniquely - surely these are not erroneous alignments?
ronaldrcutler is offline   Reply With Quote
Old 12-21-2016, 04:59 PM   #4
dcameron
Member
 
Location: Australia

Join Date: Mar 2013
Posts: 25
Default

Firstly, since the underlying issue is a data corruption issue, I would strongly recommend you re-download the corrupted data. As it is, your results will be not be reproducable from the original data.

Quote:
Originally Posted by ronaldrcutler View Post
A high percentage of the singletons mapped uniquely - surely these are not erroneous alignments?
If only one of the .1.fq.gz/.2.fq.gz pair was corrupted, then there will be a large number of singleton reads from the file that was successfully copied. You would expect the unique mapping rate of these singleton reads to be only slightly less than the the unique mapping rate for the paired end reads. The difference between the two will be due to the aligner being able to use the partner read to disambiguate the mapping location for the pair end reads, but not the singletons.

TLDR: that behaviour is expected; they're probably correct; redownload the correct data before continuing
dcameron is offline   Reply With Quote
Reply

Tags
alignment, bbtools, hisat2, tophat2

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:32 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO