Seqanswers Leaderboard Ad

**id0** · 08-19-2013, 01:48 PM

Originally posted by offspring View Post

It's included, lines 85-89 in the current version. However, the bug has been fixed in TopHat 2.0.9.

That's what it looked like. However, I tried running this earlier today on TopHat 2.0.9 data. The script ran successfully. When I ran Picard on the merged file, I got "Mapped mate should have mate reference name" errors.

Eventually, Picard terminated with this:

Code:

Exception in thread "main" net.sf.samtools.SAMException: Exception when processing alignment for BAM index HISEQ01:514:H812AADXX:1:2105:9732:29000 1/2 51b unmapped read.
	at net.sf.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:120)
	at net.sf.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:168)
	at net.sf.picard.sam.AddOrReplaceReadGroups.doWork(AddOrReplaceReadGroups.java:100)
	at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
	at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:119)
	at net.sf.picard.sam.AddOrReplaceReadGroups.main(AddOrReplaceReadGroups.java:66)
Caused by: net.sf.samtools.SAMException: Exception creating BAM index for record HISEQ01:514:H812AADXX:1:2105:9732:29000 1/2 51b unmapped read.
	at net.sf.samtools.BAMIndexer.processAlignment(BAMIndexer.java:94)
	at net.sf.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:117)
	... 5 more

I went back to this thread and the only problem I could think of was that last step. If that's not the issue, do you know what may be? This is my first attempt to merge and process TopHat BAM files.

**offspring** · 08-19-2013, 11:27 PM

Originally posted by id0 View Post

That's what it looked like. However, I tried running this earlier today on TopHat 2.0.9 data. The script ran successfully. When I ran Picard on the merged file, I got "Mapped mate should have mate reference name" errors.

Eventually, Picard terminated with this:

Code:

Exception in thread "main" net.sf.samtools.SAMException: Exception when processing alignment for BAM index HISEQ01:514:H812AADXX:1:2105:9732:29000 1/2 51b unmapped read.
	at net.sf.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:120)
	at net.sf.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:168)
	at net.sf.picard.sam.AddOrReplaceReadGroups.doWork(AddOrReplaceReadGroups.java:100)
	at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
	at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:119)
	at net.sf.picard.sam.AddOrReplaceReadGroups.main(AddOrReplaceReadGroups.java:66)
Caused by: net.sf.samtools.SAMException: Exception creating BAM index for record HISEQ01:514:H812AADXX:1:2105:9732:29000 1/2 51b unmapped read.
	at net.sf.samtools.BAMIndexer.processAlignment(BAMIndexer.java:94)
	at net.sf.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:117)
	... 5 more

I went back to this thread and the only problem I could think of was that last step. If that's not the issue, do you know what may be? This is my first attempt to merge and process TopHat BAM files.

Hmm, the mate unmapped problem looks like the culprit then indeed. You can check your unmapped.bam file with the following command:

Code:

samtools view -f 0x8 unmapped.bam

If the file is OK, there should be at least a couple of read pairs (e.g. two reads with the same name) in there. I'll re-check what TopHat does and follow up with a new bug report to the developers if needed.

I disabled this specific piece of fixup code for TopHat > 2.0.8. You can enable it manually by simply changing "False" to "True" on line 43 for now.

EDIT: I can confirm that the bug in TopHat is still there. I've updated the script to make the fixup code unconditional again.

**id0** · 08-20-2013, 07:01 AM

Originally posted by offspring View Post

Hmm, the mate unmapped problem looks like the culprit then indeed. You can check your unmapped.bam file with the following command:

Code:

samtools view -f 0x8 unmapped.bam

If the file is OK, there should be at least a couple of read pairs (e.g. two reads with the same name) in there. I'll re-check what TopHat does and follow up with a new bug report to the developers if needed.

I disabled this specific piece of fixup code for TopHat > 2.0.8. You can enable it manually by simply changing "False" to "True" on line 43 for now.

EDIT: I can confirm that the bug in TopHat is still there. I've updated the script to make the fixup code unconditional again.

Looks like you already checked yourself, but just to confirm, I am not getting any reads with that flag. The BAM file has over 100,000,000 reads, so that should definitely be plenty.

Thanks for the quick follow-up! I'll try the updated script and see how it goes.

Also, I tried running Picard on the faulty merged file without CREATE_INDEX argument. It still spits out "Mapped mate should have mate reference name" errors, but completes without terminating. Indexing with samtools works. The merged file still has the same number of reads before and after Picard processing, so the erroneous reads are not being thrown away. The old script might still be useable.

**id0** · 08-20-2013, 07:17 AM

I am still getting Picard "Mapped mate should have mate reference name" errors with the new fixed unmapped.bam. Perhaps that particular step is not the culprit.

**offspring** · 08-27-2013, 06:13 AM

Originally posted by id0 View Post

I am still getting Picard "Mapped mate should have mate reference name" errors with the new fixed unmapped.bam. Perhaps that particular step is not the culprit.

Yes, this error occurred for files generated with TopHat < 2.0.7. Basically I messed up the order in which the processing steps should be performed. The updated version should fix this.

**id0** · 08-29-2013, 07:00 AM

I downloaded the newest fix_tophat_unmapped_reads.py. I saw the changed order. I ran it on TopHat 2.0.9 data, but I am still getting the "Mapped mate should have mate reference name" errors.

Sorry about the continued complaints.

**offspring** · 09-26-2013, 04:20 AM

Unfortunately I can't reproduce this with the datasets I have available. Since the datasets are probably too large to share, could you run some tests and report the results? The following would be interesting for a start:

Take the read recorded in the Picard exception, e.g. HISEQ01:514:H812AADXX:1:2105:9732:29000 in the example you posted earlier.

Look for the read in both accepted_hits.bam and unfixed unmapped.bam as well as fixed unmapped.bam:

Code:

samtools view accepted_hits.bam | grep HISEQ01:514:H812AADXX:1:2105:9732:29000

samtools view unmapped.bam | grep HISEQ01:514:H812AADXX:1:2105:9732:29000

samtools view unmapped_fixup.bam | grep HISEQ01:514:H812AADXX:1:2105:9732:29000

Hopefully we can pin down the problem.

**id0** · 09-26-2013, 11:07 AM

Originally posted by offspring View Post

Unfortunately I can't reproduce this with the datasets I have available. Since the datasets are probably too large to share, could you run some tests and report the results? The following would be interesting for a start:

Take the read recorded in the Picard exception, e.g. HISEQ01:514:H812AADXX:1:2105:9732:29000 in the example you posted earlier.

Look for the read in both accepted_hits.bam and unfixed unmapped.bam as well as fixed unmapped.bam:

Code:

samtools view accepted_hits.bam | grep HISEQ01:514:H812AADXX:1:2105:9732:29000

samtools view unmapped.bam | grep HISEQ01:514:H812AADXX:1:2105:9732:29000

samtools view unmapped_fixup.bam | grep HISEQ01:514:H812AADXX:1:2105:9732:29000

Hopefully we can pin down the problem.

As expected, accepted_hits.bam does not match anything.

Column 5 in unmapped.bam is 255, but is changed to 0 in unmapped_fixup.bam.

I trimmed my starting FASTQs to just 100,000 reads for testing. The resulting BAMs are less than 10 MBs combined. I posted them here if you want to check for yourself:

Dropbox - 404

https://dl.dropboxusercontent.com/u/5959741/TopHat-100K/accepted_hits.bam

Dropbox - 404

https://dl.dropboxusercontent.com/u/5959741/TopHat-100K/unmapped.bam

Dropbox - 404

https://dl.dropboxusercontent.com/u/5959741/TopHat-100K/unmapped_fixup.bam

(Whoever is reading this thread a few weeks from now, these will probably be broken links. Sorry.)

**offspring** · 09-30-2013, 02:26 AM

Can you reproduce the problem you described with the downsampled files you provided?

I am getting the following Picard exception with them:

Exception in thread "main" net.sf.picard.PicardException: Discordant contig lengths: read chr1 LN=195471971, ref chr1 LN=249250621

To me this seems to indicate a problem with how those files were derived from the originals.

**id0** · 09-30-2013, 07:29 AM

I think we are just using different reference sequences. I was using the iGenomes version of mm10.

I trimmed the original FASTQs to generate those BAMs.

Which Picard tool are you using? I just tried using SortSam, which does not require a reference sequence. That still gives me the same "Mapped mate should have mate reference name" errors for the merged BAM.

Code:

java -Xmx16G -jar ${PICARD_ROOT}/SortSam.jar \
VERBOSITY=WARNING QUIET=true VALIDATION_STRINGENCY=LENIENT \
SORT_ORDER=coordinate \
INPUT=all.bam \
OUTPUT=all.sortsam.bam

These are the specific errors (just the first few):

Code:

Ignoring SAM validation error: ERROR: Record 113550, Read name HISEQ01:267:H77T6ADXX:1:1101:15356:2352, Mapped mate should have mate reference name
Ignoring SAM validation error: ERROR: Record 114728, Read name HISEQ01:267:H77T6ADXX:1:1101:6588:2773, Mapped mate should have mate reference name
Ignoring SAM validation error: ERROR: Record 115145, Read name HISEQ01:267:H77T6ADXX:1:1101:15768:2826, Mapped mate should have mate reference name
Ignoring SAM validation error: ERROR: Record 115811, Read name HISEQ01:267:H77T6ADXX:1:1101:12628:3135, Mapped mate should have mate reference name
Ignoring SAM validation error: ERROR: Record 116103, Read name HISEQ01:267:H77T6ADXX:1:1101:19872:3119, Mapped mate should have mate reference name

**offspring** · 09-30-2013, 02:30 PM

You were right, with the mm10 genome I can reproduce the problem. I'll see what I can find.

**offspring** · 10-01-2013, 02:27 AM

The problematic reads are unmapped ones in unmapped.bam with the "mate is unmapped" flag not set. This means their mates would be expected to be mapped and in accepted_hits.bam; however they are not in there. This is a problem, as my script needs these mapped reads to update some fields in their unmapped mates. Because these fields cannot be updated, Picard barfs later on. The samtools output you posted earlier indicates this is also the problem in the original files.

Do you filter your fastq files in any way before handing them to TopHat?

**id0** · 10-01-2013, 02:37 PM

The FASTQs were not filtered.

I am thinking this may be due to TopHat parameters like "--prefilter-multihits" to exclude multi-mapped reads. I believe those reads would not end up in either accepted_hits.bam or unmapped.bam.

**offspring** · 10-02-2013, 05:50 AM

Yes, that's quite possible. We don't use any options like this in our TopHat commandline, as far as I see.

**offspring** · 10-30-2014, 10:29 AM

Short update: the script posted earlier in this thread has seen some updates (Python 3 compatibility, better argument handling etc) and is now called tophat_recondition.

It can be found here: https://github.com/cbrueffer/tophat-recondition

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News