Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by offspring View Post
    It's included, lines 85-89 in the current version. However, the bug has been fixed in TopHat 2.0.9.
    That's what it looked like. However, I tried running this earlier today on TopHat 2.0.9 data. The script ran successfully. When I ran Picard on the merged file, I got "Mapped mate should have mate reference name" errors.

    Eventually, Picard terminated with this:
    Code:
    Exception in thread "main" net.sf.samtools.SAMException: Exception when processing alignment for BAM index HISEQ01:514:H812AADXX:1:2105:9732:29000 1/2 51b unmapped read.
    	at net.sf.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:120)
    	at net.sf.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:168)
    	at net.sf.picard.sam.AddOrReplaceReadGroups.doWork(AddOrReplaceReadGroups.java:100)
    	at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
    	at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:119)
    	at net.sf.picard.sam.AddOrReplaceReadGroups.main(AddOrReplaceReadGroups.java:66)
    Caused by: net.sf.samtools.SAMException: Exception creating BAM index for record HISEQ01:514:H812AADXX:1:2105:9732:29000 1/2 51b unmapped read.
    	at net.sf.samtools.BAMIndexer.processAlignment(BAMIndexer.java:94)
    	at net.sf.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:117)
    	... 5 more
    I went back to this thread and the only problem I could think of was that last step. If that's not the issue, do you know what may be? This is my first attempt to merge and process TopHat BAM files.
    Last edited by id0; 08-19-2013, 01:50 PM.

    Comment


    • #17
      Originally posted by id0 View Post
      That's what it looked like. However, I tried running this earlier today on TopHat 2.0.9 data. The script ran successfully. When I ran Picard on the merged file, I got "Mapped mate should have mate reference name" errors.

      Eventually, Picard terminated with this:
      Code:
      Exception in thread "main" net.sf.samtools.SAMException: Exception when processing alignment for BAM index HISEQ01:514:H812AADXX:1:2105:9732:29000 1/2 51b unmapped read.
      	at net.sf.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:120)
      	at net.sf.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:168)
      	at net.sf.picard.sam.AddOrReplaceReadGroups.doWork(AddOrReplaceReadGroups.java:100)
      	at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
      	at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:119)
      	at net.sf.picard.sam.AddOrReplaceReadGroups.main(AddOrReplaceReadGroups.java:66)
      Caused by: net.sf.samtools.SAMException: Exception creating BAM index for record HISEQ01:514:H812AADXX:1:2105:9732:29000 1/2 51b unmapped read.
      	at net.sf.samtools.BAMIndexer.processAlignment(BAMIndexer.java:94)
      	at net.sf.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:117)
      	... 5 more
      I went back to this thread and the only problem I could think of was that last step. If that's not the issue, do you know what may be? This is my first attempt to merge and process TopHat BAM files.
      Hmm, the mate unmapped problem looks like the culprit then indeed. You can check your unmapped.bam file with the following command:

      Code:
      samtools view -f 0x8 unmapped.bam
      If the file is OK, there should be at least a couple of read pairs (e.g. two reads with the same name) in there. I'll re-check what TopHat does and follow up with a new bug report to the developers if needed.

      I disabled this specific piece of fixup code for TopHat > 2.0.8. You can enable it manually by simply changing "False" to "True" on line 43 for now.

      EDIT: I can confirm that the bug in TopHat is still there. I've updated the script to make the fixup code unconditional again.
      Last edited by offspring; 08-20-2013, 05:21 AM.

      Comment


      • #18
        Originally posted by offspring View Post
        Hmm, the mate unmapped problem looks like the culprit then indeed. You can check your unmapped.bam file with the following command:

        Code:
        samtools view -f 0x8 unmapped.bam
        If the file is OK, there should be at least a couple of read pairs (e.g. two reads with the same name) in there. I'll re-check what TopHat does and follow up with a new bug report to the developers if needed.

        I disabled this specific piece of fixup code for TopHat > 2.0.8. You can enable it manually by simply changing "False" to "True" on line 43 for now.

        EDIT: I can confirm that the bug in TopHat is still there. I've updated the script to make the fixup code unconditional again.
        Looks like you already checked yourself, but just to confirm, I am not getting any reads with that flag. The BAM file has over 100,000,000 reads, so that should definitely be plenty.

        Thanks for the quick follow-up! I'll try the updated script and see how it goes.

        Also, I tried running Picard on the faulty merged file without CREATE_INDEX argument. It still spits out "Mapped mate should have mate reference name" errors, but completes without terminating. Indexing with samtools works. The merged file still has the same number of reads before and after Picard processing, so the erroneous reads are not being thrown away. The old script might still be useable.

        Comment


        • #19
          I am still getting Picard "Mapped mate should have mate reference name" errors with the new fixed unmapped.bam. Perhaps that particular step is not the culprit.

          Comment


          • #20
            Originally posted by id0 View Post
            I am still getting Picard "Mapped mate should have mate reference name" errors with the new fixed unmapped.bam. Perhaps that particular step is not the culprit.
            Yes, this error occurred for files generated with TopHat < 2.0.7. Basically I messed up the order in which the processing steps should be performed. The updated version should fix this.

            Comment


            • #21
              I downloaded the newest fix_tophat_unmapped_reads.py. I saw the changed order. I ran it on TopHat 2.0.9 data, but I am still getting the "Mapped mate should have mate reference name" errors.

              Sorry about the continued complaints.

              Comment


              • #22
                Unfortunately I can't reproduce this with the datasets I have available. Since the datasets are probably too large to share, could you run some tests and report the results? The following would be interesting for a start:

                Take the read recorded in the Picard exception, e.g. HISEQ01:514:H812AADXX:1:2105:9732:29000 in the example you posted earlier.

                Look for the read in both accepted_hits.bam and unfixed unmapped.bam as well as fixed unmapped.bam:

                Code:
                samtools view accepted_hits.bam | grep HISEQ01:514:H812AADXX:1:2105:9732:29000
                
                samtools view unmapped.bam | grep HISEQ01:514:H812AADXX:1:2105:9732:29000
                
                samtools view unmapped_fixup.bam | grep HISEQ01:514:H812AADXX:1:2105:9732:29000
                Hopefully we can pin down the problem.

                Comment


                • #23
                  Originally posted by offspring View Post
                  Unfortunately I can't reproduce this with the datasets I have available. Since the datasets are probably too large to share, could you run some tests and report the results? The following would be interesting for a start:

                  Take the read recorded in the Picard exception, e.g. HISEQ01:514:H812AADXX:1:2105:9732:29000 in the example you posted earlier.

                  Look for the read in both accepted_hits.bam and unfixed unmapped.bam as well as fixed unmapped.bam:

                  Code:
                  samtools view accepted_hits.bam | grep HISEQ01:514:H812AADXX:1:2105:9732:29000
                  
                  samtools view unmapped.bam | grep HISEQ01:514:H812AADXX:1:2105:9732:29000
                  
                  samtools view unmapped_fixup.bam | grep HISEQ01:514:H812AADXX:1:2105:9732:29000
                  Hopefully we can pin down the problem.
                  As expected, accepted_hits.bam does not match anything.

                  Column 5 in unmapped.bam is 255, but is changed to 0 in unmapped_fixup.bam.

                  I trimmed my starting FASTQs to just 100,000 reads for testing. The resulting BAMs are less than 10 MBs combined. I posted them here if you want to check for yourself:



                  (Whoever is reading this thread a few weeks from now, these will probably be broken links. Sorry.)

                  Comment


                  • #24
                    Can you reproduce the problem you described with the downsampled files you provided?

                    I am getting the following Picard exception with them:

                    Exception in thread "main" net.sf.picard.PicardException: Discordant contig lengths: read chr1 LN=195471971, ref chr1 LN=249250621
                    To me this seems to indicate a problem with how those files were derived from the originals.

                    Comment


                    • #25
                      I think we are just using different reference sequences. I was using the iGenomes version of mm10.

                      I trimmed the original FASTQs to generate those BAMs.

                      Which Picard tool are you using? I just tried using SortSam, which does not require a reference sequence. That still gives me the same "Mapped mate should have mate reference name" errors for the merged BAM.

                      Code:
                      java -Xmx16G -jar ${PICARD_ROOT}/SortSam.jar \
                      VERBOSITY=WARNING QUIET=true VALIDATION_STRINGENCY=LENIENT \
                      SORT_ORDER=coordinate \
                      INPUT=all.bam \
                      OUTPUT=all.sortsam.bam
                      These are the specific errors (just the first few):
                      Code:
                      Ignoring SAM validation error: ERROR: Record 113550, Read name HISEQ01:267:H77T6ADXX:1:1101:15356:2352, Mapped mate should have mate reference name
                      Ignoring SAM validation error: ERROR: Record 114728, Read name HISEQ01:267:H77T6ADXX:1:1101:6588:2773, Mapped mate should have mate reference name
                      Ignoring SAM validation error: ERROR: Record 115145, Read name HISEQ01:267:H77T6ADXX:1:1101:15768:2826, Mapped mate should have mate reference name
                      Ignoring SAM validation error: ERROR: Record 115811, Read name HISEQ01:267:H77T6ADXX:1:1101:12628:3135, Mapped mate should have mate reference name
                      Ignoring SAM validation error: ERROR: Record 116103, Read name HISEQ01:267:H77T6ADXX:1:1101:19872:3119, Mapped mate should have mate reference name
                      Last edited by id0; 09-30-2013, 07:31 AM.

                      Comment


                      • #26
                        You were right, with the mm10 genome I can reproduce the problem. I'll see what I can find.

                        Comment


                        • #27
                          The problematic reads are unmapped ones in unmapped.bam with the "mate is unmapped" flag not set. This means their mates would be expected to be mapped and in accepted_hits.bam; however they are not in there. This is a problem, as my script needs these mapped reads to update some fields in their unmapped mates. Because these fields cannot be updated, Picard barfs later on. The samtools output you posted earlier indicates this is also the problem in the original files.

                          Do you filter your fastq files in any way before handing them to TopHat?

                          Comment


                          • #28
                            The FASTQs were not filtered.

                            I am thinking this may be due to TopHat parameters like "--prefilter-multihits" to exclude multi-mapped reads. I believe those reads would not end up in either accepted_hits.bam or unmapped.bam.

                            Comment


                            • #29
                              Yes, that's quite possible. We don't use any options like this in our TopHat commandline, as far as I see.

                              Comment


                              • #30
                                Short update: the script posted earlier in this thread has seen some updates (Python 3 compatibility, better argument handling etc) and is now called tophat_recondition.

                                It can be found here: https://github.com/cbrueffer/tophat-recondition

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                18 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                22 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                17 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                49 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X