Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • TopHat .sam output format not recognised

    I am using TopHat v. 1.0.9 to analyse 36bp Illumina reads. The out put files .sam .wig and .bed have been generated successfully. I then try to view the SAM format results using the samtools and encounter probs. The pileup will not work.

    samtools pileup -cf ref.fa accepted_hits.sam > pileup.txt generates the following error:

    [bam_pileup] fail to read the header: non-exisiting file or wrong format.

    even though the SAM format looks correct. The tview tool just generates a blank window when I try to view the BAM file I generated from the SAM file using samtools view command.

    Anyone any ideas why the SAM format output is not recognised? Thanks for your help.

  • #2
    TopHat 1.0.9 has some lingering SAM compliance issues which may be contributing to this issue. I plan to release a new version this week that fixes these issues. Can you please email me a small sample of the accepted_hits.sam file that will reproduce this error, along with the samtools version info for your machine?

    Comment


    • #3
      Read orientation in Tophat SAM output

      Though I've been successful in getting Tophat's SAM output into a SAMtools pileup, I've been having considerable trouble with a related issue.

      According to the SAM specifications, for each alignment entry:
      "7. All mapped reads are represented on the forward genomic strand. *The bases are reverse complemented from the unmapped read sequence and the quality scores and cigar strings are recorded consistently with the bases. *This
      applies to information in the mate tags (R2, Q2, S2, etc.) and any other tags that are strand sensitive. *The strand bits in the flag simply indicates whether this reverse complement transform was applied from the original read sequence to obtain the bases listed in the SAM file."

      I just noticed in my Tophat accepted_hits.sam file, this is not the case: - strand reads are NOT "on the forward genomic strand."

      Example:
      HWI-E3:34:837:1749#0 0 chr11 100579857 255 34M * 0 0 GTTGTATCGTCTAAGCAGACAGTCTGTGTGAGGA _abbabb`[bbab\_aab`U_\_Za]THZXHW]Y NM:i:0

      HWI-EA:3:94:45:1276#0 16 chr11 100579857 255 34M * 0 0 TCCTCACACAGACTGTCTGCTTAGACGATACAAC aa_^^[^U`^\__]_W^_]]_`QWYSZU\VKLSY NM:i:0

      Both reads are given the same start coordinate, but are clearly reported in reverse orientations. I'm aligning single-end 34nt reads to mouse genome.

      Though the different orientations seem to be showing up in the FLAG column, the sequences themselves don't get along with SAMtools. At present, my output is completely incompatible with pileup -c and SNP calling.

      I know the alignments are good - if I take the tophat tmp bowtie output file (left_kept_reads_seg1.bwtout) and use samtools bowtie2sam.pl to convert to SAM format, orientations are correct and pileup -c, etc. work perfectly.

      Any ideas what I might be doing incorrectly to have the Tophat output flipped like this? Or is this a Tophat issue with SAM output in general?

      Thanks so much for the help!

      Comment


      • #4
        This is a new bug in 1.0.9, and is fixed in the imminent 1.0.10 release. Sorry for the inconvenience.

        Comment


        • #5
          I've just released 1.0.10 on the site. Please let me know if you still encounter any SAM-related issues.

          Comment


          • #6
            I am using the Tophat-1.0.8 and Bowtie-0.9.9.3.
            From the accepted_hits.sam, I picked up some short-reads and and tried to search them on BLAT Search Genome (http://genome.ucsc.edu/cgi-bin/hgBlat),
            But I found a confused problem that the strand is changed from "-" to "+" on the BLAT search result;
            And I also tried other records and they have the same problem.
            I don't know if it is also one new bug.

            Example:
            HWI-EAS185:3:30:709:1165#0/1 X 153280051 - 0 0 77 77 77 4 34 1 0 52 gatcttgGgccgCCGCCCCGCCCGTTGGTGAGTCTTGAATCCGTGTACTTTC %%%879/?9672>@?AB=BAAAA=A?@>?BABBBBBBAB@BB@BBBBAABBB
            Last edited by iloveneworleans; 07-31-2009, 01:11 PM.

            Comment


            • #7
              I just downloaded the latest Tophat-1.0.10, and found there is an segment join failed error at the last.
              And Tophat cannot generate the output files.
              But the previous version did not have this error.
              Is this a new incompatibility or anything else?

              [Fri Jul 31 13:58:04 2009] Beginning TopHat run (v1.0.10)
              -----------------------------------------------
              [Fri Jul 31 13:58:04 2009] Preparing output location ./tophat_out/
              [Fri Jul 31 13:58:04 2009] Checking for Bowtie index files
              [Fri Jul 31 13:58:04 2009] Checking for reference FASTA file
              [Fri Jul 31 13:58:04 2009] Checking for Bowtie
              Bowtie version: 0.10.0.0
              [Fri Jul 31 13:58:04 2009] Checking reads
              seed length: 50bp
              format: fastq
              quality scale: --phred33-quals
              [Fri Jul 31 13:59:06 2009] Reading known junctions from GFF file
              Splitting reads into 2 segments
              [Fri Jul 31 14:00:21 2009] Mapping reads against h_sapiens_asm with Bowtie
              [Fri Jul 31 14:05:34 2009] Mapping reads against h_sapiens_asm with Bowtie
              [Fri Jul 31 14:10:42 2009] Searching for junctions via segment mapping
              [Fri Jul 31 14:22:10 2009] Retrieving sequences for splices
              [Fri Jul 31 14:24:18 2009] Indexing splices
              Warning: Encountered reference sequence with only gaps
              Warning: Encountered reference sequence with only gaps
              .....

              [Fri Jul 31 14:24:41 2009] Mapping reads against segment_juncs with Bowtie
              [Fri Jul 31 14:26:35 2009] Mapping reads against segment_juncs with Bowtie
              [Fri Jul 31 14:28:30 2009] Joining segment hits
              [FAILED]
              Error: Segment join failed with err = 1

              Comment


              • #8
                Can you please email me the logs directory from this run?

                Comment


                • #9
                  Tophat 1.0.10

                  Hi Cole-

                  The new version (1.0.10) seems to be working fine for me. Strand orientation in the output SAM file looks good, as far as I can tell.

                  Thanks so much for the rapid turnaround!

                  Comment


                  • #10
                    Is there any way to have Tophat output an appropriate value for the "Mapping Qualities" field in the SAM format? At present, it outputs a straight "255" (which the SAM specification suggests when there is no value available), but this seems to be interfering with SNP calling in SAMtools (ie varFilter doesn't 'filter' much of anything...I end up with LOTS of false positives).

                    Alternatively, can anyone comment on an appropriate method of calling quality-aware, reliable cSNPs from Tophat output without a 'mapping quality' value?

                    Thanks!

                    Comment


                    • #11
                      I'll see what I can do for the next release

                      Comment


                      • #12
                        Thanks Cole.

                        Sorry for the sequential posting, but...Taking a closer look, it looks like the issue could be the base qualities returned in the 'accepted_hits.sam' output from Tophat.

                        I'm using version 1.0.10 with the "--solexa1.3-quals" flag...can you (or anyone) confirm that Tophat outputs Phred-scaled (ASCII -33 offset) base quality scores (as described in the SAM specifications)? I seem to be getting impossibly high base qualities which could be contributing to my problems in SNP filtering.

                        Thanks.

                        Comment


                        • #13
                          It should be, although TopHat lets Bowtie handle the actual conversion to base 33 Phred scale. Can you email me a handful of reads that wind up with bad qualities after the mapping?

                          Comment


                          • #14
                            Ah, I see where I'm going wrong. I'll have this fixed shortly.

                            Comment


                            • #15
                              Great! Looking forward to the update.

                              I can still email you sample data if necessary...just let me know.

                              Thanks again.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              59 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              57 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              56 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X