Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • EOF marker is absent - BAM file error in TopHat

    Hi,

    I've been running Bowtie with my RNAseq data, mapping the reads to the transcriptome (in .gtf format) and the genome.

    I've kept getting an error which basically says:

    Code:
    user@it053392:~$ tophat -G /home/user/Desktop/_sam/RNAseq_beta_data/Transcriptome/merged.remDup.gtf -o /home/user/Desktop/tophatout.sam /home/user/Downloads/Bowtie/bowtie2-2.2.6/indexes/hg19 /home/user/Desktop/_sam/RNAseq_beta_data/trimmed/36w62_trimmed.fastq
    
    [2015-11-02 11:46:46] Beginning TopHat run (v2.0.9)
    -----------------------------------------------
    [2015-11-02 11:46:46] Checking for Bowtie
    		  Bowtie version:	 2.1.0.0
    [2015-11-02 11:46:46] Checking for Samtools
    		Samtools version:	 0.1.19.0
    [2015-11-02 11:46:46] Checking for Bowtie index files (genome)..
    [2015-11-02 11:46:46] Checking for reference FASTA file
    	Warning: Could not find FASTA file /home/user/Downloads/Bowtie/bowtie2-2.2.6/indexes/hg19.fa
    [2015-11-02 11:46:46] Reconstituting reference FASTA file from Bowtie index
      Executing: /usr/bin/bowtie2-inspect /home/user/Downloads/Bowtie/bowtie2-2.2.6/indexes/hg19 > /home/user/Desktop/tophatout.sam/tmp/hg19.fa
    [2015-11-02 11:46:59] Generating SAM header for /home/user/Downloads/Bowtie/bowtie2-2.2.6/indexes/hg19
    	format:		 fastq
    	quality scale:	 phred33 (default)
    [2015-11-02 11:47:04] Reading known junctions from GTF file
    [2015-11-02 11:47:06] Preparing reads
    	 left reads: min. length=85, max. length=85, 25297338 kept reads (13185 discarded)
    [2015-11-02 11:54:01] Building transcriptome data files..
    [2015-11-02 11:54:18] Building Bowtie index from merged.remDup.fa
    [2015-11-02 13:15:14] Mapping left_kept_reads to transcriptome merged.remDup with Bowtie2 
    	[FAILED]
    Error running:
    /usr/bin/bam2fastx --all --fastq /home/user/Desktop/tophatout.sam/tmp/left_kept_reads.bam|/usr/bin/bowtie2-align -q -k 60 -D 15 -R 2 -N 0 -L 20 -i S,1,1.25 --gbar 4 --mp 6,2 --np 1 --rdg 5,3 --rfg 5,3 --score-min C,-14,0 -p 1 --sam-no-hd -x /home/user/Desktop/tophatout.sam/tmp/merged.remDup -|/usr/bin/fix_map_ordering --bowtie2-min-score 15 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --sam-header /home/user/Desktop/tophatout.sam/tmp/merged.remDup.bwt.samheader.sam - - /home/user/Desktop/tophatout.sam/tmp/left_kept_reads.m2g_um.bam | /usr/bin/map2gtf --sam-header /home/user/Desktop/tophatout.sam/tmp/hg19_genome.bwt.samheader.sam /home/user/Desktop/tophatout.sam/tmp/merged.remDup.fa.tlst - /home/user/Desktop/tophatout.sam/tmp/left_kept_reads.m2g.bam > /home/user/Desktop/tophatout.sam/logs/m2g_left_kept_reads.out
    user@it053392:~$ /usr/bin/bam2fastx --all --fastq /home/user/Desktop/tophatout.sam/tmp/left_kept_reads.bam|/usr/bin/bowtie2-align -q -k 60 -D 15 -R 2 -N 0 -L 20 -i S,1,1.25 --gbar 4 --mp 6,2 --np 1 --rdg 5,3 --rfg 5,3 --score-min C,-14,0 -p 1 --sam-no-hd -x /home/user/Desktop/tophatout.sam/tmp/merged.remDup -|/usr/bin/fix_map_ordering --bowtie2-min-score 15 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --sam-header /home/user/Desktop/tophatout.sam/tmp/merged.remDup.bwt.samheader.sam - - /home/user/Desktop/tophatout.sam/tmp/left_kept_reads.m2g_um.bam | /usr/bin/map2gtf --sam-header /home/user/Desktop/tophatout.sam/tmp/hg19_genome.bwt.samheader.sam /home/user/Desktop/tophatout.sam/tmp/merged.remDup.fa.tlst - /home/user/Desktop/tophatout.sam/tmp/left_kept_reads.m2g.bam > /home/user/Desktop/tophatout.sam/logs/m2g_left_kept_reads.out
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    Error at parsing .tlst line (invalid strand):
    	31958 TCONS_00032473 scaffold40. 5-1634
    I figured that it's some kind of error with the BAM file, but as its Bowtie/Tophat which produces the BAM file, I'm not sure what I can do. Any ideas?

  • #2
    Just download and install the latest version of TopHat, version 2.1.0.

    This issue has been fixed since Tophat 2.0.13.
    It's actually a bug in samtools 0.1.19.
    Since version 2.0.13, TopHat comes bundled with an older, stable version of samtools, version 0.1.18, to address this bug.

    Comment


    • #3
      ok great thanks. im a bit of a noob with this, is there a quick way to update the software without re-installing it? im in linux and tried sudo apt-get install but it is still running 2.0.0

      Comment


      • #4
        No, there is a binary version of TopHat available though.
        Just download, unzip, and update your PATH.
        Installation cannot be any simpler.

        Comment


        • #5
          OK. Sorry I know this is really basic, but when u say update your path - can this also mean copying the executable folders into the /bin folder? Cos on the TopHat site it says 'make sure the TopHat binaries are in a directory in your PATH environment variable'

          Comment


          • #6
            The /bin folder is already in your PATH.
            So, yes, you can just copy the binary files to your /bin folder.
            Just make sure that you are calling the correct version of TopHat, if you haven't removed the older version of TopHat.

            The two other options are just typing explicitly the path to the TopHat binary, or setting the path to the binary if you put TopHat in a folder that is not already in your path.
            /path/to/binary/tophat
            or
            export PATH=/path/to/binaryFolder:$PATH

            For the last option, you should put the command in your ~/.bash_profile to make it permanent.
            Last edited by blancha; 11-03-2015, 07:12 AM.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            30 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            32 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            28 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            53 views
            0 likes
            Last Post seqadmin  
            Working...
            X