Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Truncated TopHat Files

    So I have 4 32 mer Illumina seq files each around 9.5gb each in fastQ format. TopHat runs exceedingly slow for 2 of them and outputs files that are useable by cufflinks and samtools. The other two run quite quickly and possess all of the same files(no errors in the logs) and the juncs and coverage files are all in good order. The only difference I can find is that the size of accepted_hits.sam is ~3gb for the failed files and ~1 for the successful files.
    If i run cufflinks on these files I get the following error:

    cufflinks /home/james/Desktop/TophatAlignments2010/SRX002556-g1/accepted_hits.sam
    Counting hits in map
    Error: this SAM file doesn't appear to be correctly sorted!
    current hit is at Chr1:7405, last one was at Chr1:66939

    If I run sam tools I get this error:
    samtools sort /home/james/Desktop/TophatAlignments2010/SRX002556-g1/accepted_hits.sam /home/james/Desktop/accepted_hits.sam
    [bam_header_read] EOF marker is absent.
    [bam_sort_core] truncated file. Continue anyway.
    Segmentation fault

    Has anyone come across this problem?

  • #2
    There are couple of calls to 'sort' at the end of the pipeline, which can take a while on some machines, and could be difference between success and failure here. What kind of machine are you running this on? How much memory does it have?

    Comment


    • #3
      It is a Ubuntu Box running a Core 2 duo quad core 9400 with 6 GB memmory with a 30gb swap space. I am going to try running the datasets w/o novel discovery to see if that affects the runs. A curious aspect of the files is that the two successful runs have roughly the same sized juncs,coverag,accepted_hits files and the two failed ones showed the same trend. The failed files had larger accepted hits files but smaller juncs.bed files compared to the successful ones so that makes me a little suspicious that there are more reads aligning under the failed two treatments.

      Comment


      • #4
        UNIX sort is probably doing on-disk sorting followed by merges on a machine like that. Do you have plenty of free disk?

        Comment


        • #5
          Yes I have about 600gb free on the drive that tophat is running on.

          Comment


          • #6
            Hmm. Very strange. Can you send me the logs, along with the first 10k or so lines from the failed accepted_hits.sam? You'll probably have to post on the web somewhere, rather than email. If that's not possible, would you please at least email me the logs? I've not seen this before.

            Comment


            • #7
              fixed?

              has there been any resolution to this question? I've got the same problem...

              thanks!

              Comment


              • #8
                Originally posted by chrisbala View Post
                has there been any resolution to this question? I've got the same problem...

                thanks!
                Not yet - I have the logs, and some ideas about what's going on, but I haven't resolved the problem. It's possible it's an issue with a misformatted FASTQ. We'll let you know.

                Comment


                • #9
                  clarification

                  Hey Cole,

                  I should clarify, my problem is actually not with the tophat output.

                  I actually just have a .sam file, derived by other means, that I've converted to .bam with samtools. That conversion seemed to go smoothly, but when I tried to sort, I got the same error as above.

                  Maybe this info will somehow be helpful in sorting out what the issue with the tophat output described above is... or maybe it will not... but if anyone has any thoughts about what might cause that error from samtools that would be a big help.

                  chris

                  Comment


                  • #10
                    I have the same thing with Christ...

                    Then as for the origin, I do have a fastq conversion procedure before, when it's using bwa1, the solid2fastq.pl. I changed the script several times, but the latest change is the QV from -1 to 0. Could this be the cause...

                    Then come back to my problem now, it's like this


                    chengguo@statgenpro:~/CRS/samtools-0.1.16$ samtools sort /home/chengguo/CRS/bwa-0.5.0/12F.bam 12F.sorted.bam
                    [bam_header_read] EOF marker is absent. The input is probably truncated
                    chengguo@statgenpro:~/CRS/samtools-0.1.16$

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin


                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                      Yesterday, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    39 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    41 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    35 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    55 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X