Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Truncated TopHat Files

    So I have 4 32 mer Illumina seq files each around 9.5gb each in fastQ format. TopHat runs exceedingly slow for 2 of them and outputs files that are useable by cufflinks and samtools. The other two run quite quickly and possess all of the same files(no errors in the logs) and the juncs and coverage files are all in good order. The only difference I can find is that the size of accepted_hits.sam is ~3gb for the failed files and ~1 for the successful files.
    If i run cufflinks on these files I get the following error:

    cufflinks /home/james/Desktop/TophatAlignments2010/SRX002556-g1/accepted_hits.sam
    Counting hits in map
    Error: this SAM file doesn't appear to be correctly sorted!
    current hit is at Chr1:7405, last one was at Chr1:66939

    If I run sam tools I get this error:
    samtools sort /home/james/Desktop/TophatAlignments2010/SRX002556-g1/accepted_hits.sam /home/james/Desktop/accepted_hits.sam
    [bam_header_read] EOF marker is absent.
    [bam_sort_core] truncated file. Continue anyway.
    Segmentation fault

    Has anyone come across this problem?

  • #2
    There are couple of calls to 'sort' at the end of the pipeline, which can take a while on some machines, and could be difference between success and failure here. What kind of machine are you running this on? How much memory does it have?

    Comment


    • #3
      It is a Ubuntu Box running a Core 2 duo quad core 9400 with 6 GB memmory with a 30gb swap space. I am going to try running the datasets w/o novel discovery to see if that affects the runs. A curious aspect of the files is that the two successful runs have roughly the same sized juncs,coverag,accepted_hits files and the two failed ones showed the same trend. The failed files had larger accepted hits files but smaller juncs.bed files compared to the successful ones so that makes me a little suspicious that there are more reads aligning under the failed two treatments.

      Comment


      • #4
        UNIX sort is probably doing on-disk sorting followed by merges on a machine like that. Do you have plenty of free disk?

        Comment


        • #5
          Yes I have about 600gb free on the drive that tophat is running on.

          Comment


          • #6
            Hmm. Very strange. Can you send me the logs, along with the first 10k or so lines from the failed accepted_hits.sam? You'll probably have to post on the web somewhere, rather than email. If that's not possible, would you please at least email me the logs? I've not seen this before.

            Comment


            • #7
              fixed?

              has there been any resolution to this question? I've got the same problem...

              thanks!

              Comment


              • #8
                Originally posted by chrisbala View Post
                has there been any resolution to this question? I've got the same problem...

                thanks!
                Not yet - I have the logs, and some ideas about what's going on, but I haven't resolved the problem. It's possible it's an issue with a misformatted FASTQ. We'll let you know.

                Comment


                • #9
                  clarification

                  Hey Cole,

                  I should clarify, my problem is actually not with the tophat output.

                  I actually just have a .sam file, derived by other means, that I've converted to .bam with samtools. That conversion seemed to go smoothly, but when I tried to sort, I got the same error as above.

                  Maybe this info will somehow be helpful in sorting out what the issue with the tophat output described above is... or maybe it will not... but if anyone has any thoughts about what might cause that error from samtools that would be a big help.

                  chris

                  Comment


                  • #10
                    I have the same thing with Christ...

                    Then as for the origin, I do have a fastq conversion procedure before, when it's using bwa1, the solid2fastq.pl. I changed the script several times, but the latest change is the QV from -1 to 0. Could this be the cause...

                    Then come back to my problem now, it's like this


                    chengguo@statgenpro:~/CRS/samtools-0.1.16$ samtools sort /home/chengguo/CRS/bwa-0.5.0/12F.bam 12F.sorted.bam
                    [bam_header_read] EOF marker is absent. The input is probably truncated
                    chengguo@statgenpro:~/CRS/samtools-0.1.16$

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 06:37 PM
                    0 responses
                    7 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Yesterday, 06:07 PM
                    0 responses
                    7 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    49 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    66 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X