SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
samtool sort: truncated file error? shawpa Bioinformatics 0 12-29-2011 11:21 AM
mpileup: specified region truncated. lbthrice Bioinformatics 3 11-15-2011 12:59 PM
Truncated BAM files from 1000GP gene coder Bioinformatics 4 07-15-2011 10:02 AM
tophat log files? arrchi RNA Sequencing 0 06-02-2011 10:10 AM
truncated adapter sequences in cloned libraries greigite Illumina/Solexa 0 11-08-2010 12:43 PM

Reply
 
Thread Tools
Old 03-02-2010, 10:12 AM   #1
jrober04
Junior Member
 
Location: Vancouver, BC

Join Date: Apr 2009
Posts: 4
Default Truncated TopHat Files

So I have 4 32 mer Illumina seq files each around 9.5gb each in fastQ format. TopHat runs exceedingly slow for 2 of them and outputs files that are useable by cufflinks and samtools. The other two run quite quickly and possess all of the same files(no errors in the logs) and the juncs and coverage files are all in good order. The only difference I can find is that the size of accepted_hits.sam is ~3gb for the failed files and ~1 for the successful files.
If i run cufflinks on these files I get the following error:

cufflinks /home/james/Desktop/TophatAlignments2010/SRX002556-g1/accepted_hits.sam
Counting hits in map
Error: this SAM file doesn't appear to be correctly sorted!
current hit is at Chr1:7405, last one was at Chr1:66939

If I run sam tools I get this error:
samtools sort /home/james/Desktop/TophatAlignments2010/SRX002556-g1/accepted_hits.sam /home/james/Desktop/accepted_hits.sam
[bam_header_read] EOF marker is absent.
[bam_sort_core] truncated file. Continue anyway.
Segmentation fault

Has anyone come across this problem?
jrober04 is offline   Reply With Quote
Old 03-02-2010, 11:32 AM   #2
Cole Trapnell
Senior Member
 
Location: Boston, MA

Join Date: Nov 2008
Posts: 212
Default

There are couple of calls to 'sort' at the end of the pipeline, which can take a while on some machines, and could be difference between success and failure here. What kind of machine are you running this on? How much memory does it have?
Cole Trapnell is offline   Reply With Quote
Old 03-02-2010, 12:19 PM   #3
jrober04
Junior Member
 
Location: Vancouver, BC

Join Date: Apr 2009
Posts: 4
Default

It is a Ubuntu Box running a Core 2 duo quad core 9400 with 6 GB memmory with a 30gb swap space. I am going to try running the datasets w/o novel discovery to see if that affects the runs. A curious aspect of the files is that the two successful runs have roughly the same sized juncs,coverag,accepted_hits files and the two failed ones showed the same trend. The failed files had larger accepted hits files but smaller juncs.bed files compared to the successful ones so that makes me a little suspicious that there are more reads aligning under the failed two treatments.
jrober04 is offline   Reply With Quote
Old 03-02-2010, 12:59 PM   #4
Cole Trapnell
Senior Member
 
Location: Boston, MA

Join Date: Nov 2008
Posts: 212
Default

UNIX sort is probably doing on-disk sorting followed by merges on a machine like that. Do you have plenty of free disk?
Cole Trapnell is offline   Reply With Quote
Old 03-02-2010, 08:06 PM   #5
jrober04
Junior Member
 
Location: Vancouver, BC

Join Date: Apr 2009
Posts: 4
Default

Yes I have about 600gb free on the drive that tophat is running on.
jrober04 is offline   Reply With Quote
Old 03-03-2010, 09:27 AM   #6
Cole Trapnell
Senior Member
 
Location: Boston, MA

Join Date: Nov 2008
Posts: 212
Default

Hmm. Very strange. Can you send me the logs, along with the first 10k or so lines from the failed accepted_hits.sam? You'll probably have to post on the web somewhere, rather than email. If that's not possible, would you please at least email me the logs? I've not seen this before.
Cole Trapnell is offline   Reply With Quote
Old 04-13-2010, 09:58 AM   #7
chrisbala
Member
 
Location: North Carolina

Join Date: Jan 2010
Posts: 82
Default fixed?

has there been any resolution to this question? I've got the same problem...

thanks!
chrisbala is offline   Reply With Quote
Old 04-13-2010, 06:19 PM   #8
Cole Trapnell
Senior Member
 
Location: Boston, MA

Join Date: Nov 2008
Posts: 212
Default

Quote:
Originally Posted by chrisbala View Post
has there been any resolution to this question? I've got the same problem...

thanks!
Not yet - I have the logs, and some ideas about what's going on, but I haven't resolved the problem. It's possible it's an issue with a misformatted FASTQ. We'll let you know.
Cole Trapnell is offline   Reply With Quote
Old 04-14-2010, 07:39 AM   #9
chrisbala
Member
 
Location: North Carolina

Join Date: Jan 2010
Posts: 82
Default clarification

Hey Cole,

I should clarify, my problem is actually not with the tophat output.

I actually just have a .sam file, derived by other means, that I've converted to .bam with samtools. That conversion seemed to go smoothly, but when I tried to sort, I got the same error as above.

Maybe this info will somehow be helpful in sorting out what the issue with the tophat output described above is... or maybe it will not... but if anyone has any thoughts about what might cause that error from samtools that would be a big help.

chris
chrisbala is offline   Reply With Quote
Old 06-30-2011, 12:57 AM   #10
guo
Junior Member
 
Location: hong kong

Join Date: Jun 2011
Posts: 8
Arrow

I have the same thing with Christ...

Then as for the origin, I do have a fastq conversion procedure before, when it's using bwa1, the solid2fastq.pl. I changed the script several times, but the latest change is the QV from -1 to 0. Could this be the cause...

Then come back to my problem now, it's like this


chengguo@statgenpro:~/CRS/samtools-0.1.16$ samtools sort /home/chengguo/CRS/bwa-0.5.0/12F.bam 12F.sorted.bam
[bam_header_read] EOF marker is absent. The input is probably truncated
chengguo@statgenpro:~/CRS/samtools-0.1.16$
guo is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:35 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO