SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BWA fail to open file wisosonic Bioinformatics 12 06-06-2012 01:46 AM
Rsamtools Bam file reading error dab32 Bioinformatics 0 11-07-2011 04:21 AM
Question about samtools view -r? syedsaid Bioinformatics 0 09-29-2011 03:00 AM
Samtools view michalkovac Bioinformatics 2 07-19-2011 06:25 AM
When is Open reading frame=gene? ritzriya RNA Sequencing 4 10-06-2010 09:10 PM

Reply
 
Thread Tools
Old 08-04-2010, 05:31 AM   #1
ruping
Member
 
Location: Germany

Join Date: Jul 2010
Posts: 11
Question Samtools view: fail to open file for reading.

Hi, all

Every now and then when I am trying to convert .sam file into .bam file by calling
Code:
samtools view -bT hg.fa -o xxx.bam xxx.sam
, I get this kind of error:

Code:
[main_samview] fail to open file for reading.
I'm pretty sure that the xxx.sam file is readable and in the working directory, and the header is like this:


Code:
@HD     VN:1.0  SO:sorted
@PG     ID:TopHat       VN:1.0.13       CL:/scratch/ngsvin/ruping/CancerGenomics/tophat-1.0.13/bin/tophat -o /scratch/ngsvin/RNA-seq/MPI-NF/mimik_pairend/ --solexa1.3-quals -p 5 -r 46 --mate-std-dev 14 --segment-length 20 -G /scratch/ngsvin/RNA-seq/MPI-NF/Hs.genes.gff /scratch/ngsvin/ruping/CancerGenomics/bowtie-0.12.5/indexes/hg18 s_4_1fq.chopped s_4_2fq.chopped
Run0009Lane4Tile57x3887y5410Multi0      65      chr1    461     255     36M     =       154912309       154911848       CTAACCCTGGCGGTACCCTCAGCCGGCCCGCCCGCC    GGAEGGGGGFGGFGDGGGGG?FFFFGFGGGFGGGFG    NM:i:1
Run0009Lane4Tile28x19254y9909Multi0     73      chr1    537     0       36M     *       0       0       ACCACCGAAATCTGTGCAGAGGAGAACGCAGCTCCG    CGGDGGGFGGFGGGGGFGGGGGGFGGGGEGGGGGGG    NM:i:1
Run0009Lane4Tile119x16602y20937Multi0   161     chr1    2792    255     36M     =       3160    403     CTACAAGCAGCAAACAGTCTGCATGGGTCATCCCCT    FEFFFFEFFFFFFFFCFDFFEFAFFFFEFFEDFFED    NM:i:0
Run0009Lane4Tile48x11762y17580Multi0    147     chr1    3112    255     36M     =       3130    -17     TGCCAGCATAGTGCTCCTGGACCAGCGATACGCCCG    EGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG    NM:i:2
Run0009Lane4Tile24x15875y8494Multi0     83      chr1    3113    255     36M     =       3120    -28     GCCAGCATAGTGCTCCTGGACCAGCGATACGCCCGG    3>:.@+,31@56/?50;>CBB0)6@766-67/6@77    NM:i:2

In contrast, I did successfully convert some other .sam file into .bam file and the header looks exactly the same of the above one. The only difference maybe the file size. The above .sam file is very big (10GB), but however I have sufficient memory to load it (>250GB memory). So, It is quite confusing to me that I always get some error like this, I was trying to understand the C code of sam.C but I couldn't figure out what's the problem, can anyone help me? Thanks a lot!

ruping is offline   Reply With Quote
Old 08-04-2010, 06:43 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Have you tried taking just the start of this big SAM file (i.e. the header and say the first 20 reads). This should tell you if it is the header that is the problem, rather than the file size.
maubp is offline   Reply With Quote
Old 08-04-2010, 07:12 AM   #3
shurjo
Senior Member
 
Location: Rockville, MD

Join Date: Jan 2009
Posts: 126
Default

Try -bT <in.bam> -o <out.sam>
shurjo is offline   Reply With Quote
Old 08-04-2010, 07:35 AM   #4
ruping
Member
 
Location: Germany

Join Date: Jul 2010
Posts: 11
Default

Quote:
Originally Posted by maubp View Post
Have you tried taking just the start of this big SAM file (i.e. the header and say the first 20 reads). This should tell you if it is the header that is the problem, rather than the file size.

That's a good point. I tryed and it works for the chopped small file:

Code:
head -100 xxx.sam >test.sam
samtools view -bT hg.fa test.sam >test.bam
[sam_header_read2] 25 sequences loaded.

So that means I can not convert large sam files into bam?
ruping is offline   Reply With Quote
Old 08-04-2010, 07:43 AM   #5
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

So at least you know the header is OK. It could be that there is a corrupt or otherwise problematic read later in the SAM file. Can you break the SAM file into chunks to explore this possibility?

I'd also suggest adding some debug statements to samtools, recompile, and re-test.
maubp is offline   Reply With Quote
Old 08-04-2010, 07:50 AM   #6
ruping
Member
 
Location: Germany

Join Date: Jul 2010
Posts: 11
Default

Quote:
Originally Posted by maubp View Post
So at least you know the header is OK. It could be that there is a corrupt or otherwise problematic read later in the SAM file. Can you break the SAM file into chunks to explore this possibility?

I'd also suggest adding some debug statements to samtools, recompile, and re-test.
Good suggestion, I'm doing it.
ruping is offline   Reply With Quote
Old 08-04-2010, 07:50 AM   #7
adamdeluca
Member
 
Location: Iowa City, IA

Join Date: Jul 2010
Posts: 95
Default

Code:
samtools import hg.fa xxx.sam xxx.bam
adamdeluca is offline   Reply With Quote
Old 08-04-2010, 07:55 AM   #8
ruping
Member
 
Location: Germany

Join Date: Jul 2010
Posts: 11
Default

Quote:
Originally Posted by adamdeluca View Post
Code:
samtools import hg.fa xxx.sam xxx.bam
Thanks, but this doesn't work either.
ruping is offline   Reply With Quote
Old 08-04-2010, 08:20 AM   #9
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

"samtools view -S" reads in a SAM file, "samtools view" (without the "-S") does not.
nilshomer is offline   Reply With Quote
Old 08-04-2010, 08:49 AM   #10
ruping
Member
 
Location: Germany

Join Date: Jul 2010
Posts: 11
Default

Quote:
Originally Posted by nilshomer View Post
"samtools view -S" reads in a SAM file, "samtools view" (without the "-S") does not.
I have tried with or without -S, all the same.

I "headed" different number of lines into a new file and then tested whether it works for the conversion, I found:

Code:
head -13394305 xxx.sam >head.sam
samtools view -bST hg18.fa head.sam -o head.bam
[sam_header_read2] 25 sequences loaded.

head -13394306 xxx.sam >head.sam
samtools view -bST hg18.fa head.sam -o head.bam
[main_samview] fail to open file for reading.
I checked the line of 13394306, nothing special there.
Interestingly, if I look into the differences between the file size:
Code:
-rw------- 1 ruping xxx 2.0G Aug  4 17:42 head.sam  (for 13394305 lines)
-rw------- 1 ruping xxx 2.1G Aug  4 17:43 head.sam  (for 13394306 lines)
I think there mightbe a limit of the file size for doing the conversion, either caused by my machine or the samtools. However, the memory of my server is sufficient (>250GB) and there is no problem if I put some other big stuff into the memory.

So, what do you think?

Last edited by ruping; 08-04-2010 at 09:08 AM.
ruping is offline   Reply With Quote
Old 08-04-2010, 09:10 AM   #11
Lee Sam
Member
 
Location: Ann Arbor, MI

Join Date: Oct 2008
Posts: 57
Default

I had a similar issue with tview where it couldn't find the .sai index file. Running samtools index [whatever] fixed the issue.
Lee Sam is offline   Reply With Quote
Old 08-04-2010, 09:13 AM   #12
ruping
Member
 
Location: Germany

Join Date: Jul 2010
Posts: 11
Default

I should mention that the version of the samtools I'm using is 0.1.8.

There is an interesting thing happened, I tried another version of samtools (0.1.7-6 (r530)), and now it works! But this doesn't give me a scientific explanation...

Code:
/home/somebody/samtools/samtools view -bST hg18.fa head.sam -o head.bam
[sam_header_read2] 25 sequences loaded.
ruping is offline   Reply With Quote
Old 08-24-2010, 07:24 PM   #13
wuhoucdc
Member
 
Location: Nashville

Join Date: Oct 2009
Posts: 14
Default

Hi ruping,

So that means I can not convert large sam files into bam?[/QUOTE]


I think you can convert sam files as large as possible to bam. I have tried a sam file more than 100G.

Wu
wuhoucdc is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:25 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO