Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Samtools view: fail to open file for reading.

    Hi, all

    Every now and then when I am trying to convert .sam file into .bam file by calling
    Code:
    samtools view -bT hg.fa -o xxx.bam xxx.sam
    , I get this kind of error:

    Code:
    [main_samview] fail to open file for reading.
    I'm pretty sure that the xxx.sam file is readable and in the working directory, and the header is like this:


    Code:
    @HD     VN:1.0  SO:sorted
    @PG     ID:TopHat       VN:1.0.13       CL:/scratch/ngsvin/ruping/CancerGenomics/tophat-1.0.13/bin/tophat -o /scratch/ngsvin/RNA-seq/MPI-NF/mimik_pairend/ --solexa1.3-quals -p 5 -r 46 --mate-std-dev 14 --segment-length 20 -G /scratch/ngsvin/RNA-seq/MPI-NF/Hs.genes.gff /scratch/ngsvin/ruping/CancerGenomics/bowtie-0.12.5/indexes/hg18 s_4_1fq.chopped s_4_2fq.chopped
    Run0009Lane4Tile57x3887y5410Multi0      65      chr1    461     255     36M     =       154912309       154911848       CTAACCCTGGCGGTACCCTCAGCCGGCCCGCCCGCC    GGAEGGGGGFGGFGDGGGGG?FFFFGFGGGFGGGFG    NM:i:1
    Run0009Lane4Tile28x19254y9909Multi0     73      chr1    537     0       36M     *       0       0       ACCACCGAAATCTGTGCAGAGGAGAACGCAGCTCCG    CGGDGGGFGGFGGGGGFGGGGGGFGGGGEGGGGGGG    NM:i:1
    Run0009Lane4Tile119x16602y20937Multi0   161     chr1    2792    255     36M     =       3160    403     CTACAAGCAGCAAACAGTCTGCATGGGTCATCCCCT    FEFFFFEFFFFFFFFCFDFFEFAFFFFEFFEDFFED    NM:i:0
    Run0009Lane4Tile48x11762y17580Multi0    147     chr1    3112    255     36M     =       3130    -17     TGCCAGCATAGTGCTCCTGGACCAGCGATACGCCCG    EGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG    NM:i:2
    Run0009Lane4Tile24x15875y8494Multi0     83      chr1    3113    255     36M     =       3120    -28     GCCAGCATAGTGCTCCTGGACCAGCGATACGCCCGG    3>:.@+,31@56/?50;>CBB0)6@766-67/6@77    NM:i:2

    In contrast, I did successfully convert some other .sam file into .bam file and the header looks exactly the same of the above one. The only difference maybe the file size. The above .sam file is very big (10GB), but however I have sufficient memory to load it (>250GB memory). So, It is quite confusing to me that I always get some error like this, I was trying to understand the C code of sam.C but I couldn't figure out what's the problem, can anyone help me? Thanks a lot!


  • #2
    Have you tried taking just the start of this big SAM file (i.e. the header and say the first 20 reads). This should tell you if it is the header that is the problem, rather than the file size.

    Comment


    • #3
      Try -bT <in.bam> -o <out.sam>

      Comment


      • #4
        Originally posted by maubp View Post
        Have you tried taking just the start of this big SAM file (i.e. the header and say the first 20 reads). This should tell you if it is the header that is the problem, rather than the file size.

        That's a good point. I tryed and it works for the chopped small file:

        Code:
        head -100 xxx.sam >test.sam
        samtools view -bT hg.fa test.sam >test.bam
        [sam_header_read2] 25 sequences loaded.

        So that means I can not convert large sam files into bam?

        Comment


        • #5
          So at least you know the header is OK. It could be that there is a corrupt or otherwise problematic read later in the SAM file. Can you break the SAM file into chunks to explore this possibility?

          I'd also suggest adding some debug statements to samtools, recompile, and re-test.

          Comment


          • #6
            Originally posted by maubp View Post
            So at least you know the header is OK. It could be that there is a corrupt or otherwise problematic read later in the SAM file. Can you break the SAM file into chunks to explore this possibility?

            I'd also suggest adding some debug statements to samtools, recompile, and re-test.
            Good suggestion, I'm doing it.

            Comment


            • #7
              Code:
              samtools import hg.fa xxx.sam xxx.bam

              Comment


              • #8
                Originally posted by adamdeluca View Post
                Code:
                samtools import hg.fa xxx.sam xxx.bam
                Thanks, but this doesn't work either.

                Comment


                • #9
                  "samtools view -S" reads in a SAM file, "samtools view" (without the "-S") does not.

                  Comment


                  • #10
                    Originally posted by nilshomer View Post
                    "samtools view -S" reads in a SAM file, "samtools view" (without the "-S") does not.
                    I have tried with or without -S, all the same.

                    I "headed" different number of lines into a new file and then tested whether it works for the conversion, I found:

                    Code:
                    head -13394305 xxx.sam >head.sam
                    samtools view -bST hg18.fa head.sam -o head.bam
                    [sam_header_read2] 25 sequences loaded.
                    
                    head -13394306 xxx.sam >head.sam
                    samtools view -bST hg18.fa head.sam -o head.bam
                    [main_samview] fail to open file for reading.
                    I checked the line of 13394306, nothing special there.
                    Interestingly, if I look into the differences between the file size:
                    Code:
                    -rw------- 1 ruping xxx 2.0G Aug  4 17:42 head.sam  (for 13394305 lines)
                    -rw------- 1 ruping xxx 2.1G Aug  4 17:43 head.sam  (for 13394306 lines)
                    I think there mightbe a limit of the file size for doing the conversion, either caused by my machine or the samtools. However, the memory of my server is sufficient (>250GB) and there is no problem if I put some other big stuff into the memory.

                    So, what do you think?
                    Last edited by ruping; 08-04-2010, 08:08 AM.

                    Comment


                    • #11
                      I had a similar issue with tview where it couldn't find the .sai index file. Running samtools index [whatever] fixed the issue.

                      Comment


                      • #12
                        I should mention that the version of the samtools I'm using is 0.1.8.

                        There is an interesting thing happened, I tried another version of samtools (0.1.7-6 (r530)), and now it works! But this doesn't give me a scientific explanation...

                        Code:
                        /home/somebody/samtools/samtools view -bST hg18.fa head.sam -o head.bam
                        [sam_header_read2] 25 sequences loaded.

                        Comment


                        • #13
                          Hi ruping,

                          So that means I can not convert large sam files into bam?[/QUOTE]


                          I think you can convert sam files as large as possible to bam. I have tried a sam file more than 100G.

                          Wu

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM
                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          17 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          22 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          16 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-04-2024, 09:00 AM
                          0 responses
                          46 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X