Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • I am mad because of samtools sort command

    I got .sam files from Bowtie2.
    now I want to merge these two files.First ,I run:
    samtools view -bSh ERR1.sam >ERR1.bam
    samtools view -bSh ERR2.sam >ERR2.bam

    and,I got the bam file.(they should have the head)
    However,I run the next:
    samtools sort ERR1.bam ERR1.sorted.bam (here,I got the sorted file,lucky)
    samtools sort ERR2.bam ERR2.sorted.bam
    about the ERR2.bam, I didn't get the sorted file, this was the output:

    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bam_sort_core] truncated file. Continue anyway.
    Segmentation fault (core dumped)

    why?Just because the ERR2.sam is too big(about 66G)?

  • #2
    supplement:
    I run command: samtools view ERR2.bam |less -S
    and I got this:
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [main_samview] fail to read the header from "ERR173170_paired.bam".

    Comment


    • #3
      If the file is too big for sorting you could split the .sam file on chromosome, sort each, recombine, then convert to .bam.

      Comment


      • #4
        Originally posted by biocomputer View Post
        If the file is too big for sorting you could split the .sam file on chromosome, sort each, recombine, then convert to .bam.
        You mean that i got the fault in producing the sam file?but my sam file is okay!

        Comment


        • #5
          like your mean, maybe I need to split my big sam file

          Comment


          • #6
            66 gigs isn't too big to sort, the original BAM file was corrupt, likely due to running out of space or a hardware problem. Make sure you have enough space and then remake the BAM file.

            Comment


            • #7
              The space is enough, how about the memory?

              Comment


              • #8
                The whole SAM file isn't loaded into memory, it's processed line by line (and compressed in blocks).

                Comment


                • #9
                  A transient hardware error is the most likely cause of this sort of thing.

                  Comment


                  • #10
                    I recently got an error like that because I switched my reads with my reference sequence while mapping. I.e. my alignment was of my reference to my reads. Maybe that's your problem?

                    Here's a (correct) bash function that I used to map reads to a reference and only grab the mapped reads from the sam. Hopefully this can help guide you:
                    Code:
                    map () {
                    	bwa index -a bwtsw $refseq
                    	bwa bwasw $refseq ../temp/$1/sampled_reads.fasta > ../temp/$1/alignment.sam
                    	samtools view -bS -F 4 ../temp/$1/alignment.sam > ../temp/$1/mapped.alignment.bam
                    	samtools sort ../temp/$1/mapped.alignment.bam ../results/$1/mapped.sorted.alignment
                    	samtools index ../results/$1/mapped.sorted.alignment.bam
                    }

                    Comment


                    • #11
                      Thank you a lot for sparing your beautiful bash.
                      However ,it seems that your bash is suitable for unpaired alignment. I thought that because the command samtools view -bS -F 4 ../temp/$1/alignment.sam > ../temp/$1/mapped.alignment.bam ,you discard the unmapped reads. But how to set the parameter -F In paired alignment reads (.sam) ?
                      Thanks all .

                      Comment


                      • #12
                        -F 4 will remove unmapped reads in either case. If you want to remove those reads with an unmapped mate then just filter according to that bit in the flag.

                        Comment


                        • #13
                          About the parameter -F of samtools view

                          Originally posted by dpryan View Post
                          -F 4 will remove unmapped reads in either case. If you want to remove those reads with an unmapped mate then just filter according to that bit in the flag.
                          Hi, dpryan. Thank you for your answer. Now I have the similar quetions about -F, . I appreciate and hope you can help me.
                          1) Should I remove the unmapped reads (but its mates mapped) or the unmapped mates(but its reads mapped)
                          2) about the paired reads, if I remove all them above ,should I use -F 12? However , it seems that there's no the value of 12. How about the 77 or 141.

                          Comment


                          • #14
                            1) It depends on what you want to do with the results.
                            2) -F 12 is correct. There don't have to be any flags with that value since this is a bit comparison.

                            Comment


                            • #15
                              Thank you, dpryan.
                              Today I tested it ,and the result is Consistent with your answer!
                              Thanks, again.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 08:47 AM
                              0 responses
                              15 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              60 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              60 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              54 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X