Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • samtools & flagstat & awk

    Hi,
    Here is my command :
    $ samtools merge L002_LBCO1.bam L002_LBCO1_chr*.bam

    $ samtools view L002_LBCO1.bam | awk '$3== "chr1.fa" && $4>= 45787123 && $4<= 45787316 || $3== "chr1.fa" && $4>= 45790335 && $4<= 45790528' > essai_chr1.bam

    $ samtools flagstat L002_LBCO1.bam
    3933498 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    3868863 + 0 mapped (98.36%:nan%)
    3933498 + 0 paired in sequencing
    1966749 + 0 read1
    1966749 + 0 read2
    3787076 + 0 properly paired (96.28%:nan%)
    3804228 + 0 with itself and mate mapped
    64635 + 0 singletons (1.64%:nan%)
    7262 + 0 with mate mapped to a different chr
    7121 + 0 with mate mapped to a different chr (mapQ>=5)

    $ samtools flagstat essai_chr1.bam
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bam_flagstat_core] Truncated file? Continue anyway.
    0 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    0 + 0 mapped (nan%:nan%)
    0 + 0 paired in sequencing
    0 + 0 read1
    0 + 0 read2
    0 + 0 properly paired (nan%:nan%)
    0 + 0 with itself and mate mapped
    0 + 0 singletons (nan%:nan%)
    0 + 0 with mate mapped to a different chr
    0 + 0 with mate mapped to a different chr (mapQ>=5)

    I would compare files L002_LBCO1.bam (merge result) with essai_chr1. bam (awk result). Apparently he does not recognize the file essai_chr1.bam as bam file.

    Do you have a solution?

    Best regards

  • #2
    Originally posted by tonio100680 View Post
    Hi,
    Here is my command :
    $ samtools merge L002_LBCO1.bam L002_LBCO1_chr*.bam

    $ samtools view L002_LBCO1.bam | awk '$3== "chr1.fa" && $4>= 45787123 && $4<= 45787316 || $3== "chr1.fa" && $4>= 45790335 && $4<= 45790528' > essai_chr1.bam

    $ samtools flagstat L002_LBCO1.bam
    3933498 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    3868863 + 0 mapped (98.36%:nan%)
    3933498 + 0 paired in sequencing
    1966749 + 0 read1
    1966749 + 0 read2
    3787076 + 0 properly paired (96.28%:nan%)
    3804228 + 0 with itself and mate mapped
    64635 + 0 singletons (1.64%:nan%)
    7262 + 0 with mate mapped to a different chr
    7121 + 0 with mate mapped to a different chr (mapQ>=5)

    $ samtools flagstat essai_chr1.bam
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bam_flagstat_core] Truncated file? Continue anyway.
    0 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    0 + 0 mapped (nan%:nan%)
    0 + 0 paired in sequencing
    0 + 0 read1
    0 + 0 read2
    0 + 0 properly paired (nan%:nan%)
    0 + 0 with itself and mate mapped
    0 + 0 singletons (nan%:nan%)
    0 + 0 with mate mapped to a different chr
    0 + 0 with mate mapped to a different chr (mapQ>=5)

    I would compare files L002_LBCO1.bam (merge result) with essai_chr1. bam (awk result). Apparently he does not recognize the file essai_chr1.bam as bam file.

    Do you have a solution?

    Best regards
    truly, the awk command can not generate 'bam' file, you can try making 'san' file then convert to bam file by samtools, but I am not sure it can work.

    Comment


    • #3
      Tonio,

      As chenyao already correctly stated, the output of your samtools view | awk pipe is plain text. You could save that to essai_chr1.sam and then convert that sam file to bam.

      There is an easier way though. You don't need to use awk to filter the alignments you want; samtools can do this for you, and output a bam file directly. You can use region specifiers in your samtools view command. These define regions using chromosome names and coordinates, you can give multiple regions on one command line. They are written in the format
      Code:
      <chromosome_name>:<start_position>-<end_position>
      and should be placed after the input sam/bam file name. Your command pipe above could be rewritten:

      Code:
      samtools view -bh -o essai_chr1.bam  L002_LBCO1.bam chr1:45787123-45787316 chr1:45790335-45790528
      -b tells sammtools to output in bam format
      -h will include the header lines in the output file
      -o gives the name of the output file

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 08:47 AM
      0 responses
      16 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      54 views
      0 likes
      Last Post seqadmin  
      Working...
      X