Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • samtools & flagstat & awk

    Hi,
    Here is my command :
    $ samtools merge L002_LBCO1.bam L002_LBCO1_chr*.bam

    $ samtools view L002_LBCO1.bam | awk '$3== "chr1.fa" && $4>= 45787123 && $4<= 45787316 || $3== "chr1.fa" && $4>= 45790335 && $4<= 45790528' > essai_chr1.bam

    $ samtools flagstat L002_LBCO1.bam
    3933498 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    3868863 + 0 mapped (98.36%:nan%)
    3933498 + 0 paired in sequencing
    1966749 + 0 read1
    1966749 + 0 read2
    3787076 + 0 properly paired (96.28%:nan%)
    3804228 + 0 with itself and mate mapped
    64635 + 0 singletons (1.64%:nan%)
    7262 + 0 with mate mapped to a different chr
    7121 + 0 with mate mapped to a different chr (mapQ>=5)

    $ samtools flagstat essai_chr1.bam
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bam_flagstat_core] Truncated file? Continue anyway.
    0 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    0 + 0 mapped (nan%:nan%)
    0 + 0 paired in sequencing
    0 + 0 read1
    0 + 0 read2
    0 + 0 properly paired (nan%:nan%)
    0 + 0 with itself and mate mapped
    0 + 0 singletons (nan%:nan%)
    0 + 0 with mate mapped to a different chr
    0 + 0 with mate mapped to a different chr (mapQ>=5)

    I would compare files L002_LBCO1.bam (merge result) with essai_chr1. bam (awk result). Apparently he does not recognize the file essai_chr1.bam as bam file.

    Do you have a solution?

    Best regards

  • #2
    Originally posted by tonio100680 View Post
    Hi,
    Here is my command :
    $ samtools merge L002_LBCO1.bam L002_LBCO1_chr*.bam

    $ samtools view L002_LBCO1.bam | awk '$3== "chr1.fa" && $4>= 45787123 && $4<= 45787316 || $3== "chr1.fa" && $4>= 45790335 && $4<= 45790528' > essai_chr1.bam

    $ samtools flagstat L002_LBCO1.bam
    3933498 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    3868863 + 0 mapped (98.36%:nan%)
    3933498 + 0 paired in sequencing
    1966749 + 0 read1
    1966749 + 0 read2
    3787076 + 0 properly paired (96.28%:nan%)
    3804228 + 0 with itself and mate mapped
    64635 + 0 singletons (1.64%:nan%)
    7262 + 0 with mate mapped to a different chr
    7121 + 0 with mate mapped to a different chr (mapQ>=5)

    $ samtools flagstat essai_chr1.bam
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bam_flagstat_core] Truncated file? Continue anyway.
    0 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    0 + 0 mapped (nan%:nan%)
    0 + 0 paired in sequencing
    0 + 0 read1
    0 + 0 read2
    0 + 0 properly paired (nan%:nan%)
    0 + 0 with itself and mate mapped
    0 + 0 singletons (nan%:nan%)
    0 + 0 with mate mapped to a different chr
    0 + 0 with mate mapped to a different chr (mapQ>=5)

    I would compare files L002_LBCO1.bam (merge result) with essai_chr1. bam (awk result). Apparently he does not recognize the file essai_chr1.bam as bam file.

    Do you have a solution?

    Best regards
    truly, the awk command can not generate 'bam' file, you can try making 'san' file then convert to bam file by samtools, but I am not sure it can work.

    Comment


    • #3
      Tonio,

      As chenyao already correctly stated, the output of your samtools view | awk pipe is plain text. You could save that to essai_chr1.sam and then convert that sam file to bam.

      There is an easier way though. You don't need to use awk to filter the alignments you want; samtools can do this for you, and output a bam file directly. You can use region specifiers in your samtools view command. These define regions using chromosome names and coordinates, you can give multiple regions on one command line. They are written in the format
      Code:
      <chromosome_name>:<start_position>-<end_position>
      and should be placed after the input sam/bam file name. Your command pipe above could be rewritten:

      Code:
      samtools view -bh -o essai_chr1.bam  L002_LBCO1.bam chr1:45787123-45787316 chr1:45790335-45790528
      -b tells sammtools to output in bam format
      -h will include the header lines in the output file
      -o gives the name of the output file

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      24 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      25 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      22 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Working...
      X