Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • samtools & flagstat & awk

    Hi,
    Here is my command :
    $ samtools merge L002_LBCO1.bam L002_LBCO1_chr*.bam

    $ samtools view L002_LBCO1.bam | awk '$3== "chr1.fa" && $4>= 45787123 && $4<= 45787316 || $3== "chr1.fa" && $4>= 45790335 && $4<= 45790528' > essai_chr1.bam

    $ samtools flagstat L002_LBCO1.bam
    3933498 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    3868863 + 0 mapped (98.36%:nan%)
    3933498 + 0 paired in sequencing
    1966749 + 0 read1
    1966749 + 0 read2
    3787076 + 0 properly paired (96.28%:nan%)
    3804228 + 0 with itself and mate mapped
    64635 + 0 singletons (1.64%:nan%)
    7262 + 0 with mate mapped to a different chr
    7121 + 0 with mate mapped to a different chr (mapQ>=5)

    $ samtools flagstat essai_chr1.bam
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bam_flagstat_core] Truncated file? Continue anyway.
    0 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    0 + 0 mapped (nan%:nan%)
    0 + 0 paired in sequencing
    0 + 0 read1
    0 + 0 read2
    0 + 0 properly paired (nan%:nan%)
    0 + 0 with itself and mate mapped
    0 + 0 singletons (nan%:nan%)
    0 + 0 with mate mapped to a different chr
    0 + 0 with mate mapped to a different chr (mapQ>=5)

    I would compare files L002_LBCO1.bam (merge result) with essai_chr1. bam (awk result). Apparently he does not recognize the file essai_chr1.bam as bam file.

    Do you have a solution?

    Best regards

  • #2
    Originally posted by tonio100680 View Post
    Hi,
    Here is my command :
    $ samtools merge L002_LBCO1.bam L002_LBCO1_chr*.bam

    $ samtools view L002_LBCO1.bam | awk '$3== "chr1.fa" && $4>= 45787123 && $4<= 45787316 || $3== "chr1.fa" && $4>= 45790335 && $4<= 45790528' > essai_chr1.bam

    $ samtools flagstat L002_LBCO1.bam
    3933498 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    3868863 + 0 mapped (98.36%:nan%)
    3933498 + 0 paired in sequencing
    1966749 + 0 read1
    1966749 + 0 read2
    3787076 + 0 properly paired (96.28%:nan%)
    3804228 + 0 with itself and mate mapped
    64635 + 0 singletons (1.64%:nan%)
    7262 + 0 with mate mapped to a different chr
    7121 + 0 with mate mapped to a different chr (mapQ>=5)

    $ samtools flagstat essai_chr1.bam
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bam_flagstat_core] Truncated file? Continue anyway.
    0 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    0 + 0 mapped (nan%:nan%)
    0 + 0 paired in sequencing
    0 + 0 read1
    0 + 0 read2
    0 + 0 properly paired (nan%:nan%)
    0 + 0 with itself and mate mapped
    0 + 0 singletons (nan%:nan%)
    0 + 0 with mate mapped to a different chr
    0 + 0 with mate mapped to a different chr (mapQ>=5)

    I would compare files L002_LBCO1.bam (merge result) with essai_chr1. bam (awk result). Apparently he does not recognize the file essai_chr1.bam as bam file.

    Do you have a solution?

    Best regards
    truly, the awk command can not generate 'bam' file, you can try making 'san' file then convert to bam file by samtools, but I am not sure it can work.

    Comment


    • #3
      Tonio,

      As chenyao already correctly stated, the output of your samtools view | awk pipe is plain text. You could save that to essai_chr1.sam and then convert that sam file to bam.

      There is an easier way though. You don't need to use awk to filter the alignments you want; samtools can do this for you, and output a bam file directly. You can use region specifiers in your samtools view command. These define regions using chromosome names and coordinates, you can give multiple regions on one command line. They are written in the format
      Code:
      <chromosome_name>:<start_position>-<end_position>
      and should be placed after the input sam/bam file name. Your command pipe above could be rewritten:

      Code:
      samtools view -bh -o essai_chr1.bam  L002_LBCO1.bam chr1:45787123-45787316 chr1:45790335-45790528
      -b tells sammtools to output in bam format
      -h will include the header lines in the output file
      -o gives the name of the output file

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 03-27-2024, 06:37 PM
      0 responses
      13 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-27-2024, 06:07 PM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      69 views
      0 likes
      Last Post seqadmin  
      Working...
      X