Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SAMtools pileup command

    I am having trouble with the samtools pileup command and would appreciate any advice. I'm probably doing something wrong, but I have read the documentation and can't figure out what. Originally I was starting with a sam file produced by BWA. But that wasn't working so I switched to a sam file generated by using bowtie2sam.pl on a bowtie file.

    I used the following command to create the unsorted bam file:
    samtools view -buSt chrM.fa -o file.bam file.sam

    And then the following to sort the file:
    samtools sort file.bam file.bam.sorted

    Then I tried a variety of things to produce the pileup file.

    samtools pileup -f chrM.fa file.bam.sorted.bam
    --> no output, no error

    samtools pileup -t chrM.fa.fai file.bam.sorted.bam
    samtools pileup -f chrM.fa -t chrM.fa.fai file.bam.sorted.bam
    -->
    [sam_header_read2] 1 sequences loaded.
    [sam_read1] reference '$(((((((((((((((((((((((((NMCX1CMDZ7N0N27' is recognized as '*'.
    [sam_read1] reference '(((((((((((((((((((((((((((((NMCX1CMDZ7N0N27' is recognized as '*'.
    Parse error at line 2: invalid CIGAR character
    Abort trap

    samtools pileup -f chrM.fa -S file.sam
    -->
    [samopen] no @SQ lines in the header.
    [sam_read1] missing header? Abort!

    samtools pileup -f chrM.fa file.sam
    -->
    [bam_pileup] fail to read the header: non-exisiting file or wrong format.


    The only way I could get pileup output was as follows:
    samtools pileup -t chrM.fa.fai file.sam
    samtools pileup -t chrM.fa.fai -S file.sam
    --> note there is no -S flag in the first one and yet it produced correct output

    I want to be able to use the pileup command on the bam file because I am assuming it will be much faster on large files. Can I do that?

    And I was told that it's better to use -f than -t--is that true, or does it not matter?

  • #2
    I have same issues here when I use the conversion tools. I mean, if I perform the alignment with bwa (which produces SAM output) everything works fine, like this:

    bwa aln genome file.fq > file.sai
    bwa samse file.sai | samtools view -bt genome.fai -o results.bam -

    samtools sort results.bam sorted && mv sorted.bam results.bam
    samtools index results.bam

    samtools pileup [-f genome] results.bam

    I've tried export2sam.pl for old runs but it has some issues which I believe are related to the fact export files have the chromosome (or genome) file name in the 'chromosome' column. This is reported in SAM from the converter but there's no, say, chr5.fa chromosome in the reference used for samtools. The worst happens when I convert eland_export files aligned against a multifasta reference: the expected column for chromosome contains only, say, genome.fa... the export2sam.pl places all the reads as they were aligned to chromosome 'genome.fa':

    11II-08F1-GA_1:6:1:0:1237 81 genome.fa 138895 275 36M = 138757 -174 CGAATTAGTAGATTGTTTGTAAAAATCGGTATTGTN <:<997<;<:<<::;9999;;;:;9<<:
    <<;:;;/& MD:Z:C35 SM:i:133
    11II-08F1-GA_1:6:1:0:1237 161 genome.fa 138757 275 36M = 138895 174 ACTTGTTTTAAATAAACAGATTTTCCACTCATGTTG @@@@?=BBBCBB@?@BAB=5>@B@B@A@
    AA@@?<@@ MD:Z:36 SM:i:142
    11II-08F1-GA_1:6:1:0:399 97 genome.fa 179127 275 36M = 179285 194 NATATCATAACCATAAAAAAAAAAGCACAATTCAAC &1<<<<<<<<<<<<<<<<<<<<<8579:
    :<<:::98 MD:Z:A35 SM:i:133

    I don't have bowtie ouputs here at the moment, you may check if the bowtie chromosome column is the name of the file or the chromosome...

    HTH

    Davide

    Comment


    • #3
      * character in pileup output

      Here is an example of line in pileup output which contains * characters
      chrIII 8884091 T W 95 126 60 12 ,,*****,*aA^]A aab`a^aHa^aa

      I do not believe this character is documented. Would be nice to hear back what it means.

      Comment


      • #4
        How can I pileup all the sites of reference sequence , but not only the sites covered by at least 1 ?

        Comment


        • #5
          Originally posted by baohua100 View Post
          How can I pileup all the sites of reference sequence , but not only the sites covered by at least 1 ?
          You can get a bioinformatician or computer scientist to write a quick perl script to do this for you. We need jobs!

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          25 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          29 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          25 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          52 views
          0 likes
          Last Post seqadmin  
          Working...
          X