Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trouble getting SNPS from bwa/samtools

    Hello all,

    I have a segment from a yeast, and a yeast genome to compare it to. I want to get a list of SNPS, and eventually find a known SNP in the yeast sequence.

    When I run this through bwa and samtools it seems to work fine up until its time for me to get the SNP list. When I use the pileup command I get empty files as output.

    Here's what I'm running.

    Code:
    #reference is yeast.nt
    #query is yeast.fasta
    
    bwa index /home/jill/bwa-0.5.8c/Project/yeast.fasta 
    
    bwa index /home/jill/bwa-0.5.8c/Project/yeast.nt
    
    bwa aln /home/jill/bwa-0.5.8c/Project/yeast.nt /home/jill/bwa-0.5.8c/Project/yeast.fasta  > /home/jill/bwa-0.5.8c/Project/yeastbwaout.sai
    
    #[bwa_aln] 17bp reads: max_diff = 2
    #[bwa_aln] 38bp reads: max_diff = 3
    #[bwa_aln] 64bp reads: max_diff = 4
    #[bwa_aln] 93bp reads: max_diff = 5
    #[bwa_aln] 124bp reads: max_diff = 6
    #[bwa_aln] 157bp reads: max_diff = 7
    #[bwa_aln] 190bp reads: max_diff = 8
    #[bwa_aln] 225bp reads: max_diff = 9
    #[bwa_aln_core] calculate SA coordinate... 0.15 sec
    #[bwa_aln_core] write to the disk... 0.00 sec
    #[bwa_aln_core] 1 sequences have been processed.
    
    bwa samse /home/jill/bwa-0.5.8c/Project/yeast.nt  /home/jill/bwa-0.5.8c/Project/yeastbwaout.sai /home/jill/bwa-0.5.8c/Project/yeast.fasta  > /home/jill/bwa-0.5.8c/Project/yeastbwaout.sam 
    
    #[bwa_aln_core] convert to sequence coordinate... 0.01 sec
    #[bwa_aln_core] refine gapped alignments... 0.00 sec
    #[bwa_aln_core] print alignments... 0.01 sec
    #[bwa_aln_core] 1 sequences have been processed.
    
    samtools faidx /home/jill/bwa-0.5.8c/Project/yeast.nt
    
    samtools view -bt /home/jill/bwa-0.5.8c/Project/yeast.nt.fai /home/jill/bwa-0.5.8c/Project/yeastbwaout.sam > /home/jill/bwa-0.5.8c/Project/yeastbwaout.bam
    
    #[samopen] SAM header is present: 17 sequences.
    
    #index bam
    samtools index /home/jill/bwa-0.5.8c/Project/yeastbwaout.bam
    
    #Sorts bam
    samtools sort /home/jill/bwa-0.5.8c/Project/yeastbwaout.bam /home/jill/bwa-0.5.8c/Project/yeastbwaoutsort 
    
    samtools index /home/jill/bwa-0.5.8c/Project/yeastbwaoutsort.bam
    
    samtools pileup -vcf /home/jill/bwa-0.5.8c/Project/yeast.nt.fai /home/jill/bwa-0.5.8c/Project/yeastbwaoutsort.bam | tee /home/jill/bwa-0.5.8c/Project/raw.txt | /home/jill/samtools-0.1.11/misc/samtools.pl varFilter -D100 > /home/jill/bwa-0.5.8c/Project/flt.txt
    awk '($3=="*"&&$6>=50)||($3!="*"&&$6>=20)' /home/jill/bwa-0.5.8c/Project/flt.txt > /home/jill/bwa-0.5.8c/Project/final.txt
    I made sure both my files have an equal line length, 80, formatted with fastx. Any insight would be very appreciated. Thanks so much!

  • #2
    Try passing the actual fasta file instead of the of the index (fai) in the last command (samtools pileup).
    -drd

    Comment


    • #3
      Tried this:

      Code:
      samtools pileup -vcf [B]/home/jill/bwa-0.5.8c/Project/yeast.nt[/B] /home/jill/bwa-0.5.8c/Project/yeastbwaoutsort.bam | tee /home/jill/bwa-0.5.8c/Project/raw.txt | /home/jill/samtools-0.1.11/misc/samtools.pl varFilter -D100 > /home/jill/bwa-0.5.8c/Project/flt.txt
      awk '($3=="*"&&$6>=50)||($3!="*"&&$6>=20)' /home/jill/bwa-0.5.8c/Project/flt.txt > /home/jill/bwa-0.5.8c/Project/final.txt
      Without the .fai . Still got empty pileup files.

      Comment


      • #4
        I've been trying to do the same thing with MAQ, that is get SNPS, and I'm getting an error there too. I'm not sure if the two are related or not though.

        Code:
        maq fasta2bfa /home/jill/maq/Project/yeast.nt /home/jill/maq/Project/yeast.nt.bfa
        maq fasta2bfa /home/jill/maq/Project/yeast.fasta /home/jill/maq/Project/yeast.fasta.bfa
        #-- 1 sequences have been converted.
        maq match /home/jill/maq/Project/yeast.fasta.map /home/jill/maq/Project/yeast.nt.bfa /home/jill/maq/Project/yeast.fasta.bfa
        
        #[ma_load_reads] loading reads...
        #[ma_load_reads] set length of the first read as 4624380.
        #[ma_load_reads] 1*2 reads loaded.
        #[ma_longread2read] encoding reads... 2 sequences processed.
        #[ma_match] set the minimum insert size as 4624381.
        #[match_core] Total length of the reference: 12155026
        #[match_core] round 1/3...
        #[match_core] making index...
        #[match_index_sorted] no reasonable reads are available. Exit!
        ETA: I also tried the bam/sam script I have above with completely different files. Still, nothing.
        Last edited by mindlessbrain; 12-13-2010, 05:16 AM.

        Comment


        • #5
          Why only one read processed?

          Comment


          • #6
            Originally posted by swbarnes2 View Post
            Why only one read processed?
            No idea. I was hoping someone could tell me. This is my first time using bwa/sam tools.

            Comment


            • #7
              Originally posted by mindlessbrain View Post
              Tried this:

              Code:
              samtools pileup -vcf [B]/home/jill/bwa-0.5.8c/Project/yeast.nt[/B] /home/jill/bwa-0.5.8c/Project/yeastbwaoutsort.bam | tee /home/jill/bwa-0.5.8c/Project/raw.txt | /home/jill/samtools-0.1.11/misc/samtools.pl varFilter -D100 > /home/jill/bwa-0.5.8c/Project/flt.txt
              awk '($3=="*"&&$6>=50)||($3!="*"&&$6>=20)' /home/jill/bwa-0.5.8c/Project/flt.txt > /home/jill/bwa-0.5.8c/Project/final.txt
              Without the .fai . Still got empty pileup files.
              Try with /home/jill/maq/Project/yeast.fasta, instead of yeast.nt
              -drd

              Comment


              • #8
                Well, this obviously won't work if you only have one read processed. Does your .sam file only have one non-header line in it? Something must be wrong with your fastq.

                Comment


                • #9
                  I got it working! Somewhere along the line my fastq had been corrupted. So the commands I had above were fine, with the exception of swapping the reference for the .fai file.

                  Now I have an SNP file, from my last command:

                  Code:
                  samtools pileup -vcf /home/jill/bwa-0.5.8c/Project/yeast_mrna_genes.fasta /home/jill/bwa-0.5.8c/Project/yeastbwaoutsort.bam.bam | tee /home/jill/bwa-0.5.8c/Project/raw.txt | /home/jill/samtools-0.1.11/misc/samtools.pl varFilter -D100 > /home/jill/bwa-0.5.8c/Project/flt.txt
                  awk '($3=="*"&&$6>=50)||($3!="*"&&$6>=20)' /home/jill/bwa-0.5.8c/Project/flt.txt > /home/jill/bwa-0.5.8c/Project/final.txt
                  I've been trying to figure out what each column corresponds to. I know the first is the reference sequence name, then position of SNP, then SNPreference, SNPquery(?), after that though, no idea. The documentation has a slightly different format, except for the one they don't explain. =)

                  AB016599 1118 G R 215 215 36 81 a,,,a,,,.a,,a,,.,,,,,,,,,.aa,,,A.A.,.A....aA.AAAaAAa.,a,,a,,,,,,aaaaa,,,a,,a,,,.. CCACA0ACCBCC%CCCC%CC>CD%CCC@%CCCCCCCACCC?CCC%CACBBC:CCC%CCCCCCCBC<CCCC%CCC%C%CCAC
                  EF123147 124 T G 42 42 35 5 ^Fg^:g^Fg^Fg^Fg CC;@C
                  X84042 4 T A 36 36 25 3 ^:A^:A^:A ACC
                  X84042 5 C A 36 36 25 3 AAA ACC

                  Comment


                  • #10
                    Explained here.
                    Also, since you are on it, try mpileup instead of pileup and compare results.
                    -drd

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    25 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    29 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    24 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    52 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X