Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ksw9
    Member
    • May 2013
    • 32

    Genes of interest from SAM/BAM files

    Hi,
    I am trying to compare paired-end Illumina data with standard MLST approaches for genotyping bacteria. What is the best way to extract known marker genes (using coordinates from a reference) from SAM/BAM files created by mapping samples to this reference?

    Ultimately, I just want to compare the genotyping capacity of this shotgun data with traditional, MLST methods and ensure that we are getting good coverage of the MLST markers of choice. I would like to generate consensus sequences from short read data of the makers of interest and am unsure the best way to do this.

    Thank you!
  • crazyhottommy
    Senior Member
    • Apr 2012
    • 187

    #2
    I am not sure what you want to do, but look at bedtools. it may help you.

    Comment

    • GenoMax
      Senior Member
      • Feb 2008
      • 7142

      #3
      1. http://seqanswers.com/forums/showthread.php?t=39766

      Assuming that you are referring to extracting reads that map to certain marker genes: samtools view should allow you to pull out reads from specified gene regions.

      2. http://seqanswers.com/forums/showthread.php?t=38969

      Samtools mpileup should generate the consensus sequence.

      Comment

      • ksw9
        Member
        • May 2013
        • 32

        #4
        Thank you for your help!

        Comment

        • ksw9
          Member
          • May 2013
          • 32

          #5
          I am trying to generate a consensus sequence using the following command:

          samtools mpileup -uf B31.fna Sample_Bbcap10_006_004_MLST.bam | bcftools view -cg - | /home/bioinfo/software/samtools/samtools-0.1.19/bcftools/vcfutils.pl vcf2fq > cns.fq

          and struggling with the error:

          Use of uninitialized value in length at /home/bioinfo/software/samtools/samtools-0.1.19/bcftools/vcfutils.pl line 544, <> line 57.


          Does anyone have any insight on this? I'm just beginning with SAMtools and really appreciate the support.

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            Have you indexed your reference genome file (B31.fna) with samtools faidx? That is a fasta format file correct?

            Code:
            $ samtools faidx B31.fna
            Last edited by GenoMax; 02-21-2014, 11:44 AM.

            Comment

            • ksw9
              Member
              • May 2013
              • 32

              #7
              Yes, it is a fasta and I did index with samtools faidx.

              Comment

              • shuoguo
                Member
                • Sep 2012
                • 23

                #8
                looks to me the error occurs at last step. does "vcfutils.pl" takes stdin?

                Comment

                • GenoMax
                  Senior Member
                  • Feb 2008
                  • 7142

                  #9
                  @ksw9: This may be an obvious question but was the reference file used for generation of the indexes and the alignments to generate the BAM's? Based on the name it looks like that may be a 454 sequence file. What aligner did you use?

                  Comment

                  • ksw9
                    Member
                    • May 2013
                    • 32

                    #10
                    I used BWA for mapping paired-end Illumina reads to the reference genome.

                    Comment

                    • ksw9
                      Member
                      • May 2013
                      • 32

                      #11
                      Okay, I think my problem was in my original .bed file which now has contigs appropriately labeled.

                      I guess my question is this: if I have extracted reads mapping only to loci of interest using:
                      samtools view -h -L MLST.bed Sample_006_004.bam > Sample_006_004_MLST.sam

                      then should I create a new reference sequence including only the loci of interest in order to generate the consensus sequence only for the loci of interest (and not generate the consensus for the entire genome)?

                      Thank you for all of your help!

                      Comment

                      • shuoguo
                        Member
                        • Sep 2012
                        • 23

                        #12
                        Originally posted by ksw9 View Post
                        Okay, I think my problem was in my original .bed file which now has contigs appropriately labeled.

                        I guess my question is this: if I have extracted reads mapping only to loci of interest using:
                        samtools view -h -L MLST.bed Sample_006_004.bam > Sample_006_004_MLST.sam

                        then should I create a new reference sequence including only the loci of interest in order to generate the consensus sequence only for the loci of interest (and not generate the consensus for the entire genome)?

                        Thank you for all of your help!
                        Good to know, thanks!

                        Comment

                        Latest Articles

                        Collapse

                        • SEQadmin2
                          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                          by SEQadmin2


                          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                          Here are nine questions we think about, in roughly the order they matter, before...
                          06-18-2026, 07:11 AM
                        • SEQadmin2
                          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                          by SEQadmin2


                          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                          ...
                          06-02-2026, 10:05 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by SEQadmin2, 06-17-2026, 06:09 AM
                        0 responses
                        31 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-09-2026, 11:58 AM
                        0 responses
                        96 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-05-2026, 10:09 AM
                        0 responses
                        117 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-04-2026, 08:59 AM
                        0 responses
                        109 views
                        0 reactions
                        Last Post SEQadmin2  
                        Working...