Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Genes of interest from SAM/BAM files

    Hi,
    I am trying to compare paired-end Illumina data with standard MLST approaches for genotyping bacteria. What is the best way to extract known marker genes (using coordinates from a reference) from SAM/BAM files created by mapping samples to this reference?

    Ultimately, I just want to compare the genotyping capacity of this shotgun data with traditional, MLST methods and ensure that we are getting good coverage of the MLST markers of choice. I would like to generate consensus sequences from short read data of the makers of interest and am unsure the best way to do this.

    Thank you!

  • #2
    I am not sure what you want to do, but look at bedtools. it may help you.

    Comment


    • #3
      1. http://seqanswers.com/forums/showthread.php?t=39766

      Assuming that you are referring to extracting reads that map to certain marker genes: samtools view should allow you to pull out reads from specified gene regions.

      2. http://seqanswers.com/forums/showthread.php?t=38969

      Samtools mpileup should generate the consensus sequence.

      Comment


      • #4
        Thank you for your help!

        Comment


        • #5
          I am trying to generate a consensus sequence using the following command:

          samtools mpileup -uf B31.fna Sample_Bbcap10_006_004_MLST.bam | bcftools view -cg - | /home/bioinfo/software/samtools/samtools-0.1.19/bcftools/vcfutils.pl vcf2fq > cns.fq

          and struggling with the error:

          Use of uninitialized value in length at /home/bioinfo/software/samtools/samtools-0.1.19/bcftools/vcfutils.pl line 544, <> line 57.


          Does anyone have any insight on this? I'm just beginning with SAMtools and really appreciate the support.

          Comment


          • #6
            Have you indexed your reference genome file (B31.fna) with samtools faidx? That is a fasta format file correct?

            Code:
            $ samtools faidx B31.fna
            Last edited by GenoMax; 02-21-2014, 11:44 AM.

            Comment


            • #7
              Yes, it is a fasta and I did index with samtools faidx.

              Comment


              • #8
                looks to me the error occurs at last step. does "vcfutils.pl" takes stdin?

                Comment


                • #9
                  @ksw9: This may be an obvious question but was the reference file used for generation of the indexes and the alignments to generate the BAM's? Based on the name it looks like that may be a 454 sequence file. What aligner did you use?

                  Comment


                  • #10
                    I used BWA for mapping paired-end Illumina reads to the reference genome.

                    Comment


                    • #11
                      Okay, I think my problem was in my original .bed file which now has contigs appropriately labeled.

                      I guess my question is this: if I have extracted reads mapping only to loci of interest using:
                      samtools view -h -L MLST.bed Sample_006_004.bam > Sample_006_004_MLST.sam

                      then should I create a new reference sequence including only the loci of interest in order to generate the consensus sequence only for the loci of interest (and not generate the consensus for the entire genome)?

                      Thank you for all of your help!

                      Comment


                      • #12
                        Originally posted by ksw9 View Post
                        Okay, I think my problem was in my original .bed file which now has contigs appropriately labeled.

                        I guess my question is this: if I have extracted reads mapping only to loci of interest using:
                        samtools view -h -L MLST.bed Sample_006_004.bam > Sample_006_004_MLST.sam

                        then should I create a new reference sequence including only the loci of interest in order to generate the consensus sequence only for the loci of interest (and not generate the consensus for the entire genome)?

                        Thank you for all of your help!
                        Good to know, thanks!

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Advancing Precision Medicine for Rare Diseases in Children
                          by seqadmin




                          Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                          12-16-2024, 07:57 AM
                        • seqadmin
                          Recent Advances in Sequencing Technologies
                          by seqadmin



                          Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                          Long-Read Sequencing
                          Long-read sequencing has seen remarkable advancements,...
                          12-02-2024, 01:49 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 12-17-2024, 10:28 AM
                        0 responses
                        33 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 12-13-2024, 08:24 AM
                        0 responses
                        49 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 12-12-2024, 07:41 AM
                        0 responses
                        34 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 12-11-2024, 07:45 AM
                        0 responses
                        46 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X