Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Separating a .sam or .bam file that is based on alignment to multiple sequences

    Hey people!
    I have the following setup. I have a .sam file based on the alignment of E.coli genome reads to a multifasta file containing the sequences of several plasmids. Now I want to separate the .sam file into several sam files, each containing the header and mapping information for one plasmid, so that I can later calculate the % coverage per plasmid.

    I would do it such that I create a dictionary in python, append the header of a sequence as key and the corresponding sequence (identifiable because the key should also be present in the line of the alignment/mapping block). Then I write that to a file, convert to .bam and so on.

    Now my questions: Is this an appropriate way to do what I want to do or would I lose information? Is there a better way to do this when the file is in .bam format?

    Thanks for your help!

  • #2
    Unless you want to write your own code you can use "bamtools split" (or other programs) as noted in this thread: https://www.biostars.org/p/46327/

    Comment


    • #3
      Which program did you for aligning your reads? If you have used a multi-fasta as reference, you should normally have the sequence headers of this multi-fasta as header in your sam file. Likewise, you should have the sequence ids as "reference id" in the sam line. Then, its quite straightforward to use the sam/bam file as it is and compute the coverage of the individual reference seqs.

      Comment


      • #4
        @WhatsOEver: I used bowtie2. But how can I calculate the coverage of each reference sequence using only my file?

        @GenoMax: Thanks, that looks usefull!
        Last edited by sequence_hard; 02-09-2016, 06:41 AM.

        Comment


        • #5
          Either you use a different mapper which has this capability (eg bbmap does this with the scafstats parameter) or (what I prefer) you use a program like bedtools genomecov (eg bedtools genomecov -ibam yourAlignedData.bam -g yourMultiFasta.fasta). The function of bedtools is nicely explained here: http://bedtools.readthedocs.org/en/l...genomecov.html

          Comment


          • #6
            Samtools depth may work in a pinch. You could also use Qualimap to get visual maps of coverage.

            Comment


            • #7
              @WhatsOEver: Awesome, thanks! I am already using bedtools genomecov but I did not know it had that function. Nice!

              Comment


              • #8
                Originally posted by sequence_hard View Post
                @WhatsOEver: Awesome, thanks! I am already using bedtools genomecov but I did not know it had that function. Nice!
                You are welcome
                As to my experience, there is pretty much nothing you can't do with bedtools when it comes to sequence coverage.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                9 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                67 views
                0 likes
                Last Post seqadmin  
                Working...
                X