Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Separating a .sam or .bam file that is based on alignment to multiple sequences

    Hey people!
    I have the following setup. I have a .sam file based on the alignment of E.coli genome reads to a multifasta file containing the sequences of several plasmids. Now I want to separate the .sam file into several sam files, each containing the header and mapping information for one plasmid, so that I can later calculate the % coverage per plasmid.

    I would do it such that I create a dictionary in python, append the header of a sequence as key and the corresponding sequence (identifiable because the key should also be present in the line of the alignment/mapping block). Then I write that to a file, convert to .bam and so on.

    Now my questions: Is this an appropriate way to do what I want to do or would I lose information? Is there a better way to do this when the file is in .bam format?

    Thanks for your help!

  • #2
    Unless you want to write your own code you can use "bamtools split" (or other programs) as noted in this thread: https://www.biostars.org/p/46327/

    Comment


    • #3
      Which program did you for aligning your reads? If you have used a multi-fasta as reference, you should normally have the sequence headers of this multi-fasta as header in your sam file. Likewise, you should have the sequence ids as "reference id" in the sam line. Then, its quite straightforward to use the sam/bam file as it is and compute the coverage of the individual reference seqs.

      Comment


      • #4
        @WhatsOEver: I used bowtie2. But how can I calculate the coverage of each reference sequence using only my file?

        @GenoMax: Thanks, that looks usefull!
        Last edited by sequence_hard; 02-09-2016, 06:41 AM.

        Comment


        • #5
          Either you use a different mapper which has this capability (eg bbmap does this with the scafstats parameter) or (what I prefer) you use a program like bedtools genomecov (eg bedtools genomecov -ibam yourAlignedData.bam -g yourMultiFasta.fasta). The function of bedtools is nicely explained here: http://bedtools.readthedocs.org/en/l...genomecov.html

          Comment


          • #6
            Samtools depth may work in a pinch. You could also use Qualimap to get visual maps of coverage.

            Comment


            • #7
              @WhatsOEver: Awesome, thanks! I am already using bedtools genomecov but I did not know it had that function. Nice!

              Comment


              • #8
                Originally posted by sequence_hard View Post
                @WhatsOEver: Awesome, thanks! I am already using bedtools genomecov but I did not know it had that function. Nice!
                You are welcome
                As to my experience, there is pretty much nothing you can't do with bedtools when it comes to sequence coverage.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Recent Advances in Sequencing Analysis Tools
                  by seqadmin


                  The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                  Yesterday, 07:48 AM
                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 07:17 AM
                0 responses
                11 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-02-2024, 08:06 AM
                0 responses
                19 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-30-2024, 12:17 PM
                0 responses
                20 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-29-2024, 10:49 AM
                0 responses
                28 views
                0 likes
                Last Post seqadmin  
                Working...
                X