Hey people!
I have the following setup. I have a .sam file based on the alignment of E.coli genome reads to a multifasta file containing the sequences of several plasmids. Now I want to separate the .sam file into several sam files, each containing the header and mapping information for one plasmid, so that I can later calculate the % coverage per plasmid.
I would do it such that I create a dictionary in python, append the header of a sequence as key and the corresponding sequence (identifiable because the key should also be present in the line of the alignment/mapping block). Then I write that to a file, convert to .bam and so on.
Now my questions: Is this an appropriate way to do what I want to do or would I lose information? Is there a better way to do this when the file is in .bam format?
Thanks for your help!
I have the following setup. I have a .sam file based on the alignment of E.coli genome reads to a multifasta file containing the sequences of several plasmids. Now I want to separate the .sam file into several sam files, each containing the header and mapping information for one plasmid, so that I can later calculate the % coverage per plasmid.
I would do it such that I create a dictionary in python, append the header of a sequence as key and the corresponding sequence (identifiable because the key should also be present in the line of the alignment/mapping block). Then I write that to a file, convert to .bam and so on.
Now my questions: Is this an appropriate way to do what I want to do or would I lose information? Is there a better way to do this when the file is in .bam format?
Thanks for your help!
Comment