Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extracting mitochondrial reads

    We have sequenced plant through whole-genome short gun sequencing approach with illumina paired-end reads. I am thinking to extract mitochondiral reads by mapping against available mitochondrial genomes in NCBI. Two options comes to my mind

    1. Denovo genome assembly using velvet and blast against genbank mitochondrial genomes to extract mitochondrial reads and proceed for denovo mitochondrial genome assembly.
    2. To align the reads to each mitochondiral genome using tophat or any other aligner like bowtie or bwa and extract mitochondiral related reads and then proceed for denovo mitochondrial genome assembly

    I like the option two, because we can extract without denovo genome assembly. But the daunting task will be mapping raw reads and extracting mapped reads for each genome. Are there any script available already to extract mapped reads?

    Which method is the best one you think?
    Last edited by bioman1; 03-31-2014, 08:30 PM.

  • #2
    A common aligner such as bowtie2 can output 'mapped' reads.
    If you want reads mapped to different mitochondrial genomes going to seperate files. You can either:
    1) Make seperate indexes for each mitochondrial genome, then run a mapping process seperately. This way you get a seperate set of reads that align to different genomes.
    2) Brian Bushnell recently announced a set of tools including BBMap and BBSplit. If they work like he says, then you should be able to get what you want with just a single run. I can't be sure though, since I haven't tested these tools...

    Comment


    • #3
      You could try MITObim,

      Comment


      • #4
        Originally posted by yueluo View Post
        1) Make seperate indexes for each mitochondrial genome, then run a mapping process seperately. This way you get a seperate set of reads that align to different genomes.
        Since MT-Genomes aren't that large, imo he could also 'cat' all MTgenomes.fa he wants to into one genome. Map against this, since the mapping location will give rise to which genome the reads map you also get the information in just one mapping run. Need to cat before though.

        Comment


        • #5
          Thanks maubp..this tools seems to interesting to try out.
          @sphil: do you think is that right approach to combine all plant mitochondrial genomes (93 genome) in to one MTgenome.fa using 'cat' and mapping reads to them will be right approach?. If it works, it will be good idea.

          I am also aiming to separate chloroplast reads by using similar approach. Any tools available for this like MITobim for chloroplast?

          Comment


          • #6
            If you have a good reference you can map directly to it. I have done this with Newbler before and got quick and good mtDNA genomes out of total DNA pools

            Comment


            • #7
              Originally posted by bioman1 View Post
              Thanks maubp..this tools seems to interesting to try out.
              @sphil: do you think is that right approach to combine all plant mitochondrial genomes (93 genome) in to one MTgenome.fa using 'cat' and mapping reads to them will be right approach?. If it works, it will be good idea.

              I am also aiming to separate chloroplast reads by using similar approach. Any tools available for this like MITobim for chloroplast?
              I acutally don't see any problem with that. However, I actually have no clue how much sequence similarity there is between those MT-Genomes. But anyways I would definitely give it a try!!!

              Comment


              • #8
                Originally posted by yueluo View Post
                2) Brian Bushnell recently announced a set of tools including BBMap and BBSplit. If they work like he says, then you should be able to get what you want with just a single run. I can't be sure though, since I haven't tested these tools...
                That's correct, BBSplit was designed specifically for this scenario.

                bbsplit.sh ref=x.fa,y.fa,z.fa in=reads.fq basename=o%.fq

                That will create 3 output files, "ox.fq", "oy.fq", and "oz.fq". The reads that map best to x.fa will go to ox.fq, and so forth.

                Some reads can map ambiguously to multiple references, if the sequences are highly conserved; you can control that with the "ambig2" flag. "ambig2=best" will map the read to the first best-matching reference; "toss" will discard away ambiguous reads; "all" will send the read to the output for every reference to which it maps; and "split" will put the ambiguous reads in a separate file (i.e. AMBIGUOUS_ox.fq) for each reference to which it maps.

                Comment


                • #9
                  @Brian Bunshell, I think BBsplit will helpful in my work. Thanks.
                  regarding command line

                  bbsplit.sh ref=x.fa,y.fa,z.fa in=reads.fq basename=o%.fq

                  for me ref will be 93 genomes, should I need to give individually like ref= r1.fa, r2.fa, r3.fa..r93.fa or can I combine all reference genomes in one file using 'cat' as sphil suggested.

                  I will be paired end reads, should I need make interleaved as single file?. Did the output best match ox.fa will be single file. Can I use this reads for further denovo assembly?

                  Comment


                  • #10
                    Originally posted by bioman1 View Post
                    @Brian Bunshell, I think BBsplit will helpful in my work. Thanks.
                    regarding command line

                    bbsplit.sh ref=x.fa,y.fa,z.fa in=reads.fq basename=o%.fq

                    for me ref will be 93 genomes, should I need to give individually like ref= r1.fa, r2.fa, r3.fa..r93.fa or can I combine all reference genomes in one file using 'cat' as sphil suggested.

                    I will be paired end reads, should I need make interleaved as single file?. Did the output best match ox.fa will be single file. Can I use this reads for further denovo assembly?
                    bioman,

                    If your reads are paired in two files, you should use "in1=reads1.fq in2=reads2.fq". For paired input, the output will be interleaved, but the bbtools package includes another program, "reformat.sh", which can interleave, de-interleave, change quality offset, file format, and do various other things like subsampling and trimming, very fast.

                    To interleave:
                    reformat.sh in1=read1.fq in2=read2.fq out=reads.fq

                    To de-interleave:
                    reformat.sh in=reads.fq out1=read1.fq out2=read2.fq

                    By default, if there is a single input file, BBMap will autodetect whether the reads are interleaved or single-ended based on the names, assuming normal Hiseq/Miseq naming conventions. You can override this with the "interleaved=t" or "interleaved=f" flag. All flags are in any order.

                    If you have any trouble with RAM regarding bbsplit, you can set the amount of memory to use with the flag "-Xmx1g" for (for example) 1 GB RAM. This should be set to about 85% of physical memory, but don't bother unless you encounter a problem.

                    -Brian

                    P.S. I forgot to answer your question... No, you MUST specify "ref=r1.fa,r2.fa,r3.fa..r93.fa" (without any spaces). If you use cat, it will not work.
                    Last edited by Brian Bushnell; 04-01-2014, 07:56 PM.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 03-27-2024, 06:37 PM
                    0 responses
                    13 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-27-2024, 06:07 PM
                    0 responses
                    11 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    53 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    69 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X