Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Filtration of plasmid metagenome paired end reads

    Hi all,

    I am studying plasmids metagenome from clinical samples. The plasmids were captured from metagenomic DNA by digestion of linear DNA (leaving closed circular DNA safe), random insertion of transposon, and cloning into E. coli. Then, I used the purified plasmids from E. coli clones to construct the sequencing library. My plan next is to assemble the paired end reads generated from the sequencing. Now, I am expecting that the reads will have high amount of transposon and E. coli sequences that were introduced during the plasmid isolation. My question is, what is the best way to filter out reads that belong to the transposon and E. coli, leaving only transposon and E. coli FREE reads for the the assembly step? I think these sequences will highly affect the assembly process. I tried Bowtie 2.0, but it doesn't seem to be doing a good job since most of the scaffolds that I got after the de novo assembly belong to the cloning strain.
    Platform is Illumina HiSeq2500 (reads are 2x150bp)
    the assembler is SOAPdenovo

    I hope someone could help me in this matter.

    Cheers,
    TJ

  • #2
    Have you checked what pattern a typical read has? For example, [META_GENOMIC_SEQ] [TRANSPOSON_ELEMENT] [E_COLI_SEQ] [ADAPTER_SEQ].

    Can we divide the reads into three categories, namely (1) pure metagenomic DNA; (2) junction DNA; (3) E-coli only DNA?
    If that is true. Then you can filter out reads of category 3 based on alignment to E-coli reference genome and trim out boundary sequences from junction DNA.

    Comment


    • #3
      Thanks relipmoc for your reply,

      Forgive my ignorance, but I am not quite sure about what you mean in the first part. It is a shotgun library. So I don't know if they would have a pattern other than barcodes in the 5' ends.

      Dividing the reads into three categories is exactly what I want to do. Do you know what software I should use to do this? I assume I need to do that on two levels. 1st alignment against E. coli genome and take the unaligned sequences and do a second alignments against the transposon sequences. The final unaligned sequences should be used then for my plasmid de novo assembly. Is this the right way?

      Thanks,
      TJ

      Comment


      • #4
        I made a program specifically for separating reads, since we work with a lot of metagenomic communities.

        First, download BBMap

        Then run this:

        bbsplit.sh in=reads.fq ref=ecoli.fa,transposon.fa basename=out_%.fq outu=clean.fq int=t

        This will produce 3 output files:
        out_ecoli.fq (ecoli reads)
        out_transposon.fq (transposon reads)
        clean.fq (all other reads: 'outu' means unmapped output)

        It's very fast. The command above is for paired reads that are interleaved (the int=t flag). If the paired reads are in 2 files, use 'in1=' and 'in2=' and leave off the 'int' flag. The output will be interleaved, but if you want it in twin files, you can say 'outu1=clean1.fq outu2=clean2.fq'

        Comment


        • #5
          Hello Brian,

          I have downloaded the program and when I run the command I got the following error:

          Exception in thread "main" java.lang.UnsupportedClassVersionError: align2/BBSplitter : Unsupported major.minor version 51.0
          at java.lang.ClassLoader.defineClass1(Native Method)
          at java.lang.ClassLoader.defineClass(ClassLoader.java:643)
          at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
          at java.net.URLClassLoader.defineClass(URLClassLoader.java:277)
          at java.net.URLClassLoader.access$000(URLClassLoader.java:73)
          at java.net.URLClassLoader$1.run(URLClassLoader.java:212)
          at java.security.AccessController.doPrivileged(Native Method)
          at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
          at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
          Could not find the main class: align2.BBSplitter. Program will exit.

          the command line that I used is as following:
          bbsplit.sh in1=plasmid_R1.fastq in2=plasmid_R2.fastq ref=Escherichia_coli.fasta,Transposon.fasta basename=out.fastq outu1=plasmid_R1_clean.fastq outu2=plasmid_R2_clean.fastq qin=33 -Xmx200g

          I really appreciate your help.

          Cheers,
          TJ

          Comment


          • #6
            TJ,

            You have java 6 or earlier installed; BBMap is compiled for java 7. You can either download and install java 7 (download the JRE or JDK), or wait for me to post a version compiled for java 6 (I'll do that later today), or try to recompile it yourself if you have javac in your path (run compile.sh). I'll post here once I put up a java 6 version. But I suggest you install java 7 (or get a sysadmin to do it).

            Comment


            • #7
              Hi Brain,

              I will try to install Java 7 and re-run the comman again. I will let you know of the result.

              Thanks,
              TJ

              Comment


              • #8
                Hi Brian,

                I have updated the java and the software worked perfect!!! It is as you said very quick compared to other softwares I tried.

                For the de novo assmbly, almost all of my scafolds belong to plasmids with very small number of contigs belong to the cloning strain.

                My conclusion is your software performed better than 4 other alignment softwares that I used before. BBMap is excellent for filtering unwanted reads (reads belong to the tansposon and cloning bacteria in my case) from metagenomic data.

                Thank you very much for your help.
                TJ

                Comment


                • #9
                  Originally posted by Abujamel_t View Post
                  Hi Brian,

                  I have updated the java and the software worked perfect!!! It is as you said very quick compared to other softwares I tried.

                  For the de novo assmbly, almost all of my scafolds belong to plasmids with very small number of contigs belong to the cloning strain.

                  My conclusion is your software performed better than 4 other alignment softwares that I used before. BBMap is excellent for filtering unwanted reads (reads belong to the tansposon and cloning bacteria in my case) from metagenomic data.

                  Thank you very much for your help.
                  TJ
                  You're welcome; and thanks for the feedback!

                  Comment


                  • #10
                    Hi Abujamel_t,
                    I was wondering why you used linear DNA digestion to remove non-plasmid DNA. Is there a particular reason that you did not want to use a plasmid extraction kit. Thank you for your answer.

                    Comment


                    • #11
                      Hello rnaeye,

                      The main reason is that I am expecting very limited amount of plasmid in the metagenomic DNA, and I wanted to remove the linear DNA in order to increase the chance of recovering the plasmids from the total DNA. I thought a about plasmid purification kits (and I did a couple of trails), but I think these kits are made for purifying plasmids mainly from E. coli and will not work well with other bacteria such as Gram positive (which is hard to break). Therefore, it is better to extract metagenomic DNA with very efficient methods such as chemical and mechanical lysis then try to purify the plasmid from there.

                      I hope I answered your question.

                      Cheers,
                      TJ

                      Comment


                      • #12
                        Thank you Abujamel_t for information. It's very helpful. Good luck with your research.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM
                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 06:37 PM
                        0 responses
                        11 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Yesterday, 06:07 PM
                        0 responses
                        10 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-22-2024, 10:03 AM
                        0 responses
                        51 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-21-2024, 07:32 AM
                        0 responses
                        67 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X