Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to deal with 16S rDNA data form Illumina

    Hi Everyone,
    I am very new in the field of Bioinformatics, I have a sequencing data from Ilumina of 16S r DNA from water. I got the quality score distribution of the data and it lies within 30 -35 range, which means it is a good data to start with. I want to know what will be the next step to deal with this data.
    I am thinkng of doing this:
    1) Blast the data with ribosomal data base

    Can anyone provide me some idea how to start with this data.


    Thanks for any help!!!!!

  • #2
    How many reads do you have? The more reads you have the longer it takes (sometimes even 3 months) so it's probably not a good idea. You should check if you can make your dataset smaller.
    Maybe you should also try to use another program than BLAST, it is slower than alot of different tools, like: https://github.com/csmiller/EMIRGE
    If it still takes a long time let me know, I can help you search for some other things too.

    Good luck.

    Comment


    • #3
      Originally posted by newBioinfo View Post
      Hi Everyone,
      I am very new in the field of Bioinformatics, I have a sequencing data from Ilumina of 16S r DNA from water. I got the quality score distribution of the data and it lies within 30 -35 range, which means it is a good data to start with. I want to know what will be the next step to deal with this data.
      I am thinkng of doing this:
      1) Blast the data with ribosomal data base

      Can anyone provide me some idea how to start with this data.


      Thanks for any help!!!!!
      Take a look at QIIME (www.qiime.org, and the overview tutorial there) or mothur (www.mothur.org). Those provide standard pipelines for dealing with 16S sequences. Blasting some of the sequences against a database such as RDP or greengenes is usually part of the pipeline.

      Comment


      • #4
        Originally posted by RickBioinf View Post
        How many reads do you have? The more reads you have the longer it takes (sometimes even 3 months) so it's probably not a good idea. You should check if you can make your dataset smaller.
        Maybe you should also try to use another program than BLAST, it is slower than alot of different tools, like: https://github.com/csmiller/EMIRGE
        If it still takes a long time let me know, I can help you search for some other things too.

        Good luck.
        Thanks RickBioinf for the help.
        I have around 78 million reads and I have filtered these reads to 77 million. Now my data has those reads which have no 'N'. I tried blasting the data to non redundant database but it was taking too long. I will try what you have suggested. So, this db has only ribosomal DNA.
        Thanks for the help!!!

        Comment


        • #5
          Originally posted by wkrhc4mia View Post
          Take a look at QIIME (www.qiime.org, and the overview tutorial there) or mothur (www.mothur.org). Those provide standard pipelines for dealing with 16S sequences. Blasting some of the sequences against a database such as RDP or greengenes is usually part of the pipeline.
          Thanks wkrhc4mia,
          I am thinking of using mothur, so you mean I do not have to blast the data to any db separately, it will be a part of mothur pipeline.

          Thanks for the help!!!

          Comment


          • #6
            If you run a single blast search, it is going to take a long time. This is where you could break up your initial search file into multiple smaller fragments and then run the searches in parallel (would work best if you have access to a compute cluster).

            There are parallel implementations of blast http://www.mpiblast.org/ that can be useful. Installing and using mpiBLAST is not trivial though .. just a fair warning.

            Originally posted by newBioinfo View Post
            Thanks RickBioinf for the help.
            I tried blasting the data to non redundant database but it was taking too long. I will try what you have suggested. So, this db has only ribosomal DNA.
            Thanks for the help!!!

            Comment


            • #7
              There is a software named MEGAN (http://ab.inf.uni-tuebingen.de/software/megan/)
              ,you could use. The reference database you could choose SILVA\Greengene\RDP

              Comment


              • #8
                Becareful if using MEGAN that you don't waste time by doing your blasts against the wrong database.

                MEGAN likes NCBI taxonomy for BLASTN, BLASTX or BLASTP to compare against NCBI-NT, NCBI-NR or genome specifi c databases. MEGAN can also parse fi les generated by the RDP website or the Silva. MEGAN can also parse files in SAM format.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                22 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                24 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                20 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Working...
                X