Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Illumina Data Analysis

    Hi All,
    I am new member to this forum.
    Earlier i used to work with 454 data, now i am switching to illumina.
    I am getting around 300 million reads (100bp) and its a metagenomic sample. So i am really confused about how to start my analysis.
    Earlier i used approaches like blastx but now i think this is not a good option.
    So i was just wondering if anyone had done something like this or have some idea on this.

    I would really appreciate your help.

    Thanks
    SS

  • #2
    check out MG-RAST.

    Comment


    • #3
      You should check: A human gut microbial gene catalogue established by metagenomic sequencing. (doi:10.1038/nature08821)

      To my knowledge this is the only study who have published large scale metagenomics using illumina/short reads.

      rgds
      MA

      Comment


      • #4
        @adamdeluca , thanks for the suggestion, i have used MG-RAST earlier for 454 data, i am not sure how it will react to small reads and also i am not sure if it can handle 300 million reads.

        @MadsAlbertsen, i will surely read the paper, as far as i know they used SOAP assembly for their analysis.

        Comment


        • #5
          I have been doing some similar work, though I don't have as much data. One thing we have been doing is finding the 16 and 23S sequences, using blat and various rDNA databases. That is pretty good in identifying what is there at the genus level, and there are a large number of such reads since rDNA is ~0.1% of the genome.

          I tried assembly with SOAPdenovo - in my case it worked well on a mock community of 10 species but less well on the real sample (which I expect for lack of data). I think a question is how much you can trust the assembly, and how much you can confirm it.

          I am curious why you think blastx is not a viable approach - is it lack of computing resources? I have seen claims of increased speed with different software and hardware, not sure if anyone has direct experience.

          Comment


          • #6
            @cliffbeall,
            thanks for your input. Actually finding rRNA is not a problem, i've made a small rRNA representative database and its doing pretty good job in removing rRNA (via blast).
            Even i tried assembly (velvet) but i don't trust it that much with such a diverse environmental data, but surely i will give a shot to SOAPdenovo (heard a lot about it).
            yes you are right, blastx is not viable because of large amount of data. I have computing resources but still blasting around 300 million reads will take quit a time.
            I am still working on finding the best procedure (most of the people voted for assembly).

            Comment


            • #7
              Have you considered using the USEARCH package instead of BLAST? It might make it possible to do large scale database search?

              rgds
              MA

              Comment


              • #8
                @MA,
                Yes i considered using Ublastx but it has a paid license to get a 64 version, it is going to be expensive if i install it on the clusters

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                18 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                22 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                17 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                48 views
                0 likes
                Last Post seqadmin  
                Working...
                X