Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Volume of Output data in Illumina HiSeq 2000

    Hi all,

    Does Anyone know volume of output data for metagenimics samples (such as gut sample) in Illumina HiSeq 2000?

    I asked this question because as I checked the following link:


    The paired data is about one Giga byte! Is it the whole data that produced by HiSeq 2000 or the author trimmed it?

    Thank you,

  • #2
    Data output from a HiSeq is going to be independent of the kind of sample you are running. HiSeq 2000 has been discontinued but you can see the technical specs for a HiSeq 2500 at this link: http://www.illumina.com/systems/hise...fications.html

    Basically one can get a boatload of data (to be scientific about it ) as long as you have good libraries. Paper you are looking at should have information about how the data was processed (trimming etc) by the authors.

    Comment


    • #3
      That's definitely not the entire output of data from a HiSeq 2000, at least, it's not if we're assuming that the libraries were of decent quality.

      One lane should be around 187 million reads, assuming v3 SBS chemistry. Output per lane should be something like 37 Gbases. For metagenomics, though, especially a gut sample, the vast majority of reads (>90%, if I remember correctly) is going to be the human genome with only a small proportion of reads aligning to the metagenome. If the host sequence is filtered out, the data sets can seem unusually small for HiSeq output.

      The larger files in the data you linked to each appear to represent the output from separate lanes on a HiSeq run of 2x250 cycles (one is lane 7, the other is lane 8). Those are the datasets that are around 11 million reads, 1.8 Gbases each. That would put them around 5% of the expected number of reads for a lane, which sounds about right to me, assuming the host sequence is filtered.

      GenoMax also makes a good suggestion- the paper you're looking at should have a breakdown of the data-- how much was discarded as host sequence, whether any other trimming or filtering was performed, etc. There may be something going on with the data that isn't readily apparent in the metadata on the linked page.
      Last edited by Jessica_L; 06-03-2015, 08:59 AM. Reason: edited for clarity

      Comment


      • #4
        Dear Jessica,

        Thank you for your complete answer. You clear lots of things for me! Because I am new in Illumina and also metagenomics and don't know lots of things! Thank you really.



        Also I've sent an email to the authors of this paper about their results and the two paired SRA files (Two large files). They answered me:

        My Email:
        Dear Dr. Bork and Dr. Sobhani,

        I am a PhD student of bioinformatics in The University of Hong Kong and work on metagenomics, recently I came across your valuable publication



        I have a problem with metagenomics data that is used in your work, as I checked different sample ID in NCBI SRA database, for each one you propose more than one SRA file, e.g for the sample CCIS17669415ST-4-0 :


        or for CCIS71301801ST-4-0:


        I know 2 of them single-end and 2 of them paired-end but I wonder why you used 2 SRA for each one (for more accuracy?), and which one of them is used in your study? You choose one of the paired-end one as representative of others or you try to merge all paired-end and make one read file?

        Best Wishes,
        Mohamad Koohi-Moghadam


        The Authors answer:
        Dear Mohamad,

        sorry for the delayed answer.

        The quick answer to your question is: you should merge all these files corresponding to one sample ID to construct profiles for this patient. Note that in most cases they are split due to technical reasons (enforced by legal restrictions and ENA technicalities) rather than independently processed sequencing samples. More details below.

        Best wishes,
        Georg


        As we can only upload reads that are not matching the human genome (for legal reasons to prevent patient identification) and our screening against the human genome is done per read (rather than per read-pair), some reads lose their mate. So we uploaded each original sample as three files: pair 1, pair 2, and single, following EBI-ENA's suggestion. Some samples we actually sequenced on two to four independent lanes (technical replicates), so for these the total number of single and paired files will vary. We didn't analyse these separately, but pooled all the reads to construct taxonomic and functional profiles.


        Originally posted by Jessica_L View Post
        That's definitely not the entire output of data from a HiSeq 2000, at least, it's not if we're assuming that the libraries were of decent quality.

        One lane should be around 187 million reads, assuming v3 SBS chemistry. Output per lane should be something like 37 Gbases. For metagenomics, though, especially a gut sample, the vast majority of reads (>90%, if I remember correctly) is going to be the human genome with only a small proportion of reads aligning to the metagenome. If the host sequence is filtered out, the data sets can seem unusually small for HiSeq output.

        The larger files in the data you linked to each appear to represent the output from separate lanes on a HiSeq run of 2x250 cycles (one is lane 7, the other is lane 8). Those are the datasets that are around 11 million reads, 1.8 Gbases each. That would put them around 5% of the expected number of reads for a lane, which sounds about right to me, assuming the host sequence is filtered.

        GenoMax also makes a good suggestion- the paper you're looking at should have a breakdown of the data-- how much was discarded as host sequence, whether any other trimming or filtering was performed, etc. There may be something going on with the data that isn't readily apparent in the metadata on the linked page.

        Comment


        • #5
          you're welcome! I'm glad you got everything figured out! =)

          Comment


          • #6
            Thank you Jessica ,

            Could you possibly send me your email address to me? I really like to be in touch with you and use your advice. My email address is mkoohim {AT} gmail.com

            Thank you,

            Originally posted by Jessica_L View Post
            you're welcome! I'm glad you got everything figured out! =)

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            9 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            51 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            67 views
            0 likes
            Last Post seqadmin  
            Working...
            X