Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • FASTQC trends

    I have some metagenomic data obtained from whole genome shotgun sequencing using illumina-hiseq. The reads are 100bp paired end and when I examine the reads in fastqc, I see a couple of things. Firstly, the per base sequence content and per base GC content seem to be very skewed at the beginning of the reads (~ bp 1-16), and the per base N content seems to have a spike at bp 4. As well, I have over represented kmers at the beginning of the reads which do not belong to any adapters (as far as I can tell). I know that these trends are sometimes seen in RNA-seq data due to the (not so) random hexamer priming but I am confused as to why I see this in whole genome data. I am also not sure about the N spike at bp 4. I have attached images of what I mentioned and would appreciate any insight.

    thanks.
    Attached Files

  • #2
    Originally posted by salamay View Post
    I have some metagenomic data obtained from whole genome shotgun sequencing using illumina-hiseq. The reads are 100bp paired end and when I examine the reads in fastqc, I see a couple of things. Firstly, the per base sequence content and per base GC content seem to be very skewed at the beginning of the reads (~ bp 1-16), and the per base N content seems to have a spike at bp 4. As well, I have over represented kmers at the beginning of the reads which do not belong to any adapters (as far as I can tell). I know that these trends are sometimes seen in RNA-seq data due to the (not so) random hexamer priming but I am confused as to why I see this in whole genome data. I am also not sure about the N spike at bp 4. I have attached images of what I mentioned and would appreciate any insight.

    thanks.
    I'm assuming these were sequenced on a HiSeq? The spike at 4 cycles is most likely a phenomenon known as Bottom Middle Swath (or BMS in Illumispeak). The HiSeq attempts to find focus before scanning at a fixed point near the inlet port. If a bubble is present over at this point, then there is a mis-focus and that particular swatch is scanned out of focus. You should be able to see if you look at the thumbnail images for cycle 4. Basecalling can't be done on these images, so each cluster is given an N at this position.

    Comment


    • #3
      Thanks tonybrooks, yes it was on a hiseq. I had not heard about this issue before thanks for bringing it to my attention.

      Comment


      • #4
        Originally posted by TonyBrooks View Post
        I'm assuming these were sequenced on a HiSeq? The spike at 4 cycles is most likely a phenomenon known as Bottom Middle Swath (or BMS in Illumispeak). The HiSeq attempts to find focus before scanning at a fixed point near the inlet port. If a bubble is present over at this point, then there is a mis-focus and that particular swatch is scanned out of focus. You should be able to see if you look at the thumbnail images for cycle 4. Basecalling can't be done on these images, so each cluster is given an N at this position.
        See here for more info

        Bridged amplification & clustering followed by sequencing by synthesis. (Genome Analyzer / HiSeq / MiSeq)

        Comment


        • #5
          I've seen the same fluctuation in GC content over the first 20 or so bases on samples run both on the HiSeq and Miseq. I typically have enough coverage to just trim them off even though the Q scores are always above 30.

          Comment


          • #6
            Originally posted by lac302 View Post
            I've seen the same fluctuation in GC content over the first 20 or so bases on samples run both on the HiSeq and Miseq. I typically have enough coverage to just trim them off even though the Q scores are always above 30.
            Thanks lac302, from what I have done so far I have trimmed the sequences up to bp 16 and worked from there as you seem to have done but I can't figure out the cause for it or whether it is a bit wasteful to trim off 15 bp of useful sequence.

            Comment


            • #7
              Was the library prep done using a Nextera kit?

              Comment


              • #8
                Originally posted by mastal View Post
                Was the library prep done using a Nextera kit?
                I believe so but I am not sure and have asked those responsible for the generation of the data. Would using a nextera kit explain what is seen?

                Comment


                • #9
                  Originally posted by salamay View Post
                  I believe so but I am not sure and have asked those responsible for the generation of the data. Would using a nextera kit explain what is seen?
                  Yes. There was a recent thread discussing this. I will post a link if I can find it.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:37 PM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 06:07 PM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  51 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  67 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X