Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • reubennowell
    Member
    • Jan 2013
    • 18

    First 20bp of MiSeq reads are weird

    Hello,

    The first 20 bases of my MiSeq reads show abnormal %A, T, G and C, as evidenced by the 'per base sequence content' tab of the FastQC report (see the attached PNG). The per base GC content is similarly weird, but the quality of these bases is good.

    The issue can be easily rectified by removing the first 20 bp of each read, but can anyone enlighten me as to what is causing this? I have used both CutAdapt and TagDust on these reads to get rid of adapter sequences. I thought maybe it was the Illumina barcodes, except the barcode sequence is usually contained within the fastq header, thus:

    Code:
    @MVM-RI-I124161:11:000000000-A3985:1:1101:18249:1757 1:N:0:TAAGGCGANAGATCGC
    And searching for this sequence (i.e. TAAGGCGANAGATCGC as above) doesn't reveal it to be at the start of the read.

    What is it?? And what's the best way of dealing with it? Simply chop the first 20bp off my reads or is it something that requires a bit more QC?

    Thanks!
    Attached Files
  • Bukowski
    Senior Member
    • Jan 2010
    • 388

    #2
    Is that RNA-Seq by any chance?

    Generation of cDNA using random hexamer priming induces biases in the nucleotide composition at the beginning of transcriptome sequencing reads from the Illumina Genome Analyzer. The bias is independent of organism and laboratory and impacts the ...

    Comment

    • mastal
      Senior Member
      • Mar 2009
      • 666

      #3
      First 20bp of MiSeq reads are weird

      We are seeing this in most of our Illumina data too, and we aren't doing RNA-Seq.

      I think the reasons have been discussed in previous threads, if I can find one of the discussions I'll post the link.

      Comment

      • reubennowell
        Member
        • Jan 2013
        • 18

        #4
        Thanks guys,

        Nope, not RNA-Seq - this is bacterial genomic DNA. Mastal, can you remember what you did to account for it? Trim the first X bases off the 5' end? In my datasets, it seems to disappear completely after base #20.

        Comment

        • mastal
          Senior Member
          • Mar 2009
          • 666

          #5
          First 20bp of MiSeq reads are weird

          I just remove those 20 or so bases from the start of the reads.

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            This is not an abnormal observation. Generally you should not need to trim this data.

            Is this a known genome or unknown? If known you can take a few reads and map them. There should be no problems getting this data to map.

            Application of sequencing to RNA analysis (RNA-Seq, whole transcriptome, SAGE, expression analysis, novel organism mining, splice variants)

            Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc
            Last edited by GenoMax; 05-23-2013, 11:33 AM.

            Comment

            • JackieBadger
              Senior Member
              • Mar 2009
              • 385

              #7
              Ive seen something similar @~9bases in one or two amplicon libraries.
              I didn't delve too deep into what the cause was, I just knew that it had to go.

              Comment

              • nickloman
                Senior Member
                • Jul 2009
                • 355

                #8
                If this is from libraries made with Nextera it may represent biases in incorporation sites favoured by the transposase. We see this phenomenon frequently and don't find it necessary to trim it.

                Comment

                Latest Articles

                Collapse

                • SEQadmin2
                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                  by SEQadmin2


                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                  ...
                  06-02-2026, 10:05 AM
                • SEQadmin2
                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                  by SEQadmin2


                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                  Introduction

                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                  05-22-2026, 06:42 AM
                • SEQadmin2
                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                  by SEQadmin2

                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                  05-06-2026, 09:04 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, 06-02-2026, 12:03 PM
                0 responses
                21 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-02-2026, 11:40 AM
                0 responses
                14 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 05-28-2026, 11:40 AM
                0 responses
                29 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 05-26-2026, 10:12 AM
                0 responses
                31 views
                0 reactions
                Last Post SEQadmin2  
                Working...