Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • yaaraore
    Junior Member
    • Oct 2011
    • 3

    GC bias at the 5’ of transcriptome

    Hi all,
    We have just received data from our last bacterial transcriptome analysis and we got a very weird result. The first 12 bases show a high GC content that can not be random (see attached file). Did this thing ever happened to you and do you have any idea why such things happen? This is not caused by a contamination or an adaptor problem as all sequences (including the 12 first bases of each sequence) can be mapped to our genome.
    Any idea will be welcomed
    Thanks,
    Yaara
    Attached Files
  • Simon Anders
    Senior Member
    • Feb 2010
    • 995

    #2
    These two papers should answer your question:

    Hansen KD, Brenner SE, Dudoit S.
    Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Research. 2010:gkq224.


    Levin JZ, Yassour M, Adiconis X, et al.
    Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nature Methods. 2010;7(9):709-715

    Comment

    • simonandrews
      Simon Andrews
      • May 2009
      • 870

      #3
      In addition to the hexamer bias (which everyone gets), you've also got something else contaminating that library (possibly adapter dimers?). The content plot should flatten out after about 12 bases but yours has ongoing fluctuations. Hopefully the overrepresented sequences result will pinpoint what this is.

      Comment

      • yaaraore
        Junior Member
        • Oct 2011
        • 3

        #4
        Two more things…

        Thank you very much for your replay. There are two things I am still not sure I got:
        1. I understand that the first bases are always problematic but isn’t the bias I got more severe than what one usually gets? Can I use the data from this run (after correction of course) or would you consider it dead end?
        2. As for the second comment (simonandrews) I am not sure how can I check for adapter dimmers contamination. 80% of my reads were mapped to the genome so I assumed that I do not have any problem there. Can you please be elaborate?

        Thank you all! It helps a lot!

        Comment

        • Simon Anders
          Senior Member
          • Feb 2010
          • 995

          #5
          There might be one specific sequence that is repeated over and over in your remaining 20%, and this is what causes the wiggles for the middle and right of the reads in your plot. Some read quality assessment tools give you a list of the most often repeated sequences among your reads. Try this and see if you recognize your adapters in the sequence.

          Regarding tools: Martin Morgan's ShortRead Bioconductor package gives you lists of the most common reads, Simon Andrew's FastQC listrs of most common kmers.

          Comment

          Latest Articles

          Collapse

          • SEQadmin2
            From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
            by SEQadmin2


            Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


            The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
            ...
            06-02-2026, 10:05 AM
          • SEQadmin2
            Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
            by SEQadmin2


            With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


            Introduction

            Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
            05-22-2026, 06:42 AM
          • SEQadmin2
            Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
            by SEQadmin2

            Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


            Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
            05-06-2026, 09:04 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, Today, 08:59 AM
          0 responses
          8 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 12:03 PM
          0 responses
          21 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 11:40 AM
          0 responses
          17 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 05-28-2026, 11:40 AM
          0 responses
          30 views
          0 reactions
          Last Post SEQadmin2  
          Working...