Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • low GC% peak in one end of paired end reads

    Hi,
    I have paired end RNA seq data prepared from Brassica napus using TruSeq kit. After adapter trimming, FastQC shows a second low GC% peak per sequence in the _1.fq files. The _2 files all look ok.

    The low GC% reads don't align to our reference transcriptome, but after blasting a small proportion of the unaligned reads, don't appear to be contamination from another organism - (hits are mostly predicted genes for Brassicas).

    The average GC content is consistent across the length of the reads.

    Does anyone know what might be causing this, particularly in only one of each set of read pairs?

    thanks,
    Alex
    Attached Files

  • #2
    RNAseq?
    Tell us more about the libraries.
    Are you say the forward reads show this bimodal GC distribution but the reverse reads do not? Or does "_1" and "_2" mean something else.
    --
    Phillip

    Comment


    • #3
      Hi Phillip, thanks for your attention - what would you like to know about the libraries?

      Yes exactly, the forward "_1 file" reads are red and orange lines in the thumbnail, the reverse reads are the green. Some of the reverse reads samples have a slight shoulder in the low GC region, but much more minor than the _1 files.

      Comment


      • #4
        How were the libraries constructed? What average insert size did they have? Were the libraries stranded?

        --
        Phillip

        Comment


        • #5
          They were made using "NEB next ultra directional library kit", which uses dUTP method to retain strandedness, and should give an insert size of ~200bp

          Comment


          • #6
            Originally posted by AlexCalderwood View Post
            They were made using "NEB next ultra directional library kit", which uses dUTP method to retain strandedness, and should give an insert size of ~200bp
            Okay, then my hypothesis is that the reverse read is always reading 5' in the cDNA of the forward read. So that elevated AT% is just polyA tail. Or, since you mention hits to "predicted genes", the elevated AT% may just be 3' or 5' non-translated. (Not sure which orientation the NEB kits retain.) Nor whether a 5' or 3' bias is likely in your sequence.

            The non-translated regions of plants are often replete with transposable elements which can themselves have lower GC content. Or, with time after insertion, often become reduced in GC due to cytosine methylation. That is, C deamination is easily repaired because "U's" don't belong in DNA. However, 5-me-C deaminates to "T". So, over evolutionary time, simply methylating transposable elements has a sort of slow-motion "RIPping" effect.

            Just speculation on my part, of course.

            --
            Phillip

            Comment


            • #7
              Could you also post "Per base sequence content" plot form FastQC output.

              Comment


              • #8
                Please see attached for "per base sequence content" for one of the reverse read problem files post trimming. (Sorry, in a previous post I screwed up forward and reverse reads -> _1 is reverse, relative to mRNA)

                I think the gradient of the GC lines is consistent with Phillip's idea of the AT rich 3'UTR being a factor.
                Attached Files

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                8 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                8 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                66 views
                0 likes
                Last Post seqadmin  
                Working...
                X