SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Low tophat2 mapping of paired-end reads Aragion Bioinformatics 2 12-20-2016 09:01 AM
Creating psuedo paired-end sequencing reads from single-end reads cburke04 Bioinformatics 6 01-14-2015 06:10 PM
Uniquely mapped reads and difference for single end and paired end reads gene_x Bioinformatics 2 01-13-2015 12:55 PM
Trimmomatic Paired End - Low number of surviving reads BADE Bioinformatics 17 10-29-2014 09:37 AM
paired-end reads mapped to genome.. gene with only one direction of paired-end reads? danwiththeplan Bioinformatics 2 09-22-2011 02:06 AM

Reply
 
Thread Tools
Old 03-28-2018, 07:15 AM   #1
AlexCalderwood
Junior Member
 
Location: Norwich

Join Date: Mar 2018
Posts: 4
Default low GC% peak in one end of paired end reads

Hi,
I have paired end RNA seq data prepared from Brassica napus using TruSeq kit. After adapter trimming, FastQC shows a second low GC% peak per sequence in the _1.fq files. The _2 files all look ok.

The low GC% reads don't align to our reference transcriptome, but after blasting a small proportion of the unaligned reads, don't appear to be contamination from another organism - (hits are mostly predicted genes for Brassicas).

The average GC content is consistent across the length of the reads.

Does anyone know what might be causing this, particularly in only one of each set of read pairs?

thanks,
Alex
Attached Images
File Type: png Screen Shot 2018-03-28 at 13.32.31.png (111.6 KB, 19 views)
AlexCalderwood is offline   Reply With Quote
Old 03-30-2018, 08:40 AM   #2
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,291
Default

RNAseq?
Tell us more about the libraries.
Are you say the forward reads show this bimodal GC distribution but the reverse reads do not? Or does "_1" and "_2" mean something else.
--
Phillip
pmiguel is offline   Reply With Quote
Old 03-30-2018, 11:36 AM   #3
AlexCalderwood
Junior Member
 
Location: Norwich

Join Date: Mar 2018
Posts: 4
Default

Hi Phillip, thanks for your attention - what would you like to know about the libraries?

Yes exactly, the forward "_1 file" reads are red and orange lines in the thumbnail, the reverse reads are the green. Some of the reverse reads samples have a slight shoulder in the low GC region, but much more minor than the _1 files.
AlexCalderwood is offline   Reply With Quote
Old 03-30-2018, 01:07 PM   #4
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,291
Default

How were the libraries constructed? What average insert size did they have? Were the libraries stranded?

--
Phillip
pmiguel is offline   Reply With Quote
Old 04-03-2018, 01:31 AM   #5
AlexCalderwood
Junior Member
 
Location: Norwich

Join Date: Mar 2018
Posts: 4
Default

They were made using "NEB next ultra directional library kit", which uses dUTP method to retain strandedness, and should give an insert size of ~200bp
AlexCalderwood is offline   Reply With Quote
Old 04-03-2018, 08:38 AM   #6
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,291
Default

Quote:
Originally Posted by AlexCalderwood View Post
They were made using "NEB next ultra directional library kit", which uses dUTP method to retain strandedness, and should give an insert size of ~200bp
Okay, then my hypothesis is that the reverse read is always reading 5' in the cDNA of the forward read. So that elevated AT% is just polyA tail. Or, since you mention hits to "predicted genes", the elevated AT% may just be 3' or 5' non-translated. (Not sure which orientation the NEB kits retain.) Nor whether a 5' or 3' bias is likely in your sequence.

The non-translated regions of plants are often replete with transposable elements which can themselves have lower GC content. Or, with time after insertion, often become reduced in GC due to cytosine methylation. That is, C deamination is easily repaired because "U's" don't belong in DNA. However, 5-me-C deaminates to "T". So, over evolutionary time, simply methylating transposable elements has a sort of slow-motion "RIPping" effect.

Just speculation on my part, of course.

--
Phillip
pmiguel is offline   Reply With Quote
Old 04-03-2018, 01:53 PM   #7
nucacidhunter
Jafar Jabbari
 
Location: Melbourne

Join Date: Jan 2013
Posts: 1,179
Default

Could you also post "Per base sequence content" plot form FastQC output.
nucacidhunter is offline   Reply With Quote
Old 04-04-2018, 03:37 AM   #8
AlexCalderwood
Junior Member
 
Location: Norwich

Join Date: Mar 2018
Posts: 4
Default

Please see attached for "per base sequence content" for one of the reverse read problem files post trimming. (Sorry, in a previous post I screwed up forward and reverse reads -> _1 is reverse, relative to mRNA)

I think the gradient of the GC lines is consistent with Phillip's idea of the AT rich 3'UTR being a factor.
Attached Images
File Type: png Screen Shot 2018-04-04 at 12.16.23.png (107.4 KB, 8 views)
AlexCalderwood is offline   Reply With Quote
Reply

Tags
fastqc, gc content, paired end sequencing, rna seq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:24 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO