SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
biais in GC content with illumina benR De novo discovery 4 05-12-2013 03:28 PM
What's the effect of large difference of library size by RNA-seq ? yeyeming RNA Sequencing 8 03-15-2013 05:33 AM
CG content and Illumina Sequencing David [R] RNA Sequencing 2 07-20-2012 06:39 AM
bowtie command line for Illumina Hiseq 2000 with Illumina 1.5+ quality encoding files rworthi Illumina/Solexa 4 09-28-2011 11:25 AM

Reply
 
Thread Tools
Old 12-03-2013, 08:07 PM   #1
lovenlong
Member
 
Location: Guangzhou, CN

Join Date: Jan 2013
Posts: 16
Default Large difference (5%) in GC content of illumina readsets derived from the same line.

Hi all,

I have four illumina 101PE datasets in which sequenced samples were derived from an indica rice line.
Their relationships were as follows:
Wild-type
Generation_1 --> selfing --> Generation_2
Mutagenized-type
Generation_1 --> mutagenize --> Mutant_1 --> selfing --> Mutant_2

The pooled of G_1, G_2, M_1 and M_2 were sequenced, respectively.
However, they were not sequenced in one experiment simutaneously. The G_1 and M_1 together, the G_2 and M_2 together were sequenced in two experiments respectively.

Theoretically, these four samples could be no significant differences in genetic consists, and even can be counted as the same variety.
But my problem comes out here, they have large GC% content discrepancies:
G_1, 44.08%; M_1, 44.51%;
G_2, 38.61%; M_2, 40.61%.

As far as I can see, the gc% of my samples should be 42%~44% according to Nipponbare reference genome (43.7%).
Later, I found the GC% discrepancies still been there in my BAM files (generated by using preprocessed reads mapped to reference respectively) .

Is these datasets normal or reasonable?
What factors can be responsible for the GC% discrepancies ?
Is there anyone can give me some suggestions?

Thanks a bunch!

Last edited by lovenlong; 12-03-2013 at 08:25 PM.
lovenlong is offline   Reply With Quote
Old 12-03-2013, 10:18 PM   #2
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 521
Default

Any difference in the insert size of the libraries? The GC bias in PCR amplification might be different depending on insert size.
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Old 12-04-2013, 02:23 AM   #3
lovenlong
Member
 
Location: Guangzhou, CN

Join Date: Jan 2013
Posts: 16
Default

Quote:
Originally Posted by SNPsaurus View Post
Any difference in the insert size of the libraries? The GC bias in PCR amplification might be different depending on insert size.
The insert sizes of libraries for my samples were nearly the same:
G_1, 26066;
G_2, 28059;
M_1, 26762;
M_2, 28348.

Additionally, my samples would never be the reason to the GC% differences.
All individuals for pooled were derived from one pure rice line selfing at least 20 generations, and all were check with 24 SSR markers before selected for pooling.

Thanks.
lovenlong is offline   Reply With Quote
Old 12-04-2013, 03:13 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,060
Default

Have you asked your sequence provider if there was anything peculiar about the two sequencing runs as far as base signal intensities (assume this is illumina sequencing) were concerned?
GenoMax is offline   Reply With Quote
Old 12-05-2013, 12:27 AM   #5
lovenlong
Member
 
Location: Guangzhou, CN

Join Date: Jan 2013
Posts: 16
Default

Quote:
Originally Posted by GenoMax View Post
Have you asked your sequence provider if there was anything peculiar about the two sequencing runs as far as base signal intensities (assume this is illumina sequencing) were concerned?
Hi,

I'm waiting for their reply now. But seems that they have not yet met this kind thing before.

I'm wondering if the GC% discrepancy can be happened on the PhiX174 control in different Hiseq2000 sequencing experiments.

In the evalutation paper reported by Minoche et al (Genome Biology 2011, 12:R112, doi:10.1186/gb-2011-12-11-r112), they found GC% higher than expected in Hiseq datasets:
"The GC content of the unfiltered HiSeq reads was higher than expected: 40% for Bv + PhiX data and 45.5% for At + PhiX. The B. vulgaris reference sequence has a %GC of 35% [8] and that of the A. thaliana genome is 36% (calculated from TAIR10 [9]). The fraction of PhiX reads (44.7% GC) accounts for only 1 to 2% of the data. For the PhiX sample sequenced on the GAIIx the %GC of 45.7% is much closer to the expected value of 44.7%."

This looks really strange.

Thanks!

Last edited by lovenlong; 12-05-2013 at 12:32 AM.
lovenlong is offline   Reply With Quote
Old 12-05-2013, 03:18 PM   #6
Melissa
Senior Member
 
Location: Switzerland

Join Date: Aug 2008
Posts: 124
Default

The datasets make sense to me because you're not sequencing from all the same individuals. Sequencing bias is also a good explanation. M1 is most similar to G1 because mutation doesn't affect many sites in the genome. I think somaclonal variation during tissue culture stage can also affect genome content. As for G2 and M2, I would expect the genome to change due to selfing (more homozygous). This doesn't necessarily means higher GC% as observed in your case.
Melissa is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:51 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO