Hi all,
I have four illumina 101PE datasets in which sequenced samples were derived from an indica rice line.
Their relationships were as follows:
Wild-type
Generation_1 --> selfing --> Generation_2
Mutagenized-type
Generation_1 --> mutagenize --> Mutant_1 --> selfing --> Mutant_2
The pooled of G_1, G_2, M_1 and M_2 were sequenced, respectively.
However, they were not sequenced in one experiment simutaneously. The G_1 and M_1 together, the G_2 and M_2 together were sequenced in two experiments respectively.
Theoretically, these four samples could be no significant differences in genetic consists, and even can be counted as the same variety.
But my problem comes out here, they have large GC% content discrepancies:
G_1, 44.08%; M_1, 44.51%;
G_2, 38.61%; M_2, 40.61%.
As far as I can see, the gc% of my samples should be 42%~44% according to Nipponbare reference genome (43.7%).
Later, I found the GC% discrepancies still been there in my BAM files (generated by using preprocessed reads mapped to reference respectively) .
Is these datasets normal or reasonable?
What factors can be responsible for the GC% discrepancies ?
Is there anyone can give me some suggestions?
Thanks a bunch!
I have four illumina 101PE datasets in which sequenced samples were derived from an indica rice line.
Their relationships were as follows:
Wild-type
Generation_1 --> selfing --> Generation_2
Mutagenized-type
Generation_1 --> mutagenize --> Mutant_1 --> selfing --> Mutant_2
The pooled of G_1, G_2, M_1 and M_2 were sequenced, respectively.
However, they were not sequenced in one experiment simutaneously. The G_1 and M_1 together, the G_2 and M_2 together were sequenced in two experiments respectively.
Theoretically, these four samples could be no significant differences in genetic consists, and even can be counted as the same variety.
But my problem comes out here, they have large GC% content discrepancies:
G_1, 44.08%; M_1, 44.51%;
G_2, 38.61%; M_2, 40.61%.
As far as I can see, the gc% of my samples should be 42%~44% according to Nipponbare reference genome (43.7%).
Later, I found the GC% discrepancies still been there in my BAM files (generated by using preprocessed reads mapped to reference respectively) .
Is these datasets normal or reasonable?
What factors can be responsible for the GC% discrepancies ?
Is there anyone can give me some suggestions?
Thanks a bunch!
Comment