Hi Community:
I am doing DNA-seq (paired-ends) of a eudicot plant. FastaQC shows "per base sequence content" graphics (one per each "paired-end" fastaq file) that puzzle me. As you see, %A and %T are very different and %C and %G are also different. On the other side, the two graphics are in a sense "complementary" i.e. A1+A2 ~= T1+T2 and C1+C2 ~= G1+G2.
Additionally, I have counted the nucleotides in both entire "paired-end" files, using only the reads that aligned after BWA. Please, compare it with the same figures in the complete genome:
1_1 reads (fastaq file):
A 1454788765 32.45%
T 1007930585 22.48%
C 875193373 19.52%
G 1109913625 24.76%
1_2 reads (fastaq file):
A 1051644195 23.46%
T 1422158391 31.72%
C 1075609393 23.99%
G 896848970 20.01%
Complete genome (deposited fasta file):
A 153891618 31.65%
T 153909273 31.65%
C 81246927 16.71%
G 81219831 16.70%
Other previous DNA-seq paired-end experiment showed A=T and C=G in each single file (see attached image num. 3). It seems to me the logical result, as I presume each subset of files contain a not-biased set of reads.
Have you seen this kind of results before? Have you an explanation of both: a) the umbalance between the complementary bases and b) the complementarity of the results of both paired-end files?
Thank you in advance!
Rafael
I am doing DNA-seq (paired-ends) of a eudicot plant. FastaQC shows "per base sequence content" graphics (one per each "paired-end" fastaq file) that puzzle me. As you see, %A and %T are very different and %C and %G are also different. On the other side, the two graphics are in a sense "complementary" i.e. A1+A2 ~= T1+T2 and C1+C2 ~= G1+G2.
Additionally, I have counted the nucleotides in both entire "paired-end" files, using only the reads that aligned after BWA. Please, compare it with the same figures in the complete genome:
1_1 reads (fastaq file):
A 1454788765 32.45%
T 1007930585 22.48%
C 875193373 19.52%
G 1109913625 24.76%
1_2 reads (fastaq file):
A 1051644195 23.46%
T 1422158391 31.72%
C 1075609393 23.99%
G 896848970 20.01%
Complete genome (deposited fasta file):
A 153891618 31.65%
T 153909273 31.65%
C 81246927 16.71%
G 81219831 16.70%
Other previous DNA-seq paired-end experiment showed A=T and C=G in each single file (see attached image num. 3). It seems to me the logical result, as I presume each subset of files contain a not-biased set of reads.
Have you seen this kind of results before? Have you an explanation of both: a) the umbalance between the complementary bases and b) the complementarity of the results of both paired-end files?
Thank you in advance!
Rafael
Comment