Dear Forum,
Does anyone have information about a G vs C bias? I am analysing 1000 genomes data based on raw fastq files ~5Gb in size. For other reasons I counted the occurrence of each base and found, to my surprise, that 'C's outnumber 'G's by about 10% (though the size of the bias varies between sequencing centres). To guard against artefacts, I then counted G-C frequencies for each base position in each read (I only look at forward runs, so have bases 1 to 100). I see that the G vs C bias varies with base number, being very variable for the first and last 5 bases and otherwise generally rising (= more 'C's) along the read.
Is this a well-known phenomenon? Any thoughts gratefully received!
Cheers
Bill
Does anyone have information about a G vs C bias? I am analysing 1000 genomes data based on raw fastq files ~5Gb in size. For other reasons I counted the occurrence of each base and found, to my surprise, that 'C's outnumber 'G's by about 10% (though the size of the bias varies between sequencing centres). To guard against artefacts, I then counted G-C frequencies for each base position in each read (I only look at forward runs, so have bases 1 to 100). I see that the G vs C bias varies with base number, being very variable for the first and last 5 bases and otherwise generally rising (= more 'C's) along the read.
Is this a well-known phenomenon? Any thoughts gratefully received!
Cheers
Bill
Comment