SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   fastqc kmer relative enrichment (http://seqanswers.com/forums/showthread.php?t=32762)

mattanswers 08-13-2013 10:26 AM

fastqc kmer relative enrichment
 
1 Attachment(s)
I was wondering how to interpret the Kmer content graph in fastqc. I have attached it. It seems from my graph that 50% of all reads have the 6 kmers listed ? Is this a normal graph ?

GenoMax 08-13-2013 11:55 AM

Is this RNA-seq data?

Following thread may be useful (though it refers to a MiSeq run the issue is applicable to illumina sequencing in general): http://seqanswers.com/forums/showthread.php?t=30448

One more: http://seqanswers.com/forums/showthread.php?t=17219

mattanswers 08-13-2013 12:18 PM

Thank you very much, GenoMax, for the links. They are very informative.

This is RNA-Seq. It seems the first 12 or so bases are due to 'random' priming, but I was also wondering about why the lines on the graph stay up at ~50% for the length of the graph ? Random priming would explain the first 12 or so bases, but why the steady % for the rest of the sequence ?

GenoMax 08-13-2013 02:48 PM

Quote:

Originally Posted by mattanswers (Post 113397)
Thank you very much, GenoMax, for the links. They are very informative.

This is RNA-Seq. It seems the first 12 or so bases are due to 'random' priming, but I was also wondering about why the lines on the graph stay up at ~50% for the length of the graph ? Random priming would explain the first 12 or so bases, but why the steady % for the rest of the sequence ?

http://www.bioinformatics.babraham.a...d%20Kmers.html

mattanswers 08-14-2013 11:20 AM

Thanks again for your help, GenoMax.

My sequence length is only 50 bases and the quality is very good.

From what I read on the linked site, it seems that I have 6 kmers that are 50-fold enriched throughout the length of my sequence. But what does this mean in terms of sample quality ?

If I have 25-30 million reads and there is a 50 fold enrichment of these kmers (most likely I would guess from the adaptor) then how many sequences does that affect ? So, if there were 100,000 sequences in which had adaptor sequence at various positions other than the end of the sequence what would the fold-enrichment be ? 100,000 affected sequences may be enough to make the fold-enrichment high, but they are only a small percentage of the total. On the other hand, if I had a much smaller number of total sequences, then the fold-enrichment may be a problem. So, I guess I want to know how to relate fold-enrichment and total number of sequences in order to tell if the fold-enrichment is a problem or just from an insignificant part of the total.


All times are GMT -8. The time now is 04:05 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.