SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
PubMed: What can you do with 0.1x genome coverage? A case study based on a genome sur Newsbot! Literature Watch 1 04-11-2012 01:18 PM
Genome coverage jjk Bioinformatics 6 04-13-2011 04:32 PM
How do people query paired-end data? derobins Bioinformatics 0 12-06-2010 02:19 AM
"nucleotide coverage" to genome feature coverage sheremey Bioinformatics 3 11-02-2010 11:24 AM
Hello sequencing people! yichiao Introductions 0 09-08-2009 06:19 AM

Reply
 
Thread Tools
Old 08-12-2010, 08:12 AM   #1
dukevn
Member
 
Location: RI

Join Date: Apr 2009
Posts: 50
Default genome (%) coverage - what people usually get?

Hi all,

I am doing some mRNA analysis, and I got some numbers for genome coverage (percentage of mapping genome and whole genome length), and the numbers are quite low. I have only one lane with ~92%, where as some others are ~50%, some lanes are quite low (~20%). Are those number normal? What number do people usually get? Will these numbers be different for different sequencing such as ChIP, RNA, mRNA etc...

Thanks,

D.
dukevn is offline   Reply With Quote
Old 08-12-2010, 10:45 AM   #2
NextGenSeq
Senior Member
 
Location: USA

Join Date: Apr 2009
Posts: 482
Default

It depends on the size of the genome, the number of lanes (or octets) and the length of the sequencing run (36 bp, 72bp, 100 bp etc)
NextGenSeq is offline   Reply With Quote
Old 08-12-2010, 11:13 AM   #3
dukevn
Member
 
Location: RI

Join Date: Apr 2009
Posts: 50
Default

Quote:
Originally Posted by NextGenSeq View Post
It depends on the size of the genome, the number of lanes (or octets) and the length of the sequencing run (36 bp, 72bp, 100 bp etc)
Can you have some numbers for illustration? My dataset are Illumina 1 x 36, 2 x 36 with mouse and human samples, 7 lanes + PhiX. I really need some numbers to see how good or bad our runs are.

Thanks,

D.
dukevn is offline   Reply With Quote
Old 08-12-2010, 11:53 AM   #4
NextGenSeq
Senior Member
 
Location: USA

Join Date: Apr 2009
Posts: 482
Default

We usually get 25-30 million reads from a single lane of a flow cell. We usually run 2 x 72 bp. 80 to 90% map to the reference genome. The size of the human genome is 3.3 Gb and ~1 to 2% is coding.

If you get 92% of reads mapping thats pretty good, 20% not so good.
NextGenSeq is offline   Reply With Quote
Old 08-12-2010, 12:53 PM   #5
thinkRNA
Member
 
Location: Carlsbad,CA

Join Date: Jan 2010
Posts: 94
Default

if only ~ 1-2 % of the genome is coding then isn't 92% genome coverage a bit too high? Am I missing something?
thinkRNA is offline   Reply With Quote
Old 08-12-2010, 01:02 PM   #6
NextGenSeq
Senior Member
 
Location: USA

Join Date: Apr 2009
Posts: 482
Default

The way I read it was 20 to 92% of his reads map to his reference genome depending on the lane. If its RNA-Seq hopefully those are mapping in the coding regions.
NextGenSeq is offline   Reply With Quote
Old 08-12-2010, 01:11 PM   #7
thinkRNA
Member
 
Location: Carlsbad,CA

Join Date: Jan 2010
Posts: 94
Default

" I got some numbers for genome coverage (percentage of mapping genome and whole genome length),"
ok, I thought he was talking about percent of genome covered.
Do you know when people calcluate RPKM, the million mapped reads is only from coding regions or the whole genome?
thinkRNA is offline   Reply With Quote
Old 08-12-2010, 01:18 PM   #8
NextGenSeq
Senior Member
 
Location: USA

Join Date: Apr 2009
Posts: 482
Default

Only reads mapping on exon sequences. The RPMK value is normalized for total exon-length and the total number of matches in an experiment, in order to compare different experiments.
NextGenSeq is offline   Reply With Quote
Old 08-12-2010, 01:27 PM   #9
dukevn
Member
 
Location: RI

Join Date: Apr 2009
Posts: 50
Default

Quote:
Originally Posted by NextGenSeq View Post
The way I read it was 20 to 92% of his reads map to his reference genome depending on the lane. If its RNA-Seq hopefully those are mapping in the coding regions.
I meant percentage of mapped genome over the whole genome length, and it is mRNA-Seq. Does that mean lots of them are mapped in the coding regions, if the percentage of mapped genome is low (20-50%)?

As for the alignment score (percentage of reads mappable), we do have quite good number ranging from 80-90% for single end analysis (treated two ends like two single end runs), and about 60-70% for paired-end analysis.

Your numbers (80-90%) is paired-end alignment score or genome percentage?

Thanks,

D.
dukevn is offline   Reply With Quote
Old 08-12-2010, 01:53 PM   #10
NextGenSeq
Senior Member
 
Location: USA

Join Date: Apr 2009
Posts: 482
Default

Are you aligning to the human or mouse genome or to only coding regions?

We run mostly paired end genomic DNA and we get 80 to 90% aligning to our reference genome (which contains coding and noncoding).
NextGenSeq is offline   Reply With Quote
Old 08-12-2010, 02:07 PM   #11
dukevn
Member
 
Location: RI

Join Date: Apr 2009
Posts: 50
Default

Quote:
Originally Posted by NextGenSeq View Post
Are you aligning to the human or mouse genome or to only coding regions?
I mapped it to human (and mouse, we have both human and mouse samples) genome, and that is mRNA sample. Should I map it against the coding region only? How do I do that?
Quote:
Originally Posted by NextGenSeq View Post
We run mostly paired end genomic DNA and we get 80 to 90% aligning to our reference genome (which contains coding and noncoding).
I guess your number is fine, since your is paired-end genomic DNA, right?
dukevn is offline   Reply With Quote
Old 08-13-2010, 06:24 AM   #12
steven
Senior Member
 
Location: Southern France

Join Date: Aug 2009
Posts: 269
Default

Quote:
Originally Posted by dukevn View Post
I meant percentage of mapped genome over the whole genome length, and it is mRNA-Seq.
In this case, i would say it is quite strange to get 92% of the genome covered by RNA-seq reads. Not?
steven is offline   Reply With Quote
Old 08-13-2010, 06:32 AM   #13
dukevn
Member
 
Location: RI

Join Date: Apr 2009
Posts: 50
Default

Quote:
Originally Posted by steven View Post
In this case, i would say it is quite strange to get 92% of the genome covered by RNA-seq reads. Not?
Agreed. But I have no idea what number it should be. That is why I ask people if they do have some number for comparison.
dukevn is offline   Reply With Quote
Old 08-13-2010, 06:36 AM   #14
NextGenSeq
Senior Member
 
Location: USA

Join Date: Apr 2009
Posts: 482
Default

See
http://www.genetics.wustl.edu/bio548...Wold_NMeth.pdf

It's kinda old and only 25 bp reads but should give you a ballpark.
NextGenSeq is offline   Reply With Quote
Old 08-13-2010, 01:03 PM   #15
dukevn
Member
 
Location: RI

Join Date: Apr 2009
Posts: 50
Default

Quote:
Originally Posted by NextGenSeq View Post
See
http://www.genetics.wustl.edu/bio548...Wold_NMeth.pdf

It's kinda old and only 25 bp reads but should give you a ballpark.
Thanks! Very useful paper for my purpose. It looks like if the run is RNA-Seq (or miRNA-seq etc...), the % coverage should be really low (the paper said that > 93% of uniquely mapped reads fell into exon regions). In this case, our numbers are funny . Oh well, dont know what is happening here.
dukevn is offline   Reply With Quote
Old 08-15-2010, 09:51 AM   #16
steven
Senior Member
 
Location: Southern France

Join Date: Aug 2009
Posts: 269
Default

Quote:
Originally Posted by dukevn View Post
Thanks! Very useful paper for my purpose. It looks like if the run is RNA-Seq (or miRNA-seq etc...), the % coverage should be really low (the paper said that > 93% of uniquely mapped reads fell into exon regions). In this case, our numbers are funny . Oh well, dont know what is happening here.
It also depends on what kind of RNA you are working with (total RNA, polyA+ selected, cytoplasmic, nuclear, ribo-minus, etc). Plus this number of 93% is not a golden standard..
You may be interested in these previous threads: here and there
steven is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:09 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO