Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
How to calculate coverage arendon Bioinformatics 53 08-20-2015 08:23 AM
Hot to calculate coverage on a 454 targeted region sequenced ? Giorgio C Bioinformatics 9 10-23-2012 02:06 PM
How to calculate the sequencing coverage from bioscope result coonya SOLiD 0 12-28-2010 11:14 PM
PubMed: A Window into Third Generation Sequencing. Newsbot! Literature Watch 0 09-23-2010 03:00 AM
Slider - Maximum use of probability information for alignment of short sequence reads ECO Bioinformatics 17 09-21-2010 05:35 PM

Thread Tools
Old 04-03-2012, 09:12 AM   #1
Junior Member
Location: Boston, MA

Join Date: Apr 2012
Posts: 2
Default How do I calculate the probability of sequence coverage at particular window?

Hi all,

A newbie question:

How do I calculate the probability of a random set of sequences (at a specified length, say short reads of 25bp) aligning to a set window length (say 10kb)? Essentially, I'd like to know the sequence coverage probability along a specified length of DNA.

I'd like to use this sequence coverage probability to test whether what I see (for example, say I see 3 reads within a particular 10kb window) is truly significant or aligned by random chance.

Please let me know your thoughts and whether this is a valid question to ask in the first place.

vkpilla is offline   Reply With Quote
Old 04-03-2012, 11:12 AM   #2
Simon Anders
Senior Member
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994

If I understand your question correctly, you want to calculate the probability that a read with a completely random sequence aligns to a place in your genome by chance. The answer is usually: zero.

There are 4^25=1.1e15 possible 25-bp reads. The human genome has 3e10 base pairs, hence the probability of a random 25-mer actually occurring in the human genome is roughly 3e10/1e15=2e-6.

Hence, if you see a read aligning somewhere, it has almost certainly been amplified from a real biological template, i.e., it is either from the sample or from contamination.

Contrary to popular belief, there is no such thing as alignment noise in high-throughput sequencing.

Last edited by Simon Anders; 04-03-2012 at 11:13 AM. Reason: corrected distorting grammer mistake
Simon Anders is offline   Reply With Quote
Old 10-22-2013, 03:58 AM   #3
Location: UK

Join Date: Sep 2012
Posts: 61

Hello guys

I am having almost the same problem, and i am confused of how to calculate the probability, here is my issue:

I have sequenced a PCR fragment of 2kb from original reference sequence of 7.5kb. I used illumina HiSeq paired-ends technology to generate 5 million 80bp reads with a coverage of x30 as I am looking for a recombination event between two serotypes of viruses, which is rare event.

I know that the event occurs by 1%, so I expect to find 1% of the reads represent the recombination event. Among these reads there are some reads which are going to span the junction point of the Recombinants, and therefore not aligned to any of the reference sequences. I want to calculate the probability of the coverage of those reads which span the junction point to calculate the error rate between the expected and the observed.

Need help!

Many thanks
Fad2012 is offline   Reply With Quote
Old 10-22-2013, 04:13 AM   #4
Location: UK

Join Date: Sep 2012
Posts: 61

Hi again

I forgot to add that the junction point could be occur at any nucleotide lies in the 2kb fragment.

Thanks a lot
Fad2012 is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 11:20 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO