View Single Post
Old 06-25-2012, 03:08 AM   #2
xied75
Senior Member
 
Location: Oxford

Join Date: Feb 2012
Posts: 129
Default

Ben,

1, When a read has multiple equally good hits, BWA use drand48() function to choose one as the output.
2, drand48() needs srand48() to init with a seed, BWA SAMPE always use a fixed number as the seed.
3, if you call drand48() with the same seed, you always got the same sequence of numbers. e.g. call 1: 1,3,4,8,9,10,12...90,92; call 2: 1,3,4,8,9,10,12...90,92. etc
4, for a BWA SAMPE run, srand48() is only called once.

Thus in your first run, you call BWA SAMPE on 6 sai files in 3 pairs, so you can see srand48() is called 3 times. But if you cat fastq first, you call BWA SAMPE on 2 sai files, srand48() is only called once, the first 1/3 will be the same, then you'll start to see difference, because the rest 2/3 is getting numbers generated by drand48() in a 'brand new' way.

It's something like:

before cat:
sampe call 1: 1,3,4,8,9,10,12...90,92,.....STOP
sampe call 2: 1,3,4,8,9,10,12...90,92,.....STOP
sampe call 3: 1,3,4,8,9,10,12...90,92,.....STOP

after cat:
sampe call 1: 1,3,4,8,9,10,12...90,92,.....(no longer STOP here and continues) ... 110,113,114,115,...

If you use a graphical file compare tool, like winmerge on windows, you'll see it.

Best,

dong
xied75 is offline   Reply With Quote