View Single Post
Old 02-22-2011, 03:30 PM   #43
themwg
Junior Member
 
Location: Madison, WI

Join Date: Jan 2011
Posts: 6
Default

I have a question or two about the mapping stage.

I'm working with datasets that consist of a contig file assembled by using both paired end and mate pair data. I'm running SSpace with that contig file against the mate pair reads for scaffolding. In my best case I have 80 million inserted pairs, 10 million single reads and 7 million pairs with pairing contigs. In other cases 25 million inserted pairs, 600k single reads and 400k pairs w/ pairing contigs.

in the first case I do end up with extensive scaffolding despite ~6% of the reads mapping. in the other cases with less than 1% reads used for mapping I get very little scaffolding. I'm a little concerned about the low level of reads mapping to my contigs. and without getting into details of my datasets (as they are different species and could be the source of the difference) I'm curious if you have any thoughts on this from the program's point of view.

Perhaps I just need some clarification of some of the terms.
#number of single reads found on contigs =
(I use an insert size of 3000bp with a std dev of .5)
regarding the mapping step, does this mean you take the 4500bp from the left and right edge of each contig to use for the mapping step or do you delete 4500 bp off each edge and just use the middle of the contigs for mapping. I assume it's the first option but you use the word "subtracted" in the readme file which is somewhat misleading.

#number of pairs found with pairing contigs =
for "pairing contigs" I get numbers that are greater than half the single reads. If SSPACE uses 10 million single reads for mapping, I would imagine that at most I could get 5 million pairs

#total pairs =
I'm unclear about what this number means. total read pairs used in mapping? if so, i'm unclear how this relates to the single reads. my understanding is that SSPACE/BOwtie takes all the read pairs that don't have Ns then maps each single read to the contigs. It then determines which of the reads are paired and what contigs those lie on etc.

any light you could shed would be greatly appreciated.. I'm fully ready to realize i'm just being dense.
themwg is offline   Reply With Quote