View Single Post
Old 02-23-2012, 09:19 PM   #6
Senior Member
Location: Pathum Thani, Thailand

Join Date: Nov 2009
Posts: 190

I would recommend Newbler since it has been specifically designed for 454 data.
I am assuming that by mapping the reads back you are trying to get read counts per contig/isotig/isogroup yes?

If you use newbler you can get read counts per contig from the 454ReadStatus.txt file that is produced when you perform a transcriptome assembly. Just do a grep for 'Assembled' and count the number of times each contig appears, if you have different samples in different lanes you can do the appropriate grep to subset them also. This file lists the 3` and 5` match of each read so you effectively count each read twice. I don't think that is a problem since the reads are generally pretty long to begin with. This method means that some contigs may have a zero or low read count, but it does count every read so that should not be a problem after you sum the read counts of contigs to form read counts per isotig.

Alternatively you can grep 'Assembled', and make a subset of the assembled reads and then map them back to your contigs using GSMapper. I recommend only using reads with the assembled status to minimise false mapping. I use mapping for SNP deiscovery also, so I set -ais 1 which means that the mapped read needs to be a very good match.

Last edited by Jeremy; 02-23-2012 at 09:22 PM.
Jeremy is offline   Reply With Quote