View Single Post
Old 08-27-2012, 01:26 AM   #2
mbayer
Member
 
Location: Dundee, Scotland

Join Date: Mar 2009
Posts: 29
Default

Hi rndouglas,

it sounds like SeqTrimMap is suppressing anything that isn't small RNA.

Here is what I would do:

1. Convert the SAM file that SeqTrimMap produced to BAM format.

2. Extract the unmapped reads from this. You can do that with samtools like so:

samtools view -f4 sample.bam > sample.unmapped.sam

3. Extract the unmapped reads from the sam file you just created and convert these into FASTQ format. I don't know if there is a tool for this so you may have to script it yourself.

4. Map those reads with another mapping tool that will map regardless of length, e.g. Bowtie, BWA etc. Convert your output into BAM if it isn't that already.

5. Merge the two BAM files with samtools merge (see http://samtools.sourceforge.net/samtools.shtml) using the -r flag. This will produce a single BAM file which contains both the shorter and the longer reads, but they will be distinguishable because samtools merge adds a read group tag to each read. This will give you a trail of where each read came from.

6. You can visualise this quite nicely in Tablet (http://bioinf.scri.ac.uk/tablet/). It includes a "colour by read group" colouring scheme which colours each read according to which read group it belongs to. If you select a contig and then click on the 4th tab in the left hand panel you will see a list of the read groups. The check boxes allow you to switch colour for each read group on and off individually so you can grey out one lot and make the other one stand out.

Hope this helps.

Micha
mbayer is offline   Reply With Quote