SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bowtie: How to retain only uniquely mapped reads? EstherKLather Bioinformatics 10 03-24-2018 11:39 PM
retain multihit reads for CNV analysis jorge Bioinformatics 0 08-30-2011 12:41 AM
bfast for unmapped reads Protaeus Bioinformatics 2 11-17-2010 02:35 PM
What are the unmapped reads beelu Illumina/Solexa 1 09-09-2010 05:18 AM
% unmapped reads bioinfosm Illumina/Solexa 8 07-05-2010 12:36 AM

Reply
 
Thread Tools
Old 01-12-2012, 06:43 PM   #1
MeganS
Member
 
Location: US

Join Date: Sep 2010
Posts: 14
Default why retain unmapped reads?

I have heard that it is important for downstream analyses to retain unmapped reads. I am interested to know the reason for this recommendation.

Specifically, I am using BWA + GATK to call SNPs from Illumina data. It is not clear to me if the GATK SNP calling pipeline ever utilizes unmapped reads. We expect a large proportion of unmapped reads, so we could save a lot of disk space by getting rid of them.
MeganS is offline   Reply With Quote
Old 01-12-2012, 08:06 PM   #2
adaptivegenome
Super Moderator
 
Location: US

Join Date: Nov 2009
Posts: 437
Default

You don't *have* to retain unmapped reads if you are calling SNPs and especially if you are archiving the original FASTQ files you could remove unmapped reads from the BAMs...

Last edited by adaptivegenome; 01-12-2012 at 08:07 PM. Reason: typo
adaptivegenome is offline   Reply With Quote
Old 01-13-2012, 05:23 AM   #3
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

If you want to call structural variants at some point, you will probably want to keep the unmapped reads as they could cover breakpoints that prevent them from aligning.

However, if you only want to call SNPs and you are guaranteed not to care about calling anything else, then I agree with genericforms.
Heisman is offline   Reply With Quote
Old 01-13-2012, 06:18 AM   #4
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

So that others can re-run the data. It tells others what the real source data is. There's other information in the unmappeds: often viral or bacterial sequences that may be of interest (i.e. the sample has herpesvirae).

A classic example is a paired end rna seq. One read pair may not map but you still need it to do paired end processing; aligners require the two pairs to be there. Improvements in alignment software with something as tricky as rna alignment are likely (someday). Another case might be a very wacky indel. Trying to align all the reads to a small area or alternate genome build using different software might provide insight.
Richard Finney is offline   Reply With Quote
Old 01-13-2012, 09:45 AM   #5
Zam
Member
 
Location: Oxford

Join Date: Apr 2010
Posts: 51
Default Depends what you want

If there are unmapped reads, either the mapper has made a mistake, the reference has gaps, or the sample is different from the reference in some way that the mapper cannot compensate for. The differences may be structural variants, repeats, paralogues of genes, duplications of regions, etc.

If you want a set of conservative SNPs and you don't care about accessing all variation, then that's fine, you don't care about those problematic parts of the genome.

If you have some phenotype you are investigating, or you want a complete/sensitive set of variants, then you may be concerned about missing SNPs or more complex variants. In that case you want to keep the unmapped reads to do stuff with them (count them, or assemble the unampped reads, or assemble ALL the reads, or use paired-ends to detect structural variants, etc)
Zam is offline   Reply With Quote
Old 01-13-2012, 08:42 PM   #6
aeonsim
Member
 
Location: Belgium

Join Date: Jun 2011
Posts: 45
Default

You can also use the reads to look for potential contamination. Throw them into an assemblier and blast the bigger contigs you get out. If you see decent size contigs for a viral or bacterial species you man want to add a contamination filter step to the begining of the mapping pipeline and see how that changes your results.
aeonsim is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:54 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO