Seqanswers Leaderboard Ad

**adaptivegenome** · 01-12-2012, 09:06 PM

You don't *have* to retain unmapped reads if you are calling SNPs and especially if you are archiving the original FASTQ files you could remove unmapped reads from the BAMs...

**Heisman** · 01-13-2012, 06:23 AM

If you want to call structural variants at some point, you will probably want to keep the unmapped reads as they could cover breakpoints that prevent them from aligning.

However, if you only want to call SNPs and you are guaranteed not to care about calling anything else, then I agree with genericforms.

**Richard Finney** · 01-13-2012, 07:18 AM

So that others can re-run the data. It tells others what the real source data is. There's other information in the unmappeds: often viral or bacterial sequences that may be of interest (i.e. the sample has herpesvirae).

A classic example is a paired end rna seq. One read pair may not map but you still need it to do paired end processing; aligners require the two pairs to be there. Improvements in alignment software with something as tricky as rna alignment are likely (someday). Another case might be a very wacky indel. Trying to align all the reads to a small area or alternate genome build using different software might provide insight.

**Zam** · 01-13-2012, 10:45 AM

Depends what you want

If there are unmapped reads, either the mapper has made a mistake, the reference has gaps, or the sample is different from the reference in some way that the mapper cannot compensate for. The differences may be structural variants, repeats, paralogues of genes, duplications of regions, etc.

If you want a set of conservative SNPs and you don't care about accessing all variation, then that's fine, you don't care about those problematic parts of the genome.

If you have some phenotype you are investigating, or you want a complete/sensitive set of variants, then you may be concerned about missing SNPs or more complex variants. In that case you want to keep the unmapped reads to do stuff with them (count them, or assemble the unampped reads, or assemble ALL the reads, or use paired-ends to detect structural variants, etc)

**aeonsim** · 01-13-2012, 09:42 PM

You can also use the reads to look for potential contamination. Throw them into an assemblier and blast the bigger contigs you get out. If you see decent size contigs for a viral or bacterial species you man want to add a contamination filter step to the begining of the mapping pipeline and see how that changes your results.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

why retain unmapped reads?

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News