SEQanswers (
-   Bioinformatics (
-   -   How to handle Ns in the middle of reads (

themysticgeek 01-07-2014 11:07 PM

How to handle Ns in the middle of reads
For my illumina data fastqc shows presence of N's at positions 13,14,15 in 101 bp longs reads. If i go for cropping first 15 bases by using trimmomatic, it solves the problem but i lose a lot of data. I wanted to know that if i retain the N's what sort of problems would they cause during alignment(bwa+stampy)/variant calling(unified genotyper) and how can i handle these problems?

If any body faced a similar problem how did you handle it? Similar questions asked on different forums but none answered:(. Could not find a resourse on how variant calling programs handle N's. Do they ignore them? Or consider them as a variation with low confidence scores?

GenoMax 01-08-2014 03:04 AM

Are there N's at those positions in *all* reads? That would almost certainly indicate a technical problem of some kind with this run. In general your sequence provider should not have released this data if that is the case.

themysticgeek 01-08-2014 04:42 AM

The N's are in ~50% of the reads. I have attached the Fasqc image for per base n content. This is particular to this sequening run. Did not observe this problem in the other runs:(

benjaminsb 01-20-2014 04:25 AM

Similar problem
I've just run FastQC on a published RNAseq dataset (SRX294957) and I see a very similar pattern:

where on position 21-23, 80% of reads are N's. As I'm only interested in expression, I could accept the low read quality as long as the aligner accepts it. I'm using RSEM+bowtie for the purpose, so I'm wondering if bowtie will match NNN against anything in the reference?

dpryan 01-20-2014 04:27 AM

It will. You can alter the mismatch score due to an N and the maximum number of allowed Ns if you need to (you'll likely need to tweak the minimum allowable score if you do so).

All times are GMT -8. The time now is 10:24 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.