N character in Illumina fastq format

balamudiam

Junior Member

Join Date: Oct 2009

Posts: 4
- Share
- Tweet
#1

N character in Illumina fastq format

12-13-2009, 10:40 AM

Hi everyone,

I downloaded Illumina data from NCBI short read sequencing archive. I used script located http://maq.sourceforge.net/fq_all2std.pl to conver the data to fasta format. The converted fasta data is having N character including regular DNA characters like A,C,G and T.

My assembler runs only with A,C,G and T. Is the conversion correct? Is there any additional script to be run to get rid of N character.

Thanks
Bala
Tags: None
simonandrews

Simon Andrews

Join Date: May 2009

Posts: 871
- Share
- Tweet
#2

12-14-2009, 12:53 AM

You can certainly get N in sequences from the Illumina pipeline. If your assembler can't handle those then you'll either need to trim your sequences at the first N, convert all Ns to an arbitrary normal base, or just discard the sequences which contain Ns. I don't know if there are scripts out there already which will do this, but any of those options should be pretty easy to implement.
Comment

Previous template Next

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 21 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad