View Single Post
Old 10-03-2015, 09:11 AM   #1
gauravdube
Junior Member
 
Location: India

Join Date: Feb 2014
Posts: 7
Default Non-ATGC characters, small-case characters and lots of 'N's in fastq file's sequence

Hi All,

My fastq file consists of lot non-ATGC characters. What are these characters and how to handle these?

Commands used:
bwa index ref.fa
bwa aln -t 9 ref.fa D2_R2.fastq -f D2_R2.sai && bwa aln -t 9 ref.fa D2_R1.fastq -f D2_R1.sai
bwa sampe ref.fa D2_R1.sai D2_R2.sai D2_R1.fq D2_R2.fq > D2-aln-pe2.sam
samtools faidx ref.fa
samtools view -bt ref.fa.fai D2-aln-pe2.sam > D2-aln-pe2.bam
samtools sort D2-aln-pe2.bam D2-aln-pe2.bam.srt
samtools index D2-aln-pe2.bam.srt.bam
samtools mpileup -uf ref.fa D2-aln-pe2.bam.srt.bam | bcftools view -cg - | vcfutils.pl vcf2fq > CONSENSUS.fq


CONSENSUS.fq file looks like:
@scaffold_1
nnngtttggtggtagtattggtatttcaaacacgctaggtgtttgttggttttgagtagg
tgtagctggagtagactctatctccatttctctatcagtttgggcctctggccctaggct
ctcctgtctgttttcttgagtatttactacaatagtatcactgtctggcggcattttatt
actaagctcttttcttagtaagcaactagatggtctgtgtgtttttgttttcgtgagtga
gacgtgttcagattagctactttaccagcttctagctctatagcgcgtgggctgcacgag
ttggcactagttgtaatcgatttcttgggatggatttgtatataattcgctaaaattaca
cctattctgaaaaactcgnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnTAATGTTACAAGTAAYAAGAAGGATYCTYTCCTTRACAAATRACGAGATGGC

Please also convey, how to handle the small-case characters and 'N's ?

Thanks in advance.
gauravdube is offline   Reply With Quote