SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Samtools "is recognized as '*'" "truncated file" error axiom7 Bioinformatics 3 11-26-2014 02:53 AM
"refine gapped alignments... Segmentation fault" while using bwa sampe dd_genome Bioinformatics 3 01-03-2013 10:42 AM
strategy to convert low GQ calls to "no call" manducasexta Bioinformatics 0 08-27-2012 08:49 AM
Specifying the region with "samtools mpileup -r " changes the output calls vplagnol Bioinformatics 0 06-07-2012 04:26 AM
How to call "gaps" from spliced read alignment? sulicon Bioinformatics 3 06-09-2011 08:28 AM

Reply
 
Thread Tools
Old 01-04-2013, 08:59 AM   #1
TabeaK
Member
 
Location: Germany

Join Date: Oct 2012
Posts: 48
Default Samtools mpileup - why does it call an "N" when there ARE correct alignments?

Hi all!

Happy 2013!

I am having a bit of an issue with generating consensus sequences from an Alignment BAM file.

The alignment was done with Stampy. It is a mix of single end and paired end data. Looking at the BAM in IGV shows me I have good coverage across the board and reasonable mapping qualites.

I then do the routine samtools mpileup (using -u -f -r paramters; also tried the -A); then bcftools view (- c -g ); followed by conversion to fastq and ultimately fasta.

In my consensus fasta however; I have an inordinate amount of Ns; even in positions; where I can clearly see correct alignment/mapping.

Does anyone know when mpileup is calling an N and what I might be doing wrong?

Funnily enough; I have aligned the exact same data with BWA and done the same process; even though BWA maps less reads properly; with lower mapping quality and coverage; calling the consensus only spits out Ns if there is either no coverage at a given position or there is already an N in the reference.

So, why is this not working from my stampy-generated BAM?
Argh!

Any other recommendations for tools to generate consensus sequences?
TabeaK is offline   Reply With Quote
Old 01-07-2013, 12:08 AM   #2
TabeaK
Member
 
Location: Germany

Join Date: Oct 2012
Posts: 48
Default

Bumpety-bump.
TabeaK is offline   Reply With Quote
Old 01-07-2013, 06:04 AM   #3
TabeaK
Member
 
Location: Germany

Join Date: Oct 2012
Posts: 48
Default

Ha! I think this is the problem: http://seqanswers.com/forums/showthread.php?t=17486

I also get only 'N' in the ref column...no I just need to figure out what the hell is going on.
TabeaK is offline   Reply With Quote
Old 01-07-2013, 08:33 AM   #4
TabeaK
Member
 
Location: Germany

Join Date: Oct 2012
Posts: 48
Default

Okay, problem solved...

This little thing:

11.2 Parsing NCBI fasta files

NCBI fasta files use a >gi|nnn|ref|xxx identifier. By default
Stampy parses this and uses only the "xxx" part. Use --noparseNCBI
to switch off this behaviour and use the full NCBI identifier.


from the stampy manual broke my neck. In the BAM file created with Stampy; the identifiers were parse. While creating the fasta index; however samtools; DOES NOT parse. Ergo the identifers did not match.

I had to do some rather elaborate messing around with the reference fasta to get the strings to match...*sigh*
All good in the end....just in case anyone runs into this problem in the future.
TabeaK is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:55 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO