SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Select specific nt in the pileup file cascoamarillo Bioinformatics 2 01-10-2012 08:06 AM
pileup file format Hena Bioinformatics 0 08-03-2011 04:30 AM
pileup file annotation NM_010117 Bioinformatics 4 02-16-2011 02:49 PM
PileUp to wig file seq_GA Bioinformatics 0 01-18-2011 01:12 AM
Samtool Pileup file wuhoucdc Bioinformatics 1 08-25-2010 12:36 PM

Reply
 
Thread Tools
Old 06-17-2011, 08:05 AM   #1
Firebird
Member
 
Location: Germany

Join Date: Jun 2010
Posts: 18
Default N in pileup file

Hi,

I have sequence-information generated on a illumina sequencer. The aligment was done with ELAND and the output is a .bam file.

Now I wanted to generate a pileup-file with the mpileup feature from samtools. I used this command to perform that:
samtools mpileup -f REFERENCEFILE.fa SEUQENCES.bam > pileup.tab

The Problem is that I don't get a reference base, but only N where I expect A,T,G or G.

I also used the sort command before.
Maybe something with the faidx was wrong, because I only have 1 Lane in the .fai file.

Does someone has an answer for that.

Thanks a lot.
Firebird is offline   Reply With Quote
Old 06-17-2011, 09:49 AM   #2
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

You have to have a proper faidx file for pileup to recognize the equivalence between the reference sequence and the references named in the .bam file. The faidx file is supposed to be rather short, but if it's broken, that would explain what you are seeing.

Sometimes, I make reference files where there aren't line breaks in the middle of the sequence, but samtools faidx won't tolerate this. So if you did this, remake the reference file so there's a line break every 60 or 80 bases, or whatever, and rerun the faidx command.

The second thing to check is if the reference names in the .bam really match the reference names in your reference fasta. Spaces, or special characters may be treated differently between your aligner and samtools, so fixing the names might help.
swbarnes2 is offline   Reply With Quote
Old 06-20-2011, 01:32 AM   #3
Firebird
Member
 
Location: Germany

Join Date: Jun 2010
Posts: 18
Default

Hi,

thanks for the ideas.

I used the "view" command to have a look at the .bam file. It is written that it was alignt with a file called "chr1.fa". Therefore I changed my reference file to chr1.fa and also the fasta header to that name. Afterwards I did faidx again.
But unfortunately it didn't work and I still have N.
Firebird is offline   Reply With Quote
Old 06-20-2011, 12:16 PM   #4
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

It's not about the name of the files, it's about the name in the .sam file, and the name of each sequence in the reference multi-fasta.

You need the text after the '>' to match the text in column 3 of the .sam file.
swbarnes2 is offline   Reply With Quote
Old 07-19-2011, 08:25 AM   #5
SLB
Member
 
Location: Ireland

Join Date: Sep 2010
Posts: 21
Default

I am having this N problem also at the moment and it is driving me nuts. I have simplified the headers in my fasta file to just numbers from 1 to over 1million (each record is only 64 bases in length). I then created bowtie index and performed mapping with bowtie to map short reads back to these 'consensus tags'. I converted the generated sam file to bam and sorted. Using mpileup results in SNPs called at every base position due to the reference being treated as N.

When I try to create an indexed file from my fasta using faidx i get a .fai file. It is quite small and doesnt contain any actual sequence data just the following:

1 64 3 64 65
2 64 71 64 65
3 64 139 64 65
4 64 207 64 65
5 64 275 64 65
............................................

Anyone any ideas?
SLB is offline   Reply With Quote
Old 07-19-2011, 09:14 AM   #6
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

That's a perfectly normal faidx file, if your chromosomes are named 1,2,3,4, and 5, and each is 64 letters long.
swbarnes2 is offline   Reply With Quote
Old 07-19-2011, 10:28 AM   #7
SLB
Member
 
Location: Ireland

Join Date: Sep 2010
Posts: 21
Default

OK, I can generate an .FAI file ok from the fasta file. Any ideas on why I am getting the problem with my pileup basically treating my reference as all Ns and therefore calling SNPs at every position.

Cheers
SLB is offline   Reply With Quote
Old 07-19-2011, 11:25 AM   #8
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Generally, it means there is a mismatch between what your .bam files says each reference sequence is named, and what your reference fasta says each reference contig is named. So check first to make sure that you really are using the same reference file in the mpileup that you used in aligning.

Second, I'd try simplifying the names of the reference sequences. Maybe there are spaces, or special characters in the names of your reference sequences, and your aligner handles that differently than mpileup.
swbarnes2 is offline   Reply With Quote
Old 07-19-2011, 11:55 AM   #9
SLB
Member
 
Location: Ireland

Join Date: Sep 2010
Posts: 21
Default

Thanks for the suggestions.

This is what I initially thought the problem was after searching around on this forum, therefore I renamed my fasta reference with simple number naming scheme. This is the file I used to build my index and after mapping with bowtie and checking the 3rd column in the sam file I have the same simple numbering scheme.

I guess the fact that each fasta record in my reference file is 64bp and doesn't even fill a single line is hardly an issue.
SLB is offline   Reply With Quote
Old 08-04-2013, 11:29 PM   #10
gaoch
Junior Member
 
Location: china

Join Date: Aug 2013
Posts: 4
Default just use the latest version of samtools

Quote:
Originally Posted by Firebird View Post
Hi,

I have sequence-information generated on a illumina sequencer. The aligment was done with ELAND and the output is a .bam file.

Now I wanted to generate a pileup-file with the mpileup feature from samtools. I used this command to perform that:
samtools mpileup -f REFERENCEFILE.fa SEUQENCES.bam > pileup.tab

The Problem is that I don't get a reference base, but only N where I expect A,T,G or G.

I also used the sort command before.
Maybe something with the faidx was wrong, because I only have 1 Lane in the .fai file.

Does someone has an answer for that.

Thanks a lot.
I meet the same problem, when I update my samtools, things go to ok.
gaoch is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:43 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO