Seqanswers Leaderboard Ad

**swbarnes2** · 06-17-2011, 08:49 AM

You have to have a proper faidx file for pileup to recognize the equivalence between the reference sequence and the references named in the .bam file. The faidx file is supposed to be rather short, but if it's broken, that would explain what you are seeing.

Sometimes, I make reference files where there aren't line breaks in the middle of the sequence, but samtools faidx won't tolerate this. So if you did this, remake the reference file so there's a line break every 60 or 80 bases, or whatever, and rerun the faidx command.

The second thing to check is if the reference names in the .bam really match the reference names in your reference fasta. Spaces, or special characters may be treated differently between your aligner and samtools, so fixing the names might help.

**Firebird** · 06-20-2011, 12:32 AM

Hi,

thanks for the ideas.

I used the "view" command to have a look at the .bam file. It is written that it was alignt with a file called "chr1.fa". Therefore I changed my reference file to chr1.fa and also the fasta header to that name. Afterwards I did faidx again.
But unfortunately it didn't work and I still have N.

**swbarnes2** · 06-20-2011, 11:16 AM

It's not about the name of the files, it's about the name in the .sam file, and the name of each sequence in the reference multi-fasta.

You need the text after the '>' to match the text in column 3 of the .sam file.

**SLB** · 07-19-2011, 07:25 AM

I am having this N problem also at the moment and it is driving me nuts. I have simplified the headers in my fasta file to just numbers from 1 to over 1million (each record is only 64 bases in length). I then created bowtie index and performed mapping with bowtie to map short reads back to these 'consensus tags'. I converted the generated sam file to bam and sorted. Using mpileup results in SNPs called at every base position due to the reference being treated as N.

When I try to create an indexed file from my fasta using faidx i get a .fai file. It is quite small and doesnt contain any actual sequence data just the following:

1 64 3 64 65
2 64 71 64 65
3 64 139 64 65
4 64 207 64 65
5 64 275 64 65
............................................

Anyone any ideas?

**swbarnes2** · 07-19-2011, 08:14 AM

That's a perfectly normal faidx file, if your chromosomes are named 1,2,3,4, and 5, and each is 64 letters long.

**SLB** · 07-19-2011, 09:28 AM

OK, I can generate an .FAI file ok from the fasta file. Any ideas on why I am getting the problem with my pileup basically treating my reference as all Ns and therefore calling SNPs at every position.

Cheers

**swbarnes2** · 07-19-2011, 10:25 AM

Generally, it means there is a mismatch between what your .bam files says each reference sequence is named, and what your reference fasta says each reference contig is named. So check first to make sure that you really are using the same reference file in the mpileup that you used in aligning.

Second, I'd try simplifying the names of the reference sequences. Maybe there are spaces, or special characters in the names of your reference sequences, and your aligner handles that differently than mpileup.

**SLB** · 07-19-2011, 10:55 AM

Thanks for the suggestions.

This is what I initially thought the problem was after searching around on this forum, therefore I renamed my fasta reference with simple number naming scheme. This is the file I used to build my index and after mapping with bowtie and checking the 3rd column in the sam file I have the same simple numbering scheme.

I guess the fact that each fasta record in my reference file is 64bp and doesn't even fill a single line is hardly an issue.

**gaoch** · 08-04-2013, 10:29 PM

just use the latest version of samtools

Originally posted by Firebird View Post

Hi,

I have sequence-information generated on a illumina sequencer. The aligment was done with ELAND and the output is a .bam file.

Now I wanted to generate a pileup-file with the mpileup feature from samtools. I used this command to perform that:
samtools mpileup -f REFERENCEFILE.fa SEUQENCES.bam > pileup.tab

The Problem is that I don't get a reference base, but only N where I expect A,T,G or G.

I also used the sort command before.
Maybe something with the faidx was wrong, because I only have 1 Lane in the .fai file.

Does someone has an answer for that.

Thanks a lot.

I meet the same problem, when I update my samtools, things go to ok.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

N in pileup file

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News