![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Select specific nt in the pileup file | cascoamarillo | Bioinformatics | 2 | 01-10-2012 08:06 AM |
pileup file format | Hena | Bioinformatics | 0 | 08-03-2011 04:30 AM |
pileup file annotation | NM_010117 | Bioinformatics | 4 | 02-16-2011 02:49 PM |
PileUp to wig file | seq_GA | Bioinformatics | 0 | 01-18-2011 01:12 AM |
Samtool Pileup file | wuhoucdc | Bioinformatics | 1 | 08-25-2010 12:36 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Germany Join Date: Jun 2010
Posts: 18
|
![]()
Hi,
I have sequence-information generated on a illumina sequencer. The aligment was done with ELAND and the output is a .bam file. Now I wanted to generate a pileup-file with the mpileup feature from samtools. I used this command to perform that: samtools mpileup -f REFERENCEFILE.fa SEUQENCES.bam > pileup.tab The Problem is that I don't get a reference base, but only N where I expect A,T,G or G. I also used the sort command before. Maybe something with the faidx was wrong, because I only have 1 Lane in the .fai file. Does someone has an answer for that. Thanks a lot. |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
You have to have a proper faidx file for pileup to recognize the equivalence between the reference sequence and the references named in the .bam file. The faidx file is supposed to be rather short, but if it's broken, that would explain what you are seeing.
Sometimes, I make reference files where there aren't line breaks in the middle of the sequence, but samtools faidx won't tolerate this. So if you did this, remake the reference file so there's a line break every 60 or 80 bases, or whatever, and rerun the faidx command. The second thing to check is if the reference names in the .bam really match the reference names in your reference fasta. Spaces, or special characters may be treated differently between your aligner and samtools, so fixing the names might help. |
![]() |
![]() |
![]() |
#3 |
Member
Location: Germany Join Date: Jun 2010
Posts: 18
|
![]()
Hi,
thanks for the ideas. I used the "view" command to have a look at the .bam file. It is written that it was alignt with a file called "chr1.fa". Therefore I changed my reference file to chr1.fa and also the fasta header to that name. Afterwards I did faidx again. But unfortunately it didn't work and I still have N. |
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
It's not about the name of the files, it's about the name in the .sam file, and the name of each sequence in the reference multi-fasta.
You need the text after the '>' to match the text in column 3 of the .sam file. |
![]() |
![]() |
![]() |
#5 |
Member
Location: Ireland Join Date: Sep 2010
Posts: 21
|
![]()
I am having this N problem also at the moment and it is driving me nuts. I have simplified the headers in my fasta file to just numbers from 1 to over 1million (each record is only 64 bases in length). I then created bowtie index and performed mapping with bowtie to map short reads back to these 'consensus tags'. I converted the generated sam file to bam and sorted. Using mpileup results in SNPs called at every base position due to the reference being treated as N.
When I try to create an indexed file from my fasta using faidx i get a .fai file. It is quite small and doesnt contain any actual sequence data just the following: 1 64 3 64 65 2 64 71 64 65 3 64 139 64 65 4 64 207 64 65 5 64 275 64 65 ............................................ Anyone any ideas? |
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
That's a perfectly normal faidx file, if your chromosomes are named 1,2,3,4, and 5, and each is 64 letters long.
|
![]() |
![]() |
![]() |
#7 |
Member
Location: Ireland Join Date: Sep 2010
Posts: 21
|
![]()
OK, I can generate an .FAI file ok from the fasta file. Any ideas on why I am getting the problem with my pileup basically treating my reference as all Ns and therefore calling SNPs at every position.
Cheers |
![]() |
![]() |
![]() |
#8 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
Generally, it means there is a mismatch between what your .bam files says each reference sequence is named, and what your reference fasta says each reference contig is named. So check first to make sure that you really are using the same reference file in the mpileup that you used in aligning.
Second, I'd try simplifying the names of the reference sequences. Maybe there are spaces, or special characters in the names of your reference sequences, and your aligner handles that differently than mpileup. |
![]() |
![]() |
![]() |
#9 |
Member
Location: Ireland Join Date: Sep 2010
Posts: 21
|
![]()
Thanks for the suggestions.
This is what I initially thought the problem was after searching around on this forum, therefore I renamed my fasta reference with simple number naming scheme. This is the file I used to build my index and after mapping with bowtie and checking the 3rd column in the sam file I have the same simple numbering scheme. I guess the fact that each fasta record in my reference file is 64bp and doesn't even fill a single line is hardly an issue. |
![]() |
![]() |
![]() |
#10 | |
Junior Member
Location: china Join Date: Aug 2013
Posts: 4
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|