![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
RNA-seq SNP-calling without a complete reference | shoegame2001 | RNA Sequencing | 6 | 07-04-2012 01:55 AM |
SNP base calling | shuang | Bioinformatics | 7 | 10-24-2011 12:50 PM |
SNP base calling for multiple samples | shuang | Bioinformatics | 2 | 09-07-2011 03:06 PM |
SNP calling from a reference sequence | blackrabite | Genomic Resequencing | 2 | 05-21-2011 09:48 PM |
Hierarchical reference-free SNP calling | Marius | Bioinformatics | 1 | 12-27-2010 09:38 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Israel Join Date: Dec 2010
Posts: 23
|
![]()
Couple questions, I have a fasta file of hg18 which is about 3Gb large and I'm afraid that samtools pileup can't read it correctly because there are lower and upper case letters, is there a tool that can open such a large file so I can edit it or it there something that can convert all lower to upper case?
Second question is about samtools pileup , after generating a raw.pileup file and filtering snps with R bioconductor package (Rsamtools) using the following commands : system("samtools pileup -f genome.fasta output.sorted.bam > raw.pileup") snps <- readPileup("raw.pileup", variant="SNP") write.table(as.data.frame(snps), "snps.xls") I get a xls file in which the column of "referenceBase" is all filled with "*" instead of A/C/T/G.... someone knows what might be the problem? Thanks ahead! ![]() |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
maybe sed can replace all the lowercase letters with uppercase?
I haven't used the R package, but in general, if you are getting not getting the right reference letter in the pileup, it's because pileup didn't properly connect the reference fasta you gave it in the command line with what was aligned to. Mismatching (or possibly weird names with spaces and odd characters) names might do that, or if the samtools indexing step failed (having one long string of a genome is one reason that step fails), that'll cause pileup to behave like that. |
![]() |
![]() |
![]() |
#3 |
Member
Location: Israel Join Date: Dec 2010
Posts: 23
|
![]()
Thanks for your answer!
I get the right letter in the pileup file but as I use the R package in order to determine the snps I get the "*" instead of the reference nucleotides... What other tool can exclude the snps from the pileup file ? Thanks! |
![]() |
![]() |
![]() |
Thread Tools | |
|
|