Hi All,
We want to extract sequence information (for various genes) from a number of genome assemblies and generate consensus sequences for comparison between genomes representing different experiments.
What we have been doing is using samtools to extract regions from the genomic bam file, then trying to convert those into fasta format using bam2fastq. Everything we've extracted has been groups of overlapping short reads, we have not been successful at obtaining consensus sequences.
Is there an alternative workflow that would be more efficient/better? Are there suggestions for tools we should be using instead of/in addition to samtools and bam2fastq?
(Note: We have tried using the samtools programs (mpileup, bcf view, and vcfutils.pl) to generate a consensus sequence. Unfortunately, the (pipelined and non-pipelined) use of the program ‘bcftools view’ generates the following error: [bcf_sync] incorrect number of fields (0 != 5) at 0:0)).
We want to extract sequence information (for various genes) from a number of genome assemblies and generate consensus sequences for comparison between genomes representing different experiments.
What we have been doing is using samtools to extract regions from the genomic bam file, then trying to convert those into fasta format using bam2fastq. Everything we've extracted has been groups of overlapping short reads, we have not been successful at obtaining consensus sequences.
Is there an alternative workflow that would be more efficient/better? Are there suggestions for tools we should be using instead of/in addition to samtools and bam2fastq?
(Note: We have tried using the samtools programs (mpileup, bcf view, and vcfutils.pl) to generate a consensus sequence. Unfortunately, the (pipelined and non-pipelined) use of the program ‘bcftools view’ generates the following error: [bcf_sync] incorrect number of fields (0 != 5) at 0:0)).
Comment