Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • post-assembly genome analysis workflow question

    Hi All,

    We want to extract sequence information (for various genes) from a number of genome assemblies and generate consensus sequences for comparison between genomes representing different experiments.

    What we have been doing is using samtools to extract regions from the genomic bam file, then trying to convert those into fasta format using bam2fastq. Everything we've extracted has been groups of overlapping short reads, we have not been successful at obtaining consensus sequences.

    Is there an alternative workflow that would be more efficient/better? Are there suggestions for tools we should be using instead of/in addition to samtools and bam2fastq?

    (Note: We have tried using the samtools programs (mpileup, bcf view, and vcfutils.pl) to generate a consensus sequence. Unfortunately, the (pipelined and non-pipelined) use of the program ‘bcftools view’ generates the following error: [bcf_sync] incorrect number of fields (0 != 5) at 0:0)).

  • #2
    You are trying to make a vcf with bcftools? That should work, there's probably something off about your input file.

    Do you mean genome assemblies, or alignments? I assume alignments, since you have bam files?

    I suppose you could try using samtools view to pull sections of your .bam, and then you could put those .bams through velvet, to assemble a consensus for that sample at that region.

    But I think getting the vcf files for those regions is the way to go. You just aren't doing it right.

    Comment


    • #3
      We've been trying to pull out a chromosome, and go at it that way. Should be be doing it gene by gene instead, or is there something obviously wrong here:

      samtools view -b exome_input.bam 12 > chr12.bam
      samtools mpileup -f chr12.fa chr12.bam > chr12_pileup
      bcftools view -cg chr12_pileup > chr12_vcf


      edit:
      It seems we're not the only ones to get the following error
      [bcf_sync] incorrect number of fields (0 != 5) at 0:0))

      Maybe the suggestions here (nohup) will fix it:
      Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

      Will update when we've tried.

      Thanks
      Last edited by tom_mlvs; 02-07-2012, 12:17 PM.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      25 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      28 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      24 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Working...
      X