Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • consensus fasta and multifasta file of human genomes

    Dear All,
    i desperately need help in converting a thought to work.

    i have sequence six human samples and have created six alignment file (BAM) using UCSC reference.
    i need to do multiple alignment of all these samples how can i do that

    how i can convert every bam to consensus fasta
    how i can merge six consensus fasta into multifasta
    any software or reference will be appreciated
    Last edited by huma Asif; 09-08-2014, 05:22 PM.

  • #2
    From Samtools manual page:

    Generate the consensus sequence for one diploid individual:

    samtools mpileup -uf ref.fa aln.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq
    Use seqtk to convert from fastq to fasta.

    Concatenate (end to end, you are not "merging" the files this way) the fasta files with "cat" to generate multifasta.

    If you truly want to do whole genome alignments: http://genomewiki.ucsc.edu/index.php...lignment_howto

    Comment


    • #3
      thank you so much will try now

      Comment


      • #4
        well i am done with first step but look at my fastq file with many NNNNNNNNNN
        is this because my data is targeted exome sequencing
        how can i get rid of it
        if i use this fastq in seqtk it gives me empty output

        Comment


        • #5
          If your data is exome sequencing use the -L option in samtools and specify your capture regions file in the mpileup step.

          Comment


          • #6
            give the same result with -L

            see my code
            samtools mpileup -uf human_hg19.fa reads.bam -L targetSeq_exome_target_regions_hg19.bed |bcftools view -cg - |/usr/local/genome/samtools-0.1.18/bcftools/vcfutils.pl vcf2fq > cns.fastq



            fq with many n and empty fasta with
            seqtk seq –a cns.fastq > cns.fa

            Comment


            • #7
              That should have been a lower case "l".
              -l FILE list of positions (chr pos) or regions (BED) [null]
              -L means something else for samtools mpileup.

              Comment


              • #8
                thank you so much i am going to try this now

                Comment


                • #9
                  tried with small l still n in final output please suggest where i can get wrong

                  see my code
                  samtools mpileup -uf human_hg19.fa reads.bam -l targetSeq_exome_target_regions_hg19.bed |bcftools view -cg - |/usr/local/genome/samtools-0.1.18/bcftools/vcfutils.pl vcf2fq > cns.fastq

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  30 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  32 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  28 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  53 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X