Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    I've filed a bug for this in github:

    Reporting on behalf of seqanswers user xmubingo, using VCF utils to generate a consensus assembly doesn't work properly because mpileup is not reporting all bases. The generated sequence from mpile...


    In the meantime, you'll either have to use my script with the reference fasta file (in which case INDELs will change the length of the generated sequence), or add a dummy SAM line for each sequence in the reference fasta file that covers the entire range of the sequence.

    Comment


    • #17
      Originally posted by gringer View Post
      I've filed a bug for this in github:

      Reporting on behalf of seqanswers user xmubingo, using VCF utils to generate a consensus assembly doesn't work properly because mpileup is not reporting all bases. The generated sequence from mpile...


      In the meantime, you'll either have to use my script with the reference fasta file (in which case INDELs will change the length of the generated sequence), or add a dummy SAM line for each sequence in the reference fasta file that covers the entire range of the sequence.
      Hi gringer, Thanks a lot!! I will try your code.

      Comment


      • #18
        Originally posted by gringer View Post
        I've filed a bug for this in github:

        Reporting on behalf of seqanswers user xmubingo, using VCF utils to generate a consensus assembly doesn't work properly because mpileup is not reporting all bases. The generated sequence from mpile...


        In the meantime, you'll either have to use my script with the reference fasta file (in which case INDELs will change the length of the generated sequence), or add a dummy SAM line for each sequence in the reference fasta file that covers the entire range of the sequence.
        Hi gringer, i think find a way to solve this problem. Although the consensus sequence(cns.fa -> cns.fa) generated by vcf2fq doesn't have same length as it in reference sequence. but i find the sequence in cns.fa just lacks of some 'n' in its tail. So, we can add some 'n' to the sequences to make its length equal to its original length. for example:

        cns.fa
        Code:
        >seq1
        nnnnnnnnnnnnnnnAAAATTTTTCCCCGGGGgggccccGGTTTg
        cns.fa.fixed
        Code:
        >seq1
        nnnnnnnnnnnnnnnAAAATTTTTCCCCGGGGgggccccGGTTTgnnnnnnnnnnnnn

        Comment


        • #19
          It's amazing

          Originally posted by gringer View Post
          I have a slightly different processing script, but the command you have seems to work for me on my mitochondrial data (i.e. it produces a fastq file at the end of it).

          I've attached my modified vcf2fq script to this post [also the fairly trivial fastq2fasta], maybe it will help to diagnose the problem.
          The original vcf2fq can't producing sequence containing indels, and it's really bother me a lot,but this one can produce sequence containing indels,that's amazing.

          Comment


          • #20
            I have a problem, if the new genome.fa contain indels, the position would be changed, if I call the genes with gtf from new genome, the called genes will not with the same regions compared with original genes in the original genome.

            Comment


            • #21
              ... so don't use a non-reference genome for defining the location of variants. Always link back to the reference genome when defining positions.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              24 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              25 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              21 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Working...
              X