Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BFAST - Alignment for ABI or Illumina sequencing - with qualities

    I wanted to let the community know about a new release of BFAST that now supports reads with quality scores.



    It is designed to handle Illumina and ABI SOLiD data on the human whole-genome resequencing scale (billions of reads). It is multi-threaded and can be easily be parallelized on a cluster, or on your local desktop. You can easily tune it to handle large insertions or deletions (>10bp) in your alignment, SNPs, and even appropriately mapping ABI SOLiD color errors, all while maintaining speed and accuracy.

  • #2
    Dear Nils,

    Can I run BFAST directly on GAII fastq data? They look like this:

    @HWUSI-EAS454:5:1:0:149
    ATTTCTCCACCTCCTCNCCCACCCCTTTTTTTTCCTTACTTCTTACTAAT
    +HWUSI-EAS454:5:1:0:149
    abaabaaaaab`baa]D\a]D]bbaaaZa][``aa_a]_ba_aa_aa``_
    @HWUSI-EAS454:5:1:0:314
    AGCCAGATCCTTACCCNCTCCACCTCTTTTTCTGTGTTTTATTTATGGTG
    +HWUSI-EAS454:5:1:0:314
    a_aaZS]Za__NDV_^DOYYMDZVYaY]Y___]ZWZZZ]QPNUNU^^QQR

    Thanks in advance,

    Valentina

    Comment


    • #3
      Dear Nils,

      While waiting for your answer, I run BFAST on a very small subset of my mate-pair data without changing the format. I took only 42 reads (21 mate-pairs).

      These are Illumina data and they should be aligned to the genome like this:
      <-(50bp)--(3000pb)--(50bp)->

      This means that the left read should be align to the Crick strand and the right one to the Watson strand with a 3000bp spacer between them.

      So I run "bfast match" and then "bfast localalign".

      .bmf file was successfully created:

      In total, found matches for 20 out of 21 reads.
      ************************************************************
      Terminating successfully!
      ************************************************************

      but localalign reported an error (parameters "-l 3000 -L 3"):

      Performing alignment...
      Currently on:
      thread:1 [0]Assertion failed: 0 <= endRowStepOne && 0 <= endColStepOne, file AlignNTSpace.c, line 192
      Abort (core dumped)

      without "-L" I don't get an error:

      Outputted alignments for 20 reads.
      Outputted 1 reads for which there were no alignments.
      Outputting complete.
      ************************************************************
      Terminating successfully!
      ************************************************************

      So my question is why I can get such an error, and if the parameter "-L 3" I use in localalign is what I need for my mate-pairs?

      And also "-f" option, is it for mirroring or for fasta file?

      Thank you in advance,

      Valentina

      Comment


      • #4
        Dear Nils,

        Sorry to bother you again,

        I have a question about the output format of BFAST (option "-O" in postprocess).

        -O 1 gives almost all information I need but it is not a standard format.. So I would prefer to have output in gff or sam. But
        -O 2 does not give me the information about chromosome
        -O 3 does not give the information about strand...

        And also it would be great to have the information about positions of mismatches and indels! Now I don't see how I can get it...

        Thank you,

        Valentina

        Comment


        • #5
          Originally posted by valeu View Post
          Dear Nils,

          Can I run BFAST directly on GAII fastq data? They look like this:

          @HWUSI-EAS454:5:1:0:149
          ATTTCTCCACCTCCTCNCCCACCCCTTTTTTTTCCTTACTTCTTACTAAT
          +HWUSI-EAS454:5:1:0:149
          abaabaaaaab`baa]D\a]D]bbaaaZa][``aa_a]_ba_aa_aa``_
          @HWUSI-EAS454:5:1:0:314
          AGCCAGATCCTTACCCNCTCCACCTCTTTTTCTGTGTTTTATTTATGGTG
          +HWUSI-EAS454:5:1:0:314
          a_aaZS]Za__NDV_^DOYYMDZVYaY]Y___]ZWZZZ]QPNUNU^^QQR

          Thanks in advance,

          Valentina
          That should work, although the base qualities may not be scaled properly.

          Originally posted by valeu View Post
          Dear Nils,

          While waiting for your answer, I run BFAST on a very small subset of my mate-pair data without changing the format. I took only 42 reads (21 mate-pairs).

          These are Illumina data and they should be aligned to the genome like this:
          <-(50bp)--(3000pb)--(50bp)->

          This means that the left read should be align to the Crick strand and the right one to the Watson strand with a 3000bp spacer between them.

          So I run "bfast match" and then "bfast localalign".

          .bmf file was successfully created:

          In total, found matches for 20 out of 21 reads.
          ************************************************** **********
          Terminating successfully!
          ************************************************** **********

          but localalign reported an error (parameters "-l 3000 -L 3"):

          Performing alignment...
          Currently on:
          thread:1 [0]Assertion failed: 0 <= endRowStepOne && 0 <= endColStepOne, file AlignNTSpace.c, line 192
          Abort (core dumped)

          without "-L" I don't get an error:

          Outputted alignments for 20 reads.
          Outputted 1 reads for which there were no alignments.
          Outputting complete.
          ************************************************** **********
          Terminating successfully!
          ************************************************** **********

          So my question is why I can get such an error, and if the parameter "-L 3" I use in localalign is what I need for my mate-pairs?

          And also "-f" option, is it for mirroring or for fasta file?

          Thank you in advance,

          Valentina
          What version are you using (hopefully bfast.0.6.1c)? The "-f" option is for the fasta filename, the "-F" option is for mirroring (notice the case!).

          Originally posted by valeu View Post
          Dear Nils,

          Sorry to bother you again,

          I have a question about the output format of BFAST (option "-O" in postprocess).

          -O 1 gives almost all information I need but it is not a standard format.. So I would prefer to have output in gff or sam. But
          -O 2 does not give me the information about chromosome
          -O 3 does not give the information about strand...

          And also it would be great to have the information about positions of mismatches and indels! Now I don't see how I can get it...

          Thank you,

          Valentina
          The SAM format has information about strand. See the SAM spec. As for positions of mismatches and indels, use a variant caller or compare the alignment to the reference. It is implicit in the SAM format.

          For more help, consider the BFAST help mailing list ([email protected]).

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          18 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          22 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          17 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          49 views
          0 likes
          Last Post seqadmin  
          Working...
          X