Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to merge paired ends provided in separate files during (bfast) alignement

    Hello,

    I want to align illumina paired-end reads by using bfast. The point is that each end is provided in two separate .fastq files. I am not sure (at all) of which is the best way to 'join' them during the alignement process. I am using bfast_match + bfast_localign + bfast_postprocess. I've seen in the bfast manual that the localalign step allows to do the following:

    Code:
    bfast localalign -1 file_1.bmf -2 file_2.bmf -A 0 -U > sample.baf
    When the .bmf file comes from the bwaaln utility. However, when the .bmf file comes from the bfast_match, the following does not seem to work (bfast+bwa-0.6.4e):

    Code:
    bfast localalign -f hg19.fa -m pair_1.bmf -m pair_2.bmf -A 0 -U > sample.baf
    Therefore, I do not know the best way to proceed. My lucky guess is to align each .fastq file separately, and when I get the resulting two .sam files for each end then to join them by using picard (or samtools) merge.

    Any help will be appreciated!

    thanks
    david

  • #2
    there is a file that comes with the BFAST distribution: scripts/ill2fastq.pl
    that will convert your *sequence fastq files to bfastq format.

    Comment


    • #3
      Thank you for your answer, brentp.

      I've tried the ill2fastq.pl, and as far as I notice it just merges both fastq files in a single one in which the second end is reverted and complemented. For instance:

      pair_1:
      @HWUSI-EAS1692_0001:1:1:1050:4451#0/1
      CAGATTCACANTCCTGAATATCATGTTTTCTTTCCAAGGNATGACATAACGTCTTGGGATCATCCCTTGCTTTAATGAAAATCGTGGCAAATGAA
      +HWUSI-EAS1692_0001:1:1:1050:4451#0/1
      Ybaac][T^YB[ZZ[SKVZT`bcYbccaccaaa_cZZ[ZB[Z[T_c`cYcc\bcccc^T\a`TcccbL\ac\^a\Ybb`^bY]bb_BBBBB
      pair_2:
      @HWUSI-EAS1692_0001:1:1:1050:4451#0/2
      CATGATAATGCACTCCATCTCATTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGCACTAAAAAGCGGACCTTGGTGTGAAAACATAACACACAC
      +HWUSI-EAS1692_0001:1:1:1050:4451#0/2
      M_M^ZM\YL]U^L\^VQJIU\a__\``c\cW_aaaaa_R[_\_`W][__BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

      is converted to:
      @HWUSI-EAS1692_0001:1:1:1050:4451#0
      CAGATTCACANTCCTGAATATCATGTTTTCTTTCCAAGGNATGACATAACGTCTTGGGATCATCCCTTGCTTTAATGAAAATCGTGGCAAATGAA
      +
      :CBBD><5?:#<;;<4,7;5ACD:CDDBDDBBB@D;;<;#<;<5@DADD=CDDDD?5=BA5DDDC-=BD=?B=:CCA?C:>CC@#########
      @HWUSI-EAS1692_0001:1:1:1050:4451#0
      GTGTGTGTTATGTTTTCACACCAAGGTCCGCTTTTTAGTGCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAATGAGATGGAGTGCATTATCATG
      +
      ##############################################@@<>8A@=@<3@BBBBB@8D=DAA=@@B=6*+27?=-?6>-:=.;?.@.

      I'm still wondering if it is ok to obtain the .sam file as follows:

      Code:
      bfast match end1.fastq > end1.bmf
      bfast match end2.fastq > end2.bmf
      
      bfast localign end1.bmf > end1.baf
      bfast localign end2.bmf > end2.baf
      
      bfast postprocess end1.baf > end1.sam
      bfast postprocess end2.baf > end2.sam
      
      samtools merge end1.sam end2.sam > sample.sam
      I hope that the subsequent programs in the pipeline will understand that the aligned reads of the .sam file are in one or another strand depending on the header info.

      (Note that the pipeline is intended for searching for SNPs)

      Comment


      • #4
        Just in case anyone is interested in this post, I should say that everything goes nice when the two files containing each paired end are merged by the ill2fastq.pl script and then inputted to the bfast commands.

        I'm still concerned in the following though:

        - which is the advantage of doing so as compared to align each paired end separately and then joining the two resulting sam files (by samtools merge, for instance).

        - since i've noticed that the ill2fastq.pl script reverses and complements the second paired end, I'm not sure of what are the correct values for the -w argument in the bfast match ('to find matches on the designed strands') and the -R in the bfast postprocess ('specifies to expect paired reads to be on reverse strands').

        cheers,
        david

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 03-27-2024, 06:37 PM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-27-2024, 06:07 PM
        0 responses
        11 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        68 views
        0 likes
        Last Post seqadmin  
        Working...
        X