Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to merge paired ends provided in separate files during (bfast) alignement

    Hello,

    I want to align illumina paired-end reads by using bfast. The point is that each end is provided in two separate .fastq files. I am not sure (at all) of which is the best way to 'join' them during the alignement process. I am using bfast_match + bfast_localign + bfast_postprocess. I've seen in the bfast manual that the localalign step allows to do the following:

    Code:
    bfast localalign -1 file_1.bmf -2 file_2.bmf -A 0 -U > sample.baf
    When the .bmf file comes from the bwaaln utility. However, when the .bmf file comes from the bfast_match, the following does not seem to work (bfast+bwa-0.6.4e):

    Code:
    bfast localalign -f hg19.fa -m pair_1.bmf -m pair_2.bmf -A 0 -U > sample.baf
    Therefore, I do not know the best way to proceed. My lucky guess is to align each .fastq file separately, and when I get the resulting two .sam files for each end then to join them by using picard (or samtools) merge.

    Any help will be appreciated!

    thanks
    david

  • #2
    there is a file that comes with the BFAST distribution: scripts/ill2fastq.pl
    that will convert your *sequence fastq files to bfastq format.

    Comment


    • #3
      Thank you for your answer, brentp.

      I've tried the ill2fastq.pl, and as far as I notice it just merges both fastq files in a single one in which the second end is reverted and complemented. For instance:

      pair_1:
      @HWUSI-EAS1692_0001:1:1:1050:4451#0/1
      CAGATTCACANTCCTGAATATCATGTTTTCTTTCCAAGGNATGACATAACGTCTTGGGATCATCCCTTGCTTTAATGAAAATCGTGGCAAATGAA
      +HWUSI-EAS1692_0001:1:1:1050:4451#0/1
      Ybaac][T^YB[ZZ[SKVZT`bcYbccaccaaa_cZZ[ZB[Z[T_c`cYcc\bcccc^T\a`TcccbL\ac\^a\Ybb`^bY]bb_BBBBB
      pair_2:
      @HWUSI-EAS1692_0001:1:1:1050:4451#0/2
      CATGATAATGCACTCCATCTCATTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGCACTAAAAAGCGGACCTTGGTGTGAAAACATAACACACAC
      +HWUSI-EAS1692_0001:1:1:1050:4451#0/2
      M_M^ZM\YL]U^L\^VQJIU\a__\``c\cW_aaaaa_R[_\_`W][__BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

      is converted to:
      @HWUSI-EAS1692_0001:1:1:1050:4451#0
      CAGATTCACANTCCTGAATATCATGTTTTCTTTCCAAGGNATGACATAACGTCTTGGGATCATCCCTTGCTTTAATGAAAATCGTGGCAAATGAA
      +
      :CBBD><5?:#<;;<4,7;5ACD:CDDBDDBBB@D;;<;#<;<5@DADD=CDDDD?5=BA5DDDC-=BD=?B=:CCA?C:>CC@#########
      @HWUSI-EAS1692_0001:1:1:1050:4451#0
      GTGTGTGTTATGTTTTCACACCAAGGTCCGCTTTTTAGTGCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAATGAGATGGAGTGCATTATCATG
      +
      ##############################################@@<>8A@=@<3@BBBBB@8D=DAA=@@B=6*+27?=-?6>-:=.;?.@.

      I'm still wondering if it is ok to obtain the .sam file as follows:

      Code:
      bfast match end1.fastq > end1.bmf
      bfast match end2.fastq > end2.bmf
      
      bfast localign end1.bmf > end1.baf
      bfast localign end2.bmf > end2.baf
      
      bfast postprocess end1.baf > end1.sam
      bfast postprocess end2.baf > end2.sam
      
      samtools merge end1.sam end2.sam > sample.sam
      I hope that the subsequent programs in the pipeline will understand that the aligned reads of the .sam file are in one or another strand depending on the header info.

      (Note that the pipeline is intended for searching for SNPs)

      Comment


      • #4
        Just in case anyone is interested in this post, I should say that everything goes nice when the two files containing each paired end are merged by the ill2fastq.pl script and then inputted to the bfast commands.

        I'm still concerned in the following though:

        - which is the advantage of doing so as compared to align each paired end separately and then joining the two resulting sam files (by samtools merge, for instance).

        - since i've noticed that the ill2fastq.pl script reverses and complements the second paired end, I'm not sure of what are the correct values for the -w argument in the bfast match ('to find matches on the designed strands') and the -R in the bfast postprocess ('specifies to expect paired reads to be on reverse strands').

        cheers,
        david

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Advances in Sequencing Analysis Tools
          by seqadmin


          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
          05-06-2024, 07:48 AM
        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 06:35 AM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 02:46 PM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-07-2024, 06:57 AM
        0 responses
        17 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-06-2024, 07:17 AM
        0 responses
        19 views
        0 likes
        Last Post seqadmin  
        Working...
        X