Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trouble with bwa sampe

    Hi guys,

    i use bwa to align Whole Human Genome Data (Paired End Data from Ilumina). I have 2 Datasets. One is working one isnt.

    I used bwa 0.6.1 and 0.6.2 but get stucked at the same point.

    Here is what I have done so far:

    Index:

    bwa index -a bwtsw -p hg19 hg19.fa

    Alignment:

    bwa aln -t 4 hg19 S2_R1.fastq > S2_R1.sai
    bwa aln -t 4 hg19 S2_R2.fastq > S2_R2.sai

    Now i wanted to use bwa sampe to generate a sam file of the two Alignments:

    bwa sampe hg19 S2_R1.sai S2_R2.sai S2_R1.fastq S2_R2.fastq > S2_alignment.sam

    And here is my Problem:

    bwa runs for about 6 or 7 hours and then gets stucked at a certain point

    Code:
    ...
    [bwa_paired_sw] 18708 out of 20824 Q17 singletons are mated.
    [bwa_paired_sw] 94 out of 2476 Q17 discordant pairs are fixed.
    [bwa_sai2sam_pe_core] time elapses: 5.62 sec
    [bwa_sai2sam_pe_core] refine gapped alignments... 0.46 sec
    [bwa_sai2sam_pe_core] print alignments... 1.32 sec
    [bwa_sai2sam_pe_core] 38273024 sequences have been processed.
    [bwa_sai2sam_pe_core] convert to sequence coordinate... 
    [infer_isize] (25, 50, 75) percentile: (179, 210, 249)
    [infer_isize] low and high boundaries: 97 and 389 for estimating avg and std
    [infer_isize] inferred external isize from 231530 pairs: 217.078 +/- 51.705
    [infer_isize] skewness: 0.648; kurtosis: 0.139; ap_prior: 1.00e-05
    [infer_isize] inferred maximum insert size: 576 (6.95 sigma)
    [bwa_sai2sam_pe_core] time elapses: 8.44 sec
    [bwa_sai2sam_pe_core] changing coordinates of 997 alignments.
    [bwa_sai2sam_pe_core] align unmapped mate...
    [bwa_paired_sw] 17247 out of 19212 Q17 singletons are mated.
    [bwa_paired_sw] 97 out of 2585 Q17 discordant pairs are fixed.
    [bwa_sai2sam_pe_core] time elapses: 5.30 sec
    [bwa_sai2sam_pe_core] refine gapped alignments... 0.47 sec
    [bwa_sai2sam_pe_core] print alignments... 1.33 sec
    [bwa_sai2sam_pe_core] 38535168 sequences have been processed.
    [bwa_sai2sam_pe_core] convert to sequence coordinate... 
    [infer_isize] (25, 50, 75) percentile: (179, 210, 249)
    [infer_isize] low and high boundaries: 97 and 389 for estimating avg and std
    [infer_isize] inferred external isize from 233040 pairs: 217.053 +/- 51.725
    [infer_isize] skewness: 0.657; kurtosis: 0.157; ap_prior: 1.00e-05
    [infer_isize] inferred maximum insert size: 577 (6.95 sigma)
    [bwa_sai2sam_pe_core] time elapses: 8.24 sec
    [bwa_sai2sam_pe_core] changing coordinates of 1043 alignments.
    [bwa_sai2sam_pe_core] align unmapped mate...
    [bwa_paired_sw] 15045 out of 17028 Q17 singletons are mated.
    [bwa_paired_sw] 111 out of 2590 Q17 discordant pairs are fixed.
    [bwa_sai2sam_pe_core] time elapses: 4.85 sec
    [bwa_sai2sam_pe_core] refine gapped alignments... 0.45 sec
    [bwa_sai2sam_pe_core] print alignments... 1.32 sec
    [bwa_sai2sam_pe_core] 38797312 sequences have been processed.
    [bwa_sai2sam_pe_core] convert to sequence coordinate... 
    [infer_isize] (25, 50, 75) percentile: (179, 210, 249)
    [infer_isize] low and high boundaries: 97 and 389 for estimating avg and std
    [infer_isize] inferred external isize from 234891 pairs: 216.851 +/- 51.611
    [infer_isize] skewness: 0.650; kurtosis: 0.153; ap_prior: 1.00e-05
    [infer_isize] inferred maximum insert size: 576 (6.95 sigma)
    [bwa_sai2sam_pe_core] time elapses: 8.59 sec
    [bwa_sai2sam_pe_core] changing coordinates of 1105 alignments.
    [bwa_sai2sam_pe_core] align unmapped mate...
    [bwa_paired_sw] 14812 out of 16736 Q17 singletons are mated.
    [bwa_paired_sw] 110 out of 2525 Q17 discordant pairs are fixed.
    [bwa_sai2sam_pe_core] time elapses: 4.73 sec
    [bwa_sai2sam_pe_core] refine gapped alignments... 0.46 sec
    [bwa_sai2sam_pe_core] print alignments... 1.33 sec
    [bwa_sai2sam_pe_core] 39059456 sequences have been processed.
    [bwa_sai2sam_pe_core] convert to sequence coordinate... 
    [infer_isize] (25, 50, 75) percentile: (179, 210, 249)
    [infer_isize] low and high boundaries: 97 and 389 for estimating avg and std
    [infer_isize] inferred external isize from 234091 pairs: 216.650 +/- 51.588
    [infer_isize] skewness: 0.652; kurtosis: 0.144; ap_prior: 1.00e-05
    [infer_isize] inferred maximum insert size: 575 (6.95 sigma)
    [bwa_sai2sam_pe_core] time elapses: 8.37 sec
    [bwa_sai2sam_pe_core] changing coordinates of 1071 alignments.
    [bwa_sai2sam_pe_core] align unmapped mate...
    [bwa_paired_sw] 15456 out of 17430 Q17 singletons are mated.
    [bwa_paired_sw] 108 out of 2569 Q17 discordant pairs are fixed.
    [bwa_sai2sam_pe_core] time elapses: 4.92 sec
    [bwa_sai2sam_pe_core] refine gapped alignments... 0.46 sec
    [bwa_sai2sam_pe_core] print alignments... 1.33 sec
    [bwa_sai2sam_pe_core] 39321600 sequences have been processed.
    [bwa_sai2sam_pe_core] convert to sequence coordinate... 
    [infer_isize] (25, 50, 75) percentile: (179, 210, 249)
    [infer_isize] low and high boundaries: 97 and 389 for estimating avg and std
    [infer_isize] inferred external isize from 231644 pairs: 216.603 +/- 51.586
    [infer_isize] skewness: 0.661; kurtosis: 0.171; ap_prior: 1.00e-05
    [infer_isize] inferred maximum insert size: 575 (6.95 sigma)
    [bwa_sai2sam_pe_core] time elapses: 8.49 sec
    [bwa_sai2sam_pe_core] changing coordinates of 1105 alignments.
    [bwa_sai2sam_pe_core] align unmapped mate...
    [bwa_paired_sw] 17707 out of 19720 Q17 singletons are mated.
    [bwa_paired_sw] 105 out of 2476 Q17 discordant pairs are fixed.
    [bwa_sai2sam_pe_core] time elapses: 5.36 sec
    [bwa_sai2sam_pe_core] refine gapped alignments... 0.46 sec
    [bwa_sai2sam_pe_core] print alignments... 1.33 sec
    [bwa_sai2sam_pe_core] 39583744 sequences have been processed.
    [bwa_sai2sam_pe_core] convert to sequence coordinate... 
    [infer_isize] (25, 50, 75) percentile: (179, 210, 249)
    [infer_isize] low and high boundaries: 97 and 389 for estimating avg and std
    [infer_isize] inferred external isize from 231864 pairs: 216.571 +/- 51.411
    [infer_isize] skewness: 0.658; kurtosis: 0.166; ap_prior: 1.00e-05
    [infer_isize] inferred maximum insert size: 574 (6.95 sigma)
    [bwa_sai2sam_pe_core] time elapses: 8.47 sec
    [bwa_sai2sam_pe_core] changing coordinates of 1109 alignments.
    [bwa_sai2sam_pe_core] align unmapped mate...
    [bwa_paired_sw] 17254 out of 19247 Q17 singletons are mated.
    [bwa_paired_sw] 104 out of 2568 Q17 discordant pairs are fixed.
    [bwa_sai2sam_pe_core] time elapses: 5.32 sec
    [bwa_sai2sam_pe_core] refine gapped alignments... 0.45 sec
    [bwa_sai2sam_pe_core] print alignments... 1.35 sec
    [bwa_sai2sam_pe_core] 39845888 sequences have been processed.
    [bwa_sai2sam_pe_core] convert to sequence coordinate... 
    [infer_isize] (25, 50, 75) percentile: (179, 210, 248)
    [infer_isize] low and high boundaries: 97 and 386 for estimating avg and std
    [infer_isize] inferred external isize from 134785 pairs: 216.212 +/- 51.171
    [infer_isize] skewness: 0.645; kurtosis: 0.146; ap_prior: 1.00e-05
    [infer_isize] inferred maximum insert size: 572 (6.95 sigma)
    [bwa_sai2sam_pe_core] time elapses: 8.53 sec
    [bwa_sai2sam_pe_core] changing coordinates of 574 alignments.
    [bwa_sai2sam_pe_core] align unmapped mate...
    [bwa_paired_sw] 11908 out of 21661 Q17 singletons are mated.
    [bwa_paired_sw] 71 out of 100413 Q17 discordant pairs are fixed.
    [bwa_sai2sam_pe_core] time elapses: 32.24 sec
    [bwa_sai2sam_pe_core] refine gapped alignments... 0.45 sec
    [bwa_sai2sam_pe_core] print alignments... 1.26 sec
    [bwa_sai2sam_pe_core] 40108032 sequences have been processed.
    [bwa_sai2sam_pe_core] convert to sequence coordinate... 
    [infer_isize] (25, 50, 75) percentile: (26201, 48608, 61867)
    [infer_isize] low and high boundaries: 97 and 133199 for estimating avg and std
    [infer_isize] inferred external isize from 48 pairs: 46035.354 +/- 27108.104
    [infer_isize] skewness: 0.172; kurtosis: -0.844; ap_prior: 1.00e-05
    [infer_isize] inferred maximum insert size: 210582 (6.07 sigma)
    [bwa_sai2sam_pe_core] time elapses: 8.41 sec
    [bwa_sai2sam_pe_core] changing coordinates of 5 alignments.
    [bwa_sai2sam_pe_core] align unmapped mate...
    Here i have a 100 % CPU usage on one core and 13.7 memory usage and nothing happens anymore. The generated samfile is about 27.9 gb big. This output is from bwa-0.6.2 but i get the same on 0.6.1 and the samfile has the same size.

    Any suggestions ?

  • #2
    You are really very patient to wait 6,7 hours for an error.

    If you look the last 8 lines, you'll notice the inferred insert size is wrong. I guess your fastq is combined from several small files, somewhere after 40 million rows, the pair no longer match.

    If you can open the fastq and have a look around that region, see if you can spot anything strange, like an empty line, corrupt line, etc.

    And if you want test again, just cut 1 million rows from each fastq after 40108032 and ALN and SAMPE on those again.

    Best,

    dong

    Comment


    • #3
      Thanks again xied75!

      You are really awesome

      I had an mistake in combining my fastq files...

      I Think this will solve my problems, for now

      Thank you very much!

      Best,

      stylz2k

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Advancing Precision Medicine for Rare Diseases in Children
        by seqadmin




        Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
        12-16-2024, 07:57 AM
      • seqadmin
        Recent Advances in Sequencing Technologies
        by seqadmin



        Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

        Long-Read Sequencing
        Long-read sequencing has seen remarkable advancements,...
        12-02-2024, 01:49 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 12-17-2024, 10:28 AM
      0 responses
      32 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-13-2024, 08:24 AM
      0 responses
      48 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-12-2024, 07:41 AM
      0 responses
      34 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-11-2024, 07:45 AM
      0 responses
      46 views
      0 likes
      Last Post seqadmin  
      Working...
      X