Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trouble with bwa sampe

    Hi guys,

    i use bwa to align Whole Human Genome Data (Paired End Data from Ilumina). I have 2 Datasets. One is working one isnt.

    I used bwa 0.6.1 and 0.6.2 but get stucked at the same point.

    Here is what I have done so far:

    Index:

    bwa index -a bwtsw -p hg19 hg19.fa

    Alignment:

    bwa aln -t 4 hg19 S2_R1.fastq > S2_R1.sai
    bwa aln -t 4 hg19 S2_R2.fastq > S2_R2.sai

    Now i wanted to use bwa sampe to generate a sam file of the two Alignments:

    bwa sampe hg19 S2_R1.sai S2_R2.sai S2_R1.fastq S2_R2.fastq > S2_alignment.sam

    And here is my Problem:

    bwa runs for about 6 or 7 hours and then gets stucked at a certain point

    Code:
    ...
    [bwa_paired_sw] 18708 out of 20824 Q17 singletons are mated.
    [bwa_paired_sw] 94 out of 2476 Q17 discordant pairs are fixed.
    [bwa_sai2sam_pe_core] time elapses: 5.62 sec
    [bwa_sai2sam_pe_core] refine gapped alignments... 0.46 sec
    [bwa_sai2sam_pe_core] print alignments... 1.32 sec
    [bwa_sai2sam_pe_core] 38273024 sequences have been processed.
    [bwa_sai2sam_pe_core] convert to sequence coordinate... 
    [infer_isize] (25, 50, 75) percentile: (179, 210, 249)
    [infer_isize] low and high boundaries: 97 and 389 for estimating avg and std
    [infer_isize] inferred external isize from 231530 pairs: 217.078 +/- 51.705
    [infer_isize] skewness: 0.648; kurtosis: 0.139; ap_prior: 1.00e-05
    [infer_isize] inferred maximum insert size: 576 (6.95 sigma)
    [bwa_sai2sam_pe_core] time elapses: 8.44 sec
    [bwa_sai2sam_pe_core] changing coordinates of 997 alignments.
    [bwa_sai2sam_pe_core] align unmapped mate...
    [bwa_paired_sw] 17247 out of 19212 Q17 singletons are mated.
    [bwa_paired_sw] 97 out of 2585 Q17 discordant pairs are fixed.
    [bwa_sai2sam_pe_core] time elapses: 5.30 sec
    [bwa_sai2sam_pe_core] refine gapped alignments... 0.47 sec
    [bwa_sai2sam_pe_core] print alignments... 1.33 sec
    [bwa_sai2sam_pe_core] 38535168 sequences have been processed.
    [bwa_sai2sam_pe_core] convert to sequence coordinate... 
    [infer_isize] (25, 50, 75) percentile: (179, 210, 249)
    [infer_isize] low and high boundaries: 97 and 389 for estimating avg and std
    [infer_isize] inferred external isize from 233040 pairs: 217.053 +/- 51.725
    [infer_isize] skewness: 0.657; kurtosis: 0.157; ap_prior: 1.00e-05
    [infer_isize] inferred maximum insert size: 577 (6.95 sigma)
    [bwa_sai2sam_pe_core] time elapses: 8.24 sec
    [bwa_sai2sam_pe_core] changing coordinates of 1043 alignments.
    [bwa_sai2sam_pe_core] align unmapped mate...
    [bwa_paired_sw] 15045 out of 17028 Q17 singletons are mated.
    [bwa_paired_sw] 111 out of 2590 Q17 discordant pairs are fixed.
    [bwa_sai2sam_pe_core] time elapses: 4.85 sec
    [bwa_sai2sam_pe_core] refine gapped alignments... 0.45 sec
    [bwa_sai2sam_pe_core] print alignments... 1.32 sec
    [bwa_sai2sam_pe_core] 38797312 sequences have been processed.
    [bwa_sai2sam_pe_core] convert to sequence coordinate... 
    [infer_isize] (25, 50, 75) percentile: (179, 210, 249)
    [infer_isize] low and high boundaries: 97 and 389 for estimating avg and std
    [infer_isize] inferred external isize from 234891 pairs: 216.851 +/- 51.611
    [infer_isize] skewness: 0.650; kurtosis: 0.153; ap_prior: 1.00e-05
    [infer_isize] inferred maximum insert size: 576 (6.95 sigma)
    [bwa_sai2sam_pe_core] time elapses: 8.59 sec
    [bwa_sai2sam_pe_core] changing coordinates of 1105 alignments.
    [bwa_sai2sam_pe_core] align unmapped mate...
    [bwa_paired_sw] 14812 out of 16736 Q17 singletons are mated.
    [bwa_paired_sw] 110 out of 2525 Q17 discordant pairs are fixed.
    [bwa_sai2sam_pe_core] time elapses: 4.73 sec
    [bwa_sai2sam_pe_core] refine gapped alignments... 0.46 sec
    [bwa_sai2sam_pe_core] print alignments... 1.33 sec
    [bwa_sai2sam_pe_core] 39059456 sequences have been processed.
    [bwa_sai2sam_pe_core] convert to sequence coordinate... 
    [infer_isize] (25, 50, 75) percentile: (179, 210, 249)
    [infer_isize] low and high boundaries: 97 and 389 for estimating avg and std
    [infer_isize] inferred external isize from 234091 pairs: 216.650 +/- 51.588
    [infer_isize] skewness: 0.652; kurtosis: 0.144; ap_prior: 1.00e-05
    [infer_isize] inferred maximum insert size: 575 (6.95 sigma)
    [bwa_sai2sam_pe_core] time elapses: 8.37 sec
    [bwa_sai2sam_pe_core] changing coordinates of 1071 alignments.
    [bwa_sai2sam_pe_core] align unmapped mate...
    [bwa_paired_sw] 15456 out of 17430 Q17 singletons are mated.
    [bwa_paired_sw] 108 out of 2569 Q17 discordant pairs are fixed.
    [bwa_sai2sam_pe_core] time elapses: 4.92 sec
    [bwa_sai2sam_pe_core] refine gapped alignments... 0.46 sec
    [bwa_sai2sam_pe_core] print alignments... 1.33 sec
    [bwa_sai2sam_pe_core] 39321600 sequences have been processed.
    [bwa_sai2sam_pe_core] convert to sequence coordinate... 
    [infer_isize] (25, 50, 75) percentile: (179, 210, 249)
    [infer_isize] low and high boundaries: 97 and 389 for estimating avg and std
    [infer_isize] inferred external isize from 231644 pairs: 216.603 +/- 51.586
    [infer_isize] skewness: 0.661; kurtosis: 0.171; ap_prior: 1.00e-05
    [infer_isize] inferred maximum insert size: 575 (6.95 sigma)
    [bwa_sai2sam_pe_core] time elapses: 8.49 sec
    [bwa_sai2sam_pe_core] changing coordinates of 1105 alignments.
    [bwa_sai2sam_pe_core] align unmapped mate...
    [bwa_paired_sw] 17707 out of 19720 Q17 singletons are mated.
    [bwa_paired_sw] 105 out of 2476 Q17 discordant pairs are fixed.
    [bwa_sai2sam_pe_core] time elapses: 5.36 sec
    [bwa_sai2sam_pe_core] refine gapped alignments... 0.46 sec
    [bwa_sai2sam_pe_core] print alignments... 1.33 sec
    [bwa_sai2sam_pe_core] 39583744 sequences have been processed.
    [bwa_sai2sam_pe_core] convert to sequence coordinate... 
    [infer_isize] (25, 50, 75) percentile: (179, 210, 249)
    [infer_isize] low and high boundaries: 97 and 389 for estimating avg and std
    [infer_isize] inferred external isize from 231864 pairs: 216.571 +/- 51.411
    [infer_isize] skewness: 0.658; kurtosis: 0.166; ap_prior: 1.00e-05
    [infer_isize] inferred maximum insert size: 574 (6.95 sigma)
    [bwa_sai2sam_pe_core] time elapses: 8.47 sec
    [bwa_sai2sam_pe_core] changing coordinates of 1109 alignments.
    [bwa_sai2sam_pe_core] align unmapped mate...
    [bwa_paired_sw] 17254 out of 19247 Q17 singletons are mated.
    [bwa_paired_sw] 104 out of 2568 Q17 discordant pairs are fixed.
    [bwa_sai2sam_pe_core] time elapses: 5.32 sec
    [bwa_sai2sam_pe_core] refine gapped alignments... 0.45 sec
    [bwa_sai2sam_pe_core] print alignments... 1.35 sec
    [bwa_sai2sam_pe_core] 39845888 sequences have been processed.
    [bwa_sai2sam_pe_core] convert to sequence coordinate... 
    [infer_isize] (25, 50, 75) percentile: (179, 210, 248)
    [infer_isize] low and high boundaries: 97 and 386 for estimating avg and std
    [infer_isize] inferred external isize from 134785 pairs: 216.212 +/- 51.171
    [infer_isize] skewness: 0.645; kurtosis: 0.146; ap_prior: 1.00e-05
    [infer_isize] inferred maximum insert size: 572 (6.95 sigma)
    [bwa_sai2sam_pe_core] time elapses: 8.53 sec
    [bwa_sai2sam_pe_core] changing coordinates of 574 alignments.
    [bwa_sai2sam_pe_core] align unmapped mate...
    [bwa_paired_sw] 11908 out of 21661 Q17 singletons are mated.
    [bwa_paired_sw] 71 out of 100413 Q17 discordant pairs are fixed.
    [bwa_sai2sam_pe_core] time elapses: 32.24 sec
    [bwa_sai2sam_pe_core] refine gapped alignments... 0.45 sec
    [bwa_sai2sam_pe_core] print alignments... 1.26 sec
    [bwa_sai2sam_pe_core] 40108032 sequences have been processed.
    [bwa_sai2sam_pe_core] convert to sequence coordinate... 
    [infer_isize] (25, 50, 75) percentile: (26201, 48608, 61867)
    [infer_isize] low and high boundaries: 97 and 133199 for estimating avg and std
    [infer_isize] inferred external isize from 48 pairs: 46035.354 +/- 27108.104
    [infer_isize] skewness: 0.172; kurtosis: -0.844; ap_prior: 1.00e-05
    [infer_isize] inferred maximum insert size: 210582 (6.07 sigma)
    [bwa_sai2sam_pe_core] time elapses: 8.41 sec
    [bwa_sai2sam_pe_core] changing coordinates of 5 alignments.
    [bwa_sai2sam_pe_core] align unmapped mate...
    Here i have a 100 % CPU usage on one core and 13.7 memory usage and nothing happens anymore. The generated samfile is about 27.9 gb big. This output is from bwa-0.6.2 but i get the same on 0.6.1 and the samfile has the same size.

    Any suggestions ?

  • #2
    You are really very patient to wait 6,7 hours for an error.

    If you look the last 8 lines, you'll notice the inferred insert size is wrong. I guess your fastq is combined from several small files, somewhere after 40 million rows, the pair no longer match.

    If you can open the fastq and have a look around that region, see if you can spot anything strange, like an empty line, corrupt line, etc.

    And if you want test again, just cut 1 million rows from each fastq after 40108032 and ALN and SAMPE on those again.

    Best,

    dong

    Comment


    • #3
      Thanks again xied75!

      You are really awesome

      I had an mistake in combining my fastq files...

      I Think this will solve my problems, for now

      Thank you very much!

      Best,

      stylz2k

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin


        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
        Yesterday, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      39 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      41 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      35 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      55 views
      0 likes
      Last Post seqadmin  
      Working...
      X