Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA sampe problem

    Hi,

    Having trouble figuring out what I have done wrong to make BWA sampe give me a mess instead of a functional SAM file. The raw data are from a paired end Illumina sequencing run. The .sai files were made by using BWA aln to allign reads for each end of the pairs against hg19.

    I can generate SAM files for end1 and end2 seperately using samse. But when I uses sampe, it fails to write a complete file. Instead it gets to the first pair where both ends have mapped and spits up a mess. (see below)

    Meanwhile, when I compared the SAM files generated for the ends seperately the results make sense. There are an equal number of reads in both SAM files, not all the pairs mapped on both ends, but the that do are seperated by a distance consistent with an insert size that matches the library sequenced.

    Wondering if anyone has experienced this problem and can make suggestions regarding what to do next.

    -Patel-

    From SAM file generated by SAMPE:
    [lalitp@ccmb-comp2 FASTQ]$ head -n 125 PEaln_4.sam
    @SQ SN:chr1 LN:249250621
    @SQ SN:chr2 LN:243199373
    @SQ SN:chr3 LN:198022430
    @SQ SN:chr4 LN:191154276
    @SQ SN:chr5 LN:180915260
    @SQ SN:chr6 LN:171115067
    @SQ SN:chr7 LN:159138663
    @SQ SN:chrX LN:155270560
    @SQ SN:chr8 LN:146364022
    @SQ SN:chr9 LN:141213431
    @SQ SN:chr10 LN:135534747
    @SQ SN:chr11 LN:135006516
    @SQ SN:chr12 LN:133851895
    @SQ SN:chr13 LN:115169878
    @SQ SN:chr14 LN:107349540
    @SQ SN:chr15 LN:102531392
    @SQ SN:chr16 LN:90354753
    @SQ SN:chr17 LN:81195210
    @SQ SN:chr18 LN:78077248
    @SQ SN:chr20 LN:63025520
    @SQ SN:chrY LN:59373566
    @SQ SN:chr19 LN:59128983
    @SQ SN:chr22 LN:51304566
    @SQ SN:chr21 LN:48129895
    @SQ SN:chr6_ssto_hap7 LN:4928567
    @SQ SN:chr6_mcf_hap5 LN:4833398
    @SQ SN:chr6_cox_hap2 LN:4795371
    @SQ SN:chr6_mann_hap4 LN:4683263
    @SQ SN:chr6_apd_hap1 LN:4622290
    @SQ SN:chr6_qbl_hap6 LN:4611984
    @SQ SN:chr6_dbb_hap3 LN:4610396
    @SQ SN:chr17_ctg5_hap1 LN:1680828
    @SQ SN:chr4_ctg9_hap1 LN:590426
    @SQ SN:chr1_gl000192_random LN:547496
    @SQ SN:chrUn_gl000225 LN:211173
    @SQ SN:chr4_gl000194_random LN:191469
    @SQ SN:chr4_gl000193_random LN:189789
    @SQ SN:chr9_gl000200_random LN:187035
    @SQ SN:chrUn_gl000222 LN:186861
    @SQ SN:chrUn_gl000212 LN:186858
    @SQ SN:chr7_gl000195_random LN:182896
    @SQ SN:chrUn_gl000223 LN:180455
    @SQ SN:chrUn_gl000224 LN:179693
    @SQ SN:chrUn_gl000219 LN:179198
    @SQ SN:chr17_gl000205_random LN:174588
    @SQ SN:chrUn_gl000215 LN:172545
    @SQ SN:chrUn_gl000216 LN:172294
    @SQ SN:chrUn_gl000217 LN:172149
    @SQ SN:chr9_gl000199_random LN:169874
    @SQ SN:chrUn_gl000211 LN:166566
    @SQ SN:chrUn_gl000213 LN:164239
    @SQ SN:chrUn_gl000220 LN:161802
    @SQ SN:chrUn_gl000218 LN:161147
    @SQ SN:chr19_gl000209_random LN:159169
    @SQ SN:chrUn_gl000221 LN:155397
    @SQ SN:chrUn_gl000214 LN:137718
    @SQ SN:chrUn_gl000228 LN:129120
    @SQ SN:chrUn_gl000227 LN:128374
    @SQ SN:chr1_gl000191_random LN:106433
    @SQ SN:chr19_gl000208_random LN:92689
    @SQ SN:chr9_gl000198_random LN:90085
    @SQ SN:chr17_gl000204_random LN:81310
    @SQ SN:chrUn_gl000233 LN:45941
    @SQ SN:chrUn_gl000237 LN:45867
    @SQ SN:chrUn_gl000230 LN:43691
    @SQ SN:chrUn_gl000242 LN:43523
    @SQ SN:chrUn_gl000243 LN:43341
    @SQ SN:chrUn_gl000241 LN:42152
    @SQ SN:chrUn_gl000236 LN:41934
    @SQ SN:chrUn_gl000240 LN:41933
    @SQ SN:chr17_gl000206_random LN:41001
    @SQ SN:chrUn_gl000232 LN:40652
    @SQ SN:chrUn_gl000234 LN:40531
    @SQ SN:chr11_gl000202_random LN:40103
    @SQ SN:chrUn_gl000238 LN:39939
    @SQ SN:chrUn_gl000244 LN:39929
    @SQ SN:chrUn_gl000248 LN:39786
    @SQ SN:chr8_gl000196_random LN:38914
    @SQ SN:chrUn_gl000249 LN:38502
    @SQ SN:chrUn_gl000246 LN:38154
    @SQ SN:chr17_gl000203_random LN:37498
    @SQ SN:chr8_gl000197_random LN:37175
    @SQ SN:chrUn_gl000245 LN:36651
    @SQ SN:chrUn_gl000247 LN:36422
    @SQ SN:chr9_gl000201_random LN:36148
    @SQ SN:chrUn_gl000235 LN:34474
    @SQ SN:chrUn_gl000239 LN:33824
    @SQ SN:chr21_gl000210_random LN:27682
    @SQ SN:chrUn_gl000231 LN:27386
    @SQ SN:chrUn_gl000229 LN:19913
    @SQ SN:chrM LN:16571
    @SQ SN:chrUn_gl000226 LN:15008
    @SQ SN:chr18_gl000207_random LN:4262
    @PG ID:bwa PN:bwa VN:0.5.9-r16
    D7DHSVN1_0087:4:1:1238:1975#0 77 * 0 0 * * 0 0 TATGCCTTTAAGTTAACTGACATTTTCTTCTGCAATGTCTTTCTCTTCTGTTAATGCCACTCAGTGACAATTTTATTTTATATTATGTTCC BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NGGTTTTGGTTGATTTTTTTTGTATT XC:i:91
    D7DHSVN1_0087:4:1:1238:1975#0 141 * 0 0 * * 0 0 CTTTCTGATTTTCTGTATATGTGTAATCATATTCATTTGCAAATAAAGAGATATTTTATAATTAGTGGTGTTTGTTATACTAGCCATGAGT cbceSd\bbcc_Vaddd^c\WXc`c`cdd\cN^XbM]aaV_VY^_Y]d_adBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NGGTTTTGGTTGATTTTTTTTGTATT XC:i:91
    D7DHSVN1_0087:4:1:1304:1903#0 77 * 0 0 * * 0 0 ATGTATGATGGAGCTTCTGCCAACCATGGTGGGAAGAACTGATGCACAACTCATTCGCACCAAGGTATACGACTGAGACACTAGAATGGAG [V^^U^^^V^\^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NGGGTGGTGGGGATTTTGGTGGGCAG XC:i:91
    D7DHSVN1_0087:4:1:1304:1903#0 141 * 0 0 * * 0 0 TAAAATTACTGTCTGATCCCTGTGGACTTATGTGGACTGGTGTATGTTTTATTACAAAGGATTAATGTAAAGACAAACACAATGCACAACA bdfaNddcN\c_[\aBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NGGGTGGTGGGGATTTTGGTGGGCAG XC:i:91
    D7DHSVN1_0087:4:1:1492:1904#0 77 * 0 0 * * 0 0 GTTTACGACATAAGGGAAGAGCCTTATGCTCCTCTTGTGGACTCCTGCCATGCAAAGAGCCAGCGCTTCTATACTCTGGCAGCTTCAGCTG BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NGGGGTGGGTGGTGTGTTTGTGGGAA XC:i:91
    D7DHSVN1_0087:4:1:1492:1904#0 141 * 0 0 * * 0 0 GGAAAGAGAAAATGGATTTAGACTTCTCCAGGATTTTCTGATTACAAAAAAAATTGAGCATGCTCTCACACAGGAGCACAGAGTGCACACA `cb`cZWcdb\aa^WZcacWbN^U_BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NGGGGTGGGTGGTGTGTTTGTGGGAA XC:i:91
    D7DHSVN1_0087:4:1:1362:1923#0 77 * 0 0 * * 0 0 TTTTGGGGAGGACCGTCACTCTCCCTTGGGATACTGATATCCAAATTGTAACTTCTCAGTAGGGTTTGTATGAACTGTACTTCACACACAC BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NTGGGTGGTGTGGGGGTTGGGTGTTG XC:i:91
    D7DHSVN1_0087:4:1:1362:1923#0 141 * 0 0 * * 0 0 ATTTATAAGAAAACTGTGCAAATTAAGGCAAGTGGATCACTGTTCTACGAATACCGGTGTATCAGACTGCGACAAATGTTGGACATATTAC `_c_cKaV[]PR_]U_^c\\cWbM__aUM^^]XU]UTbM_JZWOK^U_BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NTGGGTGGTGTGGGGGTTGGGTGTTG XC:i:91
    D7DHSVN1_0087:4:1:1417:1933#0 77 * 0 0 * * 0 0 TGCAGACTTCGGTAAAATCACATGGGGTACAAGTAAACAGTGCCAGTACAAGGATAAGCTGACAAAACTCTATACCCTGGCTAACACTAGG _____V_BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NGGTTGGGTGTGTGTTTTTTGTGCGT XC:i:91
    D7DHSVN1_0087:4:1:1417:1933#0 141 * 0 0 * * 0 0 AAAAGGTAGAAGGTGATGAGGGGGAACCATATGGCCCTGTGTCTGTCAACACAAATCCATAACGGCTGAAGTACTCCCGCCAATGTCGTGA aefccdNcccabcXcbcbccb`ec^d^^__\PY`]Kb^_\_Q^Q]PXZ]][[ODUFPUTXMSJSSLV^[_^BBBBBBBBBBBBBBBBBBBB BC:Z:NGGTTGGGTGTGTGTTTTTTGTGCGT XC:i:91
    D7DHSVN1_0087:4:1:1472:1944#0 77 * 0 0 * * 0 0 GGAGAAGACAGGATGAGATCAGTTCAAAGCAGATGCCTGGACAGTGGCGGCAGTGTTGTACACGATCCACATATCGGCACAATGTATATGG ^UUY[ZYM[][V^^UX\Z^^X^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NTTTGTGTGTCCAGTGGTGGGGGTAA XC:i:91
    D7DHSVN1_0087:4:1:1472:1944#0 141 * 0 0 * * 0 0 TATTTGTGTGGTTAAAAATGAAAAATGAGCACTATTTGTACACCCAACATTCATACAGTGAACATGATATACAGCGCACAGCACACTCACA da_^N]\IJZZNMZ^TXf_d_d_a`KY`SL[_BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NTTTGTGTGTCCAGTGGTGGGGGTAA XC:i:91
    D7DHSVN1_0087:4:1:1449:1965#0 77 * 0 0 * * 0 0 ACTTTACCAAAAGTCATTCATTTCAGAAAGTCACATAATAACATTACAACATATATCAGTCTGCAGTGTAACTGTCACCTGCATAATACAA __UX\]V^\^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NGGGGTGGTGGGACATTTTGTGGCAC XC:i:91
    D7DHSVN1_0087:4:1:1449:1965#0 141 * 0 0 * * 0 0 TTTTTTGGCTATCTTGAAGTGGCATTCATAGATAAGTTAAATGGCCTCACATGTAACTTTTAAAAAGTTGATATCATGCTCAAGCGCTATA ^^V__BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NGGGGTGGTGGGACATTTTGTGGCAC XC:i:91
    D7DHSVN1_0087:4:1:1317:1991#0 77 * 0 0 * * 0 0 GGGATCTCGCCATGTTGCCCAGACTAGGTATTCTATTTTAGATCCATGTACTTCTTGACGCAATGCTGTGTGAGGAGCAGCTGGCACTGCA BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NGTTTTTTGGGATTGGGGGGGTTGCA XC:i:91
    D7DHSVN1_0087:4:1:1317:1991#0 141 * 0 0 * * 0 0 TACCTGGGGTCCTAGCTATTTGAGAGGCTCAGATGCGCGGGTTAATTGAGGCGCGTGGGACAACGCGTGGACGTAGCAGAGTGTGTCGACA b][`_VM_L]Y]a\d\d`aN[bBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NGTTTTTTGGGATTGGGGGGGTTGCA XC:i:91
    D7DHSVN1_0087:4:1:1637:1912#0 77 * 0 0 * * 0 0 ACGCGAGAAGAAAGTGAGGACACGGAGAGCGATGGAGATGATGACGATCTTGTTTGCAAAGGGTCAACAGGAGGACAGACGCGCAAGACAC BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NGGTGGTGGTGAAGGTTTTGGGACTT XC:i:91
    D7DHSVN1_0087:4:1:1637:1912#0 141 * 0 0 * * 0 0 GCGCCGACCCGTGACTGGTAGGTGTCTAGTAGTTTTGGGGGGGCTCTCGTATTGAGTGTGAGACCGGCTGCGCGCGGCACACAGCACGACG \\cb`Na\VTOM_^^^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NGGTGGTGGTGAAGGTTTTGGGACTT XC:i:91
    D7DHSVN1_0087:4:1:1728:1914#0 99 chr2 140326609 29 1S73M17S = 140326814 296 ACAGGAGAAAAAAAAAAAGCATATAAAGTGCCTCAACCAACTAGACATTGCTATGAAAAAAAACAAACACACAATGACAATTATCACGACGAAAAAAAAAAAAAqAAAAAAAoe n oeAAAAAAAAAAAAAqAAAAAAACGTCGTGATAATTGTCATTGTGTGTTTGTTTTTTTTCATAGCAATGTCTAGTTGGTTGAGGCACTTTATATGCTTTTTTTTTTTCTCCTGTAAAAAAAAAAAAAiAAAAAAAndàbioll:n:
    :i:inn::ifi
    flAiAAAAAAAAAqAAAAAAATTTACTGAACTATGAAAACAGCAGTGCCAATGGGGAAAATGTGATTGCTGCATTAAGCAAAAATAAATAAATAAAAGACAGAAACCCCAACAAAAAAAAAAAAAqAAAAAAAb??bcAAAAAAAAAAAAAqAAAAAAAGTTGGGGTTTCTGTCTTTTATTTATTTATTTTTGCTTAATGCAGCAATCACATTTTCCCCATTGGCACTGCTGTTTTCATAGTTCAGTAAAAAAAAAAAAAAAAiAAAAAAAndàbioll:n:
    :i:ih:
    :ifehflAiAAAAAAAAAqAAAAAAAAAATGCCCTGAGGTAGTCTTATTTGTGTTAAATCTGCTTGGTGCTCTATCACATTCTTGTACTTCAATGTTTATCTTTCCTTATGTTTGGAAAAAAAAAAAAAAqAAAAAAAlonooonoaoooa__awbbbblààacAAAAAAAAAAAAAqAAAAAAATCCAAACATAAGGAAAGATAAACATTGAAGTACAAGAATGTGATAGAGCACCAAGCAGATTTAACACAAATAAGACTACCTCAGGGCATTTAAAAAAAAAAAAAiAAAAAAAndàbioll:n:
    :i:ih
    :if
    fflAiAAAAAAAAAqAAAAAAAGTAGATTACCGTCTCTCATGGGGTTAGAAGTTGATCTACCCTAGGGTTTTCTATCCTTGCTTTCACTGATATATATTTCTCCTGGTAGTTTAAAAAAAAAAAAAqAAAAAAAaacclccaooooooo oe AAAAAAAAAAAAAqAAAAAAAAAACTACCAGGAGAAATATATATCAGTGAAAGCAAGGATAGAAAACCCTAGGGTAGATCAACTTCTAACCCCATGAGAGACGGTAATCTACAAAAAAAAAAAAAiAAAAAAAndàbioll:n:
    :i:inen:if
    fflAiAAAAAAAAAqAAAAAAATGACGGTAGAACAACTTCAAATGAGAAAGCATCTGTTGGGGTTTGCTTATTTTGCAATGTAGGAATAAACAGACTTCAAACCTGTGACACAAAAAAAAAAAAAAqAAAAAAAaaccnbaccaccAAAAAAAAAAAAAqAAAAAAATGTGTCACAGGTTTGAAGTCTGTTTATTCCTACATTGCAAAATAAGCAAACCCCAACAGATGCTTTCTCATTTGAAGTTGTTCTACCGTCAAAAAAAAAAAAAAiAAAAAAAndàbioll:n:
    :i:ih:n:ifhnflAiAAAAAAAAAqAAAAAAAAGTAGTTAGTATTGGTAGCATAATGATGACCTGAATCCCGGCTCATTAATTCAGTGGTCTACTTCACCTTTTCAATGTCACTCACCAAATAAAAAAAAAAAAAAqAAAAAAAAAAAAAAAAAAAAqAAAAAAATATTTGGTGAGTGACATTGAAAAGGTGAAGTAGACCACTGAATTAATGAGCCGGGATTCAGGTCATCATTATGCTACCAATACTAACTACTAAAAAAAAAAAAAiAAAAAAAndàbioll:n:
    :i:ihnl:iff
    flAiAAAAAAAAAqAAAAAAAAGCGGTGTAGGAGTACAATAAGAAATAGATACTTGGGAAATAAAAAATTATATTCCATACACATGAATGACTTTCATTGTGCTGATACTCTAAAAAAAAAAAAAqAAAAAAAan__aZalllllalaao loaac??c n AAAAAAAAAAAAAqAAAAAAAAGAGTATCAGCACAATGAAAGTCATTCATGTGTATGGAATATAATTTTTTATTTCCCAAGTATCTATTTCTTATTGTACTCCTACACCGCTAAAAAAAAAAAAAiAAAAAAAndàbioll:n:

  • #2
    check to make sure you didn't accidentally send stderr to stdout in your previous steps.
    this is a common problem and may be what's biting you.

    Comment


    • #3
      Hi Richard,

      Thank you for your reply. I'm a bit of a newbee at this and don't actually know what step(s) would allow me to direct stderr to stdout. The commands I've used are given below, though. Did I mess something up?
      $ bwa aln -B 13 -t 1 hg19.fa s_4_1_sequence.txt > BWA_4_1.sai
      $ bwa aln -B 13 -t 1 hg19.fa s_4_2_sequence.txt > BWA_4_2.sai

      $ bwa samse hg19.fa BWA_4_1.sai s_4_1_sequence.txt > SEaln_4_1.sam
      $ bwa samse hg19.fa BWA_4_2.sai s_4_2_sequence.txt > SEaln_4_2.sam

      $ bwa sampe hg19.fa BWA_4_1.sai BWA_4_2.sai s_4_1_sequence.txt s_4_2_sequence.txt > PEaln_4.sam
      Also - the following output was generated by the sampe run. Any idea what "segmentation fault" is or what would have caused it to occur?:
      [bwa_sai2sam_pe_core] convert to sequence coordinate...
      [infer_isize] (25, 50, 75) percentile: (236, 281, 305)
      [infer_isize] low and high boundaries: 98 and 443 for estimating avg and std
      [infer_isize] inferred external isize from 150034 pairs: 263.375 +/- 64.224
      [infer_isize] skewness: -1.005; kurtosis: 0.302; ap_prior: 1.19e-05
      [infer_isize] inferred maximum insert size: 708 (6.93 sigma)
      [bwa_sai2sam_pe_core] time elapses: 27.33 sec
      [bwa_sai2sam_pe_core] changing coordinates of 6985 alignments.
      [bwa_sai2sam_pe_core] align unmapped mate...
      [bwa_paired_sw] 41375 out of 44459 Q17 singletons are mated.
      [bwa_paired_sw] 817 out of 4055 Q17 discordant pairs are fixed.
      [bwa_sai2sam_pe_core] time elapses: 23.21 sec
      [bwa_sai2sam_pe_core] refine gapped alignments... 2.38 sec
      [bwa_sai2sam_pe_core] print alignments... Segmentation fault

      Comment


      • #4
        It looks good, I had to check out the barcode parameter [-B] ; I've never used it.
        If that's your script or what you typed it looks fine.

        If samse worked but sampe failed ... hmmmm. I'd start looking at the executable. Segfault just means the program tried to access memory it shouldn't have. Somewhere the code tried to access using a bad address. This is common in many errant Unix programs. The segfault message and the abrupt end of your output are consistent.

        Did you download source BWA and type "make" ... or did someone play with the makefile first and added "-O3" [ the super optimize parameter ]? Are you on Linux? Do you have 4+ GB memory? Not having enough memory is very bad. BWA source doesn't check return values [ not necessarly a bad thing, it runs faster but assumes there's no problem with input or memory].



        I have two desperation suggestions: 1) run with default parameters [ I doubt this is the problem] and 2) run "strings " command on you sai input files "strings BWA_4_2.sai" for instance. See any inconsistent junk that snuck in there?

        Comment


        • #5
          All I have to say is.... good job! In all my time developing my parallel version of BWA I never once encountered an error like this.

          I doubt it's a lack of memory - lack of memory would cause a segfault during the "convert to sequence co-ordinate" or "align unmapped mate" stages... you wouldn't even get to the print stage.

          Can you provide the first 50 or so lines of one of your successfully generated .sam files using samse? Just enough to get to the first 5 sequences or so.

          Comment


          • #6
            Thank you for the replies.

            Richard - I use a server the university has a setup for bioinformatics work. It has 100gb of ram and runs unix. Installation of BWA was done by the server admin. I don't think he modified the make file but will check. Thanks! I also ran strings on the SAI file. This gave alot of output, but I don't know what inconsistancies to be looking for in the output. Can you advise?

            dp05yk - Not quite the accomplishment I was going for... The first lines including the first ~dozen reads in the samse file are provided below. Sorry for length (long header) and thank you for your help.
            $ head -n 115 SEaln_4_1.sam
            @SQ SN:chr1 LN:249250621
            @SQ SN:chr2 LN:243199373
            @SQ SN:chr3 LN:198022430
            @SQ SN:chr4 LN:191154276
            @SQ SN:chr5 LN:180915260
            @SQ SN:chr6 LN:171115067
            @SQ SN:chr7 LN:159138663
            @SQ SN:chrX LN:155270560
            @SQ SN:chr8 LN:146364022
            @SQ SN:chr9 LN:141213431
            @SQ SN:chr10 LN:135534747
            @SQ SN:chr11 LN:135006516
            @SQ SN:chr12 LN:133851895
            @SQ SN:chr13 LN:115169878
            @SQ SN:chr14 LN:107349540
            @SQ SN:chr15 LN:102531392
            @SQ SN:chr16 LN:90354753
            @SQ SN:chr17 LN:81195210
            @SQ SN:chr18 LN:78077248
            @SQ SN:chr20 LN:63025520
            @SQ SN:chrY LN:59373566
            @SQ SN:chr19 LN:59128983
            @SQ SN:chr22 LN:51304566
            @SQ SN:chr21 LN:48129895
            @SQ SN:chr6_ssto_hap7 LN:4928567
            @SQ SN:chr6_mcf_hap5 LN:4833398
            @SQ SN:chr6_cox_hap2 LN:4795371
            @SQ SN:chr6_mann_hap4 LN:4683263
            @SQ SN:chr6_apd_hap1 LN:4622290
            @SQ SN:chr6_qbl_hap6 LN:4611984
            @SQ SN:chr6_dbb_hap3 LN:4610396
            @SQ SN:chr17_ctg5_hap1 LN:1680828
            @SQ SN:chr4_ctg9_hap1 LN:590426
            @SQ SN:chr1_gl000192_random LN:547496
            @SQ SN:chrUn_gl000225 LN:211173
            @SQ SN:chr4_gl000194_random LN:191469
            @SQ SN:chr4_gl000193_random LN:189789
            @SQ SN:chr9_gl000200_random LN:187035
            @SQ SN:chrUn_gl000222 LN:186861
            @SQ SN:chrUn_gl000212 LN:186858
            @SQ SN:chr7_gl000195_random LN:182896
            @SQ SN:chrUn_gl000223 LN:180455
            @SQ SN:chrUn_gl000224 LN:179693
            @SQ SN:chrUn_gl000219 LN:179198
            @SQ SN:chr17_gl000205_random LN:174588
            @SQ SN:chrUn_gl000215 LN:172545
            @SQ SN:chrUn_gl000216 LN:172294
            @SQ SN:chrUn_gl000217 LN:172149
            @SQ SN:chr9_gl000199_random LN:169874
            @SQ SN:chrUn_gl000211 LN:166566
            @SQ SN:chrUn_gl000213 LN:164239
            @SQ SN:chrUn_gl000220 LN:161802
            @SQ SN:chrUn_gl000218 LN:161147
            @SQ SN:chr19_gl000209_random LN:159169
            @SQ SN:chrUn_gl000221 LN:155397
            @SQ SN:chrUn_gl000214 LN:137718
            @SQ SN:chrUn_gl000228 LN:129120
            @SQ SN:chrUn_gl000227 LN:128374
            @SQ SN:chr1_gl000191_random LN:106433
            @SQ SN:chr19_gl000208_random LN:92689
            @SQ SN:chr9_gl000198_random LN:90085
            @SQ SN:chr17_gl000204_random LN:81310
            @SQ SN:chrUn_gl000233 LN:45941
            @SQ SN:chrUn_gl000237 LN:45867
            @SQ SN:chrUn_gl000230 LN:43691
            @SQ SN:chrUn_gl000242 LN:43523
            @SQ SN:chrUn_gl000243 LN:43341
            @SQ SN:chrUn_gl000241 LN:42152
            @SQ SN:chrUn_gl000236 LN:41934
            @SQ SN:chrUn_gl000240 LN:41933
            @SQ SN:chr17_gl000206_random LN:41001
            @SQ SN:chrUn_gl000232 LN:40652
            @SQ SN:chrUn_gl000234 LN:40531
            @SQ SN:chr11_gl000202_random LN:40103
            @SQ SN:chrUn_gl000238 LN:39939
            @SQ SN:chrUn_gl000244 LN:39929
            @SQ SN:chrUn_gl000248 LN:39786
            @SQ SN:chr8_gl000196_random LN:38914
            @SQ SN:chrUn_gl000249 LN:38502
            @SQ SN:chrUn_gl000246 LN:38154
            @SQ SN:chr17_gl000203_random LN:37498
            @SQ SN:chr8_gl000197_random LN:37175
            @SQ SN:chrUn_gl000245 LN:36651
            @SQ SN:chrUn_gl000247 LN:36422
            @SQ SN:chr9_gl000201_random LN:36148
            @SQ SN:chrUn_gl000235 LN:34474
            @SQ SN:chrUn_gl000239 LN:33824
            @SQ SN:chr21_gl000210_random LN:27682
            @SQ SN:chrUn_gl000231 LN:27386
            @SQ SN:chrUn_gl000229 LN:19913
            @SQ SN:chrM LN:16571
            @SQ SN:chrUn_gl000226 LN:15008
            @SQ SN:chr18_gl000207_random LN:4262
            @PG ID:bwa PN:bwa VN:0.5.9-r16
            D7DHSVN1_0087:4:1:1238:1975#0 4 * 0 0 * * 0 0 TATGCCTTTAAGTTAACTGACATTTTCTTCTGCAATGTCTTTCTCTTCTGTTAATGCCACTCAGTGACAATTTTATTTTATATTATGTTCC BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NGGTTTTGGTTGA
            D7DHSVN1_0087:4:1:1304:1903#0 4 * 0 0 * * 0 0 ATGTATGATGGAGCTTCTGCCAACCATGGTGGGAAGAACTGATGCACAACTCATTCGCACCAAGGTATACGACTGAGACACTAGAATGGAG [V^^U^^^V^\^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NGGGTGGTGGGGA
            D7DHSVN1_0087:4:1:1492:1904#0 4 * 0 0 * * 0 0 GTTTACGACATAAGGGAAGAGCCTTATGCTCCTCTTGTGGACTCCTGCCATGCAAAGAGCCAGCGCTTCTATACTCTGGCAGCTTCAGCTG BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NGGGGTGGGTGGT
            D7DHSVN1_0087:4:1:1362:1923#0 4 * 0 0 * * 0 0 TTTTGGGGAGGACCGTCACTCTCCCTTGGGATACTGATATCCAAATTGTAACTTCTCAGTAGGGTTTGTATGAACTGTACTTCACACACAC BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NTGGGTGGTGTGG
            D7DHSVN1_0087:4:1:1417:1933#0 4 * 0 0 * * 0 0 TGCAGACTTCGGTAAAATCACATGGGGTACAAGTAAACAGTGCCAGTACAAGGATAAGCTGACAAAACTCTATACCCTGGCTAACACTAGG _____V_BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NGGTTGGGTGTGT
            D7DHSVN1_0087:4:1:1472:1944#0 4 * 0 0 * * 0 0 GGAGAAGACAGGATGAGATCAGTTCAAAGCAGATGCCTGGACAGTGGCGGCAGTGTTGTACACGATCCACATATCGGCACAATGTATATGG ^UUY[ZYM[][V^^UX\Z^^X^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NTTTGTGTGTCCA
            D7DHSVN1_0087:4:1:1449:1965#0 4 * 0 0 * * 0 0 ACTTTACCAAAAGTCATTCATTTCAGAAAGTCACATAATAACATTACAACATATATCAGTCTGCAGTGTAACTGTCACCTGCATAATACAA __UX\]V^\^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NGGGGTGGTGGGA
            D7DHSVN1_0087:4:1:1317:1991#0 4 * 0 0 * * 0 0 GGGATCTCGCCATGTTGCCCAGACTAGGTATTCTATTTTAGATCCATGTACTTCTTGACGCAATGCTGTGTGAGGAGCAGCTGGCACTGCA BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NGTTTTTTGGGAT
            D7DHSVN1_0087:4:1:1637:1912#0 4 * 0 0 * * 0 0 ACGCGAGAAGAAAGTGAGGACACGGAGAGCGATGGAGATGATGACGATCTTGTTTGCAAAGGGTCAACAGGAGGACAGACGCGCAAGACAC BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NGGTGGTGGTGAA
            D7DHSVN1_0087:4:1:1728:1914#0 4 * 0 0 * * 0 0 ACAGGAGAAAAAAAAAAAGCATATAAAGTGCCTCAACCAACTAGACATTGCTATGAAAAAAAACAAACACACAATGACAATTATCACGACG _ac\c_aBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NTGGTGTTGGCTT
            D7DHSVN1_0087:4:1:1684:1936#0 16 chr15 76521931 37 91M * 0 0 GTTGGGGTTTCTGTCTTTTATTTATTTATTTTTGCTTAATGCAGCAATCACATTTTCCCCATTGGCACTGCTGTTTTCATAGTTCAGTAAA BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB^VTTV BC:Z:NGGGTTGGGTTTC XT:A:U NM:i:3 X0:i:1 X1:i:0 XM:i:3 XO:i:0 XG:i:0 MD:Z:5A6T20T57
            D7DHSVN1_0087:4:1:1564:1949#0 4 * 0 0 * * 0 0 AAATGCCCTGAGGTAGTCTTATTTGTGTTAAATCTGCTTGGTGCTCTATCACATTCTTGTACTTCAATGTTTATCTTTCCTTATGTTTGGA [_\___\_X___ZMYYXWVVVV[SSZ^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NGGTTGGGTTATT
            D7DHSVN1_0087:4:1:1737:1949#0 16 chr1 57881606 37 91M * 0 0 AAACTACCAGGAGAAATATATATCAGTGAAAGCAAGGATAGAAAACCCTAGGGTAGATCAACTTCTAACCCCATGAGAGACGGTAATCTAC BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBcccca_c_______Z^^[^^ZZ BC:Z:NGGGTTTTGTACA XT:A:U NM:i:1 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:80A10
            D7DHSVN1_0087:4:1:1682:1962#0 4 * 0 0 * * 0 0 TGACGGTAGAACAACTTCAAATGAGAAAGCATCTGTTGGGGTTTGCTTATTTTGCAATGTAGGAATAAACAGACTTCAAACCTGTGACACA ZX^^\VX^^Z^^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NGTGTGGTGGAAT
            D7DHSVN1_0087:4:1:1620:1994#0 4 * 0 0 * * 0 0 AGTAGTTAGTATTGGTAGCATAATGATGACCTGAATCCCGGCTCATTAATTCAGTGGTCTACTTCACCTTTTCAATGTCACTCACCAAATA BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NTGTTGGGGTCTC
            D7DHSVN1_0087:4:1:1895:1905#0 16 chr2 193690823 37 91M * 0 0 AGAGTATCAGCACAATGAAAGTCATTCATGTGTATGGAATATAATTTTTTATTTCCCAAGTATCTATTTCTTATTGTACTCCTACACCGCT BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBc\cc^TT^ZX_[c_XZ[RX[[[[[ZGZYY\Z BC:Z:NGTGTGTGGTTCA XT:A:U NM:i:2 X0:i:1 X1:i:0 XM:i:2 XO:i:0 XG:i:0 MD:Z:3T84A2
            D7DHSVN1_0087:4:1:1820:1909#0 4 * 0 0 * * 0 0 ATATTCAACATTCTTAGACAAAAAATCGTCAACCAATAAATTCATATTCAGACAACATAAGCTTCCTAAGTGAACATGAAATAACTTTGTT BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NGGGTGGGGTCTT
            D7DHSVN1_0087:4:1:1979:1915#0 0 chr8 129688366 25 91M * 0 0 TTCTGTTCTGTTGGTCTATGTGTCTGTTTTTGTGCCAGTACCATACTGTTTTGGTTAACGTACTCTTCTATTATAGTTTGAATTCAGGGAG BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NGTGTGTTGTCTA XT:A:U NM:i:4 X0:i:1 X1:i:0 XM:i:4 XO:i:0 XG:i:0 MD:Z:58T8A4C15T2
            D7DHSVN1_0087:4:1:1786:1927#0 16 chr18 49501580 37 91M * 0 0 TGGCTTTATTTCTGGATTCTCTATTCTGTTCCATTGGTCTATGTGCCCATTTGTATACCAGTGCCATGCGAATAATCAGAGAGATGTAAAT BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB____Z______cMac_______cXc__ BC:Z:NGGTGTGGTTTTG XT:A:U NM:i:1 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:62A28
            D7DHSVN1_0087:4:1:1916:1924#0 16 chr2 158251005 25 91M * 0 0 TGAAAAACTTAGAGATGAAGAGAAGGGGATCTGTAGACTGATAGAACAGCATAAATAAAACCCTAATGCCAAGGAAGCTTGGGAGTGAGAT BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB_U_\_VVQOMFPERTGUQUU_\XacV\^^Vcc_______ZX\ZU\__ BC:Z:NTTTTGTGTTCTC XT:A:U NM:i:4 X0:i:1 X1:i:0 XM:i:4 XO:i:0 XG:i:0 MD:Z:1C5T5A74A2
            D7DHSVN1_0087:4:1:1759:1968#0 4 * 0 0 * * 0 0 ATGAGATCTGCTGAGGAAAGAGGGCCACCCCATGAGAATCTCCAAAATGCGAGGCTACGCAGACACAGGGCAAAGCTGAAAAGGAGACATG _SYXYMY^\Z\XXPXZVZ^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BC:Z:NTGTTGGGTGTGG

            Comment


            • #7
              You'd see text error message or some other nonsense. sai files are binary files and you should NOT see anything human readable.

              Are other people using the 100GB? It's free memory that's important here.

              I need me one of them 100GB machines. *sigh*

              Other then that, I'm truly stumped.

              Comment


              • #8
                I've generated a similar error, if anyone is interested in debugging.

                I used bwa aln to align read 1 and read 3 separately (read 2 is an unorthodox index read), then bwa sampe to try to do the paired-end mapping.

                First time around, I got:
                ...
                [bwa_sai2sam_pe_core] print alignments... 1289.32 sec
                *** glibc detected *** bwa: double free or corruption (out): 0x0000000014075600 ***

                ...

                so (just for fun) I tried
                export MALLOC_CHECK_=0
                before running bwa sampe.

                That ended with
                ...
                [bwa_sai2sam_pe_core] print alignments... 1816.72 sec
                Segmentation fault


                bwa sampe is writing .sam files of >70Gb, and the mapping info is not properly formed, e.g.:
                ...
                @SQ SN:chrY LN:59373566
                @PG ID:bwa PN:bwa VN:0.5.9-r16
                HWUSI-EAS1795_0000:6:1:2046:1016#0 73 chr19 57111889 37 40M = 57111889 0 AGCCATCCCCACATTCCCCCCACCCTTCCACTACCCTTCC^@AAAAAAA^@
                ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@AAAAAAAAAAAAAAAAiAAAAAAAGGAAGGGTAGTGGAAGGGTGGGGGGAATGTGGGGATGGCTiAAAAAAAdw[<E0>;R<AC>^@<E0>inf^@ollll:h:i:nl

                ...

                125 Gb RAM, ~80 Gb free at the moment, bwa doesn't seem to be using much memory, plenty of disk space available.

                I'm also using the -B option on the first read ... I betcha ... yep.

                So instead of using -B to trim the barcode off of the first read on the fly, I trimmed it with a perl script, re-ran bwa aln on the newly-trimmed read, then ran bwa sampe on the .sai files and the trimmed .fastq file instead of the original one. The output .sam file is now properly formed (which is not to say the data are any good, but that's a different question.)

                HTH.
                Last edited by David Witherspoon; 07-09-2011, 02:38 PM.

                Comment


                • #9
                  Today, I have the same problem like U said.
                  I have alignment whole genome sequencing pair-end data.(r1.fastq 140G ,r2.fastq 130G)
                  and centos6 with memory 64G
                  But when I do BWA sampe, I got the incomplete sam file. the number of reads in sam file were small the reads in fastq file. Normally, the number of reads in sam file should be equal the reads in both fastq files(r1.fastq +r2.fastq).
                  but I get no Error in bwa sampe command.
                  I have run several times and got the same results.
                  what's the problem? not enough memory?
                  my sam file last line:
                  HWI-ST966:590B5UACXX:1:1302:17663:189522 163 chr1 179091721 60 100M= 179091948 327 AAATATATATATATTTTCAAATATATATATACTCAAATTATGTATTTTTTCAAATATATATTTCTTTTTCTTTTTTTCTTTTTTTTGAGACAGAGTCTCA <@@FFFDDBHFBHDHIIJJIGGCAIIIA<FEHGHHJII
                  incomplete!!!
                  Last edited by wanguan2000; 10-24-2011, 04:44 AM.

                  Comment


                  • #10
                    Please post the first few lines of both input fastq files.
                    This problem in this case is usually that the fastq is not pristine.
                    BWA doesn't do a great job for checking for good input, it just goes at it. Not checking input greatly increases the speed of BWA, though, which is a good thing.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    18 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    22 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    16 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    47 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X