Hello all,
I am having some difficulty predicting were the breakpoints of a known large-scale inversion (~4-10 Mb, inferred by population genetics and QTL mapping) are using part of a reference genome and single-end 100 bp Illumina reads from a population sample. One complication is that the reference scaffold that I am using may not contain both inversion breakpoints as it is incomplete. My questions are:
1) I am currently using the Breakpointer script from (https://github.com/ruping/Breakpointer), is there a better program to predict structural variants using single-end reads?
2) Does anyone have experience with the output file for Breakpointer run with the script at (https://github.com/ruping/Breakpointer)? Are there particular portions of this output that would be most informative for IDing inversion breakpoints as I have 17,468 candidate regions (eg. less reads map to that particular location and the read is split?)? The first few lines of the output looks like this:
chrscaffold935|size76515 Breakpointer Depth-Skewed 63195 63245 56.233 + . ID=chrscaffold935|size76515:63195;SIZE=51;DEPTH=29;EndsRatio=0.662;StartsRatio=1;BinomialScore=3.004;MIS=7;realMIS=2;MISRATE=0.364621;seedseq=AAAAGTGTTACCTTTAAACCCCCCT;MismatchScore=0.0161;SU=13;rank_SB=62.810;rank_SM=49.657
chrscaffold348|size845032 Breakpointer Depth-Skewed 126749 126805 26.985 + . ID=chrscaffold348|size845032:126749;SIZE=57;DEPTH=24;EndsRatio=0.577;StartsRatio=0.448;BinomialScore=1.903;MIS=5;realMIS=1;MISRATE=0.361063;seedseq=TCAGTTGATGGACGAAACCAATTTA;MismatchScore=0.00158;SU=9;rank_SB=35.475;rank_SM=18.494
chrscaffold600|size417131 Breakpointer Depth-Skewed 146682 146753 77.008 + . ID=chrscaffold600|size417131:146682;SIZE=72;DEPTH=117;EndsRatio=0.68;StartsRatio=0.235;BinomialScore=4.775;MIS=20;realMIS=0;MISRATE=0.251383;seedseq=CCACTTGATTTTAGCGATTCTGCGG;MismatchScore=0.097;SU=24;rank_SB=82.567;rank_SM=71.449
chrscaffold641|size112415 Breakpointer Depth-Skewed 10518 10545 49.758 + . ID=chrscaffold641|size112415:10518;SIZE=28;DEPTH=6;EndsRatio=1;StartsRatio=0.238;BinomialScore=9.119;MIS=2;realMIS=0;MISRATE=0.333333;seedseq=TGTTGTGACGTGTTGTTTCCGCGGC;MismatchScore=2e-09;SU=1;rank_SB=99.474;rank_SM=0.041
chrscaffold150|size493303 Breakpointer Depth-Skewed 179029 179104 60.675 + . ID=chrscaffold150|size493303:179029;SIZE=76;DEPTH=595;EndsRatio=0.499;StartsRatio=0.471;BinomialScore=1.611;MIS=90;realMIS=3;MISRATE=0.303127;seedseq=TTATTGCTAATTTAAATAAGGTTTT;MismatchScore=2.67;SU=1;rank_SB=22.620;rank_SM=98.729
chrscaffold218|size453784 Breakpointer Depth-Skewed 285154 285184 26.029 + . ID=chrscaffold218|size453784:285154;SIZE=31;DEPTH=40;EndsRatio=0.475;StartsRatio=0.476;BinomialScore=1.146;MIS=3;realMIS=0;MISRATE=0.157895;seedseq=ACCTTTAAAACTGTTTTTCTCTTAA;MismatchScore=0.0158;SU=13;rank_SB=2.611;rank_SM=49.447
chrscaffold35|size722381 Breakpointer Depth-Skewed 659136 659214 63.288 + . ID=chrscaffold35|size722381:659136;SIZE=79;DEPTH=26;EndsRatio=0.736;StartsRatio=0.195;BinomialScore=2.9;MIS=9;realMIS=0;MISRATE=0.470318;seedseq=TACCTATACATTTCCTAGGATATGT;MismatchScore=0.0567;SU=4;rank_SB=61.264;rank_SM=65.311
chrscaffold621|size428421 Breakpointer Depth-Skewed 119257 119308 24.223 + . ID=chrscaffold621|size428421:119257;SIZE=52;DEPTH=34;EndsRatio=0.487;StartsRatio=0.546;BinomialScore=1.381;MIS=5;realMIS=1;MISRATE=0.301969;seedseq=ATCGAAAAAGCTAAGGCTAAAAACC;MismatchScore=0.00676;SU=5;rank_SB=11.607;rank_SM=36.838
chrscaffold48|size965936 Breakpointer Depth-Skewed 18878 18964 22.985 + . ID=chrscaffold48|size965936:18878;SIZE=87;DEPTH=29;EndsRatio=0.662;StartsRatio=0.429;BinomialScore=2.195;MIS=4;realMIS=1;MISRATE=0.208355;seedseq=CAATTATTTTGTAAATGTTTACACG;MismatchScore=2e-09;SU=4;rank_SB=45.925;rank_SM=0.046
chrscaffold621|size428421 Breakpointer Depth-Skewed 143393 143427 70.720 + . ID=chrscaffold621|size428421:143393;SIZE=35;DEPTH=20;EndsRatio=0.852;StartsRatio=0.908;BinomialScore=5.829;MIS=3;realMIS=1;MISRATE=0.176056;seedseq=AATTTAATACAGGTACGACTGTACC;MismatchScore=0.0177;SU=24;rank_SB=90.552;rank_SM=50.887
chrscaffold348|size845032 Breakpointer Depth-Skewed 24245 24294 78.876 + . ID=chrscaffold348|size845032:24245;SIZE=50;DEPTH=29;EndsRatio=0.818;StartsRatio=0.466;BinomialScore=5.002;MIS=14;realMIS=3;MISRATE=0.590169;seedseq=TCTATATTTTGGTGCAGTCCTGTTG;MismatchScore=0.113;SU=1;rank_SB=84.565;rank_SM=73.187
chrscaffold50|size611115 Breakpointer Depth-Skewed 570718 570765 46.314 + . ID=chrscaffold50|size611115:570718;SIZE=48;DEPTH=9;EndsRatio=0.739;StartsRatio=0.805;BinomialScore=3.418;MIS=3;realMIS=0;MISRATE=0.45106;seedseq=CCTAATCCTATGTCCTTCTCCTGGC;MismatchScore=0.00275;SU=5;rank_SB=68.357;rank_SM=24.271
chrscaffold502|size248588 Breakpointer Depth-Skewed 100735 100761 54.273 + . ID=chrscaffold502|size248588:100735;SIZE=27;DEPTH=8;EndsRatio=0.785;StartsRatio=1;BinomialScore=3.581;MIS=3;realMIS=1;MISRATE=0.477707;seedseq=TGGTTCTAGGCCCTAAATCGTTAAT;MismatchScore=0.00748;SU=4;rank_SB=70.241;rank_SM=38.306
chrscaffold546|size598492 Breakpointer Depth-Skewed 475284 475375 30.852 + . ID=chrscaffold546|size598492:475284;SIZE=92;DEPTH=44;EndsRatio=0.549;StartsRatio=0.414;BinomialScore=1.776;MIS=6;realMIS=1;MISRATE=0.248385;seedseq=AAACATGTTTACATTATTATGGTAC;MismatchScore=0.00474;SU=5;rank_SB=29.983;rank_SM=31.720
chrscaffold15|size673834 Breakpointer Depth-Skewed 313308 313396 73.223 + . ID=chrscaffold15|size673834:313308;SIZE=89;DEPTH=112;EndsRatio=0.688;StartsRatio=0.275;BinomialScore=4.532;MIS=11;realMIS=2;MISRATE=0.142753;seedseq=GCCTTAATCCACGCGAATTCGATGG;MismatchScore=0.0603;SU=19;rank_SB=80.435;rank_SM=66.011
chrscaffold621|size428421 Breakpointer Depth-Skewed 380187 380235 36.980 + . ID=chrscaffold621|size428421:380187;SIZE=49;DEPTH=64;EndsRatio=0.5;StartsRatio=0.574;BinomialScore=1.609;MIS=3;realMIS=0;MISRATE=0.09375;seedseq=AATGGTTTAATGCCCGTTTTCACCA;MismatchScore=0.0185;SU=3;rank_SB=22.510;rank_SM=51.450
chrscaffold189|size410278 Breakpointer Depth-Skewed 264356 264391 83.797 + . ID=chrscaffold189|size410278:264356;SIZE=36;DEPTH=52;EndsRatio=0.666;StartsRatio=0.958;BinomialScore=3.636;MIS=22;realMIS=5;MISRATE=0.635251;seedseq=TGTAAGACTAGCGGCCGCCCGCGAC;MismatchScore=1.5;SU=14;rank_SB=70.978;rank_SM=96.616
chrscaffold155|size350779 Breakpointer Depth-Skewed 191704 191757 11.213 + . ID=chrscaffold155|size350779:191704;SIZE=54;DEPTH=18;EndsRatio=0.533;StartsRatio=0.433;BinomialScore=1.285;MIS=2;realMIS=0;MISRATE=0.208464;seedseq=GCGTAAGTCCGTTGATTGGGATCAT;MismatchScore=0.00115;SU=3;rank_SB=7.363;rank_SM=15.064
Any help would be greatly appreciated!
I am having some difficulty predicting were the breakpoints of a known large-scale inversion (~4-10 Mb, inferred by population genetics and QTL mapping) are using part of a reference genome and single-end 100 bp Illumina reads from a population sample. One complication is that the reference scaffold that I am using may not contain both inversion breakpoints as it is incomplete. My questions are:
1) I am currently using the Breakpointer script from (https://github.com/ruping/Breakpointer), is there a better program to predict structural variants using single-end reads?
2) Does anyone have experience with the output file for Breakpointer run with the script at (https://github.com/ruping/Breakpointer)? Are there particular portions of this output that would be most informative for IDing inversion breakpoints as I have 17,468 candidate regions (eg. less reads map to that particular location and the read is split?)? The first few lines of the output looks like this:
chrscaffold935|size76515 Breakpointer Depth-Skewed 63195 63245 56.233 + . ID=chrscaffold935|size76515:63195;SIZE=51;DEPTH=29;EndsRatio=0.662;StartsRatio=1;BinomialScore=3.004;MIS=7;realMIS=2;MISRATE=0.364621;seedseq=AAAAGTGTTACCTTTAAACCCCCCT;MismatchScore=0.0161;SU=13;rank_SB=62.810;rank_SM=49.657
chrscaffold348|size845032 Breakpointer Depth-Skewed 126749 126805 26.985 + . ID=chrscaffold348|size845032:126749;SIZE=57;DEPTH=24;EndsRatio=0.577;StartsRatio=0.448;BinomialScore=1.903;MIS=5;realMIS=1;MISRATE=0.361063;seedseq=TCAGTTGATGGACGAAACCAATTTA;MismatchScore=0.00158;SU=9;rank_SB=35.475;rank_SM=18.494
chrscaffold600|size417131 Breakpointer Depth-Skewed 146682 146753 77.008 + . ID=chrscaffold600|size417131:146682;SIZE=72;DEPTH=117;EndsRatio=0.68;StartsRatio=0.235;BinomialScore=4.775;MIS=20;realMIS=0;MISRATE=0.251383;seedseq=CCACTTGATTTTAGCGATTCTGCGG;MismatchScore=0.097;SU=24;rank_SB=82.567;rank_SM=71.449
chrscaffold641|size112415 Breakpointer Depth-Skewed 10518 10545 49.758 + . ID=chrscaffold641|size112415:10518;SIZE=28;DEPTH=6;EndsRatio=1;StartsRatio=0.238;BinomialScore=9.119;MIS=2;realMIS=0;MISRATE=0.333333;seedseq=TGTTGTGACGTGTTGTTTCCGCGGC;MismatchScore=2e-09;SU=1;rank_SB=99.474;rank_SM=0.041
chrscaffold150|size493303 Breakpointer Depth-Skewed 179029 179104 60.675 + . ID=chrscaffold150|size493303:179029;SIZE=76;DEPTH=595;EndsRatio=0.499;StartsRatio=0.471;BinomialScore=1.611;MIS=90;realMIS=3;MISRATE=0.303127;seedseq=TTATTGCTAATTTAAATAAGGTTTT;MismatchScore=2.67;SU=1;rank_SB=22.620;rank_SM=98.729
chrscaffold218|size453784 Breakpointer Depth-Skewed 285154 285184 26.029 + . ID=chrscaffold218|size453784:285154;SIZE=31;DEPTH=40;EndsRatio=0.475;StartsRatio=0.476;BinomialScore=1.146;MIS=3;realMIS=0;MISRATE=0.157895;seedseq=ACCTTTAAAACTGTTTTTCTCTTAA;MismatchScore=0.0158;SU=13;rank_SB=2.611;rank_SM=49.447
chrscaffold35|size722381 Breakpointer Depth-Skewed 659136 659214 63.288 + . ID=chrscaffold35|size722381:659136;SIZE=79;DEPTH=26;EndsRatio=0.736;StartsRatio=0.195;BinomialScore=2.9;MIS=9;realMIS=0;MISRATE=0.470318;seedseq=TACCTATACATTTCCTAGGATATGT;MismatchScore=0.0567;SU=4;rank_SB=61.264;rank_SM=65.311
chrscaffold621|size428421 Breakpointer Depth-Skewed 119257 119308 24.223 + . ID=chrscaffold621|size428421:119257;SIZE=52;DEPTH=34;EndsRatio=0.487;StartsRatio=0.546;BinomialScore=1.381;MIS=5;realMIS=1;MISRATE=0.301969;seedseq=ATCGAAAAAGCTAAGGCTAAAAACC;MismatchScore=0.00676;SU=5;rank_SB=11.607;rank_SM=36.838
chrscaffold48|size965936 Breakpointer Depth-Skewed 18878 18964 22.985 + . ID=chrscaffold48|size965936:18878;SIZE=87;DEPTH=29;EndsRatio=0.662;StartsRatio=0.429;BinomialScore=2.195;MIS=4;realMIS=1;MISRATE=0.208355;seedseq=CAATTATTTTGTAAATGTTTACACG;MismatchScore=2e-09;SU=4;rank_SB=45.925;rank_SM=0.046
chrscaffold621|size428421 Breakpointer Depth-Skewed 143393 143427 70.720 + . ID=chrscaffold621|size428421:143393;SIZE=35;DEPTH=20;EndsRatio=0.852;StartsRatio=0.908;BinomialScore=5.829;MIS=3;realMIS=1;MISRATE=0.176056;seedseq=AATTTAATACAGGTACGACTGTACC;MismatchScore=0.0177;SU=24;rank_SB=90.552;rank_SM=50.887
chrscaffold348|size845032 Breakpointer Depth-Skewed 24245 24294 78.876 + . ID=chrscaffold348|size845032:24245;SIZE=50;DEPTH=29;EndsRatio=0.818;StartsRatio=0.466;BinomialScore=5.002;MIS=14;realMIS=3;MISRATE=0.590169;seedseq=TCTATATTTTGGTGCAGTCCTGTTG;MismatchScore=0.113;SU=1;rank_SB=84.565;rank_SM=73.187
chrscaffold50|size611115 Breakpointer Depth-Skewed 570718 570765 46.314 + . ID=chrscaffold50|size611115:570718;SIZE=48;DEPTH=9;EndsRatio=0.739;StartsRatio=0.805;BinomialScore=3.418;MIS=3;realMIS=0;MISRATE=0.45106;seedseq=CCTAATCCTATGTCCTTCTCCTGGC;MismatchScore=0.00275;SU=5;rank_SB=68.357;rank_SM=24.271
chrscaffold502|size248588 Breakpointer Depth-Skewed 100735 100761 54.273 + . ID=chrscaffold502|size248588:100735;SIZE=27;DEPTH=8;EndsRatio=0.785;StartsRatio=1;BinomialScore=3.581;MIS=3;realMIS=1;MISRATE=0.477707;seedseq=TGGTTCTAGGCCCTAAATCGTTAAT;MismatchScore=0.00748;SU=4;rank_SB=70.241;rank_SM=38.306
chrscaffold546|size598492 Breakpointer Depth-Skewed 475284 475375 30.852 + . ID=chrscaffold546|size598492:475284;SIZE=92;DEPTH=44;EndsRatio=0.549;StartsRatio=0.414;BinomialScore=1.776;MIS=6;realMIS=1;MISRATE=0.248385;seedseq=AAACATGTTTACATTATTATGGTAC;MismatchScore=0.00474;SU=5;rank_SB=29.983;rank_SM=31.720
chrscaffold15|size673834 Breakpointer Depth-Skewed 313308 313396 73.223 + . ID=chrscaffold15|size673834:313308;SIZE=89;DEPTH=112;EndsRatio=0.688;StartsRatio=0.275;BinomialScore=4.532;MIS=11;realMIS=2;MISRATE=0.142753;seedseq=GCCTTAATCCACGCGAATTCGATGG;MismatchScore=0.0603;SU=19;rank_SB=80.435;rank_SM=66.011
chrscaffold621|size428421 Breakpointer Depth-Skewed 380187 380235 36.980 + . ID=chrscaffold621|size428421:380187;SIZE=49;DEPTH=64;EndsRatio=0.5;StartsRatio=0.574;BinomialScore=1.609;MIS=3;realMIS=0;MISRATE=0.09375;seedseq=AATGGTTTAATGCCCGTTTTCACCA;MismatchScore=0.0185;SU=3;rank_SB=22.510;rank_SM=51.450
chrscaffold189|size410278 Breakpointer Depth-Skewed 264356 264391 83.797 + . ID=chrscaffold189|size410278:264356;SIZE=36;DEPTH=52;EndsRatio=0.666;StartsRatio=0.958;BinomialScore=3.636;MIS=22;realMIS=5;MISRATE=0.635251;seedseq=TGTAAGACTAGCGGCCGCCCGCGAC;MismatchScore=1.5;SU=14;rank_SB=70.978;rank_SM=96.616
chrscaffold155|size350779 Breakpointer Depth-Skewed 191704 191757 11.213 + . ID=chrscaffold155|size350779:191704;SIZE=54;DEPTH=18;EndsRatio=0.533;StartsRatio=0.433;BinomialScore=1.285;MIS=2;realMIS=0;MISRATE=0.208464;seedseq=GCGTAAGTCCGTTGATTGGGATCAT;MismatchScore=0.00115;SU=3;rank_SB=7.363;rank_SM=15.064
Any help would be greatly appreciated!