![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Target enrichment performance | pfrommolt | Bioinformatics | 65 | 03-17-2016 08:46 AM |
Target enrichment | cub103 | Illumina/Solexa | 19 | 04-11-2011 06:37 AM |
Target Capture Technology | Expo | Illumina/Solexa | 0 | 02-25-2011 11:08 AM |
Target re-sequencing | Muraya | Introductions | 0 | 11-09-2010 02:20 AM |
Target Enrichmnet In-Solution | mestro2 | Sample Prep / Library Generation | 10 | 07-28-2010 09:27 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: FRANCE / Caen Join Date: Apr 2010
Posts: 25
|
![]()
hello,
I'm looking for a tool (or command line) to determine the % On-Off target + or - 50 bp of exon from my capture file but not annotated (!!!). Capture SureSelect agilent home. Current pipeline: GAIIx Illumina CASAVA1.8 IGV CNV-seq SAMtools BEDtools GALAXY NextGENe Please HELP |
![]() |
![]() |
![]() |
#2 |
Member
Location: St. Louis, MO - USA Join Date: Dec 2011
Posts: 14
|
![]()
Create a bed file of your Agilent SureSelect targets and use BEDtools to merge adjacent targets and then slopBed to add 50 bps to either side of your merged targets. Then use Bedtools BedtoBam to convert your bam file to a bed file and then use intersectBed to create an intersection of your bam.bed and the target.bed. This will create a bed file illustrating the target regions covered by your bam file which you can then parse for percent on and off target. I think there may be examples of this workflow in the BEDtools manual available online.
|
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: Cambridge UK Join Date: Sep 2008
Posts: 151
|
![]()
You may find picards CalculateHsMetrics useful
http://picard.sourceforge.net/comman...ulateHsMetrics |
![]() |
![]() |
![]() |
#4 | |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
BEDTools intersectBed will work with bam files, and output .bam files. I use it that way all the time. Your command line look something like:
Quote:
So then use samtools flagstat to count the number of mapped reads of the original .bam, and then of the intersect.bam |
|
![]() |
![]() |
![]() |
#5 | |
Member
Location: Milwaukee Join Date: Dec 2011
Posts: 72
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: Cambridge UK Join Date: Sep 2008
Posts: 151
|
![]() |
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: Phoenix, AZ Join Date: Mar 2010
Posts: 279
|
![]()
Picard HsMetrics is designed by the Broad, which helped develop the Agilent in solution capture method. In the case of Agilent they provide positions for both the baits and the target regions.
Think: Code:
1---------Target------------1 ------IIIIIIIIIIII exon IIIIIIIIIIIIII-------- xbaitx ybaity zbaitz Unfortunately, Illumina only provides the target regions not the actual bait locations. So it is harder to decide if some of the non-exonic reads are uncaptured flow-through or captured regions that are not in the "official" targets. Clearly, in my opinion Illumina has captured entire 5kb regions that include 3 exons totaling 1kb resulting in 4kb of high coverage intronic region. Not sure if the designer was just lazy, has ulterior motives, some internal data that this makes target recovery the best? Last edited by Jon_Keats; 01-20-2012 at 08:18 PM. |
![]() |
![]() |
![]() |
#8 | |
Member
Location: Milwaukee Join Date: Dec 2011
Posts: 72
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#9 | |
Member
Location: Milwaukee Join Date: Dec 2011
Posts: 72
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#10 | |
Member
Location: Milwaukee Join Date: Dec 2011
Posts: 72
|
![]() Quote:
(I am using samtools in Cygwin in a Windows 7 system) |
|
![]() |
![]() |
![]() |
#11 | |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#12 |
--Site Admin--
Location: SF Bay Area, CA, USA Join Date: Oct 2007
Posts: 1,358
|
![]()
Another vote for Picard's CalculateHsMetrics. It's in the public Galaxy (http://main.g2.bx.psu.edu/ under "NGS: Picard (beta)").
|
![]() |
![]() |
![]() |
#13 |
Senior Member
Location: St. Louis Join Date: Dec 2010
Posts: 535
|
![]()
Picard by default does +/- 250 bp. I recommend using the same file for baits and targets: you can have baits that extend past targets and hence get more coverage for baits than for target if you had all of your target sequence covered by baits. If you had say only 80% of your targets covered by baits, then you already know this, and it just complicates things to try to consider it again. So, again, I recommend using the actual bait intervals if possible.
|
![]() |
![]() |
![]() |
#14 | |
Member
Location: Milwaukee Join Date: Dec 2011
Posts: 72
|
![]() Quote:
Also, can the interval be modified (to, say, +/- 300bp)? |
|
![]() |
![]() |
![]() |
#15 |
Senior Member
Location: St. Louis Join Date: Dec 2010
Posts: 535
|
![]()
Somewhere there is a web page that has the code for each of the programs... I stumbled on it before and am not really a computer guy so don't know where to find it. But it had 250 set for that metric.
|
![]() |
![]() |
![]() |
#16 |
Member
Location: Milwaukee Join Date: Dec 2011
Posts: 72
|
![]() |
![]() |
![]() |
![]() |
#17 |
Member
Location: Philadelphia Join Date: Jan 2012
Posts: 58
|
![]()
Does anyone know what format the INTERVALS files need to be? Im using simple bed files but I keep getting this error
Exception in thread "main" java.lang.IllegalStateException: Interval list file must contain header. I tried adding a Chr Start End header but it doesnt like this either. The simplicity of this is confusing me I guess. Thanks. |
![]() |
![]() |
![]() |
#18 |
--Site Admin--
Location: SF Bay Area, CA, USA Join Date: Oct 2007
Posts: 1,358
|
![]()
That intervals file is annoying to make... here's how I do it (basically you need to add a SAM header and rearrange some columns):
Code:
#Input files to CalculateHsMetrics need SAM header on an interval file ("picard interval file") #example here: ftp://ftp.broadinstitute.org/pub/gsa/exampleFiles/thousand_genomes_alpha_redesign.targets.interval_list #put header from bam file at the top of the BI file above "baits.txt" samtools view -H aligned_reads.bam > header.txt #interval file needs to look like this: #1 1104841 1104940 + target_1 #1 1105283 1105599 + target_2 #1 1105712 1105860 + target_3 #rearrange columns of baits bed file, and add SAM header awk '{print $1,$2,$3,$6,$4;}' SureSelect_baits.bed > bi.txt cat header.txt bi.txt > baits.txt |
![]() |
![]() |
![]() |
#19 |
Member
Location: Philadelphia Join Date: Jan 2012
Posts: 58
|
![]()
Oh wonderful. Thank you so much.
|
![]() |
![]() |
![]() |
#20 | |
Member
Location: Philadelphia Join Date: Jan 2012
Posts: 58
|
![]() Quote:
awk -F $'\t' 'BEGIN { OFS = FS } {print $1,$2,$3,$6,$4;}' SureSelect_baits.bed > bi.txt Figured Id share. I had a question about the header still though, and I expect this is something I just dont understand about the conversion to bam process or something with picard. The CalculateHSMetrics still yells at me that interval file needs a header. It seems my aligned_reads.bam files are lacking "@HD VN:1.0 SO:coordinate" at the very top. Is this abnormal? If I use a bam file that has gone through Picard Read group assignment it does have the @HD etc., but it also will have a @RG line as well. So do these interval files need to be made for each sample after RG assignment? |
|
![]() |
![]() |
![]() |
Tags |
% on-off target |
Thread Tools | |
|
|