SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Target Sequence Enrichment 454 dottomarco 454 Pyrosequencing 12 06-20-2012 07:40 AM
Target enrichment cub103 Illumina/Solexa 19 04-11-2011 06:37 AM
PubMed: Performance of microarray and liquid based capture methods for target enrichm Newsbot! Literature Watch 0 02-25-2011 12:20 PM
Target enrichment from pooled libraries gendxdoc Sample Prep / Library Generation 8 12-23-2010 05:40 AM
Enrichment with AGILENT's Sureselect target enrichment system dottomarco 454 Pyrosequencing 1 11-18-2009 02:14 AM

Reply
 
Thread Tools
Old 03-11-2011, 04:50 AM   #1
pfrommolt
Member
 
Location: Germany

Join Date: Mar 2011
Posts: 14
Default Target enrichment performance

Dear All,

I would like to announce the inception of NGSrich, a software for evaluation of target enrichment performance in Illumina next-generation sequencing. An early release of the code has been uploaded to SourceForge at

http://sourceforge.net/projects/ngsrich/files/

but we're still working on a Java version. Regards,

Peter Frommolt
University of Cologne
pfrommolt is offline   Reply With Quote
Old 05-16-2011, 07:40 AM   #2
pfrommolt
Member
 
Location: Germany

Join Date: Mar 2011
Posts: 14
Default Java version of NGSrich

Dear All,

we have now prepared a fully-functional Java version of NGSrich which allows you to do a quick and detailed performance check for your target-enriched resequencing projects. We are using this as part of an exome analysis pipeline in our medium-sized genome center.

The reports can be integrated into a webserver in a very efficient and user-friendly way. You should definitely download this and give it a try!

Best,
Peter and Ali
pfrommolt is offline   Reply With Quote
Old 05-18-2011, 01:04 AM   #3
bpetersen
Member
 
Location: Germany

Join Date: Mar 2010
Posts: 20
Default

Dear Peter and Ali,
Your tool sounds great! I just have a question about its usage.
I get an error when trying it out, but this might be because I probably specified the wrong file for the parameter -a or -g. What kind of file is the "genome annotation" supposed to be and where can I get it? I specified the genome as a fasta file, but I am pretty sure that's not what is needed, right?
Can't wait to get this working, thanks for your help!
bpetersen is offline   Reply With Quote
Old 05-18-2011, 01:08 AM   #4
pfrommolt
Member
 
Location: Germany

Join Date: Mar 2011
Posts: 14
Default

The genome annotation parameter is not supposed to be a file. You just need to specify the UCSC version number, e.g. 'hg19' and the software will download the correct annotation from the internet. Are you using data from a human sample?

Best,
Peter
pfrommolt is offline   Reply With Quote
Old 05-18-2011, 01:15 AM   #5
bpetersen
Member
 
Location: Germany

Join Date: Mar 2010
Posts: 20
Default

Yes, my data are from human samples. But I get the following error:
=======================1=======================
>>> STEP 1: reducing files

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 3
at java.util.Vector.get(Vector.java:721)
at adapter.Adapter.field(Adapter.java:90)
at adapter.readAdapter.SamAdapter.pos(SamAdapter.java:184)
at adapter.readAdapter.SamAdapter.adapt(SamAdapter.java:131)
at Enrichment.reduceFiles(Enrichment.java:177)
at NGSrich.main(NGSrich.java:91)
bpetersen is offline   Reply With Quote
Old 05-18-2011, 01:58 AM   #6
pfrommolt
Member
 
Location: Germany

Join Date: Mar 2011
Posts: 14
Default

Okay, which syntax are you using for the analysis? Is your read alignment file in SAM format?

Regards, Peter
pfrommolt is offline   Reply With Quote
Old 05-18-2011, 02:04 AM   #7
bpetersen
Member
 
Location: Germany

Join Date: Mar 2010
Posts: 20
Default

Thanks for the quick reply!
I am using the following syntax:
java NGSrich -r /path/to/file.sam -a hg18 -T /path/to/temp_folder -t /path/to/sure_select_targets.bed
The sam file was generated with bwa and I am in the bin folder of NGSrich.
Regards,
BP
bpetersen is offline   Reply With Quote
Old 05-18-2011, 05:49 AM   #8
pfrommolt
Member
 
Location: Germany

Join Date: Mar 2011
Posts: 14
Default

Do you have a header section in your SAM file? Could you please try whether it works after removal of those lines?

Best,
Peter
pfrommolt is offline   Reply With Quote
Old 05-18-2011, 06:31 AM   #9
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default

what is the difference between -a and -g options for the annotation?

And where does the annotation come from, the UCSC track?

I ask because I have UCSC track for hg18 already downloaded.

Thanks for sharing your program. I am looking forward to trying it out.
NGSfan is offline   Reply With Quote
Old 05-18-2011, 07:15 AM   #10
pfrommolt
Member
 
Location: Germany

Join Date: Mar 2011
Posts: 14
Default

The -a and -g flags are equivalent, so you can choose any one of these. The annotation comes from the UCSC track but the download usually finishes within seconds, so you do not need to worry about this.
pfrommolt is offline   Reply With Quote
Old 05-18-2011, 07:31 AM   #11
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default

Quote:
Originally Posted by pfrommolt View Post
The -a and -g flags are equivalent, so you can choose any one of these. The annotation comes from the UCSC track but the download usually finishes within seconds, so you do not need to worry about this.

It is a shame that I am behind a firewall - would it be possible to have the UCSC track file stored locally and accessed by NGSrich instead of downloaded?
NGSfan is offline   Reply With Quote
Old 05-19-2011, 12:18 AM   #12
bpetersen
Member
 
Location: Germany

Join Date: Mar 2010
Posts: 20
Default

I tried it again after removing the header of my samfile and the previous error is gone, but now I get the following:

=======================1=======================
>>> STEP 1: reducing files

READS FILE:
/home/bpetersen/exome_9_A0019.sam was reduced to /home/bpetersen/temp/1305783668487/NGSrich_exome_9_A0019_cl008.27100.txt
Reduced file /home/bpetersen/temp/1305783668487/NGSrich_exome_9_A0019_cl008.27100.txt sorted

GENOME ANNOTATION FILE:
/home/bpetersen/temp/1305783668487/refGene.genome reduced to /home/bpetersen/temp/1305783668487/NGSrich_genome_cl008.27100.txt

TARGET REGIONS FILE:
/home/bpetersen/temp/1305783668487/TruSeq_exome_targeted_regions.converted.hg18.target reduced to /home/bpetersen/temp/1305783668487/NGSrich_target_cl008.27100.txt
Reduced file /home/bpetersen/temp/1305783668487/NGSrich_target_cl008.27100.txt sorted

STEP 1 successfully completed

=====================2=========================
>>> STEP 2: computing target coverage data

Starting computing target coverage files.
Mean target coverage data computed (/home/bpetersen/NGSricCreating coverage barplot for chr21 ... ready.Creating coverage barplot for chr21_random ...
HTML FILE /home/bpetersen/NGSrich_test/exome_9_A0019_enrichment.html not founddy.Creating coverage barplot for chr21 ...
Coverage summary computed (/hCreating coverage barplot for chr2 ... ready.Creating coverage barplot for chr20 ...
STEP 3 unsuccessfulting coverage barplot for chr19 ... ready.Creating coverage barplot for chr2 ...
Creating coverage barplot for chr18 ... ready.Creating coverage barplot for chr19 ... Creating coverage barplot for chr17_random ... ready.Creating coverage barplot for chr18 .=====================4========================= Creating coverage barplot for chr17 ... ready.Creating coverage barplot for chr17_random ...
>>> STEP 4: computing targets wiggle data====== Creating coverage barplot for chr16 ... ready.Creating coverage barplot for chr17 ...
>>> STEP 3: evaluating enrichment files Creating coverage barplot for chr15 ... ready.Creating coverage barplot for chr16 ...
Start computing target-based wiggle data Creating coverage barplot for chr14 ... ready.Creating coverage barplot for chr15 ...
Details File Name: /home/bpetersen/temp/1305783668487/coverage_cl008.27100.txtlot for chr13 ... ready.Creating coverage barplot for chr14 ...
Output Dir: /home/bpetersen/NGSrich_test/data xml Creating coverage barplot for chr12 ... ready.Creating coverage barplot for chr13 ...
End of computing target-based wiggle datang coverage barplot for chr11 ... ready.Creating coverage barplot for chr12 ...
XML summary file: /hCreating coverage barplot for chr10 ... ready.Creating coverage barplot for chr11 ...
STEP 4 successfully completed.t for chr1 ... ready.Creating coverage barplot for chr10 ...
Preparing coverage barplots ... ready.Creating coverage barplot for chr1 ...
=====================5=========================overage barplots ...
>>> STEP 5: computing overall wiggle dataage pieplot ...
Reading XML file ... ready.Reading BED file ...
Start computing overall wiggle data
Align File Name: /home/bpetersen/temp/1305783668487/NGSrich_exome_9_A0019_cl008.27100.txt
Output Dir: /home/bpetersen/NGSrich_test/data
/home/bpetersen/NGSrich_test/exome_9_A0019_enrichment.wignot found

STEP 5 unsuccessful

===============================================

The plots were successfully generated but not the html file.
Any idea why this might be?
I think it would be great if you could get NGSrich to work with samfiles with header or even better with bamfiles, is this planned anytime soon?

Last edited by bpetersen; 05-19-2011 at 12:21 AM.
bpetersen is offline   Reply With Quote
Old 05-19-2011, 12:35 AM   #13
pfrommolt
Member
 
Location: Germany

Join Date: Mar 2011
Posts: 14
Default

Yes, we have already prepared a version which can handle the header section adequately. This is coming very soon.

Did your run create a BED and XML file in the 'data' directory? If so, could you email these to my address given in the README file?

Does the SureSelect target file have more than one BED track? You should provide a BED file with only one track.

Regards,
Peter
pfrommolt is offline   Reply With Quote
Old 05-20-2011, 06:48 AM   #14
pfrommolt
Member
 
Location: Germany

Join Date: Mar 2011
Posts: 14
Default

Version 0.4.2 uploaded to SourceForge!

Recent changes:
-> SAM header sections are skipped
-> unassembled contigs with less than 3 genes are ignored
-> Bug in the computation of coverage statistics was removed
pfrommolt is offline   Reply With Quote
Old 05-20-2011, 08:22 AM   #15
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default

Hi,

Could you add a feature to allow the user to point to an annotation file instead of download it everytime?
NGSfan is offline   Reply With Quote
Old 05-20-2011, 08:26 AM   #16
pfrommolt
Member
 
Location: Germany

Join Date: Mar 2011
Posts: 14
Default

If you have downloaded and unzipped the annotation from
http://hgdownload.cse.ucsc.edu/golde...refGene.txt.gz
(example for hg19), you can just specify the local path to this file instead of 'hg19'.
pfrommolt is offline   Reply With Quote
Old 05-21-2011, 12:48 PM   #17
husamia
Member
 
Location: cinci

Join Date: Apr 2010
Posts: 66
Default

It would be nice to include example of report to see what types of metrics it can calculate. I am curious about this.
husamia is offline   Reply With Quote
Old 05-24-2011, 01:51 AM   #18
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default

Hi! I got it to work, the output looks correct! It's strange though I can see Step 3 and Step 5 as unsuccessful, but the output looks finished (includes list of poor and high covered genes).

I really like the summary.



java NGSrich -r /net/ngs/NGS.analysis/data/091204_ILLUMINA-075005_0000_42GHRAAXX/s_1.42GHRAAXX_LIB001_MIAPACA2_24877_FID000.J00001.alnRecal.sam -a /net/ngs/NGS.analysis/bin/annovar/humandb/hg18_refGene.txt -t /net/ngs/NGS.analysis/reference/500_Target_Regions.bed -T /dev/shm/
test: /net/ngs/NGS.analysis/data/091204_ILLUMINA-075005_0000_42GHRAAXX/enrichment
=======================1=======================
>>> STEP 1: reducing files
READS FILE:
/net/ngs/NGS.analysis/data/110415_SN378_0070_B81MCAABXX/s_1.B81MCAABXX_LIB160_HT_26828_FID047_J00229.bfast.sam was reduced to /dev/shm/1305908202090/NGSrich_s_1.B81MCAABXX_LIB160_HT_26828_FID047_J00229.bfast_vieasncl6.20049.txt
Reduced file /dev/shm/1305908202090/NGSrich_s_1.B81MCAABXX_LIB160_HT_26828_FID047_J00229.bfast_vieasncl6.20049.txt sorted

GENOME ANNOTATION FILE:
/dev/shm/1305908202090/hg18_refGene.genome reduced to /dev/shm/1305908202090/NGSrich_genome_vieasncl6.20049.txt

TARGET REGIONS FILE:
/dev/shm/1305908202090/1000plus_Target_Regions.target reduced to /dev/shm/1305908202090/NGSrich_target_vieasncl6.20049.txt
Reduced file /dev/shm/1305908202090/NGSrich_target_vieasncl6.20049.txt sorted

STEP 1 successfully completed

=====================2=========================
>>> STEP 2: computing target coverage data

Starting computing target coverage files.
Mean target coverage data computed (/net/ngs/NGS.analysis/data/110415_SN378_0070_B81MCAABXX/enrichment/s_1.B81MCAABXX_LIB160_HT_26828_FID047_J00229.bfast_enrichment.bed).
Detailed base coverage computed (/dev/shm/1305908202090/coverage_vieasncl6.20049.txt).
Coverage summary computed (/net/ngs/NGS.analysis/data/110415_SN378_0070_B81MCAABXX/enrichment/s_1.B81MCAABXX_LIB160_HT_26828_FID047_J00229.bfast_enrichment.xml).

STEP 2 successfully completed

=====================3=========================
>>> STEP 3: evaluating enrichment files

Detecting evaluation program
"data" and "plots" subdirectories created and xml and bed file moved to the data subdirectory!
Run evaluation program on the following arguments Writing HTML output ... ready.Finished.
HTML FILE /net/ngs/NGS.analysis/data/110415_SN378_0070_B81MCAABXX/enrichment/s_1.B81MCAABXX_LIB160_HT_26828_FID047_J00229.bfast_enrichment.html not found378_0070_B81MCAABXX/enrichment/110415_enrichment.html
Creating coverage barplot ... ready.Searching for poorly (<2x) and highly (>200x) covered genes ...AABXX_LIB160_HT_26828_FID047_J00229.bfast_enrichment.bed
STEP 3 unsuccessfulbarplots ... ready.Creating coverage barplot ...CAABXX/enrichment
Creating coverage pieplot ... ready.Preparing coverage barplots ...efGene.txt
=====================4=========================eplot ...
>>> STEP 4: computing targets wiggle datale ...
R Ausgabe: Reading XML file ...
Start computing target-based wiggle data
Details File Name: /dev/shm/1305908202090/coverage_vieasncl6.20049.txt
Output Dir: /net/ngs/NGS.analysis/data/110415_SN378_0070_B81MCAABXX/enrichment/data
End of computing target-based wiggle data

STEP 4 successfully completed.

=====================5=========================
>>> STEP 5: computing overall wiggle data

Start computing overall wiggle data
Align File Name: /dev/shm/1305908202090/NGSrich_s_1.B81MCAABXX_LIB160_HT_26828_FID047_J00229.bfast_vieasncl6.20049.txt
Output Dir: /net/ngs/NGS.analysis/data/110415_SN378_0070_B81MCAABXX/enrichment/data
/net/ngs/NGS.analysis/data/110415_SN378_0070_B81MCAABXX/enrichment/s_1.B81MCAABXX_LIB160_HT_26828_FID047_J00229.bfast_enrichment.wignot found

STEP 5 unsuccessful

===============================================

[1]+ Done java NGSrich -r /net/ngs/NGS.analysis/data/110415_SN378_0070_B81MCAABXX/s_1.B81MCAABXX_LIB160_HT_26828_FID047_J00229.bfast.sam -a /net/ngs/NGS.analysis/bin/annovar/humandb/hg18_refGene.txt -t /net/ngs/NGS.analysis/reference/1000plus_Target_Regions.bed -T /dev/shm/
NGSfan is offline   Reply With Quote
Old 05-26-2011, 09:08 AM   #19
dnusol
Senior Member
 
Location: Spain

Join Date: Jul 2009
Posts: 133
Default

Hi,

I am having a similar problem. It seems that the sam file and the database are fine but then I get this error

java NGSrich -r /Data/Exp1/s_3_QC_remdup.sam -a hg19 -T /Data/Exp1/ -t /Data/Exp1/exomeplusv2.bed
test: /Data/Exp1/enrichment
=======================1=======================
>>> STEP 1: reducing files

READS FILE:
/Data/Exp1/s_3_QC_remdup.sam was reduced to /Data/Exp1/1306423511007/NGSrich_s_3_QC_remdup_KDavid.26819.txt
Reduced file /Data/Exp1/1306423511007/NGSrich_s_3_QC_remdup_KDavid.26819.txt sorted

GENOME ANNOTATION FILE:
/Data/Exp1/1306423511007/refGene.genome reduced to /Data/Exp1/1306423511007/NGSrich_genome_KDavid.26819.txt
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 1
at java.util.Vector.get(Vector.java:721)
at adapter.Adapter.field(Adapter.java:90)
at adapter.TargetAdapter.start(TargetAdapter.java:168)
at adapter.TargetAdapter.adapt(TargetAdapter.java:56)
at Enrichment.reduceFiles(Enrichment.java:185)
at NGSrich.main(NGSrich.java:91)

This is a head result of my .bed file

head /Data/Exp1/exomeplusv2.bed
track:exome
chr1 30275 30431
chr1 69069 70029
chr1 228233 228354
chr1 228471 228711
chr1 367647 368608
chr1 470971 471330
chr1 621084 622045
chr1 741165 741285
chr1 745438 745558

Any ideas?

Thanks,

Dave
dnusol is offline   Reply With Quote
Old 05-26-2011, 04:11 PM   #20
jtjli
Member
 
Location: australia

Join Date: Nov 2008
Posts: 21
Default

i keep getting this error after ~20mins. Any idea?

Thanks,
Jason


java NGSrich -r 32815T_GATKrealigned_duplicates_marked.sam -g hg19 -t 0247401_D_BED_20090724_hg19.bed -T ngsrich_tmp -o ngsrich_out


=======================1=======================
>>> STEP 1: reducing files

READS FILE:
/mnt/Storage/All_Users/lij/testNGSrich/32815T_GATKrealigned_duplicates_marked.sam was reduced to /mnt/Storage/All_Users/lij/testNGSrich/ngsrich_tmp/1306393150286/NGSrich_32815T_GATKrealigned_duplicates_marked_pmc-bioinf02.14020.txt
Reduced file /mnt/Storage/All_Users/lij/testNGSrich/ngsrich_tmp/1306393150286/NGSrich_32815T_GATKrealigned_duplicates_marked_pmc-bioinf02.14020.txt sorted


java.io.FileNotFoundException: /mnt/Storage/All_Users/lij/testNGSrich/ngsrich_tmp/1306393150286/refGene.genome (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:106)
at java.util.Scanner.<init>(Scanner.java:636)
at adapter.GenomeAdapter.adapt(GenomeAdapter.java:22)
at Enrichment.reduceFiles(Enrichment.java:181)
at NGSrich.main(NGSrich.java:91)
Exception in thread "main" java.lang.NullPointerException
at adapter.GenomeAdapter.adapt(GenomeAdapter.java:33)
at Enrichment.reduceFiles(Enrichment.java:181)
at NGSrich.main(NGSrich.java:91)
jtjli is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:40 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO