Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
CLC Genomics Workbench - Windows vs. Linux figure002 Bioinformatics 24 12-06-2013 07:10 AM
Getting a full annotation onto a consensus sequence in CLC Genomics Workbench Dapip33 Genomic Resequencing 1 09-19-2013 08:02 AM
CLC Genomics Workbench ECO Bioinformatics 65 03-27-2012 05:05 AM
CLC Genomics Workbench for de novo RNA-seq JQH Bioinformatics 1 07-13-2011 12:17 AM
Mapping RNA seq using CLC Genomics WOrkbench rururara Bioinformatics 1 02-22-2011 12:35 PM

Thread Tools
Old 02-06-2013, 09:07 AM   #1
Junior Member
Location: germany

Join Date: Mar 2012
Posts: 2
Default CLC Genomics Workbench - exclude certain genes from reference sequence

Hi all,

I'm having a little trouble with my analysis using CLC tools.
I'm working with NGS data from a gene panel including 11 genes. These reads should be mapped against the human genome and compared to SNP data from several databases (dbSNP, COSMIC, etc.). However, the mapping and SNP calling results must include only hits within the 11 genes in the panel.

My initial approach was to prepare a reference FASTA file only consisting of the 11 genes. But then the positions of the called variants do not correspond to the chromosomal positions of the SNP data from the databases.

In the CLC manual I found something about masking the reference sequence. I tried to convert my prepared FASTA file to a track using the Track Tools, but I couldn't select a FASTA file for the conversion.

So, how do I mask my reference sequence so that just a certain group of genes are considered in the mapping and the output is still based on the full reference (chromosomal positions)?
I'm open and grateful for other suggestions.

Thank you in advance for your help.

Best regards!
mrs-sir is offline   Reply With Quote
Old 02-06-2013, 09:21 AM   #2
Location: Rockville, MD

Join Date: Apr 2011
Posts: 23

Are you using GW6?

Tracks is not the way to do this - you should use a Read Mapping file; not a Track file.

This might be a hack - but you might try annotating the genome with a custom annotation called something like "Excluded Regions" - and apply this annotation to the entire genome EXCEPT for the 11 genomic regions. Then, when you do your read mapping from your NGS data, use the Excluded Regions annotation as a masking element - that way reads only mapping to your genes of interest will be included.

Of course - you run the risk of having reads mapped to your genes that could potentially map somewhere else in the genome much better; so I would consider upping the read scoring metrics quite a bit to compensate.

I may not understand exactly what the issue is - but maybe that helps a bit.
jonathanjacobs is offline   Reply With Quote
Old 02-07-2013, 07:48 AM   #3
CLC bio - Anja
Junior Member
Location: Aarhus

Join Date: Feb 2013
Posts: 2

Hi mrs-sir
I asked the guys in support about your issue and they can definitely help. Just shoot them an email with your issue at - if you still are struggling with it
Anja from CLC bio
CLC bio - Anja is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 12:05 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO