SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Publications explaining NGS (Reads, Multiplexing, etc) Newby Literature Watch 3 04-03-2012 08:24 AM
NGS reads quality andreitudor Bioinformatics 6 04-18-2011 12:46 PM
Ideal Reads for Different NGS Applications rwenang Bioinformatics 0 10-25-2010 07:33 PM
GAIIx cooler condensation BIG_SNP Illumina/Solexa 0 08-20-2009 02:39 PM
Condensation/Consolidation methods doxologist Bioinformatics 0 03-24-2009 09:06 AM

Reply
 
Thread Tools
Old 01-29-2013, 12:16 PM   #1
bye
Junior Member
 
Location: NJ

Join Date: Sep 2010
Posts: 8
Default NGS reads condensation

Hi,

We are working on finding new transposon insertion sites using NGS data. Our candidate reads should contain part of transposon sequence and part of genome sequence around its insertion site. In another words, the reads that we are interested are the reads that can not perfectly aligned to genome, therefore the duplicate removing tools based on alignment to reference are not suitable for our project.

I'm just wondering if anyone know about any tools that can remove duplicated sequence as well as condense shorter reads into the longer ones for the reads that can't aligned to reference?

Thanks in advance!

bin
bye is offline   Reply With Quote
Old 01-29-2013, 12:20 PM   #2
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Quote:
Originally Posted by bye View Post
Hi,

We are working on finding new transposon insertion sites using NGS data. Our candidate reads should contain part of transposon sequence and part of genome sequence around its insertion site. In another words, the reads that we are interested are the reads that can not perfectly aligned to genome, therefore the duplicate removing tools based on alignment to reference are not suitable for our project.

I'm just wondering if anyone know about any tools that can remove duplicated sequence as well as condense shorter reads into the longer ones for the reads that can't aligned to reference?

Thanks in advance!

bin
It's not computationally pretty, but you could try getting all your unaligned reads, using cut | sort| uniq -c to get a list of all the sequences, and how often they come up.

Maybe start with a grep to get all the reads that begin with edge of the transposon sequence; that will make the list more manageable.
swbarnes2 is offline   Reply With Quote
Old 01-29-2013, 12:36 PM   #3
bye
Junior Member
 
Location: NJ

Join Date: Sep 2010
Posts: 8
Default

Quote:
Originally Posted by swbarnes2 View Post
It's not computationally pretty, but you could try getting all your unaligned reads, using cut | sort| uniq -c to get a list of all the sequences, and how often they come up.

Maybe start with a grep to get all the reads that begin with edge of the transposon sequence; that will make the list more manageable.
Thank you! This surely is a good starting point!
bye is offline   Reply With Quote
Old 01-30-2013, 03:36 AM   #4
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 505
Default

We've used split-end alignment (described here) to map transposon insertions.
HESmith is offline   Reply With Quote
Old 01-30-2013, 05:24 AM   #5
bye
Junior Member
 
Location: NJ

Join Date: Sep 2010
Posts: 8
Default

Quote:
Originally Posted by HESmith View Post
We've used split-end alignment (described here) to map transposon insertions.
This is a great idea! Have you ever apply this method to human? May I ask which transposon reference databases were used?
bye is offline   Reply With Quote
Old 01-30-2013, 07:48 AM   #6
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 505
Default

No, we have not screened human data, so I have no advice regarding reference databases.
HESmith is offline   Reply With Quote
Old 01-31-2013, 06:35 AM   #7
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

I would use SMALT to align your reads to the transposon. SMALT is quite quick, particularly going against such a tiny reference. You can use the output from this to
(a) filter for the reads containing transposon ends and useful other sequence
(b) orient your reads relative to the transposon end
(c) extract the non-transposon portions of the reads

The data from (c) is what is then aligned to your genome of interest.

With paired end Illumina data, life gets a little more interesting as you will want to find cases in which one read maps entirely to the genome of interest and the other entirely (or nearly so) to the transposon. Merging reads with FLASH or similar will reduce many of these to the single read case, but for the rest you'll need to make sure you identify these cases.
krobison is offline   Reply With Quote
Reply

Tags
reads condensation

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:41 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO