SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Best way to generate reads for testing of aligners/snp callers gavin.oliver Bioinformatics 5 08-09-2012 06:59 AM
Numerous short read aligners: But which is best? JackieBadger Bioinformatics 4 04-28-2011 11:02 AM
Compare mapped reads from different aligners epigen Bioinformatics 3 06-25-2010 08:06 AM
Tophat/Cufflinks/RNASeq short-read aligners and pseudogenes sjm RNA Sequencing 2 01-20-2010 05:09 AM
Running MAQ SNP/Indel detection/Assembly Tools on short aligners zee Bioinformatics 4 12-11-2009 01:41 PM

Reply
 
Thread Tools
Old 05-09-2008, 07:00 AM   #1
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default short reads missed by aligners

Anyone looking into the No Match eland reads, or reads that come off solexa that are not mapped to the reference?
Any other kind of contamination control like eColi, etc?

I was looking into blat on the entire nt, but would love to hear what people are using.

sm
bioinfosm is offline   Reply With Quote
Old 05-12-2008, 03:05 PM   #2
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Try using something like velvet to align all the unaligned reads to each other, then BLAST those contigs against nr. If they are crummy reads, they won't align to each other.

We tested an in-house clone collection, and I found a fair bit of e.coli contamination. And I've found vector-looking things in microbial samples...stuff like that. If your reference has a biggish deletion compared to what you really sequenced, you might find it this way.
swbarnes2 is offline   Reply With Quote
Old 05-15-2008, 01:26 PM   #3
acnoll
Member
 
Location: Kansas City

Join Date: Mar 2008
Posts: 14
Default

Quote:
Originally Posted by bioinfosm View Post
Anyone looking into the No Match eland reads, or reads that come off solexa that are not mapped to the reference?
Any other kind of contamination control like eColi, etc?

I was looking into blat on the entire nt, but would love to hear what people are using.

sm
One approach that takes a while but exhaustively looks at all the NMs is to do a blat on the genome of interest to kick out gapped hits and take what is left and then blast to nr to find contaminants. I was thinking to then take the top couple contaminants and look at the matching hits to see if there is any overlap since maybe reads from the contaminant intersect with those mapped to the genome of interest. This might be most important for SNP calling.
acnoll is offline   Reply With Quote
Old 05-16-2008, 03:11 PM   #4
Mr. Gunn
Member
 
Location: USA

Join Date: Dec 2007
Posts: 10
Default Here's a nice comparison of the various short-read aligners, including eland.

http://massgenomics.wordpress.com/20...nd-and-others/
Mr. Gunn is offline   Reply With Quote
Old 05-21-2008, 01:18 PM   #5
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

thanks for your inputs...

Edena and velvet - 2 de novo assemblers using short read data gave so different outputs!

Velvet gave 2 contigs that pointed to a fragment that was supposedly deleted out and should not have been sequenced

edena on the other hand gave 10 or so contigs 100-120 bp long, that align perfectly to the eColi K-12!
bioinfosm is offline   Reply With Quote
Old 08-06-2008, 08:09 AM   #6
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

Reads that aren't matched by Eland are interesting because we would suppose that they're not repeats because Eland reports the matches with multiple locations.
I would say that gaps in a read would probably be missed by Eland, so use a short read aligner that can find gaps on these reads. I've been using novoalign (www.novocraft.com) and it can find up to 7/8 gaps in a 36bp read matching to a reference sequence, and fast on large ones. I've even tested it on simulated data with mutation rates in excess of 15% and it still finds them. Use a very high threshold e.g. -t 200 to find potentially all permutations for your read.
I'd be interested to know how much more you may be able to match out of your Eland NM reads.
zee is offline   Reply With Quote
Old 08-06-2008, 09:10 AM   #7
kmay
Member
 
Location: Munich, Germany

Join Date: Aug 2008
Posts: 29
Default

Just a note from my side:

As you know from other threads, we can map from 10bp onwards, with gaps and PMs. However, before tweaking the unmapped reads into the reference genome, look at viral genomes, vectors etc.
We found numerous perfect matches there. Especially when working on specific cell lines, check the history of that line, how it was immortalized etc. YouŽll be surprised how many good old retroviral friends you find!

Cheers

Klaus
kmay is offline   Reply With Quote
Old 08-06-2008, 10:19 AM   #8
Chipper
Senior Member
 
Location: Sweden

Join Date: Mar 2008
Posts: 324
Default

Interresting note, have you looked also at if you can remap the retroviral sequences with mismatches to human and if it seems to be a source of background in alignments?
Chipper is offline   Reply With Quote
Old 08-07-2008, 06:33 AM   #9
kmay
Member
 
Location: Munich, Germany

Join Date: Aug 2008
Posts: 29
Default

Chipper,

more on that with HEK cells and SV40 and Adenovirus is described in our paper

Klaus
kmay is offline   Reply With Quote
Old 08-07-2008, 08:25 PM   #10
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

I just read the Sultan paper Kmay, nice work

However, I am a little confused because it says that reads were mapped with ELAND, " Illumina deep sequencing was used to generate 27-bp reads from replicate samples for each cell line. Reads were mapped to the human genome (hg18, NCBI build 36.1) using the Eland software, allowing up to two mismatches (see SOM). Of the total reads, 50% matched to unique genomic locations," (http://www.sciencemag.org/cgi/content/full/1160342/DC1)

And the actual read data is unavailable . So I'm assuming that you'll used the proprietary genomatix mapper in a separate study?? Where can we get this read data?
zee is offline   Reply With Quote
Old 08-08-2008, 02:50 AM   #11
kmay
Member
 
Location: Munich, Germany

Join Date: Aug 2008
Posts: 29
Default

zee,

you are right. The original data were mapped with ELAND. At those days our GMS was under development. Later we looked at the ELAND non mapped reads and ran those over the viral genomes with our GMS. The actual data reads are deposited at the GEO.
kmay is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:16 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO