SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > SOLiD



Similar Threads
Thread Thread Starter Forum Replies Last Post
.abi to fasta/fastq conversion script/program? AppleInformatics General 12 08-26-2012 11:17 PM
csfasta to fasta? brachysclereid Bioinformatics 5 08-31-2011 10:27 AM
EMBL like file to FASTA conversion.. empyrean Bioinformatics 1 05-14-2011 01:49 AM
fastq to fasta conversion kwtennis311 Bioinformatics 4 06-11-2010 12:06 PM
Fasta to Ace conversion Farhat Bioinformatics 19 05-15-2010 07:08 PM

Reply
 
Thread Tools
Old 10-05-2009, 10:22 AM   #21
inesdesantiago
Member
 
Location: LONDON, UNITED KINGDOM

Join Date: Jan 2009
Posts: 44
Default

Thanks for the reply

I have a collection of reads that are 35 nuc long.
In all of them there is a '.' in the same position, so when I translate from
colorspace to basesapce all of my reads became only 23 nucleotides long plus a tail of 12 N's:

TCGAATGACTGTGACGTGCAGTCNNNNNNNNNNNN

this is happening to all reads in the file. Maybe something went wrong with the sequencing?

For mapping proposes, do you thing that it's better to use the 23nuc reads then the ones with the 'Ns'? I guess if I use the reads with so many N's they can actually map to wrong positions.
Is this right?

Thank you
Ines
inesdesantiago is offline   Reply With Quote
Old 10-05-2009, 10:24 AM   #22
inesdesantiago
Member
 
Location: LONDON, UNITED KINGDOM

Join Date: Jan 2009
Posts: 44
Default

Dear westerman,
Thanks for the reply

I have a collection of reads that are 35 nuc long.
In all of them there is a '.' in the same position, so when I translate from
colorspace to basesapce all of my reads became only 23 nucleotides long plus a tail of 12 N's:

TCGAATGACTGTGACGTGCAGTCNNNNNNNNNNNN

this is happening to all reads in the file. Maybe something went wrong with the sequencing?

For mapping proposes, do you thing that it's better to use the 23nuc reads then the ones with the 'Ns'? I guess if I use the reads with so many N's they can actually map to wrong positions.
Is this right?

Thank you
Ines
inesdesantiago is offline   Reply With Quote
Old 10-05-2009, 10:25 AM   #23
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Using the 23nuc reads would be good. Even better is to do your mapping in colorspace without doing translation first. That way sequencer errors should be taken care of.
westerman is offline   Reply With Quote
Old 10-05-2009, 10:28 AM   #24
inesdesantiago
Member
 
Location: LONDON, UNITED KINGDOM

Join Date: Jan 2009
Posts: 44
Default

That's a good idea, I haven't thought about mapping using colorspace...
To bad bowtie doesn't map with colorspace yet..
Regards,
Ines
inesdesantiago is offline   Reply With Quote
Old 10-05-2009, 01:31 PM   #25
Chipper
Senior Member
 
Location: Sweden

Join Date: Mar 2008
Posts: 324
Default

Quote:
Originally Posted by inesdesantiago View Post
That's a good idea, I haven't thought about mapping using colorspace...
To bad bowtie doesn't map with colorspace yet..
Regards,
Ines
Try bwa (in colorspace) with a seed length of <=22, or better yet a program that allows masking of the position with dots (I think mapreads can do it, maybe others can as well).
Chipper is offline   Reply With Quote
Old 10-06-2009, 07:02 AM   #26
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

You can mask using the mapreads via the '-p' parameter. Usually this is done via the matching_large_genomes_cmap_save_script.pl program although other SOLiD routines also call mapreads.

E.g., try '-p 1111111111111111111100000000000000000000' or whatever fits your tag length and desired pattern.

mapreads will still try to map the full length tag and thus will have problems when the masked part seemingly overhangs the ends. That is, mapreads does chop off the masked part to make a shorter read but rather keeps the read full length.
westerman is offline   Reply With Quote
Old 03-08-2012, 05:32 PM   #27
gladexp
Junior Member
 
Location: Ohio

Join Date: Mar 2012
Posts: 3
Default

Quote:
Originally Posted by westerman View Post
The ABI 'corona lite' programs (which are free) include 'encodeFasta.py' which will encode and decode to/from color-space, base-space and that abomination 'double-encoded'-space.
Hi all,

Does any one have the link or zipped file for the ABI 'corona lite'?

Many thanks.
gladexp is offline   Reply With Quote
Old 03-09-2012, 03:33 AM   #28
idonaldson
Member
 
Location: Manchester, UK

Join Date: Oct 2009
Posts: 37
Default

Try this for Corona-Lite, i couldn't seem to find it on Life Techs site:
http://skip.ucsc.edu/phage_contigs/hartzog_phage/tools/
idonaldson is offline   Reply With Quote
Old 03-09-2012, 06:02 AM   #29
gladexp
Junior Member
 
Location: Ohio

Join Date: Mar 2012
Posts: 3
Default

Quote:
Originally Posted by idonaldson View Post
Try this for Corona-Lite, i couldn't seem to find it on Life Techs site:
http://skip.ucsc.edu/phage_contigs/hartzog_phage/tools/
Thanks. I just checked "corona_lite_v4.0r2.0.tgz", is it the latest version?
gladexp is offline   Reply With Quote
Old 03-12-2012, 09:06 AM   #30
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Quote:
Originally Posted by gladexp View Post
Thanks. I just checked "corona_lite_v4.0r2.0.tgz", is it the latest version?
Probably. Corona lite is rather old. We've gone through Bioscope and are now using LifeScope since CL was released.
westerman is offline   Reply With Quote
Old 03-12-2012, 09:35 AM   #31
idonaldson
Member
 
Location: Manchester, UK

Join Date: Oct 2009
Posts: 37
Default

I think i used to use 4.2.2. But i don't have the archive to install it (i didn't install it on our cluster).
idonaldson is offline   Reply With Quote
Old 04-04-2012, 01:58 AM   #32
flashton
Member
 
Location: london, uk

Join Date: Feb 2011
Posts: 10
Default

Hi,

I just want to reiterate how crazy double encoding is! Thought we were having problems with our aligner as the 'reads' weren't mapping to the reference. Why on earth did ABI pick those 4 letters? Why even double encode in the first place?!

Thanks Rick!
flashton is offline   Reply With Quote
Old 04-05-2012, 10:27 AM   #33
SeqAA
Guest
 

Posts: n/a
Default

i hope people realize converting to fastq is probably one of the worst ways to analyze cs data.
  Reply With Quote
Old 04-05-2012, 10:38 AM   #34
flashton
Member
 
Location: london, uk

Join Date: Feb 2011
Posts: 10
Default

SeqAA, could you describe your workflow?
flashton is offline   Reply With Quote
Old 04-10-2012, 03:47 AM   #35
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 415
Default

SeqAA - agree.

People, if you want to analyse Solid data properly use colour space. If you're forced into the dark arts of base space conversion i.e. for de novo assembly I would strongly recommend reading the supplements of this paper:

Iverson et al. 2012, Science : Untangling genomes from metagenomes ....
colindaven is offline   Reply With Quote
Old 05-15-2012, 10:27 AM   #36
vswilliamson
Junior Member
 
Location: Virginia

Join Date: Jun 2010
Posts: 2
Default

I generally use galaxy to do most of my conversions
vswilliamson is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:19 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO