Seqanswers Leaderboard Ad

**mbreese** · 09-23-2009, 07:38 AM

I actually just pulled the table from ABI. The attached perl scripts didn't handle N's at all.

I'm very new to this, and haven't run across 5 or 6 in our data. What do they stand for? (Ambiguity codes?)

**westerman** · 09-23-2009, 07:51 AM

There are 3 transitions ... N to N; N to known (ACGT), known to N. These transitions can be represented by 3 different color-space numbers. In this case '4', '5', and '6'. Off the top of my head I do not remember which is which. Also only some of the ABI programs actually work with such this concept. encodeFasta.py, which the ABI SNP-calling manual says to use, does not handle any of the cases. It makes me wonder at times if ABI even uses their own programs on any real-life data. :-(

**westerman** · 09-23-2009, 09:47 AM

I was sitting here avoiding work -- I have an intractable problem, ugh! -- wondering where I had seen that 4,5,6 color-space encoding. So I looked it up. The 'dna_subroutines.pm' (perl library, obviously) has the following which is used by the 'convert_to_dibase' subroutine. At least of the 26 programs in the 'bin' directory use the 'dna_subroutines.pm' module although I am not certain if any use the convert_to_dibase routine. None seem to do directly. None of the the python routines use the 4,5,6 color-space encoding.

So ... using a '4' is probably good enough.

$color{AN} = 4;
$color{CN} = 4;
$color{GN} = 4;
$color{TN} = 4;
$color{NA} = 5;
$color{NC} = 5;
$color{NG} = 5;
$color{NT} = 5;
$color{NN} = 6;

**inesdesantiago** · 10-05-2009, 06:38 AM

"N" in basespace

How come NA, NC, NT, NG all have the same colorspace code '5'.
This means that once you have N for a given base you never know what is the next base? You don't know if it is A,G,C,T ...
Right?
Ines

**westerman** · 10-05-2009, 07:26 AM

Originally posted by inesdesantiago View Post

How come is NA, NC, NT, NG all have the same code '5'.
This means that once you have N for a given base you never know what is the next base? You don't know if it is A,G,C,T ...
Right?
Ines

That is only partially correct but for the first approximation it is correct. You certainly can not properly decode from colorspace (CS) into basespace (BS) if there are 4,5, or 6s in the CS. However this does not keep you from using the information in matching.

[Note: CS reads off of the sequencer will have a simple period (.) when there is an unknown and 0 through 3 for known ... 4,5,6s are only used when computationally processing BS->CS->BS translations]

Let's go for an example.

Say we have a (poor) reference sequence that in BS is:

TCACGNGTCAAC

Translating this into CS so that it can be mapped:

T21134412101

Computationally if we tried to convert this CS back to BS we would get:

TCACGNNNNNNN

On the hand if we had an actual CS read from the sequencer such as:

T21130012101

We can certainly map, allowing for mismatches, that actual read to our reference. If we had enough reads coming off the sequencer that were all the same as the above (or, better, had slightly different start points and also overlapped the region in question), then we could say with confidence that while our reference sequence indicated an 'N' in the position, our actual sequenced organism has a 'G'.

Note that you can get into trouble with the above if your reads could potentially map to other parts of your organism and those parts are not part of your reference. This is a major reason for wanting different start sites and long reads. So tread with care.

**inesdesantiago** · 10-05-2009, 09:22 AM

Thanks for the reply

I have a collection of reads that are 35 nuc long.
In all of them there is a '.' in the same position, so when I translate from
colorspace to basesapce all of my reads became only 23 nucleotides long plus a tail of 12 N's:

TCGAATGACTGTGACGTGCAGTCNNNNNNNNNNNN

this is happening to all reads in the file. Maybe something went wrong with the sequencing?

For mapping proposes, do you thing that it's better to use the 23nuc reads then the ones with the 'Ns'? I guess if I use the reads with so many N's they can actually map to wrong positions.
Is this right?

Thank you
Ines

**inesdesantiago** · 10-05-2009, 09:24 AM

Dear westerman,
Thanks for the reply

I have a collection of reads that are 35 nuc long.
In all of them there is a '.' in the same position, so when I translate from
colorspace to basesapce all of my reads became only 23 nucleotides long plus a tail of 12 N's:

TCGAATGACTGTGACGTGCAGTCNNNNNNNNNNNN

this is happening to all reads in the file. Maybe something went wrong with the sequencing?

For mapping proposes, do you thing that it's better to use the 23nuc reads then the ones with the 'Ns'? I guess if I use the reads with so many N's they can actually map to wrong positions.
Is this right?

Thank you
Ines

**westerman** · 10-05-2009, 09:25 AM

Using the 23nuc reads would be good. Even better is to do your mapping in colorspace without doing translation first. That way sequencer errors should be taken care of.

**inesdesantiago** · 10-05-2009, 09:28 AM

That's a good idea, I haven't thought about mapping using colorspace...
To bad bowtie doesn't map with colorspace yet..
Regards,
Ines

**Chipper** · 10-05-2009, 12:31 PM

Originally posted by inesdesantiago View Post

That's a good idea, I haven't thought about mapping using colorspace...
To bad bowtie doesn't map with colorspace yet..
Regards,
Ines

Try bwa (in colorspace) with a seed length of <=22, or better yet a program that allows masking of the position with dots (I think mapreads can do it, maybe others can as well).

**westerman** · 10-06-2009, 06:02 AM

You can mask using the mapreads via the '-p' parameter. Usually this is done via the matching_large_genomes_cmap_save_script.pl program although other SOLiD routines also call mapreads.

E.g., try '-p 1111111111111111111100000000000000000000' or whatever fits your tag length and desired pattern.

mapreads will still try to map the full length tag and thus will have problems when the masked part seemingly overhangs the ends. That is, mapreads does chop off the masked part to make a shorter read but rather keeps the read full length.

**gladexp** · 03-08-2012, 05:32 PM

Originally posted by westerman View Post

The ABI 'corona lite' programs (which are free) include 'encodeFasta.py' which will encode and decode to/from color-space, base-space and that abomination 'double-encoded'-space.

Hi all,

Does any one have the link or zipped file for the ABI 'corona lite'?

Many thanks.

**idonaldson** · 03-09-2012, 03:33 AM

Try this for Corona-Lite, i couldn't seem to find it on Life Techs site:

http://skip.ucsc.edu/phage_contigs/hartzog_phage/tools/

**gladexp** · 03-09-2012, 06:02 AM

Originally posted by idonaldson View Post

Try this for Corona-Lite, i couldn't seem to find it on Life Techs site:
http://skip.ucsc.edu/phage_contigs/hartzog_phage/tools/

Thanks. I just checked "corona_lite_v4.0r2.0.tgz", is it the latest version?

**westerman** · 03-12-2012, 08:06 AM

Originally posted by gladexp View Post

Thanks. I just checked "corona_lite_v4.0r2.0.tgz", is it the latest version?

Probably. Corona lite is rather old. We've gone through Bioscope and are now using LifeScope since CL was released.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 50 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News