SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Colorspace bis-seq data analysis on Solid Blaize Epigenetics 4 11-26-2013 01:36 AM
question about Solid data analysis using bowtie? wangleibio RNA Sequencing 5 03-26-2013 07:32 AM
75+35 Pair-end SOLiD RNA-seq data analysis endether SOLiD 11 12-12-2012 05:40 PM
SOLiD data analysis with lifescope lre1234 Bioinformatics 3 08-10-2012 05:39 AM

Reply
 
Thread Tools
Old 07-23-2014, 12:40 AM   #1
Lv Ray
Member
 
Location: GZ,China

Join Date: Jun 2014
Posts: 42
Post Puzzling to SOLID data analysis

Can anyone help me about the SOLID 4 sequencing data ? I got them from SRA database. And they seems have been treated. they are mate paired data.
mate1:
@SRR586064.1 ugc_357_358_MatePair_2x50bp_solid0032_20100528_MP_ugc_357_854_13_97/1
T30..01121.12.1032100213131122200031222022101302313
+
!AB!!?:>@<!@B!;AB?AA@@<2<@?@@<?AB>A?9?:;@@>;-=>>7=@
@SRR586064.2 ugc_357_358_MatePair_2x50bp_solid0032_20100528_MP_ugc_357_854_13_137/1
T00..00000.00.0000000002220301333000020000303000000
+
!<B!!97=<A!<@!?>7?@;9+%+2%%-&-%%+6(,0*)3((%<%4.+8.1

mate2:
@SRR586064.1 ugc_357_358_MatePair_2x50bp_solid0032_20100528_MP_ugc_357_854_13_97/2
G10330000122222033220201000000220002000000000000000
+
!@BBABBBB@>?@@(.))35.%-.3((%1+%((82-'.*-*/3*14*'696
@SRR586064.2 ugc_357_358_MatePair_2x50bp_solid0032_20100528_MP_ugc_357_854_13_137/2
G01002233300021321333003300011021333000110330003000
+
!AA>>@A@?3=?:1+6@:?+;A+0+921:-:B78'6?(/5/?5:26=&*

MY questions are:
1)what is the first read in mate1?(some like the first read have the symbol . ,what does it mean?) But in mate1 ,there are many reads like mate2.
2) can I convert the SOLID fastq(color space) to necleotide space? Anything lose?
3) If I convert the color space field, how about the QV lines?
4) I am similar to Bowtie2, can I use it to do mapping?
5)last, should I do the QC before conversion or after?
Lv Ray is offline   Reply With Quote
Old 07-23-2014, 09:07 AM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

1a) The first read is "T30..01121.12.1032100213131122200031222022101302313". The reads are a single letter followed by numbers.
1b) '.' means 'N' (a no-call).
2) Not without mapping. Only mapped reads can be correctly translated to base-space.
3) The QV lines can be sort of interpolated by averaging adjacent values.
4) You need to map with a colorspace-aware aligner. I'm pretty sure BWA allows this.
5) QC must be done before mapping.
Brian Bushnell is offline   Reply With Quote
Old 07-23-2014, 11:38 AM   #3
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Actually Brian is wrong about #2. It is quite trivial to convert from color-space to base-space without mapping. I use to be able to do so by hand -- just for mental exercise -- when we owned a SOLiD. For example your read:
Quote:
G01002233300021321333003300011021333000110330003000
converts into:
Quote:
GGTTTCTATAAAAGTAGTATAAATAAAACAAGTATAAAACAATAAAATTTT
[BTW: I see that Wikipedia says "... There is one unambiguous conversion of a base reference sequence into color-space, but there are four possible conversions of a color string into base strings" which is wrong. I should pull out my old conversion table and correct this.]

That said it is quite dangerous to rely on the conversion from color-space to base-space. If there is any uncertainty in the color-space then the probability of going off-track in base-space is high with the consequent corruption of your base-space data. Mapping the (older SOLiD generated) read to a known reference is the way to keep the corruption from occurring.

Color-space can be a very powerful self-correcting format <em>as long as you remain in color space</em>. Do not convert from color-space into base-space until you are completely done with your analysis.

BWA dropped support of color-space as of version 0.6 so you will have to use an older version.

Bowtie2 never supported color-space although the former program Bowtie does.

BTW: Not that this help you but the new SOLiD has a self-correcting method which re-aligns the read properly ever fifth base.
westerman is offline   Reply With Quote
Old 07-23-2014, 12:51 PM   #4
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by westerman View Post
Actually Brian is wrong about #2. It is quite trivial to convert from color-space to base-space without mapping.
I wouldn't say "wrong"... though maybe misleading. It is impossible to correctly translate an imperfect colorspace read to base-space without mapping. And since SOLiD has an extremely high error rate - a minority, maybe around 20%, of reads were error-free in the SOLiD 4 data I've processed - in practice, it's not a useful thing to do. And indeed it is completely impossible to translate a read with a no-called base, before mapping.

It's also not possible to unambiguously translate a colorspace read to base-space even AFTER mapping; the translation requires assumptions about the probability of errors versus SNPs versus indels and the results are highly subjective.
Brian Bushnell is offline   Reply With Quote
Reply

Tags
solid data analysis

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:37 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO