Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Puzzling to SOLID data analysis

    Can anyone help me about the SOLID 4 sequencing data ? I got them from SRA database. And they seems have been treated. they are mate paired data.
    mate1:
    @SRR586064.1 ugc_357_358_MatePair_2x50bp_solid0032_20100528_MP_ugc_357_854_13_97/1
    T30..01121.12.1032100213131122200031222022101302313
    +
    !AB!!?:>@<!@B!;AB?AA@@<2<@?@@<?AB>A?9?:;@@>;-=>>7=@
    @SRR586064.2 ugc_357_358_MatePair_2x50bp_solid0032_20100528_MP_ugc_357_854_13_137/1
    T00..00000.00.0000000002220301333000020000303000000
    +
    !<B!!97=<A!<@!?>7?@;9+%+2%%-&-%%+6(,0*)3((%<%4.+8.1

    mate2:
    @SRR586064.1 ugc_357_358_MatePair_2x50bp_solid0032_20100528_MP_ugc_357_854_13_97/2
    G10330000122222033220201000000220002000000000000000
    +
    !@BBABBBB@>?@@(.))35.%-.3((%1+%((82-'.*-*/3*14*'696
    @SRR586064.2 ugc_357_358_MatePair_2x50bp_solid0032_20100528_MP_ugc_357_854_13_137/2
    G01002233300021321333003300011021333000110330003000
    +
    !AA>>@A@?3=?:1+6@:?+;A+0+921:-:B78'6?(/5/?5:26=&*

    MY questions are:
    1)what is the first read in mate1?(some like the first read have the symbol “.” ,what does it mean?) But in mate1 ,there are many reads like mate2.
    2) can I convert the SOLID fastq(color space) to necleotide space? Anything lose?
    3) If I convert the color space field, how about the QV lines?
    4) I am similar to Bowtie2, can I use it to do mapping?
    5)last, should I do the QC before conversion or after?

  • #2
    1a) The first read is "T30..01121.12.1032100213131122200031222022101302313". The reads are a single letter followed by numbers.
    1b) '.' means 'N' (a no-call).
    2) Not without mapping. Only mapped reads can be correctly translated to base-space.
    3) The QV lines can be sort of interpolated by averaging adjacent values.
    4) You need to map with a colorspace-aware aligner. I'm pretty sure BWA allows this.
    5) QC must be done before mapping.

    Comment


    • #3
      Actually Brian is wrong about #2. It is quite trivial to convert from color-space to base-space without mapping. I use to be able to do so by hand -- just for mental exercise -- when we owned a SOLiD. For example your read:
      G01002233300021321333003300011021333000110330003000
      converts into:
      GGTTTCTATAAAAGTAGTATAAATAAAACAAGTATAAAACAATAAAATTTT
      [BTW: I see that Wikipedia says "... There is one unambiguous conversion of a base reference sequence into color-space, but there are four possible conversions of a color string into base strings" which is wrong. I should pull out my old conversion table and correct this.]

      That said it is quite dangerous to rely on the conversion from color-space to base-space. If there is any uncertainty in the color-space then the probability of going off-track in base-space is high with the consequent corruption of your base-space data. Mapping the (older SOLiD generated) read to a known reference is the way to keep the corruption from occurring.

      Color-space can be a very powerful self-correcting format <em>as long as you remain in color space</em>. Do not convert from color-space into base-space until you are completely done with your analysis.

      BWA dropped support of color-space as of version 0.6 so you will have to use an older version.

      Bowtie2 never supported color-space although the former program Bowtie does.

      BTW: Not that this help you but the new SOLiD has a self-correcting method which re-aligns the read properly ever fifth base.

      Comment


      • #4
        Originally posted by westerman View Post
        Actually Brian is wrong about #2. It is quite trivial to convert from color-space to base-space without mapping.
        I wouldn't say "wrong"... though maybe misleading. It is impossible to correctly translate an imperfect colorspace read to base-space without mapping. And since SOLiD has an extremely high error rate - a minority, maybe around 20%, of reads were error-free in the SOLiD 4 data I've processed - in practice, it's not a useful thing to do. And indeed it is completely impossible to translate a read with a no-called base, before mapping.

        It's also not possible to unambiguously translate a colorspace read to base-space even AFTER mapping; the translation requires assumptions about the probability of errors versus SNPs versus indels and the results are highly subjective.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        59 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        57 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        55 views
        0 likes
        Last Post seqadmin  
        Working...
        X