Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by pliang View Post
    Hi BENM:

    Thank you for response with the new information. It happens that I need to convert the SOLiD color space sequence in fastq to Solexa format for its sequence and quality format. I believe the quality score is already in the AscII scheme (see the copied sequence entry in my first email), that is why I thought that that quality score line can be kept without change for my use. Am I right about this? In any case, I think tool for converting among different format of the data from different platform can be useful for us. Thanks again?
    Re pliang:

    Why would you want to convert color space to sequence space before alignment? Basically, why do you want SOLiD color space data in Illumina format? Bowtie does not work with color space (yet) and no amount of "input hacking" will get it to work right now.

    Comment


    • #17
      Editing SOLiD2Std.pl to include more colorspace->basespace

      Hi BENM

      I edited your SOLid2Std.pl script to include some extra colorspace mapping code that was not considered originally.

      I wanted to include the following basespace mapping:
      Basically any base(ATCG) that includes a '4' '5' or '.' is 'N'. 'N' to 'N' transition is also represented by diferent color space numbers (0,1,2,3,6,'.').

      Code:
      A4:N 
      A.: N
      A5:N
      C4:N
      C.: N
      C5:N
      G4:N
      G.: N
      G5:N
      T4: N
      T.:  N
      T5: N
      N5: A C T or G
      N.:  N
      N6: N
      N0: N
      N1: N
      N2: N
      N3: N
      I am very naive in perl but I tried to change your script to include this conversion.
      I edited the following part:

      Code:
      # SOLiD color code
      my @code = ([0,1,2,3,'.',4,5],[1,0,3,2,'.',4,5],[2,3,0,1,'.',4,5],[3,2,1,0,'.',4,5],[5,5,5,5,'.',6,0],[5,5,5,5,1,2,3],[5,5,5,5,1,2,3]);
      my @bases = qw(A C G T N N N);
      my %decode = ();
      foreach my $i(0..7)
      {
      	foreach my $j(0..7)
      	{
      		$decode{$code[$i]->[$j]} -> {$bases[$i]} = $bases[$j];
      	}
      }
      It works!
      However there is an error message when I run the script, although the error does not prevent it from working, which is good.
      perl gives me the following error message:

      Code:
      Use of uninitialized value in hash element at SOLid2Std.pl line 49.
      Use of uninitialized value within @bases in hash element at SOLid2Std.pl line 49
      .
      What am I doing wrong?
      line 49 is:
      Code:
      $decode{$code[$i]->[$j]} -> {$bases[$i]} = $bases[$j];
      Thank You
      Last edited by inesdesantiago; 10-05-2009, 06:57 AM. Reason: to mention line 49

      Comment


      • #18
        Originally posted by inesdesantiago View Post
        Hi BENM

        I edited your SOLid2Std.pl script to include some extra colorspace mapping code that was not considered originally.

        I wanted to include the following basespace mapping:
        Basically any base(ATCG) that includes a '4' '5' or '.' is 'N'. 'N' to 'N' transition is also represented by diferent color space numbers (0,1,2,3,6,'.').

        Code:
        A4:N 
        A.: N
        A5:N
        C4:N
        C.: N
        C5:N
        G4:N
        G.: N
        G5:N
        T4: N
        T.:  N
        T5: N
        N5: A C T or G
        N.:  N
        N6: N
        N0: N
        N1: N
        N2: N
        N3: N
        I am very naive in perl but I tried to change your script to include this conversion.
        I edited the following part:

        Code:
        # SOLiD color code
        my @code = ([0,1,2,3,'.',4,5],[1,0,3,2,'.',4,5],[2,3,0,1,'.',4,5],[3,2,1,0,'.',4,5],[5,5,5,5,'.',6,0],[5,5,5,5,1,2,3],[5,5,5,5,1,2,3]);
        my @bases = qw(A C G T N N N);
        my %decode = ();
        foreach my $i(0..7)
        {
        	foreach my $j(0..7)
        	{
        		$decode{$code[$i]->[$j]} -> {$bases[$i]} = $bases[$j];
        	}
        }
        It works!
        However there is an error message when I run the script, although the error does not prevent it from working, which is good.
        perl gives me the following error message:

        Code:
        Use of uninitialized value in hash element at SOLid2Std.pl line 49.
        Use of uninitialized value within @bases in hash element at SOLid2Std.pl line 49
        .
        What am I doing wrong?
        line 49 is:
        Code:
        $decode{$code[$i]->[$j]} -> {$bases[$i]} = $bases[$j];
        Thank You
        Hi inesdesantiago

        Thank you for your opinions. In color space if one color space can't be recognized by SOLiD™ System, it will cause the rear bases uncertain too. So, the reads will decode "N" instead of other base in conveting color space to nucleic acid base. For expample:

        Code:
        @example1
        G2203012023131303312303100
        +
        !611%%(-+%*.&*.,&2,,'%()31
        @example2
        G220301.023131303312303100
        +
        !611%%(-+%*.&*.,&2,,'%()31
        @example3
        G2203012023141303312303100
        +
        !611%%(-+%*.&*.,&2,,'%()31
        @example4
        G2203012023151303512303100
        +
        !611%%(-+%*.&*.,&2,,'%()31
        There is a dot in expample2 reads, "4" is present in example3 reads, "5" exists in exaple4 reads, so after it will be convert to "N" by your principle:
        Code:
        A4:N 
        A.: N
        A5:N
        C4:N
        C.: N
        C5:N
        G4:N
        G.: N
        G5:N
        T4: N
        T.:  N
        T5: N
        N5: A C T or G
        N.:  N
        N6: N
        N0: N
        N1: N
        N2: N
        N3: N
        as like that:
        Code:
        @example1
        AGGCCAGGATGCATTATGATTACCC
        +
        611%%(-+%*.&*.,&2,,'%()31
        @example2
        AGGCCANNNNNNNNNNNNNNNNNNN
        +
        611%%(-+%*.&*.,&2,,'%()31
        @example3
        AGGCCAGGATGNNNNNNNNNNNNNN
        +
        611%%(-+%*.&*.,&2,,'%()31
        @example4
        AGGCCAGGATGNNNNNGTCGGCAAA
        +
        611%%(-+%*.&*.,&2,,'%()31
        Then you don't need to change the "# SOLiD color code" part in the script. You just need to modify the line 169:
        Code:
        	$current_base = $decode{$colors[$i]}->{$last_base};
        change it to:
        Code:
        if (($last_base=~/N/i)&&($colors[$i]==5))
        {
        	$current_base = $bases[int(rand(@bases))];
        }
        else
        {
        	$current_base = (exists $decode{$colors[$i]}->{$last_base}) ? $decode{$colors[$i]}->{$last_base} : "N";
        }
        It is easier than your ways.

        BTW, because SOLiD reads are short, ultra short, most of pepople will abandon these reads which cotain ".456" in color space. I think it is acceptable for SOLiD™ System ultra high throughput, we don't need these uncertain or low quality reads.
        Attached Files
        Last edited by BENM; 10-06-2009, 09:34 PM.

        Comment


        • #19
          Hi BENN

          Thank you for your script. I tried your SOLiD2Std.pl with my following data like this:

          @exa1
          T1011122220100230032132.2111111002.1
          +
          !)+%.*%*+2'0%%%-%+%*5'%!%9+'%+<+0%!%
          @exa2
          T0101233211103200232333.2111211002.1
          +
          !,.+'+')'390%%%%%%%'%%%!-<++++<99%!%
          @exa3
          T0312202213101213131111.1110131102.1
          +
          !93<*/18+%:9%+075*%:;+6!3<26%/<%-%!%


          and the result is like this:

          @exa1
          GGTGTCTCTTGGGATTTAGTAGNNNNNNNNNNNNN
          +
          )+%.*%*+2'0%%%-%+%*5'%!%9+'%+<+0%!%
          @exa2
          TGGTCGCTGTGGCTTTCGATATNNNNNNNNNNNNN
          +
          ,.+'+')'390%%%%%%%'%%%!-<++++<99%!%
          @exa3
          TACTCCTCATGGTCATGCACACNNNNNNNNNNNNN
          +
          93<*/18+%:9%+075*%:;+6!3<26%/<%-%!%

          It seems that all letters will be converted into "N" from the first dot "."
          Is that all right?

          Thank you.

          Comment


          • #20
            Originally posted by lix View Post
            Hi BENN

            Thank you for your script. I tried your SOLiD2Std.pl with my following data like this:

            @exa1
            T1011122220100230032132.2111111002.1
            +
            !)+%.*%*+2'0%%%-%+%*5'%!%9+'%+<+0%!%
            @exa2
            T0101233211103200232333.2111211002.1
            +
            !,.+'+')'390%%%%%%%'%%%!-<++++<99%!%
            @exa3
            T0312202213101213131111.1110131102.1
            +
            !93<*/18+%:9%+075*%:;+6!3<26%/<%-%!%


            and the result is like this:

            @exa1
            GGTGTCTCTTGGGATTTAGTAGNNNNNNNNNNNNN
            +
            )+%.*%*+2'0%%%-%+%*5'%!%9+'%+<+0%!%
            @exa2
            TGGTCGCTGTGGCTTTCGATATNNNNNNNNNNNNN
            +
            ,.+'+')'390%%%%%%%'%%%!-<++++<99%!%
            @exa3
            TACTCCTCATGGTCATGCACACNNNNNNNNNNNNN
            +
            93<*/18+%:9%+075*%:;+6!3<26%/<%-%!%

            It seems that all letters will be converted into "N" from the first dot "."
            Is that all right?

            Thank you.
            Without aligning (i.e. knowing the decoded DNA seqence), a missing base will not allow for a deterministic decoding (there are actually four possible sequences after a missing base).

            Comment


            • #21
              I am looking at a lot of SOLiD that we received from collaborators. I don't see any fastq files, all the read and qual data are in separate files. I don't see anything in the SOLiD manuals that indicates that their tools make fastq files. Might I ask: did you make these fastq files yourselves by collating read and qual data? Is there a utility that does this?
              Thanks
              Mike

              Comment


              • #22
                Hi mmuratet,

                I collected the raw datasets from the SRA on NCBI website. All the raw reads are generated from the ABI Solid platform and are all in color space which are also the fastq-like format. I just downloaded them and never processed them by myself.
                You can have a try to search.

                Best,
                lix

                Comment


                • #23
                  Thanks for the reply. In the meantime, I found that the bfast suite has a tool solid2fastq. The ABI manual says that there quality scores are phred values.

                  Comment


                  • #24
                    I am trying to analyse a SRA file from SOLID through Galaxy. The file is recognised by Galaxy as a FASTQ file but is not taken up by groomer for further processing for converting it into sanger or other formats. However, the same pipeline is working fine for GA-II data. Can you help?

                    Comment


                    • #25
                      Converting solid to fastq in Galaxy

                      Galaxy has a new tool called solid2fastq that converts fragment and mate-pair runs into fastq files that can be mapped by bowtie. The tool takes care of the "orphaned" mates and makes sure that in the case of mate pair run the resulting fastq files have exactly the same number of reads. A video explaining how to use this for fragment runs is here:



                      and for mate pairs it is here:



                      These can also accessed from galaxy site (http://usegalaxy.org) as quickie 8 and 9.

                      Let us ([email protected]) know if you have issues.

                      Comment


                      • #26
                        RNA-Seq

                        I am trying to retrieve data for RNA-Seq experiments, preferably Human. I have tried the UCSC browser and EMBL, but I am not able to figure the link. Can anyone suggest a database for the same, or any other link???

                        Comment


                        • #27
                          Originally posted by BENM View Post
                          Hi, pliang

                          Because samt's question is "Convert SOLiD fastq to Illumina fastq", Illumina FASTQ is different from Standard(Sanger) FASTQ in quality format.

                          The syntax of Solexa/Illumina read format is almost identical to the FASTQ format, but the qualities are scaled differently. Given a character $sq, the following Perl code gives the Phred quality $Q:

                          $Q = 10 * log(1 + 10 ** (ord($sq) - 64) / 10.0)) / log(10);

                          The ASCII charactars in Solexa FASTQ means:
                          Code:
                          CHAR	DEC	QUALITY
                          A	65	1
                          B	66	2
                          C	67	3
                          D	68	4
                          E	69	5
                          F	70	6
                          G	71	7
                          H	72	8
                          I	73	9
                          J	74	10
                          K	75	11
                          L	76	12
                          M	77	13
                          N	78	14
                          O	79	15
                          P	80	16
                          Q	81	17
                          R	82	18
                          S	83	19
                          T	84	20
                          U	85	21
                          V	86	22
                          W	87	23
                          X	88	24
                          Y	89	25
                          Z	90	26
                          [	91	27
                          \	92	28
                          ]	93	29
                          ^	94	30
                          _	95	31
                          `	96	32
                          a	97	33
                          b	98	34
                          c	99	35
                          d	100	36
                          e	101	37
                          f	102	38
                          g	103	39
                          h	104	40
                          ;	59	-5
                          <	60	-4
                          =	61	-3
                          >	62	-2
                          ?	63	-1
                          @	64	0
                          In contrast to Solexa FASTQ quality, the ASCII characters in standard (sanger) FASTQ, it used to denote:
                          Code:
                          CHAR	DEC	QUALITY
                          !       0       -64
                          !       1       -63
                          !       2       -62
                          !       3       -61
                          !       4       -60
                          !       5       -59
                          !       6       -58
                          !       7       -57
                          !       8       -56
                          !       9       -55
                          !       10      -54
                          !       11      -53
                          !       12      -52
                          !       13      -51
                          !       14      -50
                          !       15      -49
                          !       16      -48
                          !       17      -47
                          !       18      -46
                          !       19      -45
                          !       20      -44
                          !       21      -43
                          !       22      -42
                          !       23      -41
                          !       24      -40
                          !       25      -39
                          !       26      -38
                          !       27      -37
                          !       28      -36
                          !       29      -35
                          !       30      -34
                          !       31      -33
                          !       32      -32
                          !       33      -31
                          !       34      -30
                          !       35      -29
                          !       36      -28
                          !       37      -27
                          !       38      -26
                          !       39      -25
                          !       40      -24
                          !       41      -23
                          !       42      -22
                          !       43      -21
                          !       44      -20
                          !       45      -19
                          !       46      -18
                          !       47      -17
                          !       48      -16
                          !       49      -15
                          !       50      -14
                          !       51      -13
                          !       52      -12
                          !       53      -11
                          !       54      -10
                          "       55      -9
                          "       56      -8
                          "       57      -7
                          "       58      -6
                          "       59      -5
                          "       60      -4
                          #       61      -3
                          #       62      -2
                          $       63      -1
                          $       64      0
                          %       65      1
                          %       66      2
                          &       67      3
                          &       68      4
                          '       69      5
                          (       70      6
                          )       71      7
                          *       72      8
                          +       73      9
                          +       74      10
                          ,       75      11
                          -       76      12
                          .       77      13
                          /       78      14
                          0       79      15
                          1       80      16
                          2       81      17
                          3       82      18
                          4       83      19
                          5       84      20
                          6       85      21
                          7       86      22
                          8       87      23
                          9       88      24
                          :       89      25
                          ;       90      26
                          <       91      27
                          =       92      28
                          >       93      29
                          ?       94      30
                          @       95      31
                          A       96      32
                          B       97      33
                          C       98      34
                          D       99      35
                          E       100     36
                          F       101     37
                          G       102     38
                          H       103     39
                          I       104     40
                          J       105     41
                          K       106     42
                          L       107     43
                          M       108     44
                          N       109     45
                          O       110     46
                          P       111     47
                          Q       112     48
                          R       113     49
                          S       114     50
                          T       115     51
                          U       116     52
                          V       117     53
                          W       118     54
                          X       119     55
                          Y       120     56
                          Z       121     57
                          [       122     58
                          \       123     59
                          ]       124     60
                          ^       125     61
                          _       126     62
                          `       127     63
                          a       128     64
                          So it is easy to conver Solexa->Sanger quality, you just need to build a conversion table in PERL script, just like this:
                          # Solexa->Sanger quality conversion table
                          my @conv_table;
                          for (-64..64) {
                          $conv_table[$_+64] = chr(int(33 + 10*log(1+10**($_/10.0))/log(10)+.499));
                          }

                          I am trying to write a universal script for Solexa/Illumina, SOLiD/ABi, 454/Roche, 3730/Sanger,...transforming to each other format for different purpose, but I need to know your requirements, after that, I will share it to you all.

                          Hope I answer your question.
                          BTW I attach the SOLiD2std.pl for your question, just make a little change in SOLiD2Solexa.pl
                          Hi Pliang, I am using your script SOLiD2std.pl at the begining the file looks fine but then some reads look weird, without quality data. Do you know how can I solve that?

                          @373_15_180_F3
                          CTCATAGCCCTCCGGCAGAATGAACGGACATGTACGACCATAACATAACA
                          +
                          ?B=@BBB@>?2A=?BA8;;>52>72%>?=>/=:;?<=@9><?B1<@%?85
                          @373_15_216_F3
                          TCGAGCGGCCCCCATCTCCTAATAGTTATACGCCGCACATAACATTATCA
                          +
                          (
                          @373_15_605_F3
                          ACGATCTTGCCGGCACCGCGCCGTATTAGCGCGTATATATAGCGCGCGCG
                          +

                          @373_15_663_F3
                          TTCCTCATGGCCCGGGCGTTGTCCCATGCCGCACAATCGAGACGTCACTC
                          +
                          BBAB@BBA;9=BB-B@BA7B<?+6B:@29'BB<%;7B6C<)7?&-?%6+:

                          thanks

                          Comment


                          • #28
                            Originally posted by pepperoni View Post
                            Hi Pliang, I am using your script SOLiD2std.pl at the begining the file looks fine but then some reads look weird, without quality data. Do you know how can I solve that?



                            thanks
                            Is there a way to keep my quality data as it is and only use your script to do the number to base translation?
                            thanks

                            Comment


                            • #29
                              Originally posted by nekrut View Post
                              Galaxy has a new tool called solid2fastq that converts fragment and mate-pair runs into fastq files that can be mapped by bowtie. The tool takes care of the "orphaned" mates and makes sure that in the case of mate pair run the resulting fastq files have exactly the same number of reads. A video explaining how to use this for fragment runs is here:



                              and for mate pairs it is here:



                              These can also accessed from galaxy site (http://usegalaxy.org) as quickie 8 and 9.

                              Let us ([email protected]) know if you have issues.
                              I have had issues, it does not give me the correct conversion, it gives me a frameshift!

                              Comment


                              • #30
                                Just in case you weren't aware, bowtie has changed a bit over the years. It is now able to quite easily handle SOLiD data as colour-space FASTA files and quality files (use options '-C -f' and '-Q' or '--Q1/--Q2' depending on whether it's paired end or not). Note that the colour-space switch changes the default read orientation to '--ff', so you may need to add in a '--fr' option for paired-end matching (I needed to do this for SOLiD4 data).

                                Bowtie2 (which can handle gaps) will handle colour-space input, but it will (in the beta3 version) only record as a match if the base-space conversion is perfect (no SNPs, no sequencer read errors). I assume this will only get better in the future.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                11 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                51 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                67 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X