Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    I'm getting the following using your code:

    1206_912_423 16 chrX 148852770 255 10H10M101N30M * 0 0 CTCCCGTAGCCTTGATGGTCTGCTGCTTCCGTCTGTCACT ,GA%%:IIIIIIIIIIIIIIIIIIIIIIIIIIII
    IIIIII CS:Z:T32112112213020231231221013210203231320221310310031 XJ:Z:K CQ:Z:<<::9@9=:?==;:=>>>=:>9>695;;773:885&%*80,/&7&())6( XL:Z:39,39 XU:Z:3,1 IH:i:2 HI
    :i:2 MD:Z:40 XS:A:-
    922_1240_1515 16 chrX 119563391 255 10H10M1029N30M * 0 0 TGATCATGATCATTTGTCTGCAATGGTTTTGCCAGCATCT "C?H?'';?&&A?"""IIIIIIIIIIIIIIIIII
    IIIIII CS:Z:T32231321031000101301312213103133211123213222112001 XJ:Z:K CQ:Z::>>:>:?<>;==::<9=;>>9><:&4,6&.2*',45+9()50)'&*&2 XL:Z:39 XU:Z:4 IH:i:1 HI:i:1 MD:Z:40 XS
    :A:-
    1297_662_654 0 chrX 153279920 255 10H10M102N26M4H * 0 0 CTTCGGTGTGCCACTGAAGATCCTGGTGTCGCCATG 1IIEIIIIC?III&&III&&4?I:;BDI=+.;I=3% CS
    :Z:T20331231203202301111301111202132021011123301313032 XJ:Z:K CQ:Z:@@96564=5/919428;7>&:78=&:585&+*66%7,98&&)38&.%8,+ XL:Z:30,35 XU:Z:2,2 IH:i:2 HI:i:2 MD:Z:36 XS
    :A:+
    1289_854_1683 16 chrX 153666617 255 10H10M1046N30M * 0 0 TGCCACTCGCCATTCCTGCAGCTCAGGGGAAGGGATCAAT '<A;5<IB9;@IDH((IIHIIIIGGIIIIIIIII
    IIIIII CS:Z:T33012320020200021223213122203103322110313223332222 XJ:Z:K CQ:Z:AA;A>;9>>?;?6:3:.:;4872:7(=,98)3'<7&0,6')1'5/1.)4/ XL:Z:39 XU:Z:1 IH:i:1 HI:i:1 MD:Z:40 XS
    :A:-
    1409_132_757 16 chrX 153666617 255 10H10M1046N30M * 0 0 TGCCACTCTACATTCCTGCAGCTCAGGGGAAGGGATCAAT "9:<?;G%###"IF##GFIIIAGIECIIIIIIII
    IIIIII CS:Z:T33012320020200021223213121203213222110311113332022 XJ:Z:K CQ:Z:?><@;9<>?8:>5<31553/<7526#619&#/%71+5(3'$&%:4&&-44 XL:Z:39 XU:Z:4 IH:i:1 HI:i:1 MD:Z:8TA30
    XS:A:-
    1125_1188_1449 16 chrX 53458535 255 10H10M110N30M * 0 0 GAAGAACCTCCTACAATGACACGGGCAAAGGTACGGTCCT &-<I<?##E@)/<>"""/:?IIIIIIIIIIIIII
    IIIIII CS:Z:T32021031310200130031112113113102231022021112101031 XJ:Z:K CQ:Z:;=:<=?A:?>@<=8=5:==<.2)'/*7(5/)8.#:&75(&*6#)9$8$#8 XL:Z:39 XU:Z:4 IH:i:1 HI:i:1 MD:Z:40 XS
    :A:-
    .
    .
    .

    Comment


    • #17
      It seems that this problem is caused by Cufflinks can't deal with the hard clip (H).
      Besides, as your CIGAR strings are all right, the error message "CIGAR op has zero length" is not quite distinguishable.
      Last edited by Xi Wang; 03-18-2010, 09:20 PM.
      Xi Wang

      Comment


      • #18
        My sam files have H in the CIGAR string and it worked fine.

        Is what you copied and pasted into your post exactly as it looks in the sam file? I noticed there are some newline breaks in the attribute column that doesn't look like a natural linebreak imposed by the forum.

        Comment


        • #19
          no, actually. it's because i directly copied it from the window of my ssh. So there aren't any line breaks there. sorry for the confusion.

          Comment


          • #20
            Originally posted by damiankao View Post
            I am using Bioscope mapping output .bam files as input into cufflinks. You have to first convert to .sam file, clean it up, and added the strand information by parsing the bitwise flag.

            I was able to run this cleaned up version of .sam file through cufflinks with pretty good results. The only problem I am having is that most of the output is not showing any strand information.

            I think cufflink is only using strand information for spliced reads and ignoring unspliced read strand? So all the genes assembled with spliced read has strand information, but others don't?
            Bioscope output includes two separate records in SAM file if a read aligns to a splice junction. I suppose what cufflink expects for spliced reads is one SAM record per junction read with CIGAR string ##M###N##M. Thus, reads mapped to splice junction in Bioscope output are not treated as spliced reads by cufflink and strand information is not used at all. As cufflink author Cole stated, "You should do your best to feed Cufflinks spliced alignments that are stranded with the XS". It may be necessary to merge two junction read records in Bioscope output into one with CIGAR string "##M###N##M".

            Any thought on this issue?

            Comment


            • #21
              Hi,

              I don't think BioScope outputs 2 entries, it should only output 1 entry for each alignment (continuous or spliced), unless there are more than one alignment for that read. It shouldn't be necessary to merge any 2 lines.

              Did you find any such cases in your data?

              Comment


              • #22
                My mistake. Bioscope does output spliced alignment as one record. pre-Bioscope WT pipeline generates two records in gff file for reads mapped to splice junction.

                thanks for the reply.

                Comment


                • #23
                  From the lines below, the score of these alignment are all the same: 255. But from my bioscope output, most of the alignment has less than 10 score. Should I filter out these alignments?


                  [
                  QUOTE=Haneko;15679]I'm getting the following using your code:

                  1206_912_423 16 chrX 148852770 255 10H10M101N30M * 0 0 CTCCCGTAGCCTTGATGGTCTGCTGCTTCCGTCTGTCACT ,GA%%:IIIIIIIIIIIIIIIIIIIIIIIIIIII
                  IIIIII CS:Z:T32112112213020231231221013210203231320221310310031 XJ:Z:K CQ:Z:<<::9@9=:?==;:=>>>=:>9>695;;773:885&%*80,/&7&())6( XL:Z:39,39 XU:Z:3,1 IH:i:2 HI
                  :i:2 MD:Z:40 XS:A:-
                  922_1240_1515 16 chrX 119563391 255 10H10M1029N30M * 0 0 TGATCATGATCATTTGTCTGCAATGGTTTTGCCAGCATCT "C?H?'';?&&A?"""IIIIIIIIIIIIIIIIII
                  IIIIII CS:Z:T32231321031000101301312213103133211123213222112001 XJ:Z:K CQ:Z::>>:>:?<>;==::<9=;>>9><:&4,6&.2*',45+9()50)'&*&2 XL:Z:39 XU:Z:4 IH:i:1 HI:i:1 MD:Z:40 XS
                  :A:-
                  1297_662_654 0 chrX 153279920 255 10H10M102N26M4H * 0 0 CTTCGGTGTGCCACTGAAGATCCTGGTGTCGCCATG 1IIEIIIIC?III&&III&&4?I:;BDI=+.;I=3% CS
                  :Z:T20331231203202301111301111202132021011123301313032 XJ:Z:K CQ:Z:@@96564=5/919428;7>&:78=&:585&+*66%7,98&&)38&.%8,+ XL:Z:30,35 XU:Z:2,2 IH:i:2 HI:i:2 MD:Z:36 XS
                  :A:+
                  1289_854_1683 16 chrX 153666617 255 10H10M1046N30M * 0 0 TGCCACTCGCCATTCCTGCAGCTCAGGGGAAGGGATCAAT '<A;5<IB9;@IDH((IIHIIIIGGIIIIIIIII
                  IIIIII CS:Z:T33012320020200021223213122203103322110313223332222 XJ:Z:K CQ:Z:AA;A>;9>>?;?6:3:.:;4872:7(=,98)3'<7&0,6')1'5/1.)4/ XL:Z:39 XU:Z:1 IH:i:1 HI:i:1 MD:Z:40 XS
                  :A:-
                  1409_132_757 16 chrX 153666617 255 10H10M1046N30M * 0 0 TGCCACTCTACATTCCTGCAGCTCAGGGGAAGGGATCAAT "9:<?;G%###"IF##GFIIIAGIECIIIIIIII
                  IIIIII CS:Z:T33012320020200021223213121203213222110311113332022 XJ:Z:K CQ:Z:?><@;9<>?8:>5<31553/<7526#619&#/%71+5(3'$&%:4&&-44 XL:Z:39 XU:Z:4 IH:i:1 HI:i:1 MD:Z:8TA30
                  XS:A:-
                  1125_1188_1449 16 chrX 53458535 255 10H10M110N30M * 0 0 GAAGAACCTCCTACAATGACACGGGCAAAGGTACGGTCCT &-<I<?##E@)/<>"""/:?IIIIIIIIIIIIII
                  IIIIII CS:Z:T32021031310200130031112113113102231022021112101031 XJ:Z:K CQ:Z:;=:<=?A:?>@<=8=5:==<.2)'/*7(5/)8.#:&75(&*6#)9$8$#8 XL:Z:39 XU:Z:4 IH:i:1 HI:i:1 MD:Z:40 XS
                  :A:-
                  .
                  .
                  .[/QUOTE]

                  Comment


                  • #24
                    Hi there,

                    That is actually not the score, but the mapping quality (I'm assuming you're referring to column 5 for 255 "score"). For calculation of score, you will have to take the alignment in colorspace (XL:Z) and the number of colorspace mismatches (XU:Z), then use SOLiD's formula.

                    A score of 10 shouldn't be appearing in the output. The seed of 25bp mapping with at most 2 mismatches will give u the lowest possible score for an alignment to be reported, which in my case is 18.

                    Comment


                    • #25
                      Originally posted by Haneko View Post
                      Hi there,

                      A score of 10 shouldn't be appearing in the output. The seed of 25bp mapping with at most 2 mismatches will give u the lowest possible score for an alignment to be reported, which in my case is 18.
                      I just think, if a read hit multiple locations in the genome, the mapping quality should also be a small number, even with few or no mismatches. Is it right?
                      Xi Wang

                      Comment


                      • #26
                        I'm not sure about that. We've never really looked into mapping quality. The score of 10 i was referring to was the alignment score actually.

                        Comment


                        • #27
                          Originally posted by Haneko View Post
                          I'm not sure about that. We've never really looked into mapping quality. The score of 10 i was referring to was the alignment score actually.
                          According to SAM manual, the column 5 is for mapping quality. I just refer to this column as mapping quality, and I think the two - "alignment score" and "mapping quality" - are the same.
                          Xi Wang

                          Comment


                          • #28
                            Hmm, I guess it depends on how you see it. I noted that the column gives a value of 255 for spliced reads, so it's not really helpful when it comes to spliced alignments. And we've always been dependent on the alignment score (using alignment length and mismatches) since WTAP1.2, so I tend to favor that.

                            Comment


                            • #29
                              Originally posted by Haneko View Post
                              Hmm, I guess it depends on how you see it. I noted that the column gives a value of 255 for spliced reads, so it's not really helpful when it comes to spliced alignments. And we've always been dependent on the alignment score (using alignment length and mismatches) since WTAP1.2, so I tend to favor that.
                              Ok, some things were mixed up here. The alignment score you mentioned is not the 5th column of SAM files, right?
                              But I really want to point out that the 255 means the mapping quality is not available. (Refer to the SAM manual: http://samtools.sourceforge.net/SAM1.pdf)
                              Xi Wang

                              Comment


                              • #30
                                Sorry, I made it a bit confusing.

                                Yes, the alignment score I'm referring to is self-calculated. Not column 5 of mapping quality.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 11:49 AM
                                0 responses
                                15 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-24-2024, 08:47 AM
                                0 responses
                                16 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                61 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                60 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X