Seqanswers Leaderboard Ad

**Haneko** · 03-18-2010, 05:17 PM

I'm getting the following using your code:

1206_912_423 16 chrX 148852770 255 10H10M101N30M * 0 0 CTCCCGTAGCCTTGATGGTCTGCTGCTTCCGTCTGTCACT ,GA%%:IIIIIIIIIIIIIIIIIIIIIIIIIIII
IIIIII CS:Z:T32112112213020231231221013210203231320221310310031 XJ:Z:K CQ:Z:<<::9@9=:?==;:=>>>=:>9>695;;773:885&%*80,/&7&())6( XL:Z:39,39 XU:Z:3,1 IH:i:2 HI
:i:2 MD:Z:40 XS:A:-
922_1240_1515 16 chrX 119563391 255 10H10M1029N30M * 0 0 TGATCATGATCATTTGTCTGCAATGGTTTTGCCAGCATCT "C?H?'';?&&A?"""IIIIIIIIIIIIIIIIII
IIIIII CS:Z:T32231321031000101301312213103133211123213222112001 XJ:Z:K CQ:Z::>>:>:?<>;==::<9=;>>9><:

&4,6&.2*',45+9()50)'&*&2 XL:Z:39 XU:Z:4 IH:i:1 HI:i:1 MD:Z:40 XS
:A:-
1297_662_654 0 chrX 153279920 255 10H10M102N26M4H * 0 0 CTTCGGTGTGCCACTGAAGATCCTGGTGTCGCCATG 1IIEIIIIC?III&&III&&4?I:;BDI=+.;I=3% CS
:Z:T20331231203202301111301111202132021011123301313032 XJ:Z:K CQ:Z:@@96564=5/919428;7>&:78=&:585&+*66%7,98&&)38&.%8,+ XL:Z:30,35 XU:Z:2,2 IH:i:2 HI:i:2 MD:Z:36 XS
:A:+
1289_854_1683 16 chrX 153666617 255 10H10M1046N30M * 0 0 TGCCACTCGCCATTCCTGCAGCTCAGGGGAAGGGATCAAT '<A;5<IB9;@IDH((IIHIIIIGGIIIIIIIII
IIIIII CS:Z:T33012320020200021223213122203103322110313223332222 XJ:Z:K CQ:Z:AA;A>;9>>?;?6:3:.:;4872:7(=,98)3'<7&0,6')1'5/1.)4/ XL:Z:39 XU:Z:1 IH:i:1 HI:i:1 MD:Z:40 XS
:A:-
1409_132_757 16 chrX 153666617 255 10H10M1046N30M * 0 0 TGCCACTCTACATTCCTGCAGCTCAGGGGAAGGGATCAAT "9:<?;G%###"IF##GFIIIAGIECIIIIIIII
IIIIII CS:Z:T33012320020200021223213121203213222110311113332022 XJ:Z:K CQ:Z:?><@;9<>?8:>5<31553/<7526#619&#/%71+5(3'$&%:4&&-44 XL:Z:39 XU:Z:4 IH:i:1 HI:i:1 MD:Z:8TA30
XS:A:-
1125_1188_1449 16 chrX 53458535 255 10H10M110N30M * 0 0 GAAGAACCTCCTACAATGACACGGGCAAAGGTACGGTCCT &-<I<?##E@)/<>"""/:?IIIIIIIIIIIIII
IIIIII CS:Z:T32021031310200130031112113113102231022021112101031 XJ:Z:K CQ:Z:;=:<=?A:?>@<=8=5:==<.2)'/*7(5/)8.#:&75(&*6#)9$8$#8 XL:Z:39 XU:Z:4 IH:i:1 HI:i:1 MD:Z:40 XS
:A:-
.
.
.

**Xi Wang** · 03-18-2010, 09:17 PM

It seems that this problem is caused by Cufflinks can't deal with the hard clip (H).
Besides, as your CIGAR strings are all right, the error message "CIGAR op has zero length" is not quite distinguishable.

**damiankao** · 03-19-2010, 02:01 AM

My sam files have H in the CIGAR string and it worked fine.

Is what you copied and pasted into your post exactly as it looks in the sam file? I noticed there are some newline breaks in the attribute column that doesn't look like a natural linebreak imposed by the forum.

**Haneko** · 03-22-2010, 07:10 PM

no, actually. it's because i directly copied it from the window of my ssh. So there aren't any line breaks there. sorry for the confusion.

**xguo** · 04-26-2010, 08:04 AM

Originally posted by damiankao View Post

I am using Bioscope mapping output .bam files as input into cufflinks. You have to first convert to .sam file, clean it up, and added the strand information by parsing the bitwise flag.

I was able to run this cleaned up version of .sam file through cufflinks with pretty good results. The only problem I am having is that most of the output is not showing any strand information.

I think cufflink is only using strand information for spliced reads and ignoring unspliced read strand? So all the genes assembled with spliced read has strand information, but others don't?

Bioscope output includes two separate records in SAM file if a read aligns to a splice junction. I suppose what cufflink expects for spliced reads is one SAM record per junction read with CIGAR string ##M###N##M. Thus, reads mapped to splice junction in Bioscope output are not treated as spliced reads by cufflink and strand information is not used at all. As cufflink author Cole stated, "You should do your best to feed Cufflinks spliced alignments that are stranded with the XS". It may be necessary to merge two junction read records in Bioscope output into one with CIGAR string "##M###N##M".

Any thought on this issue?

**Haneko** · 04-26-2010, 05:13 PM

Hi,

I don't think BioScope outputs 2 entries, it should only output 1 entry for each alignment (continuous or spliced), unless there are more than one alignment for that read. It shouldn't be necessary to merge any 2 lines.

Did you find any such cases in your data?

**xguo** · 04-26-2010, 05:35 PM

My mistake. Bioscope does output spliced alignment as one record. pre-Bioscope WT pipeline generates two records in gff file for reads mapped to splice junction.

thanks for the reply.

**clariet** · 04-27-2010, 10:32 AM

From the lines below, the score of these alignment are all the same: 255. But from my bioscope output, most of the alignment has less than 10 score. Should I filter out these alignments?

[
QUOTE=Haneko;15679]I'm getting the following using your code:

1206_912_423 16 chrX 148852770 255 10H10M101N30M * 0 0 CTCCCGTAGCCTTGATGGTCTGCTGCTTCCGTCTGTCACT ,GA%%:IIIIIIIIIIIIIIIIIIIIIIIIIIII
IIIIII CS:Z:T32112112213020231231221013210203231320221310310031 XJ:Z:K CQ:Z:<<::9@9=:?==;:=>>>=:>9>695;;773:885&%*80,/&7&())6( XL:Z:39,39 XU:Z:3,1 IH:i:2 HI
:i:2 MD:Z:40 XS:A:-
922_1240_1515 16 chrX 119563391 255 10H10M1029N30M * 0 0 TGATCATGATCATTTGTCTGCAATGGTTTTGCCAGCATCT "C?H?'';?&&A?"""IIIIIIIIIIIIIIIIII
IIIIII CS:Z:T32231321031000101301312213103133211123213222112001 XJ:Z:K CQ:Z::>>:>:?<>;==::<9=;>>9><:

&4,6&.2*',45+9()50)'&*&2 XL:Z:39 XU:Z:4 IH:i:1 HI:i:1 MD:Z:40 XS
:A:-
1297_662_654 0 chrX 153279920 255 10H10M102N26M4H * 0 0 CTTCGGTGTGCCACTGAAGATCCTGGTGTCGCCATG 1IIEIIIIC?III&&III&&4?I:;BDI=+.;I=3% CS
:Z:T20331231203202301111301111202132021011123301313032 XJ:Z:K CQ:Z:@@96564=5/919428;7>&:78=&:585&+*66%7,98&&)38&.%8,+ XL:Z:30,35 XU:Z:2,2 IH:i:2 HI:i:2 MD:Z:36 XS
:A:+
1289_854_1683 16 chrX 153666617 255 10H10M1046N30M * 0 0 TGCCACTCGCCATTCCTGCAGCTCAGGGGAAGGGATCAAT '<A;5<IB9;@IDH((IIHIIIIGGIIIIIIIII
IIIIII CS:Z:T33012320020200021223213122203103322110313223332222 XJ:Z:K CQ:Z:AA;A>;9>>?;?6:3:.:;4872:7(=,98)3'<7&0,6')1'5/1.)4/ XL:Z:39 XU:Z:1 IH:i:1 HI:i:1 MD:Z:40 XS
:A:-
1409_132_757 16 chrX 153666617 255 10H10M1046N30M * 0 0 TGCCACTCTACATTCCTGCAGCTCAGGGGAAGGGATCAAT "9:<?;G%###"IF##GFIIIAGIECIIIIIIII
IIIIII CS:Z:T33012320020200021223213121203213222110311113332022 XJ:Z:K CQ:Z:?><@;9<>?8:>5<31553/<7526#619&#/%71+5(3'$&%:4&&-44 XL:Z:39 XU:Z:4 IH:i:1 HI:i:1 MD:Z:8TA30
XS:A:-
1125_1188_1449 16 chrX 53458535 255 10H10M110N30M * 0 0 GAAGAACCTCCTACAATGACACGGGCAAAGGTACGGTCCT &-<I<?##E@)/<>"""/:?IIIIIIIIIIIIII
IIIIII CS:Z:T32021031310200130031112113113102231022021112101031 XJ:Z:K CQ:Z:;=:<=?A:?>@<=8=5:==<.2)'/*7(5/)8.#:&75(&*6#)9$8$#8 XL:Z:39 XU:Z:4 IH:i:1 HI:i:1 MD:Z:40 XS
:A:-
.
.
.[/QUOTE]

**Haneko** · 04-27-2010, 05:18 PM

Hi there,

That is actually not the score, but the mapping quality (I'm assuming you're referring to column 5 for 255 "score"). For calculation of score, you will have to take the alignment in colorspace (XL:Z) and the number of colorspace mismatches (XU:Z), then use SOLiD's formula.

A score of 10 shouldn't be appearing in the output. The seed of 25bp mapping with at most 2 mismatches will give u the lowest possible score for an alignment to be reported, which in my case is 18.

**Xi Wang** · 04-27-2010, 09:53 PM

Originally posted by Haneko View Post

Hi there,

A score of 10 shouldn't be appearing in the output. The seed of 25bp mapping with at most 2 mismatches will give u the lowest possible score for an alignment to be reported, which in my case is 18.

I just think, if a read hit multiple locations in the genome, the mapping quality should also be a small number, even with few or no mismatches. Is it right?

**Haneko** · 04-27-2010, 09:59 PM

I'm not sure about that. We've never really looked into mapping quality. The score of 10 i was referring to was the alignment score actually.

**Xi Wang** · 04-27-2010, 10:03 PM

Originally posted by Haneko View Post

I'm not sure about that. We've never really looked into mapping quality. The score of 10 i was referring to was the alignment score actually.

According to SAM manual, the column 5 is for mapping quality. I just refer to this column as mapping quality, and I think the two - "alignment score" and "mapping quality" - are the same.

**Haneko** · 04-27-2010, 10:33 PM

Hmm, I guess it depends on how you see it. I noted that the column gives a value of 255 for spliced reads, so it's not really helpful when it comes to spliced alignments. And we've always been dependent on the alignment score (using alignment length and mismatches) since WTAP1.2, so I tend to favor that.

**Xi Wang** · 04-27-2010, 10:43 PM

Originally posted by Haneko View Post

Hmm, I guess it depends on how you see it. I noted that the column gives a value of 255 for spliced reads, so it's not really helpful when it comes to spliced alignments. And we've always been dependent on the alignment score (using alignment length and mismatches) since WTAP1.2, so I tend to favor that.

Ok, some things were mixed up here. The alignment score you mentioned is not the 5th column of SAM files, right?
But I really want to point out that the 255 means the mapping quality is not available. (Refer to the SAM manual: http://samtools.sourceforge.net/SAM1.pdf)

**Haneko** · 04-27-2010, 11:02 PM

Sorry, I made it a bit confusing.

Yes, the alignment score I'm referring to is self-calculated. Not column 5 of mapping quality.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News