SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
RNA-seq analysis with stranded and unstranded data siddha RNA Sequencing 2 04-30-2014 03:58 AM
Cuffdiff stranded and unstranded data jake13 Bioinformatics 3 03-03-2014 06:32 AM
tool for checking the stranded vs unstranded library MichalO Illumina/Solexa 5 04-29-2013 08:49 AM
Bias adjustment in cufflinks: stranded and unstranded protocol question... adeonari Bioinformatics 0 03-21-2013 03:53 AM
Cufflinks: transcript assembly and abundance estimation for RNA-seq Cole Trapnell Bioinformatics 21 02-10-2011 11:46 AM

Reply
 
Thread Tools
Old 07-20-2015, 08:44 AM   #1
runxuan
Junior Member
 
Location: United Kingdom

Join Date: Sep 2011
Posts: 3
Default using STAR+Cufflinks for transcript assembly turns unstranded RNA-seq to stranded?

I am trying to use STAR+Cufflinks to do a reference based transcript assembly using unstranded RNA-seq data.

As mentioned in the STAR manual "If you have un-stranded RNA-seq data, and wish to run Cufflinks/Cuffdiff on STAR alignments, you will
need to run STAR with --outSAMstrandField intronMotif option, which will generate the XS strand attribute for all alignments that contain splice junctions"

Thus in the generated SAM file, strand will be derived from the intron motif. Unstranded RNA-seq data will be assigned a strand, which results in a lot of genes have both sense and antisense transcripts in the merged transcript assembly.

My questions are:

1) how reliable is the derived strand info from intron motif?
2) Is the assembled transcripts affected by this?

Thank you very much!

Runxuan
runxuan is offline   Reply With Quote
Old 07-21-2015, 07:17 AM   #2
amitm
Member
 
Location: Manchester, UK

Join Date: Feb 2011
Posts: 52
Default

hi,
Your un-stranded data doesn't get 'converted to stranded'. An un-stranded data would have reads from both strands as PCR amplification (during library prep.) amplifies both strands of the DNA.

The derived strand by STAR is based on alignment of any particular read and is not necessarily reflecting the strand due to the above reason.

Regarding whether assembly would be affected or not => Cufflinks wont run without the XS attribute in the SAM/BAM file.
amitm is offline   Reply With Quote
Old 07-21-2015, 07:55 AM   #3
runxuan
Junior Member
 
Location: United Kingdom

Join Date: Sep 2011
Posts: 3
Default

thanks a lot, amitm. if the strand attribute from STAR feeding into cufflink is not really the strand information, is it going to affect how cufflink uses the info to assemble the transcripts? How should i deal with the sense and antisense assembled transcripts to reduce false positives?
runxuan is offline   Reply With Quote
Old 07-21-2015, 08:44 AM   #4
amitm
Member
 
Location: Manchester, UK

Join Date: Feb 2011
Posts: 52
Default

hi,
If you are worried about a scenario where a gene locus has no/minimal sense transcription but very high antisense transcription and then Cufflinks not able to differentiate then you might need to do prepare a Stranded library before sequencing.

If not then at data analysis step there is very minimal you could do -
1) Do you know the sequence of these antisense? Do they maintain the exon intron boundary (introns spliced off), but just in complementary strand? Or do they read through introns? If they read through introns then you can set an arbitrary threshold (depending on your read length) saying -
If a read extends beyond the exon boundary into the intron sequence for at least 'n' bases, then it might be from an unspliced transcript/ antisense. Hence discard the read. then use the filtered reads only for transcript assembly.

Doing so genome-wide would be very tricky as there might be genuine transcripts with alternate exon start-ends.

I'm not aware of your organism, but if it is something that has been widely studied then there would be datasets available around & PCR validations to cross-check your results for.

Last edited by amitm; 07-21-2015 at 08:45 AM. Reason: Corrected typo
amitm is offline   Reply With Quote
Old 07-21-2015, 10:57 AM   #5
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default

Since it wasn't mentioned yet I'll add that cufflinks determines the strand of the assembled isoforms from the value of the XS attribute in the alignments (generated by STAR with --outSAMstrandField intronMotif set at runtime). The XS attribute is only populated with strand information for spliced reads. The 4-bp motif at the splice site informs STAR what the strand is if the motif is a known one. If it is an unknown motif then there is no strand information. 90+% of splices will have those known motifs in mammalian genomes. The only other way cufflinks can determine strand is if you provide a reference GTF for assembly in which case it will use the strand information from that for matching assembled isoforms from the data.
__________________
/* Shawn Driscoll, Gene Expression Laboratory, Pfaff
Salk Institute for Biological Studies, La Jolla, CA, USA */
sdriscoll is offline   Reply With Quote
Old 07-22-2015, 03:24 AM   #6
runxuan
Junior Member
 
Location: United Kingdom

Join Date: Sep 2011
Posts: 3
Default

Quote:
Originally Posted by sdriscoll View Post
The only other way cufflinks can determine strand is if you provide a reference GTF for assembly in which case it will use the strand information from that for matching assembled isoforms from the data.
but this is not necessarily correct strand information if i use an unstranded RNA-seq data, isn't it?
runxuan is offline   Reply With Quote
Reply

Tags
cufflink, star, transcript assembly

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:47 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO