SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
RNA-Seq: Genome Wide Full-Length Transcript Analysis Using 5' and 3' Paired-End-Tag N Newsbot! Literature Watch 1 01-20-2012 05:38 PM
RNA-Seq: Full-length transcriptome assembly from RNA-Seq data without a reference gen Newsbot! Literature Watch 7 10-26-2011 05:37 AM
Roche gsMapper output exon contigs rather than full-length sequence? sulicon Bioinformatics 0 02-28-2011 04:51 PM
normalizing RNA-seq data to "unique transcript length" instead of "transcript length" lmc Bioinformatics 2 06-23-2010 10:45 AM
Transcript length bias in RNA-seq data confounds systems biology. NGSfan Literature Watch 1 05-12-2009 03:35 PM

Reply
 
Thread Tools
Old 03-05-2012, 11:06 AM   #1
Annibal
Member
 
Location: Italy

Join Date: Mar 2012
Posts: 10
Default How to obtain full length RNA transcript sequence

Hi everyone,
i'm new to this kind of tasks so, please be patient!
I'm trying to create a blast DB using the RNAseq data from ENCODE.
I've downloaded both the FASTQ reads and the .bam/bai files.
I need the fasta sequences of all the full length transcripts: is it possible to extract/obtain them from the BAM file?
Alternatively should i try to do a de novo assembly using Trinity?
Thanx a lot.
Regards,

Davide
Annibal is offline   Reply With Quote
Old 03-09-2012, 12:01 AM   #2
Annibal
Member
 
Location: Italy

Join Date: Mar 2012
Posts: 10
Default

I thought this task would have been easy or at least possible since i have the reads aligned to the ref genome (homo sapiens)
Anyone can help?
Thanx
Annibal is offline   Reply With Quote
Old 03-09-2012, 01:49 AM   #3
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Which BAM files are you talking about? ENCODE has many.

Why do you want to make your BLAST data base from RNA-Seq reads rather than simply from, say, the cDNA FASTA file from Ensembl?
Simon Anders is offline   Reply With Quote
Old 03-09-2012, 02:55 AM   #4
Annibal
Member
 
Location: Italy

Join Date: Mar 2012
Posts: 10
Default

Quote:
Originally Posted by Simon Anders View Post
Which BAM files are you talking about? ENCODE has many.

Why do you want to make your BLAST data base from RNA-Seq reads rather than simply from, say, the cDNA FASTA file from Ensembl?
I'm talking about BAM file of the human total RNA extract from CSHL Long RNA seq.

I don't use Ensembl data because cDNA FASTA from Ensembl does not contain all the transcript (i guess) but only "known, novel and pseudogenes" as stated on their website

Moreover i will probably repeat this task using RNAseq data from cell in particular conditions

Thanx a lot.

Davide
Annibal is offline   Reply With Quote
Old 03-09-2012, 03:03 AM   #5
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

What you want to do is called reference-based (as opposed to: de-novo) transcript assembly. A tool commonly used for this purpose is cufflinks:

Roberts, Pimentel, Trapnell, and Pachter:
Identification of novel transcripts in annotated genomes using RNA-Seq
Bioinformatics (2011) 27 (17): 2325-2329.
doi:10.1093/bioinformatics/btr355

However, before doing this yourself, you may want to check whether the ENCODE people have not already done this analysis. It seems obvious that they would do this.

I still wonder what you would need a database of all transcripts for. Instead of blasting against it, you can always blast against the genome.
Simon Anders is offline   Reply With Quote
Old 03-09-2012, 03:14 AM   #6
Annibal
Member
 
Location: Italy

Join Date: Mar 2012
Posts: 10
Default

Quote:
Originally Posted by Simon Anders View Post
What you want to do is called reference-based (as opposed to: de-novo) transcript assembly. A tool commonly used for this purpose is cufflinks:

Roberts, Pimentel, Trapnell, and Pachter:
Identification of novel transcripts in annotated genomes using RNA-Seq
Bioinformatics (2011) 27 (17): 2325-2329.
doi:10.1093/bioinformatics/btr355

However, before doing this yourself, you may want to check whether the ENCODE people have not already done this analysis. It seems obvious that they would do this.

I still wonder what you would need a database of all transcripts for. Instead of blasting against it, you can always blast against the genome.
Thank you.
I've taken a look at cufflinks, just the manual, but i did not find th FASTA file of the transcript as an output file of some task. Cufflinks instead talk about gtf file as an output (that does not contain the FASTA sequence of the transcript). I'll take a better look to the program.
I've also read just yesterday this interesting article:
"Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks" nature protocol

If i blast in the genome i lose informations that are in the RNA sequence and not in genome (ex. sequences in the transposable element that are not integrated in the genome...)

Thanx again.
Annibal is offline   Reply With Quote
Old 03-09-2012, 03:20 AM   #7
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

You use the GTF file to produce the cDNA FASTA file from the reference FASTA file. This is a simple exercise in script programming.
Simon Anders is offline   Reply With Quote
Old 03-09-2012, 05:48 AM   #8
Annibal
Member
 
Location: Italy

Join Date: Mar 2012
Posts: 10
Default

I've taken a look at GTF specs.
Yes, it is.
Thanx
Annibal is offline   Reply With Quote
Old 03-14-2012, 12:51 AM   #9
oxydeepu
Member
 
Location: bangalore,india

Join Date: Jul 2011
Posts: 41
Default

Hi i saw the thread..
Can i get the logic for the program to create transcripts from genome file.
how it differ based on orientation. i mean reads which have positive and negative orientation.
Thank you.
Deepak
oxydeepu is offline   Reply With Quote
Old 03-14-2012, 06:51 AM   #10
swaraj
Member
 
Location: Naples, Italy

Join Date: Feb 2012
Posts: 50
Default

Refer to my earlier post to get fasta from Cufflinks GTF.
http://seqanswers.com/forums/showthread.php?t=18369
swaraj is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:51 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO