Seqanswers Leaderboard Ad

**kmcarr** · 05-17-2013, 09:16 AM

What you are describing is exactly what the Cuffmerge program from the Cufflinks suite does. It takes the .gtf files from a set of cufflinks runs and merges them together into a single, non-redundant set of transfrags. You may optionally provides a GTF file contain a set of already known transcripts and the output mapping file will classify the transfrags as known, novel etc. See the class code documentation in the manual.

Now if I may offer a perspective as one who has done a lot of RNA-Seq analysis in Arabidopsis, the A. thaliana genome has been analyzed and annotated to death. The odds of finding a new gene are very, very slim an probably not worth spending time working on unless you have time to waste. I can guarantee you that you will find groups of reads mapping to intergenic space, but will also bet large amounts of money that these will not be new genes. They most likely arise from mis-mapping or spurious transcriptional events.

**colaneri** · 05-20-2013, 10:33 AM

no gene/ transcript discovery

Assuming that as you said, the Arabidopsis genome is very well annotated, do I need to run cufflinks?
It is better in any way to perform the analysis just combining tophat, the referece trancriptome and cuffdiff?
Do you suggest to run TopHat with "no gene/transcript discovery"?

Originally posted by kmcarr View Post

What you are describing is exactly what the Cuffmerge program from the Cufflinks suite does. It takes the .gtf files from a set of cufflinks runs and merges them together into a single, non-redundant set of transfrags. You may optionally provides a GTF file contain a set of already known transcripts and the output mapping file will classify the transfrags as known, novel etc. See the class code documentation in the manual.

Now if I may offer a perspective as one who has done a lot of RNA-Seq analysis in Arabidopsis, the A. thaliana genome has been analyzed and annotated to death. The odds of finding a new gene are very, very slim an probably not worth spending time working on unless you have time to waste. I can guarantee you that you will find groups of reads mapping to intergenic space, but will also bet large amounts of money that these will not be new genes. They most likely arise from mis-mapping or spurious transcriptional events.

**kmcarr** · 05-20-2013, 10:54 AM

Originally posted by colaneri View Post

Assuming that as you said, the Arabidopsis genome is very well annotated, do I need to run cufflinks?

No

It is better in any way to perform the analysis just combining tophat, the referece trancriptome and cuffdiff?

Yes. It's faster because you are skipping an unneccessary step, and the IDs used for cuffdiff analysis will be the normal TAIR AT IDs instead of cufflinks transfrag IDs (XLOCs) which you would then need to correlate to their TAIR IDs.

Do you suggest to run TopHat with "no gene/transcript discovery"?

That's what I normally do.

**kmcarr** · 05-20-2013, 02:34 PM

Originally posted by colaneri

I want to use tophat in galaxy with the parameter --no-novel-juncs genome

how can I implement the parameter?

Sorry, I don't use Galaxy so can't help you there.

**dnusol** · 05-23-2013, 07:41 AM

I think this was solved elsewhere

Tophat 1.4.0 RNA seq mapping - SEQanswers

http://seqanswers.com/forums/showthread.php?t=16965

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

HTH

**colaneri** · 05-23-2013, 01:51 PM

Cuffdiff do not performed promoter preference test

I have run cuffdiff 2 using this command

cuffdiff -p 4 -c 4 --no-update-check /proj/seq/data/TAIR10_Ensembl/Annotation/Archives/archive-2013-03-06-09-54-25/Genes/genes.gtf -o cuffdiff_results ./C_ctrl_rep1_Trim37.tophat_out/accepted_hits.bam ./C_ABA_rep1_Trim40.tophat_out/accepted_hits.bam

even though the work was successfully completed I received the below stated output. My question is why were not test performed for promoter preference or splicing? It is the default option?

Performed 27529 isoform-level transcription difference tests
Performed 25428 tss-level transcription difference tests
Performed 21976 gene-level transcription difference tests
Performed 24788 CDS-level transcription difference tests
Performed 0 splicing tests
Performed 0 promoter preference tests
Performing 0 relative CDS output tests
Writing isoform-level FPKM tracking
Writing TSS group-level FPKM tracking
Writing gene-level FPKM tracking
Writing CDS-level FPKM tracking
Writing isoform-level count tracking
Writing TSS group-level count tracking
Writing gene-level count tracking
Writing CDS-level count tracking
Writing isoform-level read group tracking
Writing TSS group-level read group tracking
Writing gene-level read group tracking
Writing CDS-level read group tracking
Writing read group info
Writing run info

**colaneri** · 05-29-2013, 10:24 AM

concatenating files before tophat?

Hi
I have a RNA-seq library that has been sequenced multiple times, then I have four fastq files.
Do I need to concatenate them before alignment in tophat?
Can i just list the four files at the end of the tophat command like that?

If my files are fastq1, fastq2, fastq3 and fastq4,

and I do:

tophat -p 4 --segment-length 20 --no-novel-juncs -G /proj/seq/data/TAIR10_Ensembl/Annotation/Archives/archive-2013-03-06-09-54-25/Genes/genes.gtf -o C_ctrl_rep1_THout_6 /proj/seq/data/TAIR10_Ensembl/Sequence/Bowtie2Index/genome fastq1 fastq2 fastq3 fastq4

**colaneri** · 05-29-2013, 10:36 AM

selecting the approapiate range to trim

Most of my RNA-seq sequenced libraries look like that in the fastqc report (please see the images below)
Do I need to trim the first 10 bases?
Only the first 10 ones?
It is going to improve the results?

[IMG] [/IMG]

**colaneri** · 06-10-2013, 01:07 PM

Defining replicates and different conditions in Cuffdiff2

Hi peoples
I'm trying to make cuffdiff 2 to compare RNA-seq data from
2 different genotypes in two different conditions and I did 3 biological replicates for each genotype in each condition.
So I have 12 different libraries, I aligned them separately with tophat.
My problem is in running cuffdiff from the command line, I can not get it to work in the way I would like, and I do not know what I'm doing wrong. PLEASE SOME HELP IN HERE GUYS!!!

I did run cuffdiff with this command
cuffdiff -p 8 -c 20 --no-update-check /proj/seq/data/TAIR10_Ensembl/Annotation/Archives/archive-2013-03-06-09-54-25/Genes/genes.gtf -o cuffdiff_ABA_whole_set_results_week \
./C_ctrl_rep1_abaexp.tophat_out/accepted_hits.bam, ./C_ctrl_rep2_abaexp.tophat_out/accepted_hits.bam, ./C_ctrl_rep3_abaexp.tophat_out/accepted_hits.bam \
./C_ABA_rep1_abaexp.tophat_out/accepted_hits.bam, ./C_ABA_rep2_abaexp.tophat_out/accepted_hits.bam, ./C_ABA_rep3_abaexp.tophat_out/accepted_hits.bam \
./B_ctrl_rep1_abaexp.tophat_out/accepted_hits.bam, ./B_ctrl_rep2_abaexp.tophat_out/accepted_hits.bam, ./B_ctrl_rep3_abaexp.tophat_out/accepted_hits.bam \
./B_ABA_rep1_abaexp.tophat_out/accepted_hits.bam, ./B_ABA_rep2_abaexp.tophat_out/accepted_hits.bam, ./B_ABA_rep3_abaexp.tophat_out/accepted_hits.bam

BUT THE RESULT IS THAT ALL THE FILES ARE COMPARED AGAINS THE OTHER, so all samples are considered different instead of 4 groups with triplicates

CAN SOME ONE TELL ME WHAT IS WRONG WITH MY COMMAND LINE?

**kmcarr** · 06-10-2013, 07:47 PM

Originally posted by colaneri View Post

Hi peoples
I'm trying to make cuffdiff 2 to compare RNA-seq data from
2 different genotypes in two different conditions and I did 3 biological replicates for each genotype in each condition.
So I have 12 different libraries, I aligned them separately with tophat.
My problem is in running cuffdiff from the command line, I can not get it to work in the way I would like, and I do not know what I'm doing wrong. PLEASE SOME HELP IN HERE GUYS!!!

I did run cuffdiff with this command

Code:

cuffdiff -p 8 -c 20 --no-update-check /proj/seq/data/TAIR10_Ensembl/Annotation/Archives/archive-2013-03-06-09-54-25/Genes/genes.gtf -o cuffdiff_ABA_whole_set_results_week \
./C_ctrl_rep1_abaexp.tophat_out/accepted_hits.bam, ./C_ctrl_rep2_abaexp.tophat_out/accepted_hits.bam, ./C_ctrl_rep3_abaexp.tophat_out/accepted_hits.bam \
./C_ABA_rep1_abaexp.tophat_out/accepted_hits.bam, ./C_ABA_rep2_abaexp.tophat_out/accepted_hits.bam, ./C_ABA_rep3_abaexp.tophat_out/accepted_hits.bam \
./B_ctrl_rep1_abaexp.tophat_out/accepted_hits.bam, ./B_ctrl_rep2_abaexp.tophat_out/accepted_hits.bam, ./B_ctrl_rep3_abaexp.tophat_out/accepted_hits.bam \
./B_ABA_rep1_abaexp.tophat_out/accepted_hits.bam, ./B_ABA_rep2_abaexp.tophat_out/accepted_hits.bam, ./B_ABA_rep3_abaexp.tophat_out/accepted_hits.bam

BUT THE RESULT IS THAT ALL THE FILES ARE COMPARED AGAINS THE OTHER, so all samples are considered different instead of 4 groups with triplicates

CAN SOME ONE TELL ME WHAT IS WRONG WITH MY COMMAND LINE?

You have spaces after the commas in your command line. The list of BAM files for your bio reps should be separated by commas WITHOUT SPACES, then spaces between the different condition groups. Since you put a space after every BAM file name cuffdiff interpreted them as twelve conditions.

BTW if you are posting blocks of command text or code please use the CODE tag formatting as I have done with your text above. It makes reading lines of code or output much easier and thus easier to spot the errors.

**colaneri** · 06-11-2013, 05:35 AM

naming samples in cuffdiff

Than you very much KMCARR!

But the way, when I use the -L option to name the different samples,
do I also have to separate them with commas without spaces?
In term of the names, do I need to use exactly the same name that the one is pointing to the bam file? Or it is just the order of names after the -L option what it matters?

**sdriscoll** · 06-11-2013, 12:11 PM

Originally posted by colaneri View Post

when I use the -L option to name the different samples, do I also have to separate them with commas without spaces?

Yes.

Originally posted by colaneri View Post

In term of the names, do I need to use exactly the same name that the one is pointing to the bam file? Or it is just the order of names after the -L option what it matters?

It's only the order that matters.

**colaneri** · 06-13-2013, 06:49 AM

I do not understand this tophat error

I have a fastq file that I used to align sequences with tophat v 1.3 (from a galaxy server) and I have not problem, but when I use the same fastq file to align the sequences with tophat 2 in command line I get this error.

Can you please explain to me why and what does means?

This is the output: (error is highlighted in red at the bottom)

[2013-06-13 00:43:09] Checking for Bowtie
Bowtie version: 2.1.0.0
[2013-06-13 00:43:09] Checking for Samtools
Samtools version: 0.1.19.0
[2013-06-13 00:43:09] Checking for Bowtie index files
[2013-06-13 00:43:09] Checking for reference FASTA file
[2013-06-13 00:43:09] Generating SAM header for /proj/seq/data/TAIR10_Ensembl/Sequence/Bowtie2Index/genome
format: fastq
quality scale: phred33 (default)
[2013-06-13 00:43:12] Reading known junctions from GTF file
[2013-06-13 00:43:16] Preparing reads
[FAILED]
Error running 'prep_reads'
Error: qual length (19) differs from seq length (41) for fastq record !

**sdriscoll** · 06-13-2013, 10:41 PM

What it means to me is that the qualities string and the read string for at least one of the reads in your fastq file are not the same length. It doesn't explain why it worked on galaxy though.

Topics	Statistics	Last Post
Bacterial Timeline Study Suggests Oxygen Use Preceded Photosynthesis by seqadmin Started by seqadmin, Today, 12:59 PM	0 responses 6 views 0 reactions	Last Post by seqadmin Today, 12:59 PM
New Software Simplifies 3D Gene Expression Mapping by seqadmin Started by seqadmin, Yesterday, 10:17 AM	0 responses 8 views 0 reactions	Last Post by seqadmin Yesterday, 10:17 AM
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 60 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM

Seqanswers Leaderboard Ad

Differential gene expression analysis

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News