Seqanswers Leaderboard Ad

**cjp** · 10-24-2011, 08:56 AM

Originally posted by camelbbs View Post

1. I want to ask a question about bam files.

I have 2 sequencing library in a same sample, and get 2 fastq files, the length of reads are 50bp and 36bp separately.
When I do tophat, because I need to specify the -r, I cannot merge the two fastq files. But after I got the accepted.bam files, can I merge them (bam files) with the samtools merge?

I need to do cufflinks and cuffdiff using the merged bam files.

2. I see the parameter of cuffdiff is
cuffdiff transcripts.gtf 1.bam 2.bam

Does this transcritpts.gtf is the output of cufflinks or just the reference transcript annotation?

thanks everyone.

I guess the sequences are not paired end, so you can't align the FastQ files in the same TopHat command. In that case, you can always merge two BAM files with 'samtools merge' or 'picard MergeSamFiles':

Encountered a 404 error

http://picard.sourceforge.net/command-line-overview.shtml#MergeSamFiles

You can use either a reference GTF file or the output from cufflinks. If you want novel transcripts, then do cufflinks first, but if you only want expression from known genes, you can just do cuffdiff with a GTF file downloaded from ensembl, UCSC, etc.

Chris

**camelbbs** · 10-24-2011, 11:57 AM

Thanks very much. But the sequences are paried end. Because one sample have several libraries, and the sequencing length is different between the libraries. So we just first to get the bam files by tophat -r xxx -G hg19_ucsc.gtf ERR001_1.fastq ERR001_2.fastq

and then merge all the bam files that not belong to the sample library, but belong to the same sample. Is that right? Thanks

**camelbbs** · 10-24-2011, 12:05 PM

Originally posted by cjp View Post

I guess the sequences are not paired end, so you can't align the FastQ files in the same TopHat command. In that case, you can always merge two BAM files with 'samtools merge' or 'picard MergeSamFiles':

Encountered a 404 error

http://picard.sourceforge.net/command-line-overview.shtml#MergeSamFiles

You can use either a reference GTF file or the output from cufflinks. If you want novel transcripts, then do cufflinks first, but if you only want expression from known genes, you can just do cuffdiff with a GTF file downloaded from ensembl, UCSC, etc.

Chris

And If we use the output from cufflinks, there will be two gtf files when we work on two samples. So how to input these two files into the cuffdiff. thanks very much for your help

**cjp** · 10-25-2011, 01:33 AM

Originally posted by camelbbs View Post

Thanks very much. But the sequences are paried end. Because one sample have several libraries, and the sequencing length is different between the libraries. So we just first to get the bam files by tophat -r xxx -G hg19_ucsc.gtf ERR001_1.fastq ERR001_2.fastq

and then merge all the bam files that not belong to the sample library, but belong to the same sample. Is that right? Thanks

Yes, you can merge BAM files from multiple sequencing runs if they are the same sample even if they have a different read length.

**cjp** · 10-25-2011, 01:43 AM

Originally Posted by camelbbs

And If we use the output from cufflinks, there will be two gtf files when we work on two samples. So how to input these two files into the cuffdiff. thanks very much for your help

Cufflinks provides some software called gffread - from gffread -h, there are these options:

-M/--merge : cluster the input transcripts into loci, collapsing matching
transcripts (those with the same exact introns and fully contained)
--cluster-only: same as --merge but without collapsing matching transcripts
-K for -M option: also collapse shorter, fully contained transcripts
with fewer introns than the container
-Q for -M option, remove the containment restriction:
(multi-exon transcripts will be collapsed if just their introns match,
while single-exon transcripts can partially overlap (80%))

I've never used myself, so am not sure if it does what you want. You could also convert to bed format and then use BEDtools, which has something called intersectBed that will get one bed file from combining two input bed files. To get a final GTF file from this bed file, I found this link on seqAnswers:

transformatin from bed to gtf - SEQanswers

http://seqanswers.com/forums/showthread.php?t=13368

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

But converting between GTF and bed is not always so easy, as you can lose data.

Chris

**camelbbs** · 10-25-2011, 11:34 AM

Originally posted by cjp View Post

I guess the sequences are not paired end, so you can't align the FastQ files in the same TopHat command. In that case, you can always merge two BAM files with 'samtools merge' or 'picard MergeSamFiles':

Encountered a 404 error

http://picard.sourceforge.net/command-line-overview.shtml#MergeSamFiles

You can use either a reference GTF file or the output from cufflinks. If you want novel transcripts, then do cufflinks first, but if you only want expression from known genes, you can just do cuffdiff with a GTF file downloaded from ensembl, UCSC, etc.

Chris

Thanks a lot Chris,
Actually my purpose is to search and compare the alternative splicing events between two samples.

My workflow is like this:

First I got the two merged bam files from the two samples by tophat. Then I run

cuffdiff hg19_ucsc.gtf sample1.bam sample2.bam

And I got some results. But they don't contain the novel transcript assembled by cufflinks.

So I run cufflinks in order to get the novel transcript

cufflinks -g hg19_ucsc.gtf sample1.bam
cufflinks -g hg19_ucsc.gtf sample2.bam

I got two transcript.gtf files in the two samples.

Then I merged the two transcript.gtf files, transcript1.gtf and transcript2.gtf with the reference annotation

cuffmerge -o merged gtf_list (hg19_ucsc.gtf, transcript1.gtf, transcript2.gtf)

Then run cuffdiff:

cuffdiff merged.gtf sample1.bam sample2.bam

Is that the right workflow for comparing the novel alternative splicing transcripts and their expression between the two samples.

But I see there is a script called cuffcompare. If I run

cuffcompare hg19_ucsc.gtf transcript1.gtf transcript2.gtf

I can also get the different alternative splicing transcripts. So does that mean

cufflinks + cuffcompare == cuffdiff ?

Thanks a lot!!!

**cjp** · 10-25-2011, 02:10 PM

Sounds like you've got a better method than I suggested as have never used cuffcompare or cuffmerge before.

cuffdiff seems to be always the last program to run whether you want FPKM's (expression levels) for known or novel transcripts. It gives the data in nice spreadsheet (.csv) formats and does some useful stats tests as well.

Chris

**tiffany081126** · 10-27-2011, 07:24 PM

Originally posted by camelbbs View Post

Thanks a lot Chris,
Actually my purpose is to search and compare the alternative splicing events between two samples.

My workflow is like this:

First I got the two merged bam files from the two samples by tophat. Then I run

cuffdiff hg19_ucsc.gtf sample1.bam sample2.bam

And I got some results. But they don't contain the novel transcript assembled by cufflinks.

So I run cufflinks in order to get the novel transcript

cufflinks -g hg19_ucsc.gtf sample1.bam
cufflinks -g hg19_ucsc.gtf sample2.bam

I got two transcript.gtf files in the two samples.

Then I merged the two transcript.gtf files, transcript1.gtf and transcript2.gtf with the reference annotation

cuffmerge -o merged gtf_list (hg19_ucsc.gtf, transcript1.gtf, transcript2.gtf)

Then run cuffdiff:

cuffdiff merged.gtf sample1.bam sample2.bam

Is that the right workflow for comparing the novel alternative splicing transcripts and their expression between the two samples.

But I see there is a script called cuffcompare. If I run

cuffcompare hg19_ucsc.gtf transcript1.gtf transcript2.gtf

I can also get the different alternative splicing transcripts. So does that mean

cufflinks + cuffcompare == cuffdiff ?

Thanks a lot!!!

I have done the same a few days ago, and in my project, I only used the merged.gtf for cuffdiff, and it goes well(there are "u" in the class code ), while for my workmate, she found there were not any "u" in the class code from merged.gtf, so she then run cuffcompare with merged.gtf and known.gtf(the species was not human), and last she used the combined.gtf as well for cuffdiff.

So, I am still a littlte confused for the difference of the merged.gtf and the combined.gtf. Any help will be grateful.

**camelbbs** · 10-30-2011, 02:33 PM

hi, i just want to know what do you mean the combine.gtf

**camelbbs** · 11-02-2011, 09:06 AM

Originally posted by tiffany081126 View Post

I have done the same a few days ago, and in my project, I only used the merged.gtf for cuffdiff, and it goes well(there are "u" in the class code ), while for my workmate, she found there were not any "u" in the class code from merged.gtf, so she then run cuffcompare with merged.gtf and known.gtf(the species was not human), and last she used the combined.gtf as well for cuffdiff.

So, I am still a littlte confused for the difference of the merged.gtf and the combined.gtf. Any help will be grateful.

I want to ask what do you mean combined.gtf

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 55 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 52 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 45 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Two questions

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News