SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Tophat2 can't find Bowtie2 rnavon Bioinformatics 5 02-27-2013 07:51 AM
Tophat2 very slow when running over Bowtie2 jdenvir Bioinformatics 2 02-18-2013 06:28 AM
bowtie2 vs. Tophat2 RNA-Seq tschauer Bioinformatics 2 12-18-2012 03:45 AM
tophat2/bowtie2 inconsistency in number of unmapped reads manianslab Bioinformatics 2 07-13-2012 01:56 PM
multiBamCov or htseq-count to count read per feature ? NicoBxl Bioinformatics 1 07-03-2012 03:05 AM

Reply
 
Thread Tools
Old 10-10-2013, 04:47 AM   #1
chickenmcfu
Junior Member
 
Location: Hamburg

Join Date: Sep 2013
Posts: 4
Default Tophat2 Bowtie2 Htseq-count for bacteria

Hey this is my first try to analyse a rna-seq project. Since the company we worked with is not able to give me a usefull annotated differential expression table...

I just want to know for sure, if its half-way right what I do.

My samples: 2 conditions, 2 replicats from each condition, 50bp single-end, not strand-specific. I got 23m-50m reads per library.

I want to know differential expressed genes between conditions.

I first wanted to use bowtie2 for alignment and that worked pretty well until I noticed that no NH tag for htseqcount is written.

So I switched to tophat and there it got complicated:

In default tophat2 finds lesser alignments than bowtie2. Why? As I understand tophat2 uses bowtie2 for aligment.

As I dont want to find novel junctions, as there is no splicing my bacterium, the final command I used after several attempts is:

tophat2 -G file.gtf --no-novel-juncs --no-coverage-search --library-typ fr-unstranded index file.fastq

With every attempt (first: --no-coverage-search; second: added -G; third: added --no-novel juncs) the count of aligned reads dropped a little bit. Why it droppend between this 3 modes?

With the last mode I got an alignment rate of 65-75%.

I finally got my count tables with
samtools view file.bam | htseq-count -t gene -s no - file.gtf > counts.txt

Now I will use deseq for differential expression.

Everything ok so far?
chickenmcfu is offline   Reply With Quote
Old 10-10-2013, 08:24 AM   #2
crazyhottommy
Senior Member
 
Location: Gainesville

Join Date: Apr 2012
Posts: 140
Default

I just went through the same process as what you did
see my post here http://crazyhottommy.blogspot.com/20...-sort-and.html

I am following the protocol from Simon Anders http://www.nature.com/nprot/journal/....2013.099.html
Count-based differential expression analysis of RNA sequencing data using R and Bioconductor






Quote:
Originally Posted by chickenmcfu View Post
Hey this is my first try to analyse a rna-seq project. Since the company we worked with is not able to give me a usefull annotated differential expression table...

I just want to know for sure, if its half-way right what I do.

My samples: 2 conditions, 2 replicats from each condition, 50bp single-end, not strand-specific. I got 23m-50m reads per library.

I want to know differential expressed genes between conditions.

I first wanted to use bowtie2 for alignment and that worked pretty well until I noticed that no NH tag for htseqcount is written.

So I switched to tophat and there it got complicated:

In default tophat2 finds lesser alignments than bowtie2. Why? As I understand tophat2 uses bowtie2 for aligment.

As I dont want to find novel junctions, as there is no splicing my bacterium, the final command I used after several attempts is:

tophat2 -G file.gtf --no-novel-juncs --no-coverage-search --library-typ fr-unstranded index file.fastq

With every attempt (first: --no-coverage-search; second: added -G; third: added --no-novel juncs) the count of aligned reads dropped a little bit. Why it droppend between this 3 modes?

With the last mode I got an alignment rate of 65-75%.

I finally got my count tables with
samtools view file.bam | htseq-count -t gene -s no - file.gtf > counts.txt

Now I will use deseq for differential expression.

Everything ok so far?
crazyhottommy is offline   Reply With Quote
Old 10-16-2013, 06:31 AM   #3
chickenmcfu
Junior Member
 
Location: Hamburg

Join Date: Sep 2013
Posts: 4
Default

Yes, the description of Deseq is easy to follow, even with no R experience.

Unfortunately I have run in other problems. Deseq analysis gives me 125 differentially regulated genes. The commercial sequencing service gives me a cuffdiff result of 200 genes (mixed up annotation though). So now I went through the process of cuffdiff and got 129 regulated genes.

As I now have conducted nearly every possible mapping (bowtie2, tophat2), whole genome or with options for tophat -G and -T and then for cuffdiff -M with the rtRNA.gtf, I could bring it up to 136.

The only thing in which my analysis differs from them is, that they align straight to only the cds and ncRNAs, completely disregarding the RNAs, but then give cuffdiff a full gtf. They afterwards did a second mapping with the fastqs to rtRNAs, so I know that up to 4.5% map to them.

Is this right? Can this difference within the alignment give such a huge difference in the results?
chickenmcfu is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:07 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO