SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
First 6 genes missing from HTSeq when read into edgeR sindrle Bioinformatics 3 01-24-2014 04:26 AM
EdgeR heatmap specific genes claire5 Bioinformatics 1 10-25-2013 05:56 AM
How to rescue multi-reads when using htseq to generate edgeR/DESeq counts? Hilary April Smith Bioinformatics 3 05-06-2013 12:07 PM
DESeq: problem in viewing heatmap.2 output coutellec RNA Sequencing 0 02-03-2013 08:53 AM
help with a heatmap with deseq - legend? vebaev Bioinformatics 3 03-03-2012 10:39 AM

Reply
 
Thread Tools
Old 03-25-2014, 05:06 AM   #41
super0925
Senior Member
 
Location: UK

Join Date: Feb 2014
Posts: 206
Default

Quote:
Originally Posted by dpryan View Post
(1) Yeah, that's a good idea when you're getting started.

(2) Cuffdiff2 is just the most recent version of cuffdiff (unlike DESeq/DESeq2).
Thank you dpyran. You are really a expert.
On the preprocessing , do you have any ideas.
It is controversial that after we get the reads(single-ended/ pair-ended), some one suggested that we don't need pre-processing (like trimming, remove adaptors) but some one said it is essential before TOPHat, until now I am still confused. What is your opinion or more detail illustration? Thank you!

Last edited by super0925; 03-25-2014 at 05:13 AM.
super0925 is offline   Reply With Quote
Old 03-25-2014, 09:38 AM   #42
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

It's mostly a question of how aggressively to quality trim after removing adapter sequences. I fall into the the "trim gently" camp, so I trim off adapters and bases with a phred score of 5 or below. While more aggressive trimming can probably increase the mapping rate, it will generally drastically decrease the overall number of mapped reads (and generally not improve accuracy that much). This is at least the case for RNAseq, other experiment types may be different.

Last edited by dpryan; 03-25-2014 at 09:39 AM. Reason: Someday I'll actually start proof reading these things before hitting submit...
dpryan is offline   Reply With Quote
Old 03-25-2014, 09:54 AM   #43
super0925
Senior Member
 
Location: UK

Join Date: Feb 2014
Posts: 206
Default

Quote:
Originally Posted by dpryan View Post
It's mostly a question of how aggressively to quality trim after removing adapter sequences. I fall into the the "trim gently" camp, so I trim off adapters and bases with a phred score of 5 or below. While more aggressive trimming can probably increase the mapping rate, it will generally drastically decrease the overall number of mapped reads (and generally not improve accuracy that much). This is at least the case for RNAseq, other experiment types may be different.
Thank you. I have also read a blog talking about "gently trim" before but I don't remember the address of that blog.
I have done trim by cutadapt on Ion Proton data by removing the adapter "GGCCAAGGCG ", which followed the Ion Community's recommendation.
But I am not sure is that same procedure in Illumina data? and how to remove the phred score<5 , by which software? I am sorry to ask you so naive question.
super0925 is offline   Reply With Quote
Old 03-25-2014, 10:04 AM   #44
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

I think cutadapt can do quality trimming to, but if not then trim_galore can do both (it's a wrapper around cutadapt, in fact). There's also trimmomatic, which is relatively popular. I haven't used ion proton reads myself so I can't make any specific recommendations there.
dpryan is offline   Reply With Quote
Old 03-25-2014, 11:54 AM   #45
Zapages
Member
 
Location: NJ

Join Date: Oct 2012
Posts: 97
Default

I would recommend Trimmomatic as well.


Its very good from what I have used for. Also it can take account for paired reads unlike some other trimmers.

Other trimmers I have seen and used are Scythe/Sickle and FastX Toolkit.

Also trim the the first 10 to 14 bp depending on primers length in the start of the reads length.

Personally, I trim for adapters, primers and any over-expressed sequences based on FastQC or as best I can. I really take care of over expressed Ns.

All the best with your project.
Zapages is offline   Reply With Quote
Old 03-25-2014, 03:14 PM   #46
super0925
Senior Member
 
Location: UK

Join Date: Feb 2014
Posts: 206
Default

Quote:
Originally Posted by dpryan View Post
I think cutadapt can do quality trimming to, but if not then trim_galore can do both (it's a wrapper around cutadapt, in fact). There's also trimmomatic, which is relatively popular. I haven't used ion proton reads myself so I can't make any specific recommendations there.
Thank you. Sorry what I mean is that is same adatptor I need to remove "GGCCAAGGCG " like I did in Proton data?

Besides, I think cutadaptor could to Qualitytrimming, pls see help

Additional modifications to the reads:
-q CUTOFF, --quality-cutoff=CUTOFF
Trim low-quality ends from reads before adapter
removal. The algorithm is the same as the one used by
BWA (Subtract CUTOFF from all qualities; compute
partial sums from all indices to the end of the
sequence; cut sequence at the index at which the sum
is minimal) (default: 0)
--quality-base=QUALITY_BASE
Assume that quality values are encoded as
ascii(quality + QUALITY_BASE). The default (33) is
usually correct, except for reads produced by some
versions of the Illumina pipeline, where this should
be set to 64. (default: 33)
-x PREFIX, --prefix=PREFIX
Add this prefix to read names
-y SUFFIX, --suffix=SUFFIX
Add this suffix to read names
--strip-suffix=STRIP_SUFFIX
Remove this suffix from read names if present. Can be
given multiple times.
-c, --colorspace Colorspace mode: Also trim the color that is adjacent
to the found adapter.
-d, --double-encode
When in color space, double-encode colors (map
0,1,2,3,4 to A,C,G,T,N).
-t, --trim-primer When in color space, trim primer base and the first
color (which is the transition to the first
nucleotide)
--strip-f3 For color space: Strip the _F3 suffix of read names
--maq, --bwa MAQ- and BWA-compatible color space output. This
enables -c, -d, -t, --strip-f3, -y '/1' and -z.
--length-tag=TAG Search for TAG followed by a decimal number in the
name of the read (description/comment field of the
FASTA or FASTQ file). Replace the decimal number with
the correct length of the trimmed read. For example,
use --length-tag 'length=' to correct fields like
'length=123'.
-z, --zero-cap Change negative quality values to zero (workaround to
avoid segmentation faults in old BWA versions)


So is that -q set to 5?
super0925 is offline   Reply With Quote
Old 03-25-2014, 03:33 PM   #47
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Ah no, the adapter is different. The quality thresholding works the same though.
dpryan is offline   Reply With Quote
Old 03-26-2014, 03:46 AM   #48
super0925
Senior Member
 
Location: UK

Join Date: Feb 2014
Posts: 206
Default

Quote:
Originally Posted by dpryan View Post
Ah no, the adapter is different. The quality thresholding works the same though.
But which adapter do I need to remove?
Sorry I haven't got the raw data from Illumina.
How could I find it?
The adapter of proton's data 'GC....' is recommended from Ion Community.
super0925 is offline   Reply With Quote
Old 03-26-2014, 03:53 AM   #49
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Usually it's something like AGATCGGAAGAGC, which is the invariant part (illumina uses Y-shaped adapters). You can always ask whomever is doing the sequencing for you if this leads to problems. BTW, you should also run fastQC on things after trimming as that'll tell you if you missed something obvious or trimmed off the wrong thing.
dpryan is offline   Reply With Quote
Old 03-26-2014, 05:04 AM   #50
super0925
Senior Member
 
Location: UK

Join Date: Feb 2014
Posts: 206
Default

Quote:
Originally Posted by dpryan View Post
Usually it's something like AGATCGGAAGAGC, which is the invariant part (illumina uses Y-shaped adapters). You can always ask whomever is doing the sequencing for you if this leads to problems. BTW, you should also run fastQC on things after trimming as that'll tell you if you missed something obvious or trimmed off the wrong thing.
Thank you. I also have found it on trim_galore --help.
For you said (I trim off adapters and bases with a phred score of 5 or below)
Is this command line work fine?
trim_galore --length 100 --quality 20 --stringency 5 SampleSeq1.fastq
super0925 is offline   Reply With Quote
Old 03-26-2014, 05:20 AM   #51
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

You'll want something like:

Code:
trim_galore --length 20 --quality 5 -s 5 sample.fastq
dpryan is offline   Reply With Quote
Old 03-26-2014, 05:21 AM   #52
super0925
Senior Member
 
Location: UK

Join Date: Feb 2014
Posts: 206
Default

Quote:
Originally Posted by dpryan View Post
You'll want something like:

Code:
trim_galore --length 20 --quality 5 -s 5 sample.fastq
Thank you! That is so-called 'gentle trimmed'
super0925 is offline   Reply With Quote
Old 03-26-2014, 11:59 AM   #53
super0925
Senior Member
 
Location: UK

Join Date: Feb 2014
Posts: 206
Default

Quote:
Originally Posted by dpryan View Post
Usually it's something like AGATCGGAAGAGC, which is the invariant part (illumina uses Y-shaped adapters). You can always ask whomever is doing the sequencing for you if this leads to problems. BTW, you should also run fastQC on things after trimming as that'll tell you if you missed something obvious or trimmed off the wrong thing.

If I find my mapping rate is by Tophat is ~80-85%, is it high, normal, or low?
In the Ion-proton data, the Community recommended me to follow that procedure (the Tophat mapping rate is ~50% and if we use bowtie2 to mapping the unmapped reads and merge , the rate will be increased to ~90%)
Do I need to re-aligned the unmapped reads and merged them together in Illumina as well? Or just leave it and go to Cufflinks and DE analysis?
super0925 is offline   Reply With Quote
Old 03-26-2014, 01:01 PM   #54
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Depending on the species that's not unreasonable. You might also try STAR, though that requires more memory. Tophat2 can use bowtie2 already, so just don't give it the --no-mixed option and it will try to map unmapped paired-end reads as single-end for you.
dpryan is offline   Reply With Quote
Old 03-26-2014, 02:19 PM   #55
super0925
Senior Member
 
Location: UK

Join Date: Feb 2014
Posts: 206
Default

Quote:
Originally Posted by dpryan View Post
Depending on the species that's not unreasonable. You might also try STAR, though that requires more memory. Tophat2 can use bowtie2 already, so just don't give it the --no-mixed option and it will try to map unmapped paired-end reads as single-end for you.
Hi I am talking about Bovine cell and sing-ended reads,my mapping rate is by Tophat is ~80-85%.
Thank you!
MY question is the mapping the unmapped reads is meaningful or essential in mapping?

Last edited by super0925; 03-26-2014 at 02:32 PM.
super0925 is offline   Reply With Quote
Old 03-26-2014, 02:39 PM   #56
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Ah, then you're unlikely to gain much by remapping unmapped reads with bowtie2 (while it does allow more mismatches by default, if you're getting up to 85% alignment already then we're looking at seriously diminishing returns). You could try on one sample and see how much of a difference it makes.
dpryan is offline   Reply With Quote
Old 03-26-2014, 03:07 PM   #57
super0925
Senior Member
 
Location: UK

Join Date: Feb 2014
Posts: 206
Default

Quote:
Originally Posted by dpryan View Post
Ah, then you're unlikely to gain much by remapping unmapped reads with bowtie2 (while it does allow more mismatches by default, if you're getting up to 85% alignment already then we're looking at seriously diminishing returns). You could try on one sample and see how much of a difference it makes.
Ok probably I will leave the 85%.
i think it is high enough
super0925 is offline   Reply With Quote
Old 03-27-2014, 02:47 AM   #58
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Don't let the perfect be the enemy of the "good-enough, let's finish analysing the data"
dpryan is offline   Reply With Quote
Old 03-27-2014, 06:45 AM   #59
super0925
Senior Member
 
Location: UK

Join Date: Feb 2014
Posts: 206
Default

Quote:
Originally Posted by dpryan View Post
Don't let the perfect be the enemy of the "good-enough, let's finish analysing the data"
Have you used STAR before? Or is it better than Tophat? I read some blogs talking about the mapping rate of STAR is higher than Tophat. But I don't know is it (higher or not) is really key to downstream analysis.
super0925 is offline   Reply With Quote
Old 03-27-2014, 06:52 AM   #60
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Have a look at my answer over on biostars to a similar question. That should tell you most of what you want to know (particularly given the included links and replies from others).
dpryan is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:20 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO