SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
RNA-seq read coverage questions pasta RNA Sequencing 15 05-11-2012 07:02 AM
questions of illumina pe reads fastqc results arrchi Bioinformatics 1 12-01-2011 04:07 PM
A couple questions on RNA-seq carpenoctem RNA Sequencing 0 07-28-2011 12:55 PM
RNA-Seq: From RNA-seq reads to differential expression results. Newsbot! Literature Watch 0 12-24-2010 03:13 AM
questions about RNA-Seq found RNA Sequencing 3 07-11-2010 09:32 PM

Reply
 
Thread Tools
Old 01-27-2013, 02:54 AM   #1
sazz
Member
 
Location: Istanbul, Turkey

Join Date: Oct 2012
Posts: 28
Default Several questions about our RNA-seq results

Hello,

I have several questions;

1-) We did a SE-50bp sequencing at Illumina platform. I am trying to analyze them in Galaxy Server. After uploading the fastq files, I saw dots in some of the reads and they are in a pattern with other reads containing dots (I mean some set of reads have dots in 33th and 34th position; another set at somewhere else but in same locations. - and it seems like reads containing dots constitutes up %10 of the all reads)

After grooming, those did'nt change, but I still did the TopHat; I am not sure if I need to change the dots with "N"s to be able to use that reads (do I?). If I need, how should I do that? (I am not using linux, I'll be happy if you can give a solution by using Galaxy)

2-) One of the fastqs give overrepresented sequence which is something like that:
Sequence Count Percentage Possible Source
GATCGGAAGAGCACACGTCTGAACTCCAGTCACAGTTCCGTATCTCGTAT 34465 0.11089473793609078 TruSeq Adapter, Index 8 (97% over 37bp)

Should I need to remove those reads? Because they won't map at the end and shouldn't be a problem.

3-) I guess per base quality graph like below is good and I don't need any trimming or quality cut off?

https://main.g2.bx.psu.edu/datasets/...se_quality.png

Last edited by sazz; 01-27-2013 at 02:57 AM.
sazz is offline   Reply With Quote
Old 01-27-2013, 05:17 AM   #2
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

1) I haven't encountered dots in Illumina fastq files (only in SOLiD output) but I suppose you may need to convert them to N, yes.

2) You are right, you probably don't need to remove them. (You would need to if you were doing de novo assembly).

3) It is great.
kopi-o is offline   Reply With Quote
Old 01-29-2013, 02:31 AM   #3
sazz
Member
 
Location: Istanbul, Turkey

Join Date: Oct 2012
Posts: 28
Default

Thanks for the answers kopi-o,

I have converted the dots into "N" and now doing alignment for both of them to see if there is any difference between their alignments.

I have some other questions now. For SE alignments on TopHat, I can't get a detailed statistics at the end (with flagstat). I can only see mapped read number and it also shows me as %100 of the reads are mapped. But I also want to know how many are discarded or which are uniquely mapped or mapped twice etc. (as I have used default settings which lets a read to align max 20 times) Is there any program showing that? Also why 20, isn't it a little bit high?

Additionally, in default settings of tophat, max mismatch is 2 but as I have been doing expression analysis (comparing 2 samples), should I let it to be more than 2 or is it fine?

I will be happy if you can give any other suggestions about TopHat parameters.
sazz is offline   Reply With Quote
Old 01-29-2013, 01:25 PM   #4
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

I have found it can be more informative to use bam_stat.py from the RSeQC package or Picard's CollectAlignmentSummaryMetrics to get detailed information about the alignment statistics.

Why it is 20 is anyone's guess - you could always change it :-)

I would think the max mismatch can be increased considerably if you have reads of length 100 or similar, but in fact I have never touched this parameter myself. I don't think it has been in many older version of TopHat. The versions I have used have had a max mismatch parameter for each segment (sub-read) but not for the whole read.
kopi-o is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:48 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO