Seqanswers Leaderboard Ad

**kopi-o** · 01-27-2013, 05:17 AM

1) I haven't encountered dots in Illumina fastq files (only in SOLiD output) but I suppose you may need to convert them to N, yes.

2) You are right, you probably don't need to remove them. (You would need to if you were doing de novo assembly).

3) It is great.

**sazz** · 01-29-2013, 02:31 AM

Thanks for the answers kopi-o,

I have converted the dots into "N" and now doing alignment for both of them to see if there is any difference between their alignments.

I have some other questions now. For SE alignments on TopHat, I can't get a detailed statistics at the end (with flagstat). I can only see mapped read number and it also shows me as %100 of the reads are mapped. But I also want to know how many are discarded or which are uniquely mapped or mapped twice etc. (as I have used default settings which lets a read to align max 20 times) Is there any program showing that? Also why 20, isn't it a little bit high?

Additionally, in default settings of tophat, max mismatch is 2 but as I have been doing expression analysis (comparing 2 samples), should I let it to be more than 2 or is it fine?

I will be happy if you can give any other suggestions about TopHat parameters.

**kopi-o** · 01-29-2013, 01:25 PM

I have found it can be more informative to use bam_stat.py from the RSeQC package or Picard's CollectAlignmentSummaryMetrics to get detailed information about the alignment statistics.

Why it is 20 is anyone's guess - you could always change it :-)

I would think the max mismatch can be increased considerably if you have reads of length 100 or similar, but in fact I have never touched this parameter myself. I don't think it has been in many older version of TopHat. The versions I have used have had a max mismatch parameter for each segment (sub-read) but not for the whole read.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Several questions about our RNA-seq results

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News