Hi SEQanswers community!
So I'm not sure if this has been posted - I poked around, but didn't find anything specific enough so I'll just ask:
I realize that specifying the mean insert size and std. dev. during tophat mapping of RNAseq data is an important thing to do because it can return better mapping results. I also know that the best way to do this is to map your reads and then run a program such as 'getinsertsize.py' or run tools in picard (?) to emperically determine this.
Before mapping my RNAseq rides, I performed quality trimming using Trimmomatic-0.32.
So my question is - what order am I supposed to perform all of these steps to determine the insert size and std. dev.? Will quality trimming skew the results of the insert size calculation?? I've got 14 different read sets (each from a different time point or control vs. KO tissue) and I'm also not sure if I need to perform this estimation for every single read set. (i.e. Do I need to find the mean insert size separately for E11.5KO, E11.5Het, E12.5KO, E12.5Het, etc... E17.5KO, E17.5Het??)
By read set, I mean E11.5KO is one read set, and E11.5Het is another read set, etc.
What I've done is:
1. Quality trim the reads
2. Map E11.5Het (an arbitrary choice)
3. Run 'getinsertsize.py' on the output (the output was bam then converted to sam)
4. Remap all of the reads with the new parameters
Does this seem legitimate?? Or should I go back and remap everything naturally, and then calculate the insert size for each read set? And finally remap everything using the calculated mean insert size/std.dev.?
I'm happy to clarify anything that doesn't make sense!
Cheers,
Paul
So I'm not sure if this has been posted - I poked around, but didn't find anything specific enough so I'll just ask:
I realize that specifying the mean insert size and std. dev. during tophat mapping of RNAseq data is an important thing to do because it can return better mapping results. I also know that the best way to do this is to map your reads and then run a program such as 'getinsertsize.py' or run tools in picard (?) to emperically determine this.
Before mapping my RNAseq rides, I performed quality trimming using Trimmomatic-0.32.
So my question is - what order am I supposed to perform all of these steps to determine the insert size and std. dev.? Will quality trimming skew the results of the insert size calculation?? I've got 14 different read sets (each from a different time point or control vs. KO tissue) and I'm also not sure if I need to perform this estimation for every single read set. (i.e. Do I need to find the mean insert size separately for E11.5KO, E11.5Het, E12.5KO, E12.5Het, etc... E17.5KO, E17.5Het??)
By read set, I mean E11.5KO is one read set, and E11.5Het is another read set, etc.
What I've done is:
1. Quality trim the reads
2. Map E11.5Het (an arbitrary choice)
3. Run 'getinsertsize.py' on the output (the output was bam then converted to sam)
4. Remap all of the reads with the new parameters
Does this seem legitimate?? Or should I go back and remap everything naturally, and then calculate the insert size for each read set? And finally remap everything using the calculated mean insert size/std.dev.?
I'm happy to clarify anything that doesn't make sense!
Cheers,
Paul
Comment