SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
TopHat - Bowtie read trimming adrian Bioinformatics 2 12-06-2013 01:48 AM
Please Help: What is the differences between standard trimming and adaptive trimming byou678 Bioinformatics 8 08-22-2011 01:05 PM
MIDs trimming Mali Salmon 454 Pyrosequencing 0 05-08-2011 10:22 PM
trimming reads in tophat cswarth Bioinformatics 1 12-21-2010 03:00 PM
How will trimming low-quality ends of Illumina reads affect TopHat and Cufflinks? ecabot RNA Sequencing 1 02-25-2010 09:31 AM

Reply
 
Thread Tools
Old 01-10-2011, 02:42 PM   #1
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 150
Unhappy trimming in tophat

Hi all,

I am trying to analyse my PE Illumina data using tophat.

At first I run fastqc. Checking the raw data, I discovered at the beginnings (and presumably at the ends) of my reads I have some containments from the adapters of the sequencing.
I run bowtie first on both the full length and trimmed sequences and got better results with the trimmed sequences.

Do I need to trim the data before running tophat?

Does someone know how to do it? do I need to convert my trimmed sam files (bowtie output) back into fastq files?

Thanks for any help
Assa
frymor is offline   Reply With Quote
Old 01-26-2011, 06:52 AM   #2
dnusol
Senior Member
 
Location: Spain

Join Date: Jul 2009
Posts: 133
Default

Hi, I found this useful page about this issue.

http://bioinfo-core.org/index.php/9t...8_October_2010

HTH

Dave
dnusol is offline   Reply With Quote
Old 01-26-2011, 10:35 PM   #3
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 150
Default

Thanks for the tip.
It is a good page with summaries about the different images of the fastqc software, a thing a lot of people were looking for in a different thread.

BTW, can anyone tell me of a good way to remove the duplicates reads from the equation.
running the fastqc program I get a lot of duplicated reads (see attachment).

As I am looking for differentially regulated genes I am not sure whether I should exclude the duplicated reads or not, but I would like to try ans see what I get when doing so.

Q: can anyone tell me how to filter duplicated genes from the sam files or before the bowtie run from the fastq files?

Q: Is it the right way when going for differential expression also to exclude the duplications? or do I need to keep them?

Thanks

Assa
Attached Files
File Type: pdf duplication_levels.pdf (42.6 KB, 48 views)
frymor is offline   Reply With Quote
Old 12-06-2013, 12:18 AM   #4
jp.
Senior Member
 
Location: NikoNarita.jp

Join Date: Jul 2013
Posts: 142
Default

did you get the answer ?
would like to share it here
thank you

Quote:
Originally Posted by frymor View Post
Thanks for the tip.
It is a good page with summaries about the different images of the fastqc software, a thing a lot of people were looking for in a different thread.

BTW, can anyone tell me of a good way to remove the duplicates reads from the equation.
running the fastqc program I get a lot of duplicated reads (see attachment).

As I am looking for differentially regulated genes I am not sure whether I should exclude the duplicated reads or not, but I would like to try ans see what I get when doing so.

Q: can anyone tell me how to filter duplicated genes from the sam files or before the bowtie run from the fastq files?

Q: Is it the right way when going for differential expression also to exclude the duplications? or do I need to keep them?

Thanks

Assa
jp. is offline   Reply With Quote
Old 12-06-2013, 01:00 AM   #5
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 150
Default

No I didn't get any response for the questions I posted.

I am not sure though how important is the duplication rate in this step. I'm using tophat2 with the option to exclude all duplicated reads, so I am not worried about the duplication in the original fastq file.

I hope I am thinking in the right direction.
frymor is offline   Reply With Quote
Old 12-06-2013, 01:42 AM   #6
anamika
Junior Member
 
Location: india

Join Date: Apr 2013
Posts: 5
Default Sangenix

SangeniX: A comprehensive, automated, scalable and user friendly NGS data analysis suite

Sangenix Has module for duplication removal.

Give it a try : http://www.sangenix.com/
anamika is offline   Reply With Quote
Old 12-06-2013, 01:44 AM   #7
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 150
Unhappy

let me know again, when it is a freeware
frymor is offline   Reply With Quote
Old 12-06-2013, 02:00 AM   #8
vineet jha
Junior Member
 
Location: pune india

Join Date: Feb 2013
Posts: 4
Default Sangenix

Beta Version is available. you can contact to us via contact page in http://www.sangenix.com/contactus.aspx
vineet jha is offline   Reply With Quote
Old 12-06-2013, 02:03 AM   #9
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Removing the duplicates could be done with the samtools rmdup command (you could alternatively use markDuplicates from picard). This is generally not needed for RNAseq, since a certain amount of duplication would be both expected and desired for highly expressed genes (i.e., many/most of these probably aren't PCR duplicates).
dpryan is offline   Reply With Quote
Reply

Tags
paired end reads, tophat bowtie, trimmed reads

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:55 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO