SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
RPKM for paired-end data Amative Bioinformatics 6 04-03-2013 10:05 AM
Forcing paired-end data mapped as single-end in SAM puggie Bioinformatics 1 03-16-2013 11:50 AM
Help with Illumina Paired-End Data adamba Bioinformatics 5 04-16-2012 01:36 PM
PRINSEQ and paired-end data Rockx Bioinformatics 1 03-10-2012 11:02 AM
Does Cufflinks support single-end and paired end data together ? ersenkavak Bioinformatics 1 10-22-2010 08:26 AM

Reply
 
Thread Tools
Old 09-16-2013, 05:13 AM   #1
zhoujiayi
Member
 
Location: Canada

Join Date: Sep 2013
Posts: 12
Question Trim the paired-end data

I am a beginner at bioinformatics, please forgive me if I asked silly questions.

I am trying to do the alignment for some paired-end Illumina data. I used Fastx-toolkit to do the trimming of my data. And then I tried to use bowtie to do the alignment. I found out after trimming, the number of reads in Read 1 file is different from the number of reads in Read 2 file. So bowtie cannot find any matches. If I use bowtie 2, it will give me an error msg "Error, fewer reads in file specified with -2 than in file specified with -1 ".
I guess I have the following options to solve this problem:
1. go with the raw data file, skip trimming the data. Just use bowtie to do the alignment. (I tried this, it worked because the read number is same in read 1 and read 2 raw data file. I got around 70% matching rate. It is not that satisfactory)
2. use some software to match the read 1 and read 2 file after trimming? Can anyone suggest any software to me?
3. maybe there are some better methods I could use to do the alignment for this kind of paired-end data?
zhoujiayi is offline   Reply With Quote
Old 09-16-2013, 05:29 AM   #2
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

If you use trimmomatic to trim your paired-end data, it will give you
separate files with the reads that end up unpaired after trimming.

http://www.usadellab.org/cms/?page=trimmomatic
mastal is offline   Reply With Quote
Old 09-16-2013, 05:40 AM   #3
zhoujiayi
Member
 
Location: Canada

Join Date: Sep 2013
Posts: 12
Default

Quote:
Originally Posted by mastal View Post
If you use trimmomatic to trim your paired-end data, it will give you
separate files with the reads that end up unpaired after trimming.

http://www.usadellab.org/cms/?page=trimmomatic
So you mean I can use trimmomatic to trim my paired-end read 1 file and read 2 file. Then I will get two trimmed files but they are not paired so that I can run the bowtie alignment on each them?

By the way, trimmomatic can also output paired files after trimming, are the number of reads in the paired output files the same?
Thank you.

Last edited by zhoujiayi; 09-16-2013 at 05:50 AM.
zhoujiayi is offline   Reply With Quote
Old 09-16-2013, 06:02 AM   #4
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

trimmomatic will give you 4 output files, in fastq format:
R1_paired, R1_unpaired, R2_paired, and R2_unpaired.

R1_paired and R2_paired will have the same number of reads,
in the same order, just like the untrimmed Illumina data, except that
the reads where R1 or R2 was removed by the trimming process will be removed from both files.

Bowtie doesn't do mixtures of paired and unpaired reads, so you will
have to run the R1_paired, R2_paired as one run, and the unpaired files as a separate run.

Hope this makes sense.
Maria
mastal is offline   Reply With Quote
Old 09-16-2013, 06:32 AM   #5
zhoujiayi
Member
 
Location: Canada

Join Date: Sep 2013
Posts: 12
Default

Quote:
Originally Posted by mastal View Post
trimmomatic will give you 4 output files, in fastq format:
R1_paired, R1_unpaired, R2_paired, and R2_unpaired.

R1_paired and R2_paired will have the same number of reads,
in the same order, just like the untrimmed Illumina data, except that
the reads where R1 or R2 was removed by the trimming process will be removed from both files.

Bowtie doesn't do mixtures of paired and unpaired reads, so you will
have to run the R1_paired, R2_paired as one run, and the unpaired files as a separate run.

Hope this makes sense.
Maria
Thank you for your soonest reply.
By the way, can I consider that it is better to use trimmed paired files to do the alignment when your raw data files are paired-end? Then what is the point to do the alignment for trimmed unpaired files while the raw data files are paired-end?
zhoujiayi is offline   Reply With Quote
Old 09-16-2013, 06:36 AM   #6
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

It doesn't matter whether your data is single-end or paired-end, it is always better to do QC first, and then trim the reads if the QC indicates that you have low quality regions or adapter sequences.
mastal is offline   Reply With Quote
Old 09-16-2013, 06:56 AM   #7
zhoujiayi
Member
 
Location: Canada

Join Date: Sep 2013
Posts: 12
Default

Quote:
Originally Posted by mastal View Post
It doesn't matter whether your data is single-end or paired-end, it is always better to do QC first, and then trim the reads if the QC indicates that you have low quality regions or adapter sequences.
Sorry for my poor English. I guess I didn't make my point clearly.
I know it is always better to do QC first.
For example:
I have two fastq files (R1.fastq R2.fastq), which are paried-end data.
After I use Trimmomatic to do the trimming, I can get R1_trimmed_paired.fastq,R1_trimmed_unpaired.fastq, R2_trimmed_paired.fastq,R2_trimmed_unpaired.fastq.
Then,
1. I can run bowtie with R1_trimmed_paired.fastq and R2_trimmed_paired.fastq as paired-end data to get the alignment file say R1R2.sam.
2. Or I can run bowtie with R1_trimmed_unpaired.fastq or R2_trimmed_unpaired.fastq seperately to get two alignment files say R1.sam or R2.sam.

As my understanding, it make sense for me to do the above step 1, because we are processing paired-end files. Then I am wondering why we can do step 2? Step 2 seems to process the paired-end files as single-end files, if we can do that, why don't we just treat all the files as single-end and process them?
zhoujiayi is offline   Reply With Quote
Old 09-16-2013, 07:26 AM   #8
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Quote:
Originally Posted by zhoujiayi View Post
Sorry for my poor English. I guess I didn't make my point clearly.
I know it is always better to do QC first.
For example:
I have two fastq files (R1.fastq R2.fastq), which are paried-end data.
After I use Trimmomatic to do the trimming, I can get R1_trimmed_paired.fastq,R1_trimmed_unpaired.fastq, R2_trimmed_paired.fastq,R2_trimmed_unpaired.fastq.
Then,
1. I can run bowtie with R1_trimmed_paired.fastq and R2_trimmed_paired.fastq as paired-end data to get the alignment file say R1R2.sam.
2. Or I can run bowtie with R1_trimmed_unpaired.fastq or R2_trimmed_unpaired.fastq seperately to get two alignment files say R1.sam or R2.sam.

As my understanding, it make sense for me to do the above step 1, because we are processing paired-end files. Then I am wondering why we can do step 2? Step 2 seems to process the paired-end files as single-end files, if we can do that, why don't we just treat all the files as single-end and process them?
Doing 1. makes total sense and does what you describe. Doing 2. may or may not be worthwhile (in my experience, at least, aligning an R2_trimmed_unpaired file is usually not worthwhile). The reads in the unpaired files are not the same as those in the paired file. In brief, if one read of a pair has terrible quality, is mostly adapter, or something else that results in it being trimmed to short for use, then its mate is written to the appropriate unpaired file. These, then are single-end reads, because their mates aren't useful for anything. In general, paired-end reads will give you a little more certain alignment (they can also more easily be used for determining structural variations and other things, if that's your goal).
dpryan is offline   Reply With Quote
Old 09-16-2013, 07:27 AM   #9
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

Because there are advantages to using paired-end reads.

When you are doing alignment or assembly, it is easier to map the reads correctly if you know that R2 should map within so many bases from R1.
mastal is offline   Reply With Quote
Old 11-29-2013, 05:31 PM   #10
arcolombo698
Senior Member
 
Location: Los Angeles

Join Date: Nov 2013
Posts: 142
Default Memory Space issues and Unpaired Reads

Hello.

I finished trimming my data and also have paired end reads and unpaired ended reads.

I have limited space and want to delete the unpaired reads. In order to be sure I do not need the unpaired data, if I did a FastQC report on the trimmed paired data, will this suffice in letting me delete the unpaired data if I know that the paired reads that are trimmed have good quality?

thank you
arcolombo698 is offline   Reply With Quote
Old 11-30-2013, 10:00 AM   #11
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

I guess it depends how many of your reads are paired-end and how many are single-end after trimming.

I would also run FastQC on the single-end reads, to see how the quality compares with that of the trimmed paired-end reads. Then decide whether you want to delete them or not.
mastal is offline   Reply With Quote
Old 11-30-2013, 01:40 PM   #12
arcolombo698
Senior Member
 
Location: Los Angeles

Join Date: Nov 2013
Posts: 142
Default Deleting the Unpaired Reads

Hello. Thank you for your reply.

I may not have time to compare each paired trimmed and unpaired trim for each sample. I have too many.

So if my paired Trimmed data passes the FastQC, it would make sense to use only the paired end data. Comparing is not time efficient.


Especially if a lot of folks are writing

"Doing 1. makes total sense and does what you describe. Doing 2. may or may not be worthwhile (in my experience, at least, aligning an R2_trimmed_unpaired file is usually not worthwhile). "

Thank you.
arcolombo698 is offline   Reply With Quote
Reply

Tags
alignment, bowtie, fastx-toolkit, illumina, paired-end

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:13 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO