Seqanswers Leaderboard Ad

**ctseto** · 11-14-2013, 08:46 AM

If you can reformat your R1/R2 such that they only contained paired, and have an extra fastq that contains unpaired reads, you can do...

-q -1 R1.fastq -2 R2.fastq -U unpaired.fastq

On my end, I've had good luck with Trimmomatic, which will give you a separate file for the unpaired reads (1P, 2P, 1U, 2U)

**jordi** · 11-14-2013, 08:54 AM

Hi ctseto! Thanks for your reply!
I had a look to trimmomatic. However it didn't work as well as I expected. I mean, some primer sequences still remained in my fastq files.
That's what I decided to give a chance to cutadapt, but I don't know how to "reformat" my R1 and R2 files...
Thanks!

**ctseto** · 11-14-2013, 09:05 AM

Not familiar with cutadapt, but I'll look into it. From https://github.com/marcelm/cutadapt/...ster/README.md

If you use one of the read-discarding options, then the --paired-output option is needed to keep the two files synchronized. First trim the forward read, writing output to temporary files:

cutadapt -a ADAPTER_FWD --minimum-length 20 --paired-output tmp.2.fastq -o tmp.1.fastq reads.1.fastq reads.2.fastq
Then trim the reverse read, using the temporary files as input:

cutadapt -a ADAPTER_REV --minimum-length 20 --paired-output trimmed.1.fastq -o trimmed.2.fastq tmp.2.fastq tmp.1.fastq
Finally, remove the temporary files:

rm tmp.1.fastq tmp.2.fastq

I assume this is what you tried?

An inelegant way of pulling reads that still had pairs would be to build an index of readnames that existed in both R1 and R2; then extract those reads from R1 and R2 and construct a new pair of files that had the appropriate reads.

**Wallysb01** · 11-14-2013, 09:14 AM

This is why I prefer Trimmomatic. It handles paired end reads more elegantly. I suggest you give it a try.

**jordi** · 11-14-2013, 09:36 AM

Exactly!
Older version....

Thanks a lot ctseto!
However, I will let a chance to Trimmomatic again...

thanks!

**ctseto** · 11-14-2013, 11:19 AM

Originally posted by jordi View Post

Exactly!
Older version....

Thanks a lot ctseto!
However, I will let a chance to Trimmomatic again...

thanks!

Might be worth checking your Trimmomatic's list of adapters to be on the safe side.

That would be the TruSeq3-PE.fa file in your adapters directory in the Trimmomatic directory. (Current version is 0.30)

Knowing what adapters/primers/kits you are using upstream of your NGS, and armed with something like Illumina's Customer Sequence Letter (http://support.illumina.com/download...es_letter.ilmn) It should give you enough information (to much, even) to put together a list of adaptors to trim off.

A figure illustrating the schema of Trimmomatic is in its manual (http://www.usadellab.org/cms/uploads...nual_V0.30.pdf)

From Trimmomatic's notes

These sequences have not been extensively tested, and depending on specific issues which may occur in library preparation, other sequences may work better for a given dataset.

To make a custom version of fasta, you must first understand how it will be used. Trimmomatic uses two strategies for adapter trimming: Palindrome and Simple

With 'simple' trimming, each adapter sequence is tested against the reads, and if a sufficiently accurate match is detected, the read is clipped appropriately.

'Palindrome' trimming is specifically designed for the case of 'reading through' a short fragment into the adapter sequence on the other end. In this approach, the appropriate adapter sequences are 'in silico ligated' onto the start of the reads, and the combined adapter+read sequences, forward and reverse are aligned. If they align in a manner which indicates 'read-through', the forward read is clipped and the reverse read dropped (since it contains no new data).

Naming of the sequences indicates how they should be used. For 'Palindrome' clipping, the sequence names should both start with 'Prefix', and end in '/1' for the forward adapter and '/2' for the reverse adapter. All other sequences are checked using 'simple' mode. Sequences with names ending in '/1' or '/2' will be checked only against the forward or reverse read. Sequences not ending in '/1' or '/2' will be checked against both the forward and reverse read. If you want to check for the reverse-complement of a specific sequence, you need to specifically include the reverse-complemented form of the sequence as well, with another name.

Knowing what adapters/primers/kits you are using upstream of your NGS, and armed with something like Illumina's Customer Sequence Letter (http://support.illumina.com/download...es_letter.ilmn) It should give you enough information (to much, even) to put together a list of adaptors to trim off.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 17 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 46 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

fewer reads R1 than R2 bowtie2

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News