Seqanswers Leaderboard Ad

**jimmybee** · 10-25-2012, 04:05 PM

Originally posted by Jiafen View Post

I have a really bad pairs of fastq files so I am tried to used fastx toolkit to do quality control.

The main question I have is how do I remove the adapter sequence if I do not know the adapter (but I knew there are some adapter sequence). The default adapter option is 'CCTTAAGG' in fastx_clipper. If, by any chance I new my adapter, for example two different adapter, should I remove them one by one or there is a way to move them all together.

Thanks ahead.

Have you run FASTQC? This will show you some over-represented sequences. You could then use something like cutadapt which allows you to input a file containing multiple adapter sequences.

**FiReaNG3L** · 10-25-2012, 11:58 PM

Don't use fastx-clipper, its wayyyy too aggressive - use cutadapt or something else instead.

**Jiafen** · 10-26-2012, 05:25 AM

Thanks, Jimmy.

Yes, I have run FASTQC, there are so many over-reprenseted sequences, only one or two of them are noted as 'TruSeq Adapter, Index 3 (**% over **)' in the column named 'possible source'. But most of them are not 100%, should I put the whole sequence in the cutadapt or just part of them? How about the other over-represented sequence, they should not be listed in this file, right?

Originally posted by jimmybee View Post

Have you run FASTQC? This will show you some over-represented sequences. You could then use something like cutadapt which allows you to input a file containing multiple adapter sequences.

**Jiafen** · 10-26-2012, 05:46 AM

I forget to mention, all my reads are relatively the same length, 50 or 51bp. Seems cutadapt is designed for removing the adapter only from a sequence longer than the molecule that is sequenced.

**Jiafen** · 10-26-2012, 05:48 AM

Thank you for your comments.

Would you mind giving more information about why fasts-clipper is too aggressive?

Originally posted by FiReaNG3L View Post

Don't use fastx-clipper, its wayyyy too aggressive - use cutadapt or something else instead.

**FiReaNG3L** · 10-26-2012, 08:26 AM

If you look at the source code, it cuts adapters if:

- the first base match and
- more than 5 bases match through the adapter

With long adapters and long reads, it tend to cut at too many places and in general is not suited for many analysis scenarios. Look for other threads on seqanswer for fastx_clipper for the details.

**Rzinna** · 04-09-2013, 07:04 AM

Originally posted by Jiafen View Post

Thanks, Jimmy.

Yes, I have run FASTQC, there are so many over-reprenseted sequences, only one or two of them are noted as 'TruSeq Adapter, Index 3 (**% over **)' in the column named 'possible source'. But most of them are not 100%, should I put the whole sequence in the cutadapt or just part of them? How about the other over-represented sequence, they should not be listed in this file, right?

Not to dredge up old posts, but Jiafen, did you ever solve this problem?

I am at the exact same place- I have several libraries that FastQC reported as having overrepresentation of TruSeq adapters

My instinctive reaction is to use fastx trimmer and have it simply discard sequences that contain the adapters I have found.

What I am worried about is keeping the files in register, because they are paired-end runs. What would you all suggest?

Should I try and get trimmomatic installed and working?

Thanks in advance!

**Jiafen** · 04-10-2013, 11:37 AM

Hi Rzinna,

I found two ways to solve the problem.

The first and easier way is to download Trimmomatic-0.22 from http://www.usadellab.org/cms/index.php?page=trimmomatic, and follow the examples on the page.

The second and more tedious way is to use Fastx_toolkit to remove poor quality reads and then use fastqcombinepairedend_update.py from Stanford Palumbi Lab to match up the paired-ends reads after Fastx_toolkit (http://sfg.stanford.edu/scripts.html).

I didn't remove the adaptor directly from method 2, but quite a few adaptor left in the result from method 2.

Hope this helps,
Jiafen

Originally posted by Rzinna View Post

Not to dredge up old posts, but Jiafen, did you ever solve this problem?

I am at the exact same place- I have several libraries that FastQC reported as having overrepresentation of TruSeq adapters

My instinctive reaction is to use fastx trimmer and have it simply discard sequences that contain the adapters I have found.

What I am worried about is keeping the files in register, because they are paired-end runs. What would you all suggest?

Should I try and get trimmomatic installed and working?

Thanks in advance!

**aforntacc** · 07-31-2013, 07:35 AM

Originally posted by Jiafen View Post

Hi Rzinna,

I found two ways to solve the problem.

The first and easier way is to download Trimmomatic-0.22 from http://www.usadellab.org/cms/index.php?page=trimmomatic, and follow the examples on the page.

The second and more tedious way is to use Fastx_toolkit to remove poor quality reads and then use fastqcombinepairedend_update.py from Stanford Palumbi Lab to match up the paired-ends reads after Fastx_toolkit (http://sfg.stanford.edu/scripts.html).

I didn't remove the adaptor directly from method 2, but quite a few adaptor left in the result from method 2.

Hope this helps,
Jiafen

hi Jiafen

please i need some help, i want to trim and remove adapter sequence
please can you show me the command line, as i could not understand the Trimmomatic command line. my reads are illumina hi-seq 2000 paired end reads

thanks

**Jiafen** · 07-31-2013, 10:22 AM

Hi aforntacc

I also took me a while to make trimmomatic works. I don't know whether you have the same mistake as mine. At the beginning, I only downloaded resource, it is the binary we should download to make it run.

In the folder where trimmomatic-0.22.jar is, I run the command line on terminal. If you are not in the folder of where trimmomatic-0.22.jar is, you need the path of this file. My command line

java -classpath trimmomatic-0.22.jar org.usadellab.trimmomatic.TrimmomaticPE -phred33 -trimlog poole.adaptor.log fastq_afterQualityFilter/poole.1_filtered_stillpaired.fastq fastq_afterQualityFilter/poole.2_filtered_stillpaired.fastq poole.adaptor1.fastq poole.adaptor1.unpair.fastq poole.adaptor2.fastq poole.adaptor2.unpair.fastq ILLUMINACLIP:adaptor2:2:39:29

So -trimlog poole.adaptor.log to specify the log file. it followed by the two paired fastq file, then you specify the adaptors.

Hope it helps.

Originally posted by aforntacc View Post

hi Jiafen

please i need some help, i want to trim and remove adapter sequence
please can you show me the command line, as i could not understand the Trimmomatic command line. my reads are illumina hi-seq 2000 paired end reads

thanks

**mastal** · 07-31-2013, 11:01 AM

I'm sorry Jiafen,
but your explanation/command line is incorrect.

The correct version of a command line to use is on the trimmomatic web page.

USADELLAB.org - Trimmomatic: A flexible read trimming tool for Illumina NGS data

http://www.usadellab.org/cms/?page=trimmomatic

Trimmomatic produces 4 output files with the adapter-trimmed sequences (in addition to the .log file), and you need to give trimmomatic names for the 4 files, in the following order:
1. read1-paired: read1, where both reads of the pair survive the trimming;
2. read1-unpaired: for reads where only read1 of the pair survives the trimming;
3. read2-paired: read2, where both reads of the pair survive the trimming;
4. read2-unpaired: read2, where only read2 of the pair survive the trimming.

then, in the ILLUMINACLIP part of the command, you need to specify the name of a fasta file with the adapter sequences. The current version of trimmomatic comes with files containing sequences for the TruSeq Illumina adapters.

**Jiafen** · 07-31-2013, 11:12 AM

You are right, Mastal. I gave the wrong explanation, my command line is correct, though. After the two original fastq file, I did list four files. Aforntacc, I am sorry for the misleading. The adaptor list is in file adaptor2 right after ILLUMINACLIP.

Than you, Mastal.
Jiafen

Originally posted by mastal View Post

I'm sorry Jiafen,
but your explanation/command line is incorrect.

The correct version of a command line to use is on the trimmomatic web page.

USADELLAB.org - Trimmomatic: A flexible read trimming tool for Illumina NGS data

http://www.usadellab.org/cms/?page=trimmomatic

Trimmomatic produces 4 output files with the adapter-trimmed sequences (in addition to the .log file), and you need to give trimmomatic names for the 4 files, in the following order:
1. read1-paired: read1, where both reads of the pair survive the trimming;
2. read1-unpaired: for reads where only read1 of the pair survives the trimming;
3. read2-paired: read2, where both reads of the pair survive the trimming;
4. read2-unpaired: read2, where only read2 of the pair survive the trimming.

then, in the ILLUMINACLIP part of the command, you need to specify the name of a fasta file with the adapter sequences. The current version of trimmomatic comes with files containing sequences for the TruSeq Illumina adapters.

**aforntacc** · 08-07-2013, 07:17 AM

ok guys i am very grateful for all your help, i trimmed successfully but now i have another issue
i want to know the -r/--mate -inner-dist) to use for tophat. i ran a subset of my reads with bowtie2 and i looked at the output file (.sam) i understand that the 9th field describes the fragment length from which i can substract the lenght of the read twice. but now i am see different values in the 9th field. so please what could be the fragment length to choose. see the sam alignment line.

[samopen] SAM header is present: 23720 sequences.
HWI-ST365_0157:7:1101:1975:2074#GCGGTC 77 * 0 0 * * 0 0 CCCANCTTCACACTCAAAATTTGTTCTGTAGTGTTTGACCATACACAACTTTTGTTTCTTTTGTTACAAAAGTATTGTATAATTGGAACTAAACAAAGGC _bbeBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB YT:Z:UP

HWI-ST365_0157:7:1101:1975:2074#GCGGTC 141 * 0 0 * * 0 0 TCAAAACACATGCACCCACTAGCTTCCTTGGAAAAANA a__ecceeeggggfffdgggfgeghdegfghffebgBB YT:Z:UP

HWI-ST365_0157:7:1101:1991:2120#GCGGTC 83 gi|359484478|ref|XM_002281860.2| 952 42 100M = 888 -164 TGCTTCACGACTGAGTAGTGACGACAAACATGCACGTTGCAGAATGAAATGACTTAAATTACCCTTGTTTTTATATGATTTGGAGTTAATGTAATGGGTG cccabcaab`dbdccdeeeegeggdgfbhihiihhihhiihhhihiihiihihggfhhhdciiihiiiiiiiiiiiiiihiiiiiiigggggeeeeebbb AS:i:-6 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:73A26 YS:i:0 YT:Z:CP

HWI-ST365_0157:7:1101:1991:2120#GCGGTC 163 gi|359484478|ref|XM_002281860.2| 888 42 40M = 952 164 GTACTTCCTACCATATGCGATGGGCATTCTCGTCATTTTA aabeeeeegggggifhiiiiiihifgggfh[egfgfhiic AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:40YS:i:-6 YT:Z:CP

HWI-ST365_0157:7:1101:2000:2177#GCGGTC 77 * 0 0 * * 0 0 CGAGAATCGGCTGGGGGCTTGGGTGATGGTCACTTGTTTCCAGACGTCTTGCATTTTGTCTGCCATGGTTTTGGCAGGGATGGCTTGGAGACGCCGTTGG ^__cccc]ce[ceefhW^^bU_R^UHONIOIONNaSM\bS_ecHLZ_N\``dHVZ_Y^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB YT:Z:UP

HWI-ST365_0157:7:1101:2000:2177#GCGGTC 141 * 0 0 * * 0 0 CGGATAAATCACATACAAAAGCAATGGCACAGCAATCAAGGNCNCTA a__ecccae]bg]c`dffha_dReed`R[bfH_aHPY^fgeBBBBBB YT:Z:UP

HWI-ST365_0157:7:1101:2242:2071#GCGGTC 83 gi|359490881|ref|XM_002277964.2| 1169 40 100M = 1075 -194 GCATCACAATCAAGCCCCAAACTGATCGATGGGTTTTCCCAGAAACCAACAGTGGCATCATTATTCTCGCTGAGGGACGACTGATGAACTTAGGANGTGC BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBc___ AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:95T4 YS:i:-35 YT:Z:CP

HWI-ST365_0157:7:1101:2242:2071#GCGGTC 163 gi|359490881|ref|XM_002277964.2| 1075 40 100M = 1169 194 AAGCAGATGAAGAACAATGCAATTGTTTGTAACATTGGCCAGTTTGACAATGAGATTGGTATGGTTGGTTTTGAAACCTACCCTGGGGTTAAGGGCATCA a__`ccccagecgda`[`beccegdedfJ`dgf[d[I^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB AS:i:-35 XN:i:0 XM:i:7 XO:i:0 XG:i:0 NM:i:7 MD:Z:41C16A4C5C1G14T6C6 YS:i:-1YT:Z:CP

HWI-ST365_0157:7:1101:2219:2076#GCGGGC 153 gi|225443436|ref|XM_002269813.1| 66 0 100M = 66 0 CGGGCCGGTCTGGACGTCCGCCCCCCCCCACGGCACCCAAAAGGAGCGCAACCAGGTCGATGAAGCTCGCCGGAGCTTCGCCGACCTCCGGTTCGAGAAG BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBa`[_WZRffaf`ac`dd^_ihhgggeaecaee_a_ AS:i:-45 XN:i:0 XM:i:9 XO:i:0 XG:i:0 NM:i:9 MD:Z:1C3A7G0A2T2A5A2T10G59 YT:Z:UP

HWI-ST365_0157:7:1101:2219:2076#GCGGGC 69 gi|225443436|ref|XM_002269813.1| 66 0 * = 66 0 CAGANAGACAGTTGAGAGTTGAAACTAAATTGTATAATGTGGAAGCTGAAGGTGGCCGAAGGGGACGATCCCTGGCTCAGGACGTTGAACGGCCACGGCG _bbeBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB YT:Z:UP
bilbo@ubuntu:/media/My Passport/Trimmomatic-0.30$
thanks in advance

**dpryan** · 08-08-2013, 01:16 AM

There's going to be a range of values, since having all the fragments exactly the same size would be very unusual (just think about how these are actually made to realize why). I think one of the picardtool commands can output the actual fragment length average (otherwise, that's trivial to script). Keep in mind that it's probably best to over-estimate the value a bit. I also remember that tophat restimates this during alignment, so setting an exact value is probably not overly important (presumably the library was run on a bioanalyzer at some point, so just use the appropriate value from that).

BTW, the "BBBBBB" stretches are very low quality sequences, you should probably trim those.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Remove the adapter sequence by fastx_clipper in fastq file

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News