![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
paired-end adapter trimming | vinay052003 | Bioinformatics | 16 | 05-02-2017 08:58 PM |
Paired-end Illumina RNA-seq adapter trimming | fabrice | Bioinformatics | 8 | 01-05-2015 08:48 AM |
FASTXtoolkit adapter trimming | Mark | Bioinformatics | 36 | 10-24-2013 11:28 AM |
3' Adapter Trimming | caddymob | Bioinformatics | 0 | 05-27-2009 01:53 PM |
Adapter trimming in MAQ for SOLiD | lgoff | Bioinformatics | 0 | 05-11-2009 10:55 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: germany Join Date: Jun 2012
Posts: 32
|
![]()
Hi All,
I am a total newbies in this field. I have to assemble RNA seq data. Before that I need to trim the sequences. I have got 100bp illumina paired end reads in two files. I also got the adaptors sequences P5 and P7. 5-AATGATACGGCGACCACCGAGATCTACACGTTCAGAGTTCTACAGTCCGACGATC-(insert)-ACCTTAAGAGCCCACGGTTCCTTGAGGTCAGTGXXXXXXTAGAGCATACGGCAGAAGACGAAC-3 But when for example I use the grep -c 'AATGATACGGCGACCACCGAGATCTACACGTTCAGAGTTCTACAGTCCGACGATC' file_name to count the adapters.i cannot find a single one. I am totally a fresher if any one can help me out in detail. I tried to read the on the forums different answers but I am confused. regards |
![]() |
![]() |
![]() |
#2 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
You're pretty unlikely to find the entire adapter sequence in any of the reads. You'll want to look into something like cutadapt or trim_galore to make your life easier.
|
![]() |
![]() |
![]() |
#3 | |
Member
Location: germany Join Date: Jun 2012
Posts: 32
|
![]() Quote:
ATGACACTCAAACAGGCATGCTCCACGGAATACCATGGAGCGCAAGGTGC 1155666 2.5956349017221085 No Hit AATGACGCTCGAACAGGCATGCCCCTCGGAATACCAAGGGGCGCAATGTG 225179 0.5057538004361837 No Hit AAGACACTCAAACAGGCATGCCTCTCGGAATACCAAGAGGCGCAAGGTGC 218636 0.4910581711090531 No Hit GATCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAA 119619 0.2686652123616139 Illumina RNA PCR Primer (100% over 50bp) GATCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAA 111925 0.251384428005364 Illumina RNA PCR Primer (100% over 50bp) AAATGACGCTCAAACAGGCATGCCCTTTGGAATACCAAAGGGCGCAATGT 104210 0.2340564774843778 No Hit ACAAACCCTTGTGTCGAGGGCTGACTTTCAATAGATCGCAGCGAGGGAGC 71881 0.16144528987673504 No Hit GATCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATTAAA 46463 0.10435626248303084 Illumina RNA PCR Primer (100% over 50bp) So , do i need to remove all these also from my sequences. I hope i am not too much bugging you. Regards |
|
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: Los Angeles Join Date: Nov 2013
Posts: 142
|
![]()
Hello.
I have the same question. FastQC can return the output of which sequences are overrepresented. Does this mean we need to removed? How do you trim the adapters? You can use the ILLUMINACLIP but I don't know how to create the adapter.fa file. Advice? But this forum says that if you align with tophat you don't need to cut the adapters http://seqanswers.com/forums/showthread.php?t=19799 "If you ignore the adapters , using the alignment in Tophat, actually filters the adapters out becuase they are not in the transcriptome, so when you are aligning your sequence ot a trasncriptome, the adapters will not get aliged because they are not in the transcriptome" |
![]() |
![]() |
![]() |
#5 |
Member
Location: Germany Join Date: Dec 2012
Posts: 26
|
![]()
I have a relatively dumb question. Doesnt the MiSeq have an integrated adaptor trimming option?
|
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: New England Join Date: Jun 2012
Posts: 200
|
![]()
The MiSeq has adapter trimming built in if you include it on the sample sheet. We generally do.
|
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: Oxford, Ohio Join Date: Mar 2012
Posts: 253
|
![]()
Hello,
With the HiSeq 2000, what is the default for adaptor trimming? Is it "on" or "off" when generating FASTQ files? Thanks |
![]() |
![]() |
![]() |
#8 |
Senior Member
Location: Montreal Join Date: May 2013
Posts: 367
|
![]()
To my knowledge, no trimming is performed by the HiSeq 2000. The HiSeq 2000 only calls the bases. Trimming the adapter sequences, if present, is a downstream step.
Our local sequencing centre, with many HiSeq 2000 machines, never trims the adapters at the level of the HiSeq 2000. They do the trimming later, if necessary, with Trimmomatic. |
![]() |
![]() |
![]() |
#9 |
Senior Member
Location: Oxford, Ohio Join Date: Mar 2012
Posts: 253
|
![]()
Ok, thanks. I called Illumina and the HiSeq 2000 machine can do trimming - it a CLI flag on the FASTQ generation.
It turns out the adaptors were not trimmed. - Regards |
![]() |
![]() |
![]() |
#10 |
Senior Member
Location: Montreal Join Date: May 2013
Posts: 367
|
![]()
Good to know that the built-in software can do the trimming. I'd still rather have the raw data, and set the trimming parameters myself though.
|
![]() |
![]() |
![]() |
#11 |
Senior Member
Location: USA Join Date: Jul 2012
Posts: 184
|
![]()
It's a feature that's been in CASAVA and BCL2FASTQ for a few years, but it's never worked really well.
|
![]() |
![]() |
![]() |
#12 |
Junior Member
Location: Stamford, CT Join Date: Feb 2014
Posts: 4
|
![]()
Trimmomatic includes Illumina-supplied adapter fasta files:
NexteraPE-PE.fa TruSeq2-SE.fa TruSeq3-PE.fa TruSeq2-PE.fa TruSeq3-PE-2.fa TruSeq3-SE.fa I don't know which one to use. My data is paired end. When I asked the Primary Investigator, she gave me this response: I'm not sure which of the adapter fa files it is. The index sequences are are from Epicenter: http://www.epibio.com/docs/default-s...s.pdf?sfvrsn=8 all are from set 1. As for the adapter sequences, they are from the "scriptseq kit". I have been using TruSeq3-PE.fa, but only because I read this is common for recently sequenced data. I read in another forum TruSeq2-PE.fa is pretty generic, and should work. I am not sure what to do, and would appreciate some guidance. Thanks. |
![]() |
![]() |
![]() |
#13 |
Senior Member
Location: Los Angeles Join Date: Nov 2013
Posts: 142
|
![]()
Hi. Okay you are using Trimmomatic.
You first need to know which prep kit was used on the data. For my experiment we had used ILLUMINA prep kit that was found on their website and you can easily download the list of adapters used in the experiment because the covariate file has the prep kit name. We used the TruSeq2 Prep kit (if I remember correctly) The thing to realize is to understand how trimming works. There are 3 ' and 5' adapter sequences that attach to both ends. The universal adapter attaches to the 5' end of read 1 and read 1 also has the indexed adapter on the 3' end. when read 1 is sequenced into the NGS machine, the machine detects the Universal adapter (because there is a primer attached onto the universal adapter) and read 1 skips the universal adapter, and the actual read 1 is everything in the flow cell lane that is after the universal adapter (i.e. <read 1 content><adapter region> Then since this is paired end data, the second read 2 is sequenced, and the second read ends up with the reverse complement of the universal adapter. So if you know the universal adapter used in the experiment, merely calculate the reverse compliment and enter that into the TruSeq-2.fa if it is not already there. Now how to remove the universal adapter? Well read 2 is generated by reading the opposite direction 5' --> 3' and now the indexed adapter is detected by the machine and skips it. So the read 2 contains the fragment content and also the reverse complement of the universal adapter. So all you need to do when using trimmomatic is 1) make sure that trimmomatic removes all the content that FOLLOWS the match, and not the exact match itself 2) find the common index for all the indexed adapters and enter that into the adapter.fa file 3) enter the reverse complement of the universal adapter into the adapter.fa file. Check the alignment files after trimming. |
![]() |
![]() |
![]() |
Thread Tools | |
|
|