SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to combine all the seqences in a fasta file to one sequence? wmseq Bioinformatics 7 10-28-2013 08:40 AM
FASTA sequence From large BAM file mez Bioinformatics 9 01-13-2013 05:42 AM
Extract only sequence ids from fasta file with makeblastdb angeloulivieri Bioinformatics 13 07-30-2012 02:41 AM
how to get specific length sequence from a fasta file entomology Bioinformatics 5 07-12-2012 03:59 PM
Find all occurrences of a sequence in a fasta file dphansti Bioinformatics 3 12-06-2011 06:11 AM

Reply
 
Thread Tools
Old 12-03-2013, 12:53 AM   #1
mmmm
Senior Member
 
Location: UK

Join Date: Jul 2013
Posts: 131
Default adapters sequence fasta file

hi there, do you have the sequence of adapters used in Illumina for library prep. so I can trim from the sequence data, please. thanks
mmmm is offline   Reply With Quote
Old 12-03-2013, 03:12 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,054
Default

See this thread for hints: http://seqanswers.com/forums/showthread.php?t=198

Also here: http://support.illumina.com/download...es_letter.ilmn

Last edited by GenoMax; 12-03-2013 at 03:15 AM.
GenoMax is offline   Reply With Quote
Old 12-04-2013, 12:39 AM   #3
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

For the most part you don't need to know the full adapter sequence for the specific adapters you used. Nearly all illumina adapters start with a common sequence and then later diverge into the different variants. If you trim based on the common sequence you will remove any instances of any of the other adapter types. The common sequence (as it would be seen in read-though) is AGATCGGAAGAGC.

The only types of library we've seen where we get other adapter starts are in small RNA libraries where we run a trimming using ATGGAATTCTCG.
simonandrews is offline   Reply With Quote
Old 12-04-2013, 01:29 AM   #4
mmmm
Senior Member
 
Location: UK

Join Date: Jul 2013
Posts: 131
Default Gff

how to get GFF file for a certain genome (DNA)?
mmmm is offline   Reply With Quote
Old 12-04-2013, 03:09 AM   #5
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,054
Default

Quote:
Originally Posted by mmmm View Post
how to get GFF file for a certain genome (DNA)?
Please start a new thread for questions that pertain to a new topic.

Do this to start a new thread (always a good idea to search the forum first):

1. Go to Seqanswers.com main page
2. Choose "Forums" from "site navigation menu" (left side).
3. Choose an appropriate forum for your post
4. On the subsequent forum page use the "New Thread" button at top left of the page.

That said, a GFF file may not be available for all genomes (you may need to construct one yourself). If you work with a model/common organism then you can get the files from Ensembl (http://useast.ensembl.org/info/data/ftp/index.html), UCSC or NCBI.
GenoMax is offline   Reply With Quote
Old 12-12-2013, 04:29 PM   #6
arcolombo698
Senior Member
 
Location: Los Angeles

Join Date: Nov 2013
Posts: 142
Default

I have a question in reply to knowing the sequence,

I know all the adapter sequences used in the RNAseq reads from ILLUMINA. I have 27 of them, and know their sequences in full.

To use trimmomatic I can use the ILLUMINACLIP:<adapters.fa>

however, I do not know how to create the correct adapter.fa file in the correct format, only knowing the name.

advice?

Thank you
arcolombo698 is offline   Reply With Quote
Old 12-12-2013, 06:01 PM   #7
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,054
Default

You should make the adapter.fa in multi-fasta format.

Quote:
>Seq1
ACTUAL_SEQUENCE
>Seq2
ACTUAL_SEQUENCE
and so on.

Remember to save it as text.
GenoMax is offline   Reply With Quote
Old 12-12-2013, 06:07 PM   #8
arcolombo698
Senior Member
 
Location: Los Angeles

Join Date: Nov 2013
Posts: 142
Default Making the ILLUMINA adapter.fa file

Hello.

Thank you so very much for your response.


The letter from illumina is given to the users as

TruSeq Universal Adapter
5 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
TruSeq Adapter, Index 1 5
5 GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG



So based on what you are saying would I create the file as :

>Universal Adapter 5
>AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
>TruSeq Adapter, Index 1 5 GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG



I am concerned about "whitespace" issues... because I am not sure if I should save the file as :

TruSeq Universal Adapter
>5 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
TruSeq Adapter, Index 1
>5 GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG



my last question, is that ILLUMINACLIP requires adapter.fa.... but you suggest to save as .txt ?

Could you post a clip of your adapter.fa so I can see how to format correctly?


Thanks again

(my other option is to update Trimmomatic from version .22 to version .32)
arcolombo698 is offline   Reply With Quote
Old 12-12-2013, 06:17 PM   #9
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,054
Default

Multi fasta format has (2 or more) ID-sequence pairs. The ID line has to start with ">" and there should be no other ">" on that ID line. The sequence line has only sequence (no other characters).

Right format would be.

Code:
>TruSeq_Universal_Adapter
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
>TruSeq_Adapter_Index 1
GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG
Though not strictly needed you can take out strange characters and spaces out of the sequence ID's.

I meant that you should save this file as plain text. You can use the name "adapter.fa" (if you include the quotes around the name in windows "save as") then no extension would be appended to that name but the file would still be in text format.
GenoMax is offline   Reply With Quote
Old 12-12-2013, 06:37 PM   #10
arcolombo698
Senior Member
 
Location: Los Angeles

Join Date: Nov 2013
Posts: 142
Default

I see.

From the wiki, I was not sure if I needed to include all the barcode information.

thank you
arcolombo698 is offline   Reply With Quote
Old 12-12-2013, 09:02 PM   #11
arcolombo698
Senior Member
 
Location: Los Angeles

Join Date: Nov 2013
Posts: 142
Default Updated Trimmomatic

Quote:
Originally Posted by GenoMax View Post
You should make the adapter.fa in multi-fasta format.



and so on.

Remember to save it as text.
Hello. thank you for the response!

So I updated the Trimmomatic, and am not sure which adapter to use , which they provide for the users?

essentially, none of the adapters they provide, match with the list that I have been given from Illumina.

yet the adapters found in Trimmomatic don't seem to match any of the ones on the attached list.

NexteraPE
>PrefixNX/1
AGATGTGTATAAGAGACAG
>PrefixNX/2
AGATGTGTATAAGAGACAG
>Trans1
TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG
>Trans1_rc
CTGTCTCTTATACACATCTGACGCTGCCGACGA
>Trans2
GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG
>Trans2_rc
CTGTCTCTTATACACATCTCCGAGCCCACGAGAC




Essentially, the adapter from Truseq2- prefix/1 matches the universal adapter in the Illumina sheet.

However, I have multiple indices used from this sheet.

So how do I select the correct adapter from Trimmomatic?

Or how do I customize my own adapter sheet?
Attached Files
File Type: pdf Illumina Adapter sequences.pdf (235.0 KB, 62 views)
arcolombo698 is offline   Reply With Quote
Old 12-12-2013, 09:17 PM   #12
arcolombo698
Senior Member
 
Location: Los Angeles

Join Date: Nov 2013
Posts: 142
Default

Quote:
Originally Posted by arcolombo698 View Post
Hello. thank you for the response!

So I updated the Trimmomatic, and am not sure which adapter to use , which they provide for the users?

essentially, none of the adapters they provide, match with the list that I have been given from Illumina.

yet the adapters found in Trimmomatic don't seem to match any of the ones on the attached list.

NexteraPE
>PrefixNX/1
AGATGTGTATAAGAGACAG
>PrefixNX/2
AGATGTGTATAAGAGACAG
>Trans1
TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG
>Trans1_rc
CTGTCTCTTATACACATCTGACGCTGCCGACGA
>Trans2
GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG
>Trans2_rc
CTGTCTCTTATACACATCTCCGAGCCCACGAGAC




Essentially, the adapter from Truseq2- prefix/1 matches the universal adapter in the Illumina sheet.

However, I have multiple indices used from this sheet.

So how do I select the correct adapter from Trimmomatic?

Or how do I customize my own adapter sheet?


Here is my custom adapter.fa. I will upload my FastQC report after i finish running it.

Here is my original fastqc report for a sample that HAS ADAPTERS in it, this is before i did the trimmomatic

[WARN] Overrepresented sequences
Sequence Count Percentage Possible Source
GCAGATAGTGAGGAAAGTTGAGCCAATAATGACGTGAAGTCCGTGGAAGC 52178 0.18044319834652642 No Hit
AGTAGTATAGTGATGCCAGCAGCTAGGACTGGGAGAGATAGGAGAAGTAG 51279 0.17733425520356333 No Hit
GCTTTGGCTCTCCTTGCAAAGTTATTTCTAGTTAATTCATTATGCAGAAG 49511 0.17122011562986064 No Hit
TTTGATGGTAAGGGAGGGATCGTTGACCTCGTCTGTTATGTAAAGGATGC 44215 0.15290536269867885 No Hit
GCCATATCGGGGGCACCGATTATTAGGGGAACTAGTCAGTTGCCAAAGCC 32007 0.11068736727121142 No Hit
GTCAGTTCAGTGTTTTAATCTGACGCAGGCTTATGCGGAGGAGAATGTTT 31149 0.10772021130162042 No Hit
TTGTCAGTTCAGTGTTTTAATCTGACGCAGGCTTATGCGGAGGAGAATGT 31053 0.10738822182250535 No Hit
AGCTTTGGCTCTCCTTGCAAAGTTATTTCTAGTTAATTCATTATGCAGAA 30693 0.10614326127582382 No Hit
AGTTAGATTTACGCCGATGAATATGATAGTGAAATGGATTTTGGCGTAGG 29276 0.10124295823513564 No Hit
TGGTCTAGGGTGTAGCCTGAGAATAGGGGAAATCAGTGAATGAAGCCTCC 29193 0.10095592566465071 No Hit
arcolombo698 is offline   Reply With Quote
Old 12-12-2013, 09:30 PM   #13
arcolombo698
Senior Member
 
Location: Los Angeles

Join Date: Nov 2013
Posts: 142
Default Custom Adapter.fa for Trimmomatic version .32

So here is my command to submit the Trimmomatic


java -classpath /auto/rcf-proj/sa1/software/Trimmomatic-0.32/trimmomatic-0.32.jar org.usadellab.trimmomatic.TrimmomaticPE -threads 16 -phred33 CHLA-15_S1_R1.fastq.gz CHLA-15_S1_R2.fastq.gz noadapter_paired_trimmed_CHLA-15_S1_R1.fastq.gz noadapter_unpaired_trimmed_CHLA-15_S1_R1.fastq.gz noadapter_paired_trimmed_CHLA-15_S1_R2.fastq.gz noadapter_unpaired_trimmed_CHLA-15_S1_R2.fastq.gz ILLUMINACLIP:/auto/rcf-proj/sa1/software/Trimmomatic-0.32/adapters/TrueSeq2-PE.fa:2:30:10 LEADING:3 TRAILING:3 HEADCROP:10 SLIDINGWINDOW:4:10 MINLEN:30



and here is my adapter file


>PrefixPE/1
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
>PrefixPE/2
CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT
>PCR_Primer1
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
>PCR_Primer1_rc
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
>PCR_Primer2
CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT
>PCR_Primer2_rc
AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG
>FlowCell1
TTTTTTTTTTAATGATACGGCGACCACCGAGATCTACAC
>FlowCell2
TTTTTTTTTTCAAGCAGAAGACGGCATACGA
>TruSeq_Adapter_Index1
GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index2
GATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index3
GATCGGAAGAGCACACGTCTGAACTCCAGTCACTTAGGCATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index4
GATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index5
GATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index6
GATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index7
GATCGGAAGAGCACACGTCTGAACTCCAGTCACCAGATCATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index8
GATCGGAAGAGCACACGTCTGAACTCCAGTCACACTTGAATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index9
GATCGGAAGAGCACACGTCTGAACTCCAGTCACGATCAGATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index10
GATCGGAAGAGCACACGTCTGAACTCCAGTCACTAGCTTATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index11
GATCGGAAGAGCACACGTCTGAACTCCAGTCACGGCTACATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index12
GATCGGAAGAGCACACGTCTGAACTCCAGTCACCTTGTAATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index13
GATCGGAAGAGCACACGTCTGAACTCCAGTCACAGTCAACAATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index14
GATCGGAAGAGCACACGTCTGAACTCCAGTCACAGTTCCGTATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index15
GATCGGAAGAGCACACGTCTGAACTCCAGTCACATGTCAGAATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index16
GATCGGAAGAGCACACGTCTGAACTCCAGTCACCCGTCCCGATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index18
GATCGGAAGAGCACACGTCTGAACTCCAGTCACGTCCGCACATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index19
GATCGGAAGAGCACACGTCTGAACTCCAGTCACGTGAAACGATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index20
GATCGGAAGAGCACACGTCTGAACTCCAGTCACGTGGCCTTATCTCGTATGCCGTCTTCTGCTTG



awaiting results
arcolombo698 is offline   Reply With Quote
Old 12-12-2013, 09:51 PM   #14
arcolombo698
Senior Member
 
Location: Los Angeles

Join Date: Nov 2013
Posts: 142
Default

So I am running the trimmomatic with my custom made adapter.fa file, and it should remove the over represented genes that FASTQC has shown.... update to arrive soon.

THank you in advance.
arcolombo698 is offline   Reply With Quote
Old 12-12-2013, 10:39 PM   #15
arcolombo698
Senior Member
 
Location: Los Angeles

Join Date: Nov 2013
Posts: 142
Default Trimmomatic is not cutting the adapters

Hello.

If you read the above commands, I submitted the trimmomatic commands, and it is not cutting the adapters.

Need some help here.
arcolombo698 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:55 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO