SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   adapters sequence fasta file (http://seqanswers.com/forums/showthread.php?t=38854)

mmmm 12-03-2013 12:53 AM

adapters sequence fasta file
 
hi there, do you have the sequence of adapters used in Illumina for library prep. so I can trim from the sequence data, please. thanks

GenoMax 12-03-2013 03:12 AM

See this thread for hints: http://seqanswers.com/forums/showthread.php?t=198

Also here: http://support.illumina.com/download...es_letter.ilmn

simonandrews 12-04-2013 12:39 AM

For the most part you don't need to know the full adapter sequence for the specific adapters you used. Nearly all illumina adapters start with a common sequence and then later diverge into the different variants. If you trim based on the common sequence you will remove any instances of any of the other adapter types. The common sequence (as it would be seen in read-though) is AGATCGGAAGAGC.

The only types of library we've seen where we get other adapter starts are in small RNA libraries where we run a trimming using ATGGAATTCTCG.

mmmm 12-04-2013 01:29 AM

Gff
 
how to get GFF file for a certain genome (DNA)?

GenoMax 12-04-2013 03:09 AM

Quote:

Originally Posted by mmmm (Post 126366)
how to get GFF file for a certain genome (DNA)?

Please start a new thread for questions that pertain to a new topic.

Do this to start a new thread (always a good idea to search the forum first):

1. Go to Seqanswers.com main page
2. Choose "Forums" from "site navigation menu" (left side).
3. Choose an appropriate forum for your post
4. On the subsequent forum page use the "New Thread" button at top left of the page.

That said, a GFF file may not be available for all genomes (you may need to construct one yourself). If you work with a model/common organism then you can get the files from Ensembl (http://useast.ensembl.org/info/data/ftp/index.html), UCSC or NCBI.

arcolombo698 12-12-2013 04:29 PM

I have a question in reply to knowing the sequence,

I know all the adapter sequences used in the RNAseq reads from ILLUMINA. I have 27 of them, and know their sequences in full.

To use trimmomatic I can use the ILLUMINACLIP:<adapters.fa>

however, I do not know how to create the correct adapter.fa file in the correct format, only knowing the name.

advice?

Thank you

GenoMax 12-12-2013 06:01 PM

You should make the adapter.fa in multi-fasta format.

Quote:

>Seq1
ACTUAL_SEQUENCE
>Seq2
ACTUAL_SEQUENCE
and so on.

Remember to save it as text.

arcolombo698 12-12-2013 06:07 PM

Making the ILLUMINA adapter.fa file
 
Hello.

Thank you so very much for your response.


The letter from illumina is given to the users as

TruSeq Universal Adapter
5 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
TruSeq Adapter, Index 1 5
5 GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG



So based on what you are saying would I create the file as :

>Universal Adapter 5
>AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
>TruSeq Adapter, Index 1 5 GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG



I am concerned about "whitespace" issues... because I am not sure if I should save the file as :

TruSeq Universal Adapter
>5 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
TruSeq Adapter, Index 1
>5 GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG



my last question, is that ILLUMINACLIP requires adapter.fa.... but you suggest to save as .txt ?

Could you post a clip of your adapter.fa so I can see how to format correctly?


Thanks again

(my other option is to update Trimmomatic from version .22 to version .32)

GenoMax 12-12-2013 06:17 PM

Multi fasta format has (2 or more) ID-sequence pairs. The ID line has to start with ">" and there should be no other ">" on that ID line. The sequence line has only sequence (no other characters).

Right format would be.

Code:

>TruSeq_Universal_Adapter
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
>TruSeq_Adapter_Index 1
GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG

Though not strictly needed you can take out strange characters and spaces out of the sequence ID's.

I meant that you should save this file as plain text. You can use the name "adapter.fa" (if you include the quotes around the name in windows "save as") then no extension would be appended to that name but the file would still be in text format.

arcolombo698 12-12-2013 06:37 PM

I see.

From the wiki, I was not sure if I needed to include all the barcode information.

thank you

arcolombo698 12-12-2013 09:02 PM

Updated Trimmomatic
 
1 Attachment(s)
Quote:

Originally Posted by GenoMax (Post 127320)
You should make the adapter.fa in multi-fasta format.



and so on.

Remember to save it as text.

Hello. thank you for the response!

So I updated the Trimmomatic, and am not sure which adapter to use , which they provide for the users?

essentially, none of the adapters they provide, match with the list that I have been given from Illumina.

yet the adapters found in Trimmomatic don't seem to match any of the ones on the attached list.

NexteraPE
>PrefixNX/1
AGATGTGTATAAGAGACAG
>PrefixNX/2
AGATGTGTATAAGAGACAG
>Trans1
TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG
>Trans1_rc
CTGTCTCTTATACACATCTGACGCTGCCGACGA
>Trans2
GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG
>Trans2_rc
CTGTCTCTTATACACATCTCCGAGCCCACGAGAC




Essentially, the adapter from Truseq2- prefix/1 matches the universal adapter in the Illumina sheet.

However, I have multiple indices used from this sheet.

So how do I select the correct adapter from Trimmomatic?

Or how do I customize my own adapter sheet?

arcolombo698 12-12-2013 09:17 PM

Quote:

Originally Posted by arcolombo698 (Post 127330)
Hello. thank you for the response!

So I updated the Trimmomatic, and am not sure which adapter to use , which they provide for the users?

essentially, none of the adapters they provide, match with the list that I have been given from Illumina.

yet the adapters found in Trimmomatic don't seem to match any of the ones on the attached list.

NexteraPE
>PrefixNX/1
AGATGTGTATAAGAGACAG
>PrefixNX/2
AGATGTGTATAAGAGACAG
>Trans1
TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG
>Trans1_rc
CTGTCTCTTATACACATCTGACGCTGCCGACGA
>Trans2
GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG
>Trans2_rc
CTGTCTCTTATACACATCTCCGAGCCCACGAGAC




Essentially, the adapter from Truseq2- prefix/1 matches the universal adapter in the Illumina sheet.

However, I have multiple indices used from this sheet.

So how do I select the correct adapter from Trimmomatic?

Or how do I customize my own adapter sheet?



Here is my custom adapter.fa. I will upload my FastQC report after i finish running it.

Here is my original fastqc report for a sample that HAS ADAPTERS in it, this is before i did the trimmomatic

[WARN] Overrepresented sequences
Sequence Count Percentage Possible Source
GCAGATAGTGAGGAAAGTTGAGCCAATAATGACGTGAAGTCCGTGGAAGC 52178 0.18044319834652642 No Hit
AGTAGTATAGTGATGCCAGCAGCTAGGACTGGGAGAGATAGGAGAAGTAG 51279 0.17733425520356333 No Hit
GCTTTGGCTCTCCTTGCAAAGTTATTTCTAGTTAATTCATTATGCAGAAG 49511 0.17122011562986064 No Hit
TTTGATGGTAAGGGAGGGATCGTTGACCTCGTCTGTTATGTAAAGGATGC 44215 0.15290536269867885 No Hit
GCCATATCGGGGGCACCGATTATTAGGGGAACTAGTCAGTTGCCAAAGCC 32007 0.11068736727121142 No Hit
GTCAGTTCAGTGTTTTAATCTGACGCAGGCTTATGCGGAGGAGAATGTTT 31149 0.10772021130162042 No Hit
TTGTCAGTTCAGTGTTTTAATCTGACGCAGGCTTATGCGGAGGAGAATGT 31053 0.10738822182250535 No Hit
AGCTTTGGCTCTCCTTGCAAAGTTATTTCTAGTTAATTCATTATGCAGAA 30693 0.10614326127582382 No Hit
AGTTAGATTTACGCCGATGAATATGATAGTGAAATGGATTTTGGCGTAGG 29276 0.10124295823513564 No Hit
TGGTCTAGGGTGTAGCCTGAGAATAGGGGAAATCAGTGAATGAAGCCTCC 29193 0.10095592566465071 No Hit

arcolombo698 12-12-2013 09:30 PM

Custom Adapter.fa for Trimmomatic version .32
 
So here is my command to submit the Trimmomatic


java -classpath /auto/rcf-proj/sa1/software/Trimmomatic-0.32/trimmomatic-0.32.jar org.usadellab.trimmomatic.TrimmomaticPE -threads 16 -phred33 CHLA-15_S1_R1.fastq.gz CHLA-15_S1_R2.fastq.gz noadapter_paired_trimmed_CHLA-15_S1_R1.fastq.gz noadapter_unpaired_trimmed_CHLA-15_S1_R1.fastq.gz noadapter_paired_trimmed_CHLA-15_S1_R2.fastq.gz noadapter_unpaired_trimmed_CHLA-15_S1_R2.fastq.gz ILLUMINACLIP:/auto/rcf-proj/sa1/software/Trimmomatic-0.32/adapters/TrueSeq2-PE.fa:2:30:10 LEADING:3 TRAILING:3 HEADCROP:10 SLIDINGWINDOW:4:10 MINLEN:30



and here is my adapter file


>PrefixPE/1
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
>PrefixPE/2
CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT
>PCR_Primer1
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
>PCR_Primer1_rc
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
>PCR_Primer2
CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT
>PCR_Primer2_rc
AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG
>FlowCell1
TTTTTTTTTTAATGATACGGCGACCACCGAGATCTACAC
>FlowCell2
TTTTTTTTTTCAAGCAGAAGACGGCATACGA
>TruSeq_Adapter_Index1
GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index2
GATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index3
GATCGGAAGAGCACACGTCTGAACTCCAGTCACTTAGGCATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index4
GATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index5
GATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index6
GATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index7
GATCGGAAGAGCACACGTCTGAACTCCAGTCACCAGATCATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index8
GATCGGAAGAGCACACGTCTGAACTCCAGTCACACTTGAATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index9
GATCGGAAGAGCACACGTCTGAACTCCAGTCACGATCAGATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index10
GATCGGAAGAGCACACGTCTGAACTCCAGTCACTAGCTTATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index11
GATCGGAAGAGCACACGTCTGAACTCCAGTCACGGCTACATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index12
GATCGGAAGAGCACACGTCTGAACTCCAGTCACCTTGTAATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index13
GATCGGAAGAGCACACGTCTGAACTCCAGTCACAGTCAACAATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index14
GATCGGAAGAGCACACGTCTGAACTCCAGTCACAGTTCCGTATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index15
GATCGGAAGAGCACACGTCTGAACTCCAGTCACATGTCAGAATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index16
GATCGGAAGAGCACACGTCTGAACTCCAGTCACCCGTCCCGATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index18
GATCGGAAGAGCACACGTCTGAACTCCAGTCACGTCCGCACATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index19
GATCGGAAGAGCACACGTCTGAACTCCAGTCACGTGAAACGATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Adapter_Index20
GATCGGAAGAGCACACGTCTGAACTCCAGTCACGTGGCCTTATCTCGTATGCCGTCTTCTGCTTG



awaiting results

arcolombo698 12-12-2013 09:51 PM

So I am running the trimmomatic with my custom made adapter.fa file, and it should remove the over represented genes that FASTQC has shown.... update to arrive soon.

THank you in advance.

arcolombo698 12-12-2013 10:39 PM

Trimmomatic is not cutting the adapters
 
Hello.

If you read the above commands, I submitted the trimmomatic commands, and it is not cutting the adapters.

Need some help here.


All times are GMT -8. The time now is 09:36 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.