SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
Trimming 5' end of RNAseq reads for de novo assembly Kennels Bioinformatics 6 02-19-2018 09:50 AM
P1-P2 Adapter/Primer inspirit SOLiD 0 09-23-2013 10:25 AM
Adapter trimming and trimming by quality question alisrpp Bioinformatics 5 04-08-2013 05:55 PM
adapter trimming - help a_mt Bioinformatics 6 11-12-2012 08:36 PM
3' Adapter Trimming caddymob Bioinformatics 0 05-27-2009 01:53 PM

Reply
 
Thread Tools
Old 02-07-2015, 04:32 PM   #1
everestial
Member
 
Location: North Carolina

Join Date: Feb 2015
Posts: 31
Question Adapter/primer trimming from RNAseq reads

Our lab had received RNAseq data from Illumina. I was able to do preliminary quality analysis using Fastqc app. Results are back however there seems to be something that I am not understanding.

All the sample have passed basic statistics and others except for 1) per base gc content 2)per base sequence content 3) sequence duplication levels.
I also checked the graphs and it looks like the first 10 reads on the fragments show high duplication levels and above mentioned problem. The end part of the fragments also show high duplication but not other two problems, which I think is due to poly A or 3' primer. But, statistics for over represented sequences has passed.

I also checked fragment reads in fastq files but it does show any of the adapter sequence in it.

So, my concerns are: 1) Could there be adapter/primer in my fragment? How do I check it? 2) If so how can I remove it: I need to prepare the adapter file. Is there some methods so I do it correctly with out removing the important part of my files.

Any help is appreciated.

Thanks,
Thanks,
everestial is offline   Reply With Quote
Old 02-07-2015, 05:11 PM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,994
Default

It is possible that you have some adapter contamination. A pass through using trimming software would be recommended. Follow directions included here: http://seqanswers.com/forums/showthread.php?t=42776

Standard adapter sequences are included with BBMap software. At the end you will get comprehensive statistics of what your data looked like, before and after. You can check with FastQC afterwards to see if the trimming has been effective.
GenoMax is offline   Reply With Quote
Old 02-08-2015, 06:16 PM   #3
everestial
Member
 
Location: North Carolina

Join Date: Feb 2015
Posts: 31
Default

Thanks for your message Genomax.
I have to make a little correction on what I posted earlier:

I also checked fragment reads in fastq files but it does not* show any of the adapter sequence in it.

So, my concerns are: 1) Could there be adapter/primer in my fragment? How do I check it? 2) If so how can I remove it: I need to prepare the adapter file. Is there some methods so I do it correctly with out removing the important part of my files.

Regarding BBmap, I visited several forums but I don't find how to get the BBmap to work on windows platform.

Sorry to bother you, but I am biologist and I need straight and clear directions to get a computer thing to work.

Thanks,
everestial is offline   Reply With Quote
Old 02-08-2015, 06:28 PM   #4
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

BBMap works on the Windows platform like this:

1) Install Java if not installed

2) Download bbmap and extract it. You can do this with 7-zip. You need to first unzip it, then untar it; buth can be accomplished by right-clicking. Let's say you extract it to C:\, so that you have a bunch of ".sh" files in C:\bbmap\

3) Open a command prompt: start -> run -> type "cmd" and hit enter

4) Type "java -Xmx1g -ea -cp C:\bbmap\current jgi.BBDukF in=reads.fastq out=trimmed.fastq" along with any other necessary parameters. Basically, follow any of the instructions for Linux, but replace "bbduk.sh" with "java -Xmx1g -ea -cp C:\bbmap\current jgi.BBDukF"

Are your reads paired-ended? And, do you know which adapters were used? Right now BBMap includes nextara and truseq adapters in /bbmap/resources/. There are also RNAseq-specific truseq adapters that are not currently packaged with BBMap, but I am going to add them sometime tomorrow.
Brian Bushnell is offline   Reply With Quote
Old 02-08-2015, 07:06 PM   #5
everestial
Member
 
Location: North Carolina

Join Date: Feb 2015
Posts: 31
Default

Quote:
Originally Posted by Brian Bushnell View Post
BBMap works on the Windows platform like this:

1) Install Java if not installed

2) Download bbmap and extract it. You can do this with 7-zip. You need to first unzip it, then untar it; buth can be accomplished by right-clicking. Let's say you extract it to C:\, so that you have a bunch of ".sh" files in C:\bbmap\

3) Open a command prompt: start -> run -> type "cmd" and hit enter

4) Type "java -Xmx1g -ea -cp C:\bbmap\current jgi.BBDukF in=reads.fastq out=trimmed.fastq" along with any other necessary parameters. Basically, follow any of the instructions for Linux, but replace "bbduk.sh" with "java -Xmx1g -ea -cp C:\bbmap\current jgi.BBDukF"

Are your reads paired-ended? And, do you know which adapters were used? Right now BBMap includes nextara and truseq adapters in /bbmap/resources/. There are also RNAseq-specific truseq adapters that are not currently packaged with BBMap, but I am going to add them sometime tomorrow.
Thanks. I also tried installing it on Linux (that is loaded on vmware, but sudo apt-get install couldn't find it). The bbmap website says that it is also available on linux platform.
everestial is offline   Reply With Quote
Old 02-08-2015, 07:25 PM   #6
everestial
Member
 
Location: North Carolina

Join Date: Feb 2015
Posts: 31
Default

Thanks. I was able to install the java and the program but running it seems to look difficult. I am going to work on it for couple of days and see how it goes.

Yes my reads are paired end. I have RNAseq data from several samples. And also population specific genomic reseq data. I am recently exploring BWA and botwie2/tophat pipeline. Since, BBmap seems to be more efficient I think I will try exploring my data from all three pipelines.

Also, this pipeline ould be more useful if available on iplant cyberinfrastructre.

Thanks,
everestial is offline   Reply With Quote
Old 02-08-2015, 09:13 PM   #7
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by everestial View Post
Thanks. I also tried installing it on Linux (that is loaded on vmware, but sudo apt-get install couldn't find it). The bbmap website says that it is also available on linux platform.
The same download runs in Windows, Linux, or MacOS; in each case, you just download and extract it. I will look into iPlant and see if I can make it available there.
Brian Bushnell is offline   Reply With Quote
Old 02-19-2015, 08:11 PM   #8
everestial
Member
 
Location: North Carolina

Join Date: Feb 2015
Posts: 31
Default

Thanks for the message Brian.

I think I got this thing to work on Windows however there still seems to be some problem. I am posting a screen shot of what it looks like.
But, it there any kind of documentation that I can work with so I can employ my own data (RNA seq and population genome) to align it against my reference sequence.

Here is the screen shot:
Attached Images
File Type: jpg BBmap_Screenshot.jpg (65.3 KB, 11 views)
everestial is offline   Reply With Quote
Old 02-19-2015, 08:12 PM   #9
everestial
Member
 
Location: North Carolina

Join Date: Feb 2015
Posts: 31
Default

Hi Brian,

Is there is chance that this tool might be available on iplant anytime soon.
Thanks,
everestial is offline   Reply With Quote
Old 02-19-2015, 08:42 PM   #10
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

The problem in this case seems to be that "reads.fastq" is not in "C:\". If your input file is not in the current working directory, you need to set the absolute path to use it.

As for iPlant, it's a low priority (since nobody at JGI uses it), but I will look into it.

BBTools is already compiled and does not need "make" - it is already running successfully, the only problem is that you are pointing it to the wrong location for the input file.
Brian Bushnell is offline   Reply With Quote
Old 02-24-2015, 08:49 PM   #11
everestial
Member
 
Location: North Carolina

Join Date: Feb 2015
Posts: 31
Default

Hi Brian,
Seems like I am begin to understand some aspects of the command line. But, still its not working.

To make the process easy I have installed cygwin in my windows in G: (not in C, where I will also have bbmap folder and my RNAseq reads. So, my first understanding is that the cygwin terminal will work like linux, and I can type bbduk.sh (let me know if its not ok). Is there something wrong with placing all these files,folders in G.
Also, do I have to assign 1gb memory to be used, using -Xmx1g
What about the absolute path?, do I need to set it up?

Let me explain what I am trying to do.
I load cygwin, then navigate to my directory where bbmap is located
Home@username /cygdrive/g/bbmap
Now, i try to run bbduk
bbduk.sh -Xmx1g in=sample1.fq out=trimmed.fq

(I ran it with no any additional parameters, just to check if the program will run; sample1.fq is the extracted sample files under "resources" directory )
REsult:
-bash: bbduk.sh: command not found

Again, I try cmd prompt under windows after navigating to G:
G:\bbmap>java -Xmx1g -ea -cp G:\bbmap\current\ jgi.BBDukF in=sample.fq out=test1

Result:
Executing jgi.BBDukF [in=sample1.fq, out=test1.fq]

BBDuk version 34.56
Exception in thread "main" java.lang.RuntimeException: Can't read file 'sample1.fq'
at align2.Tools.testInputFiles<Tools.jave:217>
at jgi.BBDukF.<init><BBDukF.java:658>
at jgi.BBDukF.main<BBDukF.java:62>


Well, I hope you might be able to guide me on what I am not understanding.
If cygwin works better I would prefer it.

Also, I want to now work with my RNAseq data. These are illumina paired end reads (100bp) library. So, i will have two paired end library and named as (say, abc1 and abc2)

I will provide input in cygwin as, bbmap.sh -Xmx1g in1=abc1 in2=abc2 (parameters)
But, how do I align these reads to reference A. lyrata genome. Do I have to download it and put it inside BBMap folder? I tried to do so but there are several files for A. lyrata genome at jgi website with different scaffolds (which again transfers me to phytozome webpage). How do I go about this. Could you please write me a command example.
Is it possible to align the reads to the online genome data?

Sorry for lots of question in one email but I hope I am asking it clearly.

Thanks,
everestial is offline   Reply With Quote
Old 02-25-2015, 03:59 AM   #12
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,994
Default

@everstial: Why are you complicating things with cygwin when you originally had trouble getting bbduk to work even on windows?

Why don't you re-try the directions Brian gave in post #4 in windows (keep cygwin aside for now). In step 4 do this:

Note: replace bbmap-nn.nn (with BBMap version number you are using).

Code:
c:\>java -Xmx1g -ea -cp C:\bbmap\current jgi.BBDukF in=c:\path_to_your_folder_with_fastq_files\reads.fastq out=trimmed.fastq ref=\path_to_folder\bbmap-nn.nn\bbmap\resources\truseq.fa.gz
For paired-end reads you can use:

Code:
c:\>java -Xmx1g -ea -cp C:\bbmap\current jgi.BBDukF in1=c:\path_to_your_folder_with_fastq_files\reads_R1.fastq in2=c:\path_to_your_folder_with_fastq_files\reads_R2.fastq out1=reads_R1_trimmed.fastq out2=reads_R2_trimmed.fastq ref=\path_to_folder\bbmap-nn.nn\bbmap\resources\truseq.fa.gz

Last edited by GenoMax; 02-25-2015 at 08:48 AM. Reason: Added path for the adapters
GenoMax is offline   Reply With Quote
Old 02-25-2015, 06:46 AM   #13
everestial
Member
 
Location: North Carolina

Join Date: Feb 2015
Posts: 31
Default

Thank you so much for making me understand the glitch. I now have a brief idea, how the program works.
Well, I think now I have to download the whole A. lyrata genome to my harddrive for the reference sequence. I logged in to jgi which directed then me to phytozome where I could download the bulk data (which I already downloaded using globus).
But, these data contain lots of folder with several information other than just genome fasta files. There are several scaffolds.
Alternatively, NCBI was provideing whole genome data previously but its not available right now.

Could you please suggest what approach should I take next.

Thanks,
everestial is offline   Reply With Quote
Old 02-25-2015, 08:45 AM   #14
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,994
Default

Depending on which adapters (truseq, nextera etc) you will need to use the correct file (Brian provides TruSeq and Nextera adapters, if you used something else then provide appropriate sequence in a file) to provide to BBDuk. I have modified example above to include that information. You will need to provide any additional parameters you want (otherwise defaults will be used).

For your reference "genome" you can concatenate (or if you already have a multi-fasta format file then use that) all scaffolds into a single file and use that to create indexes for BBMap (aligner). This step has to be done only once. Once created you can then use the pre-made indexes for alignments.
GenoMax is offline   Reply With Quote
Old 02-25-2015, 10:41 AM   #15
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by everestial View Post
But, these data contain lots of folder with several information other than just genome fasta files. There are several scaffolds.
Alternatively, NCBI was provideing whole genome data previously but its not available right now.
You can get the genome here:
http://genome.jgi-psf.org/Araly1/Ara...nload.ftp.html

Specifically, you want:
Araly1_assembly_scaffolds.fasta.gz
Araly1_assembly_chloroplast.scaffolds.fasta.gz
Araly1_assembly_mitochondrion.scaffolds.fasta.gz

I doubt that NCBI has a better assembly, as they would have gotten it from JGI, as far as I know.
Brian Bushnell is offline   Reply With Quote
Old 03-02-2015, 07:48 PM   #16
everestial
Member
 
Location: North Carolina

Join Date: Feb 2015
Posts: 31
Default

Hi Brian,

Thanks for the message. Should I leave the file as gzipped or extract it to fast before doing any alignment.

Thanks,
everestial is offline   Reply With Quote
Old 03-02-2015, 07:55 PM   #17
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

You can leave them gzipped.
Brian Bushnell is offline   Reply With Quote
Old 03-04-2015, 05:36 PM   #18
everestial
Member
 
Location: North Carolina

Join Date: Feb 2015
Posts: 31
Default

Hi Brian

I have been able to run other example parameters efficiently.
So, I tried to create a index fasta file with the files you mentioned (got if from the JGI website).

I don't seem to be able to prepare a concatenate index file. I put all the mentioned files in one folder in G: and ran the following command
java -Xmx1g -ea -cp G:\bbmap\current\align2.BBmap ref=G:\bbmap\lyrata_genome_index

Folder named lyrata_genome_index contains all three gziped files.
I got the error message:
Error: Could not find or laod main class ref=G:\bbmap\lyrata_genome\index

Again, I tried to prepare the index file for just one file (not including chloroplast and mitochondrial genome).
I get the same error message.

Can you please help.

Also, is there any command to concatenate all these files into one.

Thanks,
everestial is offline   Reply With Quote
Old 03-04-2015, 05:41 PM   #19
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Code:
java -Xmx1g -ea -cp G:\bbmap\current\align2.BBmap ref=G:\bbmap\lyrata_genome_index
is missing a space; should be:
Code:
java -Xmx1g -ea -cp G:\bbmap\current\ align2.BBmap ref=G:\bbmap\lyrata_genome_index
To concatenate into one file, in Windows:

Code:
type file1 file2 > file3
...taken from http://stackoverflow.com/questions/6...cat-on-windows
Brian Bushnell is offline   Reply With Quote
Old 03-04-2015, 06:23 PM   #20
everestial
Member
 
Location: North Carolina

Join Date: Feb 2015
Posts: 31
Default

Thanks for quick reply.

Concatenating work perfect.
I had used spaced in my previous command and again did the same after I was able concatenate all the scaffolds into one.
But, still getting the error message:
Error: Could not find or load main class align2.BBmap

I check and BBmap.java BBmap.class and several other important BBmap. files do exist inside the required folder (current).

I have been running cmd as administrator in Windows 8.1. HOpefully the problems isn't coming from there.
everestial is offline   Reply With Quote
Reply

Tags
adapter contamination, fastqc, rnaseq alignment, rnaseq data

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:13 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO