Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adapter/primer trimming from RNAseq reads

    Our lab had received RNAseq data from Illumina. I was able to do preliminary quality analysis using Fastqc app. Results are back however there seems to be something that I am not understanding.

    All the sample have passed basic statistics and others except for 1) per base gc content 2)per base sequence content 3) sequence duplication levels.
    I also checked the graphs and it looks like the first 10 reads on the fragments show high duplication levels and above mentioned problem. The end part of the fragments also show high duplication but not other two problems, which I think is due to poly A or 3' primer. But, statistics for over represented sequences has passed.

    I also checked fragment reads in fastq files but it does show any of the adapter sequence in it.

    So, my concerns are: 1) Could there be adapter/primer in my fragment? How do I check it? 2) If so how can I remove it: I need to prepare the adapter file. Is there some methods so I do it correctly with out removing the important part of my files.

    Any help is appreciated.

    Thanks,
    Thanks,

  • #2
    It is possible that you have some adapter contamination. A pass through using trimming software would be recommended. Follow directions included here: http://seqanswers.com/forums/showthread.php?t=42776

    Standard adapter sequences are included with BBMap software. At the end you will get comprehensive statistics of what your data looked like, before and after. You can check with FastQC afterwards to see if the trimming has been effective.

    Comment


    • #3
      Thanks for your message Genomax.
      I have to make a little correction on what I posted earlier:

      I also checked fragment reads in fastq files but it does not* show any of the adapter sequence in it.

      So, my concerns are: 1) Could there be adapter/primer in my fragment? How do I check it? 2) If so how can I remove it: I need to prepare the adapter file. Is there some methods so I do it correctly with out removing the important part of my files.

      Regarding BBmap, I visited several forums but I don't find how to get the BBmap to work on windows platform.

      Sorry to bother you, but I am biologist and I need straight and clear directions to get a computer thing to work.

      Thanks,

      Comment


      • #4
        BBMap works on the Windows platform like this:

        1) Install Java if not installed

        2) Download bbmap and extract it. You can do this with 7-zip. You need to first unzip it, then untar it; buth can be accomplished by right-clicking. Let's say you extract it to C:\, so that you have a bunch of ".sh" files in C:\bbmap\

        3) Open a command prompt: start -> run -> type "cmd" and hit enter

        4) Type "java -Xmx1g -ea -cp C:\bbmap\current jgi.BBDukF in=reads.fastq out=trimmed.fastq" along with any other necessary parameters. Basically, follow any of the instructions for Linux, but replace "bbduk.sh" with "java -Xmx1g -ea -cp C:\bbmap\current jgi.BBDukF"

        Are your reads paired-ended? And, do you know which adapters were used? Right now BBMap includes nextara and truseq adapters in /bbmap/resources/. There are also RNAseq-specific truseq adapters that are not currently packaged with BBMap, but I am going to add them sometime tomorrow.

        Comment


        • #5
          Originally posted by Brian Bushnell View Post
          BBMap works on the Windows platform like this:

          1) Install Java if not installed

          2) Download bbmap and extract it. You can do this with 7-zip. You need to first unzip it, then untar it; buth can be accomplished by right-clicking. Let's say you extract it to C:\, so that you have a bunch of ".sh" files in C:\bbmap\

          3) Open a command prompt: start -> run -> type "cmd" and hit enter

          4) Type "java -Xmx1g -ea -cp C:\bbmap\current jgi.BBDukF in=reads.fastq out=trimmed.fastq" along with any other necessary parameters. Basically, follow any of the instructions for Linux, but replace "bbduk.sh" with "java -Xmx1g -ea -cp C:\bbmap\current jgi.BBDukF"

          Are your reads paired-ended? And, do you know which adapters were used? Right now BBMap includes nextara and truseq adapters in /bbmap/resources/. There are also RNAseq-specific truseq adapters that are not currently packaged with BBMap, but I am going to add them sometime tomorrow.
          Thanks. I also tried installing it on Linux (that is loaded on vmware, but sudo apt-get install couldn't find it). The bbmap website says that it is also available on linux platform.

          Comment


          • #6
            Thanks. I was able to install the java and the program but running it seems to look difficult. I am going to work on it for couple of days and see how it goes.

            Yes my reads are paired end. I have RNAseq data from several samples. And also population specific genomic reseq data. I am recently exploring BWA and botwie2/tophat pipeline. Since, BBmap seems to be more efficient I think I will try exploring my data from all three pipelines.

            Also, this pipeline ould be more useful if available on iplant cyberinfrastructre.

            Thanks,

            Comment


            • #7
              Originally posted by everestial View Post
              Thanks. I also tried installing it on Linux (that is loaded on vmware, but sudo apt-get install couldn't find it). The bbmap website says that it is also available on linux platform.
              The same download runs in Windows, Linux, or MacOS; in each case, you just download and extract it. I will look into iPlant and see if I can make it available there.

              Comment


              • #8
                Thanks for the message Brian.

                I think I got this thing to work on Windows however there still seems to be some problem. I am posting a screen shot of what it looks like.
                But, it there any kind of documentation that I can work with so I can employ my own data (RNA seq and population genome) to align it against my reference sequence.

                Here is the screen shot:
                Attached Files

                Comment


                • #9
                  Hi Brian,

                  Is there is chance that this tool might be available on iplant anytime soon.
                  Thanks,

                  Comment


                  • #10
                    The problem in this case seems to be that "reads.fastq" is not in "C:\". If your input file is not in the current working directory, you need to set the absolute path to use it.

                    As for iPlant, it's a low priority (since nobody at JGI uses it), but I will look into it.

                    BBTools is already compiled and does not need "make" - it is already running successfully, the only problem is that you are pointing it to the wrong location for the input file.

                    Comment


                    • #11
                      Hi Brian,
                      Seems like I am begin to understand some aspects of the command line. But, still its not working.

                      To make the process easy I have installed cygwin in my windows in G: (not in C, where I will also have bbmap folder and my RNAseq reads. So, my first understanding is that the cygwin terminal will work like linux, and I can type bbduk.sh (let me know if its not ok). Is there something wrong with placing all these files,folders in G.
                      Also, do I have to assign 1gb memory to be used, using -Xmx1g
                      What about the absolute path?, do I need to set it up?

                      Let me explain what I am trying to do.
                      I load cygwin, then navigate to my directory where bbmap is located
                      Home@username /cygdrive/g/bbmap
                      Now, i try to run bbduk
                      bbduk.sh -Xmx1g in=sample1.fq out=trimmed.fq

                      (I ran it with no any additional parameters, just to check if the program will run; sample1.fq is the extracted sample files under "resources" directory )
                      REsult:
                      -bash: bbduk.sh: command not found

                      Again, I try cmd prompt under windows after navigating to G:
                      G:\bbmap>java -Xmx1g -ea -cp G:\bbmap\current\ jgi.BBDukF in=sample.fq out=test1

                      Result:
                      Executing jgi.BBDukF [in=sample1.fq, out=test1.fq]

                      BBDuk version 34.56
                      Exception in thread "main" java.lang.RuntimeException: Can't read file 'sample1.fq'
                      at align2.Tools.testInputFiles<Tools.jave:217>
                      at jgi.BBDukF.<init><BBDukF.java:658>
                      at jgi.BBDukF.main<BBDukF.java:62>


                      Well, I hope you might be able to guide me on what I am not understanding.
                      If cygwin works better I would prefer it.

                      Also, I want to now work with my RNAseq data. These are illumina paired end reads (100bp) library. So, i will have two paired end library and named as (say, abc1 and abc2)

                      I will provide input in cygwin as, bbmap.sh -Xmx1g in1=abc1 in2=abc2 (parameters)
                      But, how do I align these reads to reference A. lyrata genome. Do I have to download it and put it inside BBMap folder? I tried to do so but there are several files for A. lyrata genome at jgi website with different scaffolds (which again transfers me to phytozome webpage). How do I go about this. Could you please write me a command example.
                      Is it possible to align the reads to the online genome data?

                      Sorry for lots of question in one email but I hope I am asking it clearly.

                      Thanks,

                      Comment


                      • #12
                        @everstial: Why are you complicating things with cygwin when you originally had trouble getting bbduk to work even on windows?

                        Why don't you re-try the directions Brian gave in post #4 in windows (keep cygwin aside for now). In step 4 do this:

                        Note: replace bbmap-nn.nn (with BBMap version number you are using).

                        Code:
                        c:\>java -Xmx1g -ea -cp C:\bbmap\current jgi.BBDukF in=[COLOR="Red"]c:\path_to_your_folder_with_fastq_files\[/COLOR]reads.fastq out=trimmed.fastq ref=\path_to_folder\bbmap-nn.nn\bbmap\resources\truseq.fa.gz
                        For paired-end reads you can use:

                        Code:
                        c:\>java -Xmx1g -ea -cp C:\bbmap\current jgi.BBDukF in1=[COLOR="Red"]c:\path_to_your_folder_with_fastq_files\[/COLOR]reads_R1.fastq in2=[COLOR="Red"]c:\path_to_your_folder_with_fastq_files\[/COLOR]reads_R2.fastq out1=reads_R1_trimmed.fastq out2=reads_R2_trimmed.fastq ref=\path_to_folder\bbmap-nn.nn\bbmap\resources\truseq.fa.gz
                        Last edited by GenoMax; 02-25-2015, 08:48 AM. Reason: Added path for the adapters

                        Comment


                        • #13
                          Thank you so much for making me understand the glitch. I now have a brief idea, how the program works.
                          Well, I think now I have to download the whole A. lyrata genome to my harddrive for the reference sequence. I logged in to jgi which directed then me to phytozome where I could download the bulk data (which I already downloaded using globus).
                          But, these data contain lots of folder with several information other than just genome fasta files. There are several scaffolds.
                          Alternatively, NCBI was provideing whole genome data previously but its not available right now.

                          Could you please suggest what approach should I take next.

                          Thanks,

                          Comment


                          • #14
                            Depending on which adapters (truseq, nextera etc) you will need to use the correct file (Brian provides TruSeq and Nextera adapters, if you used something else then provide appropriate sequence in a file) to provide to BBDuk. I have modified example above to include that information. You will need to provide any additional parameters you want (otherwise defaults will be used).

                            For your reference "genome" you can concatenate (or if you already have a multi-fasta format file then use that) all scaffolds into a single file and use that to create indexes for BBMap (aligner). This step has to be done only once. Once created you can then use the pre-made indexes for alignments.

                            Comment


                            • #15
                              Originally posted by everestial View Post
                              But, these data contain lots of folder with several information other than just genome fasta files. There are several scaffolds.
                              Alternatively, NCBI was provideing whole genome data previously but its not available right now.
                              You can get the genome here:


                              Specifically, you want:
                              Araly1_assembly_scaffolds.fasta.gz
                              Araly1_assembly_chloroplast.scaffolds.fasta.gz
                              Araly1_assembly_mitochondrion.scaffolds.fasta.gz

                              I doubt that NCBI has a better assembly, as they would have gotten it from JGI, as far as I know.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X