Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Downstream RNA-seq analysis without reference genome

    Hello all Seqanswer community Users,

    I am biologist. Learning bioinformatics from scratch. Performing RNA-seq analysis for first time:

    I got fastq files from ion proton instrument, it has single end read, 50-340 sequence length.
    I don't have reference genome.
    Here is how I did my analysis:
    1]Fastqc
    2]Trimmmimg some read using fastx tool:
    a]First used fastq_qulaity trimmer -Q33 -t 20 -l 50 -i -o, because some of sequence has quality less than 18
    b]fastx_trimmer = to trim off few reads from end of seq

    For de novo assembly:
    3]Velveth with kmer 31
    4]velvetg
    5] bowtie-build== to build reference index from contig.fa file created from velvetg
    6] Mapping my fastq read with above build reference index

    So my questions are:
    1] Am i doing trimimg in right manner
    2] On what basis you select parameter of velvetg and velveth
    2] Which kmer value to select
    3] Am I running bowtie in correct manner?
    4] If yes, how do i confirm assembly created using velvet contain preserved input information and it's accuracy.

    Hope you all can help me out.
    Thanks a bunch in advance.

    Naresh

  • #2
    I would use a tool designed to put RNAseq reads together. It has a been a while since I used Velvet but as far as I know it is designed to assemble genomes not transcripts. My favorite RNAseq tool is 'Trinity'.

    The above assumes that your reads are from transcripts and not from the entire genome.

    I know that the above advise does not answer your specific questions. However in case you did start down a poor path then I wanted to concentrate on correcting that instead of specifics. I suppose I could answer #1 -- trimming. Seems ok. Not sure why you want to trim off the end of the sequence after quality trimming but it won't hurt.

    Comment


    • #3
      Hi Westerman,

      Hi,

      You are right, I have transcripts read. I forgot to mention I have also used oases: it is post assembly processor for velvet, it work as transcriptome assembler.

      I trimmed some base from end to improve per base GC content and per base sequence content.
      If you don't mind can you please explain me in detail about:
      fastq_qulaity trimmer -Q33 -t 20 -l 50 -i -o
      upto my understanding it remove nucleotides having quality score less lower than 20 from the ends of the read. Furthermore, any trimmed reads having length less than 50 nt are discarded altogether.

      Trinity is also good for transcriptome assembly. But I have never used that.
      Can you please help with parameter for trinity command line.

      Trinity.pl -SeqType Fq -min_contig_length 150 -JM 10G -single inputfilename -CPU 2 -output output_filename

      Which other option do i need to consider for running Trinity like butterfly, inchworm, kmer and Chrysalis, etc

      Thanks for your input.
      I would really appreciate your feedback

      Naresh


      Originally posted by westerman View Post
      I would use a tool designed to put RNAseq reads together. It has a been a while since I used Velvet but as far as I know it is designed to assemble genomes not transcripts. My favorite RNAseq tool is 'Trinity'.

      The above assumes that your reads are from transcripts and not from the entire genome.

      I know that the above advise does not answer your specific questions. However in case you did start down a poor path then I wanted to concentrate on correcting that instead of specifics. I suppose I could answer #1 -- trimming. Seems ok. Not sure why you want to trim off the end of the sequence after quality trimming but it won't hurt.

      Comment


      • #4
        Your understanding of fastq_quality_trimmer is correct. BTW, if you ever get paired-end sequences then use 'trimmomatic' instead since it works much better with PE reads.

        The other parameters to Trinity will depend on the size of your computer system -- e.g., how much memory, how many CPUs -- but these parameters are not required and what you have should be good enough. I suggest running Trinity with the parameters you have and see what happens. In the end you should get a 'Trinity.fasta' file. The other files can be discarded.

        Once you get a 'Trinity.fasta' file then you can use bowtie2 to back-map your reads or, perhaps better, the Trinnotate annotation pipeline described on the Trinity web site.

        Comment


        • #5
          Hi Westerman,

          Hi,

          Thanks for your prompt reply.
          I really appreciate your suggestion.
          Do you think if I put some more input option for butterfly, inchworm, kmer and Chrysalis, it will give me better contig file?

          Thanks,
          Naresh


          Originally posted by westerman View Post
          Your understanding of fastq_quality_trimmer is correct. BTW, if you ever get paired-end sequences then use 'trimmomatic' instead since it works much better with PE reads.

          The other parameters to Trinity will depend on the size of your computer system -- e.g., how much memory, how many CPUs -- but these parameters are not required and what you have should be good enough. I suggest running Trinity with the parameters you have and see what happens. In the end you should get a 'Trinity.fasta' file. The other files can be discarded.

          Once you get a 'Trinity.fasta' file then you can use bowtie2 to back-map your reads or, perhaps better, the Trinnotate annotation pipeline described on the Trinity web site.

          Comment


          • #6
            No. The options to inchworm and chrysalis are really performance related. Butterfly has some non-performance related options but I would stick with the defaults unless you get something that seems weird. Really the only useful extra non-performance option is ' --jaccard_clip' which is used on high-gene density genomes.

            Comment


            • #7
              Hi,

              Thanks a lot.


              Naresh
              Originally posted by westerman View Post
              No. The options to inchworm and chrysalis are really performance related. Butterfly has some non-performance related options but I would stick with the defaults unless you get something that seems weird. Really the only useful extra non-performance option is ' --jaccard_clip' which is used on high-gene density genomes.

              Comment


              • #8
                One metric you can look at to assess your assembly is the percentage of reads that align back to your transcriptome assembly.

                Comment


                • #9
                  Hi Cofactor,

                  HI,

                  Can you please ellobarate how can I perform that?

                  Thanks,
                  Naresh

                  Originally posted by Cofactor Genomics View Post
                  One metric you can look at to assess your assembly is the percentage of reads that align back to your transcriptome assembly.

                  Comment


                  • #10
                    Well, your newly formed transcriptome assembly is your reference, the raw read data used to generate the assembly is your data and you treat it like a RNA-seq project. In this manner, you align the raw data (trimming does not matter here, this is just a QA check) to the assembly and divide the number of reads aligning to the assembly by the total number of reads that went into the assembly. This is just a rough check and one could argue that you will miss things, however it is good for a rough check.

                    From these alignments, you may find some surprising results in that the percentage of reads are pretty low. Non-transcriptome assemblers do not like to see large differences in coverage in an assembly, assuming these are repetitive areas that are piling, but this type of data is inherent with RNA data.

                    Did you perform any manipulations during library prep to treat the RNA for the assembly process, like double-stranded nuclease treatment (to compress the dynamic range of the sample)? This can greatly help an assembly if one is not to heavy handed in the treatment.

                    It is hard to tell you what percentages are good and bad since I am not sure how the material was treated prior to sequencing or what your goals are for the assembly.

                    Hope this helps.

                    Jon Armstrong

                    Comment


                    • #11
                      Hi,
                      Thanks for your prompt reply.
                      I didn't perform any manipulation during library prep.

                      Thanks in advance,
                      Naresh

                      Originally posted by Cofactor Genomics View Post
                      Well, your newly formed transcriptome assembly is your reference, the raw read data used to generate the assembly is your data and you treat it like a RNA-seq project. In this manner, you align the raw data (trimming does not matter here, this is just a QA check) to the assembly and divide the number of reads aligning to the assembly by the total number of reads that went into the assembly. This is just a rough check and one could argue that you will miss things, however it is good for a rough check.

                      From these alignments, you may find some surprising results in that the percentage of reads are pretty low. Non-transcriptome assemblers do not like to see large differences in coverage in an assembly, assuming these are repetitive areas that are piling, but this type of data is inherent with RNA data.

                      Did you perform any manipulations during library prep to treat the RNA for the assembly process, like double-stranded nuclease treatment (to compress the dynamic range of the sample)? This can greatly help an assembly if one is not to heavy handed in the treatment.

                      It is hard to tell you what percentages are good and bad since I am not sure how the material was treated prior to sequencing or what your goals are for the assembly.

                      Hope this helps.

                      Jon Armstrong

                      Comment


                      • #12
                        Hi westerman,

                        Hi,

                        I am trying to run read alignment of my fastq file with Trinity.fa file using trinity's script that is alignreads.pl

                        I used below cmd:
                        #### /bin/util/alignReads.pl -seqType fq -single inputfile_name -target Trinity.fasta -aligner bowtie2 -p 4 -retain_intermediate_files -num_top_hits 20 -output align_bowtie_output

                        but i am getting below error:
                        Must specify target_db and it must exist at that location at /bin/util/alignReads.pl line 180

                        I don't know what does that mean, as I am good with reading script.

                        Hope you can help me out.
                        Thanks,
                        Naresh


                        Originally posted by westerman View Post
                        Your understanding of fastq_quality_trimmer is correct. BTW, if you ever get paired-end sequences then use 'trimmomatic' instead since it works much better with PE reads.

                        The other parameters to Trinity will depend on the size of your computer system -- e.g., how much memory, how many CPUs -- but these parameters are not required and what you have should be good enough. I suggest running Trinity with the parameters you have and see what happens. In the end you should get a 'Trinity.fasta' file. The other files can be discarded.

                        Once you get a 'Trinity.fasta' file then you can use bowtie2 to back-map your reads or, perhaps better, the Trinnotate annotation pipeline described on the Trinity web site.

                        Comment


                        • #13
                          Well, off-hand I would say that you need to give the whole path to the Trinity.fasta file. I suspect that it is not located in the directory in which you are located.

                          In these 'file not found' cases it is always helpful to the rest of us if you can include the results of:

                          pwd

                          and a

                          ls -l

                          Comment


                          • #14
                            Hi westerman,

                            Hi westerman,

                            Thanks for you reply.
                            fullpath was mising in my cmd line.

                            Above cmd worked but with bowtie not with bowtie2.

                            With below cmd:
                            /bin/util/alignReads.pl --seqType fq --single CombineIonXpressRNA_010_NareshPool_Chip1_2_WT2_fastxtrimmer_from_quality_trimmer.fastq --target /media/DATAPART3/Combine_Files/Velvetoptimiser_27_37/trinity_output/Trinity.fasta --aligner bowtie2 --retain_intermediate_files --num_top_hits 20 --output align_bowtie_output1

                            Following error:
                            which: no tophat2 in (/root/perl5/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/root/Trinity/util/../trinity-plugins/rsem/sam/)
                            Error, path to required tophat2 cannot be found at /bin/util/alignReads.pl line 234.

                            #If I used bowtie instead of bowtie 2, it work fine with problem i.e. sam file has no header.

                            Hope you can help me out.
                            Thanks in advance.
                            Naresh


                            Originally posted by westerman View Post
                            Well, off-hand I would say that you need to give the whole path to the Trinity.fasta file. I suspect that it is not located in the directory in which you are located.

                            In these 'file not found' cases it is always helpful to the rest of us if you can include the results of:

                            pwd

                            and a

                            ls -l

                            Comment


                            • #15
                              I am surprised that Trinity would be looking for tophat2. However bowtie2 is closely associated with tophat2. I suggest installing tophat2. You may never need it but that should be the way to get Trinity to use bowtie2.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X