Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • New to bioinformatics - Read Alignment/Mapping

    Good day to you,

    I am a biology b.sc. student from germany and for my graduation i am working on RNAseq data sequenced by Illumina Next Seq 500.

    So far I've managed to do quality checks and trimmed adapter.

    For my next step i wanted to gain skills on read alignment. But since I am really new to bioinformatical work my knowledge is scant.

    Anyway, I tried STAR, HISAT2 and TopHat with a subset of my data and a reference. Before I am going to ask specific question on the softwares, I'd like to listen to your opinions and recommendations. To be honest this is very overwhelming and I hope you guys could give me a hand.

    I am not sure if I may tell details on the data, because most of it is not published yet. Let's say I have Rnaseq data from different treatments of an invertebrate species with a length of 150bp (paired-end). In the end the analysis should show some differential gene expression within the different treatments.

  • #2
    What's the question?

    HISAT replaces TopHat anyway, so no point running them both. If you want to try something different try Kallisto/sleuth or other pseudoalignment approach.

    Questions not to ask 'They all give different results - why?' because this is amply answered in several years worth of literature

    You will need to describe the experiment in detail, organism irrelevant - but how many samples, treatment groups, replicates etc, reads per sample etc. will help people understand what you are dealing with

    Comment


    • #3
      Thank you for your quick response and I am sorry for not providing enough information. I will try to do from now on.

      The experiment consists of 4 different treatments. Each treatment was replicated 5 times. Illumina Sequencing resulted in 25mio paired end reads per sample. In total I have 160 RNA samples (80 forward, 80 reverse).


      So, the question might be formulated as "Which of the current read alignment softwares might be suitable for this experiment?"

      I hope this can clarify some things.

      Comment


      • #4
        Well at least you have replicates, that is a good start. But your 'forward' and 'reverse' reads are not different samples - and your maths doesn't stack.

        4 treatments x 5 replicates = 20 samples.

        Were these run on multiple lanes of a sequencer? i.e. do you have 4 lanes worth of sequencing? That would make me think you have 80 'samples' with forward and reverse reads, which are actually '20 samples'. These will need to be combined for analysis, not treated as separate entities - you should be proceeding your analysis with 40 files - 20 'forward' and 20 'reverse' fastq.gz files

        I'd argue that unless you're running a direct comparison of different tools, or want to take orthogonal approaches, then learning what a tool does well will far outweigh the benefits of trying to cram the results of multiple tools together.

        *If* HISAT2 is giving you differentially expressed genes with the treatment groups, that's all you need. I'd focus on what those results might *mean* in the experimental situation - what's the biological story you're trying to explain and how do the results you see fit in with any prior hypothesis (or what they mean if no one thought of one.). IF you do move on with HISAT2 then take a look at this: https://www.nature.com/articles/nprot.2016.095

        You could take a look at other routes of analysis DESeq2/Limma/EdgeR - all very powerful R/Bioconductor packages, or the ones in my previous post if you're desperate to compare and contrast packages, but I can tell you now, you'll get different results with different packages which will overlap by various amounts.

        My advice stands if you're a biologist and you're interested in the biology - focus on the story.

        To be honest, running multiple pipelines end to end is an achievement if this isn't your background.
        Last edited by Bukowski; 11-28-2018, 06:21 AM.

        Comment


        • #5
          Oh. I see. Yes you are definitely right. It's 20 samples with 4 rev and 4 fwd fastq.gz.
          I guess i will have to see how to combine them before i start aligning them to the reference genome.

          It's not like the different tools were to be compared, it's just that I did not have a clue on which to use so I'd figured to try various approaches.

          This is all plenty to concern and I appreciate every bit of it. There will surely be more questions in the future but for now i thank you. Maybe we are going to have a talk again.

          Have a nice day

          Comment


          • #6
            You can crudely combine fastq.gz files with the 'cat' command, you just need to concatenate the 'forward' and 'reverse' reads for each sample separately

            cat fastq_sample1_lane1_R1.fastq.gz fastq_sample1_lane2_R1.fastq.gz > fastq_sample1_R1.fastq.gz

            if you want to be super cautious use zcat:

            zcat fastq_sample1_lane1_R1.fastq.gz fastq_sample1_lane2_R1.fastq.gz | gzip > fastq_sample1_R1.fastq.gz

            Should do the trick

            Comment


            • #7
              works like a charm

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X