Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • New to bioinformatics - Read Alignment/Mapping

    Good day to you,

    I am a biology b.sc. student from germany and for my graduation i am working on RNAseq data sequenced by Illumina Next Seq 500.

    So far I've managed to do quality checks and trimmed adapter.

    For my next step i wanted to gain skills on read alignment. But since I am really new to bioinformatical work my knowledge is scant.

    Anyway, I tried STAR, HISAT2 and TopHat with a subset of my data and a reference. Before I am going to ask specific question on the softwares, I'd like to listen to your opinions and recommendations. To be honest this is very overwhelming and I hope you guys could give me a hand.

    I am not sure if I may tell details on the data, because most of it is not published yet. Let's say I have Rnaseq data from different treatments of an invertebrate species with a length of 150bp (paired-end). In the end the analysis should show some differential gene expression within the different treatments.

  • #2
    What's the question?

    HISAT replaces TopHat anyway, so no point running them both. If you want to try something different try Kallisto/sleuth or other pseudoalignment approach.

    Questions not to ask 'They all give different results - why?' because this is amply answered in several years worth of literature

    You will need to describe the experiment in detail, organism irrelevant - but how many samples, treatment groups, replicates etc, reads per sample etc. will help people understand what you are dealing with

    Comment


    • #3
      Thank you for your quick response and I am sorry for not providing enough information. I will try to do from now on.

      The experiment consists of 4 different treatments. Each treatment was replicated 5 times. Illumina Sequencing resulted in 25mio paired end reads per sample. In total I have 160 RNA samples (80 forward, 80 reverse).


      So, the question might be formulated as "Which of the current read alignment softwares might be suitable for this experiment?"

      I hope this can clarify some things.

      Comment


      • #4
        Well at least you have replicates, that is a good start. But your 'forward' and 'reverse' reads are not different samples - and your maths doesn't stack.

        4 treatments x 5 replicates = 20 samples.

        Were these run on multiple lanes of a sequencer? i.e. do you have 4 lanes worth of sequencing? That would make me think you have 80 'samples' with forward and reverse reads, which are actually '20 samples'. These will need to be combined for analysis, not treated as separate entities - you should be proceeding your analysis with 40 files - 20 'forward' and 20 'reverse' fastq.gz files

        I'd argue that unless you're running a direct comparison of different tools, or want to take orthogonal approaches, then learning what a tool does well will far outweigh the benefits of trying to cram the results of multiple tools together.

        *If* HISAT2 is giving you differentially expressed genes with the treatment groups, that's all you need. I'd focus on what those results might *mean* in the experimental situation - what's the biological story you're trying to explain and how do the results you see fit in with any prior hypothesis (or what they mean if no one thought of one.). IF you do move on with HISAT2 then take a look at this: https://www.nature.com/articles/nprot.2016.095

        You could take a look at other routes of analysis DESeq2/Limma/EdgeR - all very powerful R/Bioconductor packages, or the ones in my previous post if you're desperate to compare and contrast packages, but I can tell you now, you'll get different results with different packages which will overlap by various amounts.

        My advice stands if you're a biologist and you're interested in the biology - focus on the story.

        To be honest, running multiple pipelines end to end is an achievement if this isn't your background.
        Last edited by Bukowski; 11-28-2018, 06:21 AM.

        Comment


        • #5
          Oh. I see. Yes you are definitely right. It's 20 samples with 4 rev and 4 fwd fastq.gz.
          I guess i will have to see how to combine them before i start aligning them to the reference genome.

          It's not like the different tools were to be compared, it's just that I did not have a clue on which to use so I'd figured to try various approaches.

          This is all plenty to concern and I appreciate every bit of it. There will surely be more questions in the future but for now i thank you. Maybe we are going to have a talk again.

          Have a nice day

          Comment


          • #6
            You can crudely combine fastq.gz files with the 'cat' command, you just need to concatenate the 'forward' and 'reverse' reads for each sample separately

            cat fastq_sample1_lane1_R1.fastq.gz fastq_sample1_lane2_R1.fastq.gz > fastq_sample1_R1.fastq.gz

            if you want to be super cautious use zcat:

            zcat fastq_sample1_lane1_R1.fastq.gz fastq_sample1_lane2_R1.fastq.gz | gzip > fastq_sample1_R1.fastq.gz

            Should do the trick

            Comment


            • #7
              works like a charm

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Advancing Precision Medicine for Rare Diseases in Children
                by seqadmin




                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                12-16-2024, 07:57 AM
              • seqadmin
                Recent Advances in Sequencing Technologies
                by seqadmin



                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                Long-Read Sequencing
                Long-read sequencing has seen remarkable advancements,...
                12-02-2024, 01:49 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 12-17-2024, 10:28 AM
              0 responses
              33 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-13-2024, 08:24 AM
              0 responses
              48 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-12-2024, 07:41 AM
              0 responses
              34 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-11-2024, 07:45 AM
              0 responses
              46 views
              0 likes
              Last Post seqadmin  
              Working...
              X