SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BOWTIE2: How to randomly map a read to single alignment when read is multimapping? cbaudo Bioinformatics 5 01-27-2016 10:33 AM
How to get bwa to report one alignment per read or read pair golharam Bioinformatics 0 08-04-2015 09:51 AM
multiple read mapping with the same read set with newbler vincebaby6 Bioinformatics 6 12-20-2012 03:10 AM

Reply
 
Thread Tools
Old 11-28-2018, 04:02 AM   #1
hiatus
Junior Member
 
Location: Germany

Join Date: Nov 2018
Posts: 5
Default New to bioinformatics - Read Alignment/Mapping

Good day to you,

I am a biology b.sc. student from germany and for my graduation i am working on RNAseq data sequenced by Illumina Next Seq 500.

So far I've managed to do quality checks and trimmed adapter.

For my next step i wanted to gain skills on read alignment. But since I am really new to bioinformatical work my knowledge is scant.

Anyway, I tried STAR, HISAT2 and TopHat with a subset of my data and a reference. Before I am going to ask specific question on the softwares, I'd like to listen to your opinions and recommendations. To be honest this is very overwhelming and I hope you guys could give me a hand.

I am not sure if I may tell details on the data, because most of it is not published yet. Let's say I have Rnaseq data from different treatments of an invertebrate species with a length of 150bp (paired-end). In the end the analysis should show some differential gene expression within the different treatments.
hiatus is offline   Reply With Quote
Old 11-28-2018, 04:24 AM   #2
Bukowski
Senior Member
 
Location: Aberdeen, Scotland

Join Date: Jan 2010
Posts: 362
Default

What's the question?

HISAT replaces TopHat anyway, so no point running them both. If you want to try something different try Kallisto/sleuth or other pseudoalignment approach.

Questions not to ask 'They all give different results - why?' because this is amply answered in several years worth of literature

You will need to describe the experiment in detail, organism irrelevant - but how many samples, treatment groups, replicates etc, reads per sample etc. will help people understand what you are dealing with
Bukowski is offline   Reply With Quote
Old 11-28-2018, 05:44 AM   #3
hiatus
Junior Member
 
Location: Germany

Join Date: Nov 2018
Posts: 5
Default

Thank you for your quick response and I am sorry for not providing enough information. I will try to do from now on.

The experiment consists of 4 different treatments. Each treatment was replicated 5 times. Illumina Sequencing resulted in 25mio paired end reads per sample. In total I have 160 RNA samples (80 forward, 80 reverse).


So, the question might be formulated as "Which of the current read alignment softwares might be suitable for this experiment?"

I hope this can clarify some things.
hiatus is offline   Reply With Quote
Old 11-28-2018, 06:03 AM   #4
Bukowski
Senior Member
 
Location: Aberdeen, Scotland

Join Date: Jan 2010
Posts: 362
Default

Well at least you have replicates, that is a good start. But your 'forward' and 'reverse' reads are not different samples - and your maths doesn't stack.

4 treatments x 5 replicates = 20 samples.

Were these run on multiple lanes of a sequencer? i.e. do you have 4 lanes worth of sequencing? That would make me think you have 80 'samples' with forward and reverse reads, which are actually '20 samples'. These will need to be combined for analysis, not treated as separate entities - you should be proceeding your analysis with 40 files - 20 'forward' and 20 'reverse' fastq.gz files

I'd argue that unless you're running a direct comparison of different tools, or want to take orthogonal approaches, then learning what a tool does well will far outweigh the benefits of trying to cram the results of multiple tools together.

*If* HISAT2 is giving you differentially expressed genes with the treatment groups, that's all you need. I'd focus on what those results might *mean* in the experimental situation - what's the biological story you're trying to explain and how do the results you see fit in with any prior hypothesis (or what they mean if no one thought of one.). IF you do move on with HISAT2 then take a look at this: https://www.nature.com/articles/nprot.2016.095

You could take a look at other routes of analysis DESeq2/Limma/EdgeR - all very powerful R/Bioconductor packages, or the ones in my previous post if you're desperate to compare and contrast packages, but I can tell you now, you'll get different results with different packages which will overlap by various amounts.

My advice stands if you're a biologist and you're interested in the biology - focus on the story.

To be honest, running multiple pipelines end to end is an achievement if this isn't your background.

Last edited by Bukowski; 11-28-2018 at 06:21 AM.
Bukowski is offline   Reply With Quote
Old 11-28-2018, 06:37 AM   #5
hiatus
Junior Member
 
Location: Germany

Join Date: Nov 2018
Posts: 5
Default

Oh. I see. Yes you are definitely right. It's 20 samples with 4 rev and 4 fwd fastq.gz.
I guess i will have to see how to combine them before i start aligning them to the reference genome.

It's not like the different tools were to be compared, it's just that I did not have a clue on which to use so I'd figured to try various approaches.

This is all plenty to concern and I appreciate every bit of it. There will surely be more questions in the future but for now i thank you. Maybe we are going to have a talk again.

Have a nice day
hiatus is offline   Reply With Quote
Old 11-28-2018, 06:50 AM   #6
Bukowski
Senior Member
 
Location: Aberdeen, Scotland

Join Date: Jan 2010
Posts: 362
Default

You can crudely combine fastq.gz files with the 'cat' command, you just need to concatenate the 'forward' and 'reverse' reads for each sample separately

cat fastq_sample1_lane1_R1.fastq.gz fastq_sample1_lane2_R1.fastq.gz > fastq_sample1_R1.fastq.gz

if you want to be super cautious use zcat:

zcat fastq_sample1_lane1_R1.fastq.gz fastq_sample1_lane2_R1.fastq.gz | gzip > fastq_sample1_R1.fastq.gz

Should do the trick
Bukowski is offline   Reply With Quote
Old 11-28-2018, 07:20 AM   #7
hiatus
Junior Member
 
Location: Germany

Join Date: Nov 2018
Posts: 5
Default

works like a charm
hiatus is offline   Reply With Quote
Reply

Tags
alignment, mapping, new to bioinformatics, read

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:01 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO