Hello everyone,
First of all, you have no idea how great it feels to find this place. I'm a programmer/comp. scientist and totally new to bioinformatics. I need to do some analysis for a job and I have no clue about many things.
Here's the story:
I am given a huge data set for 15 subjects with some sort of disease and I'm supposed to analyse it to find out common SNPs for that disease. Each data set contains a huge amount of fastq sequences such as this one:
@1:1:1166:20230:Y
GAATGTAGATTTCTTCTAACACACAACACATNCATG
+
DDD?DBEEEED?DEE?EEDE?B5?CC########
My questions are (I apologize in advance if they are too many and too stupid to ask, please bear with me):
1- I know now that I need to do adapter trimming and quality filtering. For that, don't I need to be given the 'adapter' sequences? if not, where do I find common adapter sequences? Also, how long should the adapter sequences be? as you can see, my fastq sequence is 36 bp long. How do I determine how much of it as adapter?
2- What are indexed samples? or better said: what is my reference for indexing my cleaned up sequence?
3- Can you give an outline for the logical steps that i will need to follow to implement my analysis? (all I will need is the order of each step at a very high level and I will dig into it).
4- Why do I need two files each with a read from a different direction, for each subject?
Thank you, much appreciated and sorry
First of all, you have no idea how great it feels to find this place. I'm a programmer/comp. scientist and totally new to bioinformatics. I need to do some analysis for a job and I have no clue about many things.
Here's the story:
I am given a huge data set for 15 subjects with some sort of disease and I'm supposed to analyse it to find out common SNPs for that disease. Each data set contains a huge amount of fastq sequences such as this one:
@1:1:1166:20230:Y
GAATGTAGATTTCTTCTAACACACAACACATNCATG
+
DDD?DBEEEED?DEE?EEDE?B5?CC########
My questions are (I apologize in advance if they are too many and too stupid to ask, please bear with me):
1- I know now that I need to do adapter trimming and quality filtering. For that, don't I need to be given the 'adapter' sequences? if not, where do I find common adapter sequences? Also, how long should the adapter sequences be? as you can see, my fastq sequence is 36 bp long. How do I determine how much of it as adapter?
2- What are indexed samples? or better said: what is my reference for indexing my cleaned up sequence?
3- Can you give an outline for the logical steps that i will need to follow to implement my analysis? (all I will need is the order of each step at a very high level and I will dig into it).
4- Why do I need two files each with a read from a different direction, for each subject?
Thank you, much appreciated and sorry
Comment