So basically what I am doing is looking at 20 specimens (Serratia marcescens), which is 40 fastq files and I am trying to compare them to the ancestral genome aka reference genome.
My first task was the clean up the raw data, but I am clueless. I was told to first find out whether or not each file contains good reads or bad reads (using FASTQC), then remove the adapter sequences with cutadapt or a python script. I've been tinkering with the FASTQC software and I understand the results to a certain extent, but I still don't know how to clean data that I judge to be 'bad'.
I cannot get cutadapt to work and I am not sure how I would go about using a python script because my knowledge relevant to biology is very rusty. I was told if I can't get cutadapt to work, then a python script would be faster.
After cleaning the data, my next task is gene mapping. I think I was told to use BWA and the software is installed, but I am still clueless as how I should use it.
And the final stage is analysis I think...
Can anyone point me in the right direction? I am having a difficult time understanding everything that is going on, especially trying to understand both the biology and computer science side.
Thanks.
Edit:
I am expected to know how to do all of this in Linux terminal. I have the basics down pretty much, but I have to apply it.
My first task was the clean up the raw data, but I am clueless. I was told to first find out whether or not each file contains good reads or bad reads (using FASTQC), then remove the adapter sequences with cutadapt or a python script. I've been tinkering with the FASTQC software and I understand the results to a certain extent, but I still don't know how to clean data that I judge to be 'bad'.
I cannot get cutadapt to work and I am not sure how I would go about using a python script because my knowledge relevant to biology is very rusty. I was told if I can't get cutadapt to work, then a python script would be faster.
After cleaning the data, my next task is gene mapping. I think I was told to use BWA and the software is installed, but I am still clueless as how I should use it.
And the final stage is analysis I think...
Can anyone point me in the right direction? I am having a difficult time understanding everything that is going on, especially trying to understand both the biology and computer science side.
Thanks.
Edit:
I am expected to know how to do all of this in Linux terminal. I have the basics down pretty much, but I have to apply it.
Comment