Hello all,
i apologise in advance for asking some basic questions
i have recently started working on RNA-Seq. i have fastq files in zipped format.
I read lot of threads and the one that came very close to my queries is
as far i understood, I have to do following steps:
1) merge the fastq files for each sample (3 files per sample in my case)
-> do i just need to concatenate the files or there is some specific software to achieve this?
2) I have to remove barcodes and adapter sequences.
-> how can i know if i have barcode and adapter sequences?
-> should i use cutadapt before fastqc, as fastqc gives results on first 200,000 sequences
-> are there any adapters specific to RNA-seq?
3) check the quality with fastqc and discard the data based on quality
* what is FASTQ grooming and why we need to do it?
As far i know from February 2011, Illumina's newest version (1.8) of their pipeline CASAVA will directly produce fastq in Sanger format in Phred+33 format. Hence, i don't need to use FASTQ Groomer.
4) align with reference genome
-> should i use assembly(human) grch37 or grch38 ? I am inclined to use gr38 as it should be most updated version
Some more basic questions:
in the HiSeq2000 fastq format
@HWI-ST1146:243:C5HH7ACXX:1:2316:16223:100755 1:N:0:NTTTCG
GGGAGGCTGTTCTGCTTTACGCATCTGAGAACTACATAGGAGAGNAANNN
+
CCCFFFFFHHHHHJJJJJJJJJ1FHIJJJJJJJJJJJJJJJJJJ#0?###
what is use of Index Sequence ?
i apologise in advance for asking some basic questions
i have recently started working on RNA-Seq. i have fastq files in zipped format.
I read lot of threads and the one that came very close to my queries is
as far i understood, I have to do following steps:
1) merge the fastq files for each sample (3 files per sample in my case)
-> do i just need to concatenate the files or there is some specific software to achieve this?
2) I have to remove barcodes and adapter sequences.
-> how can i know if i have barcode and adapter sequences?
-> should i use cutadapt before fastqc, as fastqc gives results on first 200,000 sequences
-> are there any adapters specific to RNA-seq?
3) check the quality with fastqc and discard the data based on quality
* what is FASTQ grooming and why we need to do it?
As far i know from February 2011, Illumina's newest version (1.8) of their pipeline CASAVA will directly produce fastq in Sanger format in Phred+33 format. Hence, i don't need to use FASTQ Groomer.
4) align with reference genome
-> should i use assembly(human) grch37 or grch38 ? I am inclined to use gr38 as it should be most updated version
Some more basic questions:
in the HiSeq2000 fastq format
@HWI-ST1146:243:C5HH7ACXX:1:2316:16223:100755 1:N:0:NTTTCG
GGGAGGCTGTTCTGCTTTACGCATCTGAGAACTACATAGGAGAGNAANNN
+
CCCFFFFFHHHHHJJJJJJJJJ1FHIJJJJJJJJJJJJJJJJJJ#0?###
what is use of Index Sequence ?
Comment