View Single Post
Old 06-24-2015, 09:01 AM   #1
imsharmanitin
Postdoc Cancer Bioinformatics
 
Location: Olso, Norway

Join Date: Dec 2014
Posts: 17
Cool processing Fastq files from HiSeq2000 single end and RNAseq analysis

Hello all,

i apologise in advance for asking some basic questions

i have recently started working on RNA-Seq. i have fastq files in zipped format.

I read lot of threads and the one that came very close to my queries is
http://seqanswers.com/forums/showthread.php?t=21331

as far i understood, I have to do following steps:

1) merge the fastq files for each sample (3 files per sample in my case)
-> do i just need to concatenate the files or there is some specific software to achieve this?

2) I have to remove barcodes and adapter sequences.
-> how can i know if i have barcode and adapter sequences?
-> should i use cutadapt before fastqc, as fastqc gives results on first 200,000 sequences
-> are there any adapters specific to RNA-seq?

3) check the quality with fastqc and discard the data based on quality

* what is FASTQ grooming and why we need to do it?
As far i know from February 2011, Illumina's newest version (1.8) of their pipeline CASAVA will directly produce fastq in Sanger format in Phred+33 format. Hence, i don't need to use FASTQ Groomer.

4) align with reference genome
-> should i use assembly(human) grch37 or grch38 ? I am inclined to use gr38 as it should be most updated version


Some more basic questions:

in the HiSeq2000 fastq format

@HWI-ST1146:243:C5HH7ACXX:1:2316:16223:100755 1:N:0:NTTTCG
GGGAGGCTGTTCTGCTTTACGCATCTGAGAACTACATAGGAGAGNAANNN
+
CCCFFFFFHHHHHJJJJJJJJJ1FHIJJJJJJJJJJJJJJJJJJ#0?###

what is use of Index Sequence ?
imsharmanitin is offline   Reply With Quote