SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
bowtie command line for Illumina Hiseq 2000 with Illumina 1.5+ quality encoding files rworthi Illumina/Solexa 4 09-28-2011 12:25 PM
Multiplexing with Illumina HiSeq Croissant Illumina/Solexa 0 08-22-2011 12:14 PM
maq indelpe does not give any results RobinVanS Bioinformatics 0 08-03-2011 01:20 AM
Illumina HiSeq BclConverter wdt Bioinformatics 10 05-09-2011 01:21 PM
Kits for DGE on Illumina GA/HiSeq Kiki RNA Sequencing 0 06-25-2010 01:47 AM

Reply
 
Thread Tools
Old 07-31-2012, 05:52 AM   #1
Lilach
Member
 
Location: Israel

Join Date: Sep 2011
Posts: 20
Default Using MAQ with Illumina HiSeq results

Hi,
I want to try MAQ (for the first time) for analysis of Illumina HiSeq human whole exome results, and I have two questions:

1. Is it ok to use hg19_chromFa.tar (from UCSC) as a reference, or should I run it again each chromosome individually ?

2. The text file I got from the HiSeq (which is fastq actually) is ok as an input for maq fasta2bfa command, or should I change its fastq format? I think that Hiseq output is already in the Sanger fastq format, but I'm not sure? below I copied the beginning of the txt file.

Thanks!



@ILLUMINA-FFC6C4_0005:7:1:1941:1087#0/1
CACATTGGATTGATCGGTCTCATTGGCCCCCCGGGAGAAGCTGGGGAGAAAGGAGATCAGGGGGTGCCAGGCGT
+ILLUMINA-FFC6C4_0005:7:1:1941:1087#0/1
faf\f_ccfffffSedaRe\dcdYffffcggg`g^ae`ca^RbaJ_VWW_Z\\Na`d``]a`bGM[UYVa`]`B
@ILLUMINA-FFC6C4_0005:7:1:2045:1092#0/1
GTGTGAATTTCATTTCCACATAAATTTTCTGAGCTGCATCACGGGAGATCCAGTTTGTACGAAGCCAGTTGTTT
+ILLUMINA-FFC6C4_0005:7:1:2045:1092#0/1
fffff_ffff[a\feaacafffffgfggcff]f]ae`beffadffafcf^[ac^dWd^abe`[d`_be_BBBBB
@ILLUMINA-FFC6C4_0005:7:1:2497:1094#0/1
GACGCTCACTCTCTCTGGTATAACTTCACCATCATTCATTTGCCCAGACATGGGCAACAGTGGTGTGAGGTCCA
+ILLUMINA-FFC6C4_0005:7:1:2497:1094#0/1
cbbd][dbdffffcbcccRa^ab^cff[d^Qaa^f[fefg_gcec`_f[Y^a^_ccaa_aZ`dYa[`Y``Y^Z[
@ILLUMINA-FFC6C4_0005:7:1:2730:1091#0/1
GGAACACACAGCTTCCCAGCTTTGGACAGTTGGTACAGCCTGAGGATGAGGGAAGCCAAGAACAAAAAACACCA
+ILLUMINA-FFC6C4_0005:7:1:2730:1091#0/1
ddaa`da`aacZa^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
Lilach is offline   Reply With Quote
Old 07-31-2012, 07:02 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,541
Default

Quote:
Originally Posted by Lilach View Post
I think that Hiseq output is already in the Sanger fastq format, but I'm not sure?
That string of BBBBBBBBB qualities in the fourth read looks like the PHRED Q2 marker for bad signal, which means this is in the old Illumina encoding, not the Sanger encoding used in the more recent Illumina pipelines. See:
http://news.open-bio.org/news/2010/0...q2-trim-fastq/
http://seqanswers.com/forums/showpos...91&postcount=3
maubp is offline   Reply With Quote
Old 07-31-2012, 09:14 AM   #3
dariober
Senior Member
 
Location: Cambridge, UK

Join Date: May 2010
Posts: 311
Default

Quote:
Originally Posted by Lilach View Post
Hi,
I want to try MAQ (for the first time) for analysis of Illumina HiSeq human whole exome results, and I have two questions:
Do you have any particular reason to use MAQ? I think it is a bit outdated and nowdays most people use bwa, bowtie (1 or 2) or similar. I think MAQ was designed when sequence files were a few million reads and with the output of HiSeq (100s millions) it might take ages and/or require a lot of memory.

Quote:
1. Is it ok to use hg19_chromFa.tar (from UCSC) as a reference, or should I run it again each chromosome individually ?
Not sure, but in general you don't want to split the reference sequence otherwise you can't tell whether a read aligns equally well to different chromosomes (well, you can but it would be more work downstream of the alignment which I don't think it pays off). To save time and parallelize you can split the sequence files though, unless it is RNAseq data you have.

Hope I'm not misunderstanding your question...

Best
Dario
dariober is offline   Reply With Quote
Old 07-31-2012, 09:53 AM   #4
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

MAQ's problem is that it's not a fast aligner. You want one of the Burrows-Wheeler Transform algorithms. That means Bowtie or bwa.

And yes, in general, you want to align to the whole reference at one go. These algorithms will always try to fit your reads to the reference they are given, so you want to give the program your whole references. If a read aligns to Chr 6 perfectly, you don't want the software to be telling you it aligns to Chr 1 with two errors, but if you only give the software Chr 1 to align to, that's what it will do.
swbarnes2 is offline   Reply With Quote
Old 08-01-2012, 03:56 AM   #5
Lilach
Member
 
Location: Israel

Join Date: Sep 2011
Posts: 20
Default

Thank you for the answers!
So I read a little and it seems as Illimuna 1.5 fastq, becuase of the BBBBBBB strings.
Can I use BWA aln and sampe directly on these files, or should I reformat them to Sanger fastq?

Regarding MAQ - I wanted to compare its results to BWA. I already used BWA, but now I'm afraid the fastq qualities were not interpreted well?
Lilach is offline   Reply With Quote
Old 08-01-2012, 05:43 PM   #6
cam.jack
Member
 
Location: Canberra

Join Date: Jun 2011
Posts: 11
Default

I would have thought MAQ was totally outdated now. Does it even output to SAM?

With Illumina 1.3+ to 1.7 you need to use the -I flag with BWA.
cam.jack is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:36 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO