![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to fix Illumina FASTQ files with read length varies errors | antgomo | Illumina/Solexa | 5 | 08-20-2014 12:28 AM |
Split Large FASTQ file in small FASTQ files with user defined number of reads Windows | deepbiomed | Bioinformatics | 3 | 04-04-2013 08:14 AM |
Converting Tophats bam output back to separate paired end read fastq files | bob-loblaw | Bioinformatics | 0 | 12-03-2012 05:23 AM |
Take out the read of specific length from fastq files | figo1019 | Bioinformatics | 2 | 07-25-2012 06:34 AM |
Consensus part from sequence read(fastq) and align(BAM) files | culmen | Bioinformatics | 5 | 12-21-2010 04:57 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Switzerland Join Date: May 2013
Posts: 20
|
![]()
Hi
I have sequenced my exomes by Genebygene in Houston, TX. I got now the exome data. But the exome data has FASTQ files. I can't use them. What I have to do? |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: uk Join Date: Mar 2009
Posts: 667
|
![]()
fastq is a common format for sequencing data, and most alignment programs (like BWA and Bowtie) use fastq files as input.
|
![]() |
![]() |
![]() |
#3 |
Member
Location: Switzerland Join Date: May 2013
Posts: 20
|
![]()
I have to download BWA?
|
![]() |
![]() |
![]() |
#4 |
Member
Location: Switzerland Join Date: May 2013
Posts: 20
|
![]()
It doesn't run.
|
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: UK Join Date: Jan 2010
Posts: 390
|
![]()
http://seqanswers.com/wiki/How-to/exome_analysis
What on earth possessed you to do an exome experiment with no idea how to analyse the data? You do realise there are other sequencing providers that would do the alignment, variant calling, annotation and everything else for a modest extra cost on the capture and sequencing. |
![]() |
![]() |
![]() |
#6 |
Member
Location: Switzerland Join Date: May 2013
Posts: 20
|
![]()
I am not even able TO OPEN the exome data!
|
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: UK Join Date: Jan 2010
Posts: 390
|
![]()
You don't 'open' exome data. It's probably compressed fastq files, they end in a .gz extension. These files are binary files and you are meant to align them to a reference genome as your next step. And yes, that means using BWA. Which works on Linux machines.. just in case you were entertaning the idea of trying to do it in Windows.
|
![]() |
![]() |
![]() |
#8 |
Senior Member
Location: uk Join Date: Mar 2009
Posts: 667
|
![]()
What kind of computer/operating system are you using?
If you need a crash course in basic linux, see this tutorial: http://www.ee.surrey.ac.uk/Teaching/Unix/ Your fastq files are probably compressed. If they are of the format reads.fastq.gz, you need to do, at the terminal prompt, $gunzip reads.fastq.gz if they are in fastq.tgz format, $tar xvzf reads.fastq.tgz |
![]() |
![]() |
![]() |
#9 |
Member
Location: Switzerland Join Date: May 2013
Posts: 20
|
![]()
I'm using Winsows vista 32-bit and winrar
It doesn't run on Windows? Anyway I'm looking forward to get the exome result variant, too. Last edited by Mr.Zurich1992; 05-04-2013 at 03:17 AM. |
![]() |
![]() |
![]() |
#10 |
Senior Member
Location: UK Join Date: Jan 2010
Posts: 390
|
![]()
A 32 bit machine wont even let you build the bwa index required to to do the alignment, so no you're not going to be able to analyse the data on a 32bit Windows machine.
Please go and seek out a local bioinformatician who can help you. I've analysed 1000's of exomes, and if you are doing it for the first time, you won't even get started without a 64bit Linux machines with a few GB of RAM and a full week to dedicate to understanding how it all works. |
![]() |
![]() |
![]() |
#11 |
Senior Member
Location: Halifax, Nova Scotia Join Date: Mar 2009
Posts: 381
|
![]()
Google is a wonderful invention, you know?
|
![]() |
![]() |
![]() |
#12 |
Member
Location: Switzerland Join Date: May 2013
Posts: 20
|
![]()
Isn't possible I could convert the FASTQ files to bam files?
Cause there is a good program called Bamseek. Bamseek is able to read the FASTQ files, but the result is a big mess. |
![]() |
![]() |
![]() |
#13 |
Senior Member
Location: Halifax, Nova Scotia Join Date: Mar 2009
Posts: 381
|
![]()
BAM is an alignment file...FASTQ is not
|
![]() |
![]() |
![]() |
#14 |
Senior Member
Location: uk Join Date: Mar 2009
Posts: 667
|
![]()
fastq files are human-readable text files if you unzip/untar them.
aligners like bwa will output the alignments in bam (binary) or sam (the human-readable text equivalent of bam) formats, and you can use samtools to convert between bam and sam. |
![]() |
![]() |
![]() |
#15 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]()
Look, you already paid a company to do the sequencing for you. They gave you back the raw data, right as it comes out of the machine, instead of running it through the usual analysis pipeline. You can do this yourself, but it will take you at least several weeks to learn how to do that. Why don't you simply pay the company to do the analysis for you, too, and to provide you with a list differences between your sample and the mouse reference, if this is what you need.
|
![]() |
![]() |
![]() |
#16 |
Junior Member
Location: UK Join Date: Mar 2012
Posts: 3
|
![]()
There are commercial softwares e.g. CLCBio, DNAStar, NextGene etc. to analyse NGS data that can take fastQ format as input data just follow the manual.
|
![]() |
![]() |
![]() |
#17 |
Member
Location: india Join Date: Jun 2013
Posts: 42
|
![]()
Hi Bukowski, mastal / simon andrews
I hereby write you as you are experts and pioneers in the NGS Analysis. 1) i have whole genome seq. data of human lymphocytes done on illumina hiseq 2000 paired end 2X100bp read length 2) i had done FASTQC of my data and trimmed low quality bases using FASTX toolkit. Q) i am struck to go further with analysis as i am not sure of how to perform indexing. --> what does the indexing do exactly. do we need to consider the Chr un... files for indexing. kindly let me know the downstream analysis steps with clarity. Will be very thankful to you. Vishnu. |
![]() |
![]() |
![]() |
#18 | |
Member
Location: Germany Join Date: Nov 2011
Posts: 27
|
![]() Quote:
The indexing itself is not that important for you, it is just an algorithm to store your reference in a nice data structure. Every aligner uses it's own indexing strategy based on the alignment strategy. What you want to do next is the alignment/mapping of your trimmed reads to a reference. For this you need a program like STAR, TopHat, Bowtie 1/2, BWA, .... So, you have to decide which mapping program to use and which reference. It is up to you and your biological questions, whether you want to include unplaced contigs (chr_Un..., UCSC) in your reference. Finally you end up with a folder of FASTA files or a single FASTA file fully describing your reference chromosomes/contigs. This FASTA has to be converted to an index suitable for your mapping program. Most often it should be only a single command to to this conversion step. For instance in Bowtie you simply type: Code:
bowie-build REF_FASTA IDX_FOLDER/IDX_NAME For every mapping with Bowtie against this reference you have to provide the path to this index, as it is in fact just another representation of your reference FASTA. Hope this helps |
|
![]() |
![]() |
![]() |
#19 |
Member
Location: india Join Date: Jun 2013
Posts: 42
|
![]()
Hi hanshart,
Thank you very much for your instant response and I apologize for my delayed response. At present, i am done with the indexing and alignment of my data using BWA. ( whole genome sequence of human lymphocytes). I am interesting in looking for SNP's and if possible structural variations- indels, CNV's. Other than SAMTOOLS, what other software tools may be required for the further downstream analysis. Kindly let me know. Thank you, Vishnu. |
![]() |
![]() |
![]() |
#20 |
Senior Member
Location: uk Join Date: Mar 2009
Posts: 667
|
![]()
Other popular tools are GATK from the Broad institute for finding SNPs/genotype calling, and dindel or pindel for finding indels.
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|