SEQanswers

Go Back   SEQanswers > Applications Forums > Genomic Resequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to fix Illumina FASTQ files with read length varies errors antgomo Illumina/Solexa 5 08-19-2014 11:28 PM
Split Large FASTQ file in small FASTQ files with user defined number of reads Windows deepbiomed Bioinformatics 3 04-04-2013 07:14 AM
Converting Tophats bam output back to separate paired end read fastq files bob-loblaw Bioinformatics 0 12-03-2012 04:23 AM
Take out the read of specific length from fastq files figo1019 Bioinformatics 2 07-25-2012 05:34 AM
Consensus part from sequence read(fastq) and align(BAM) files culmen Bioinformatics 5 12-21-2010 03:57 AM

Reply
 
Thread Tools
Old 05-04-2013, 01:32 AM   #1
Mr.Zurich1992
Member
 
Location: Switzerland

Join Date: May 2013
Posts: 20
Default How to read FASTQ files?

Hi


I have sequenced my exomes by Genebygene in Houston, TX. I got now the exome data. But the exome data has FASTQ files. I can't use them. What I have to do?
Mr.Zurich1992 is offline   Reply With Quote
Old 05-04-2013, 01:38 AM   #2
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default How to read FASTQ files?

fastq is a common format for sequencing data, and most alignment programs (like BWA and Bowtie) use fastq files as input.
mastal is offline   Reply With Quote
Old 05-04-2013, 01:44 AM   #3
Mr.Zurich1992
Member
 
Location: Switzerland

Join Date: May 2013
Posts: 20
Default

I have to download BWA?
Mr.Zurich1992 is offline   Reply With Quote
Old 05-04-2013, 01:55 AM   #4
Mr.Zurich1992
Member
 
Location: Switzerland

Join Date: May 2013
Posts: 20
Default

It doesn't run.
Mr.Zurich1992 is offline   Reply With Quote
Old 05-04-2013, 01:58 AM   #5
Bukowski
Senior Member
 
Location: Aberdeen, Scotland

Join Date: Jan 2010
Posts: 379
Default

http://seqanswers.com/wiki/How-to/exome_analysis

What on earth possessed you to do an exome experiment with no idea how to analyse the data? You do realise there are other sequencing providers that would do the alignment, variant calling, annotation and everything else for a modest extra cost on the capture and sequencing.
Bukowski is offline   Reply With Quote
Old 05-04-2013, 02:02 AM   #6
Mr.Zurich1992
Member
 
Location: Switzerland

Join Date: May 2013
Posts: 20
Default

I am not even able TO OPEN the exome data!
Mr.Zurich1992 is offline   Reply With Quote
Old 05-04-2013, 02:05 AM   #7
Bukowski
Senior Member
 
Location: Aberdeen, Scotland

Join Date: Jan 2010
Posts: 379
Default

You don't 'open' exome data. It's probably compressed fastq files, they end in a .gz extension. These files are binary files and you are meant to align them to a reference genome as your next step. And yes, that means using BWA. Which works on Linux machines.. just in case you were entertaning the idea of trying to do it in Windows.
Bukowski is offline   Reply With Quote
Old 05-04-2013, 02:12 AM   #8
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default How to read FASTQ files?

What kind of computer/operating system are you using?

If you need a crash course in basic linux, see this tutorial:
http://www.ee.surrey.ac.uk/Teaching/Unix/

Your fastq files are probably compressed.
If they are of the format reads.fastq.gz, you need to do,
at the terminal prompt,

$gunzip reads.fastq.gz

if they are in fastq.tgz format,

$tar xvzf reads.fastq.tgz
mastal is offline   Reply With Quote
Old 05-04-2013, 02:14 AM   #9
Mr.Zurich1992
Member
 
Location: Switzerland

Join Date: May 2013
Posts: 20
Default

I'm using Winsows vista 32-bit and winrar

It doesn't run on Windows?


Anyway I'm looking forward to get the exome result variant, too.

Last edited by Mr.Zurich1992; 05-04-2013 at 02:17 AM.
Mr.Zurich1992 is offline   Reply With Quote
Old 05-04-2013, 02:21 AM   #10
Bukowski
Senior Member
 
Location: Aberdeen, Scotland

Join Date: Jan 2010
Posts: 379
Default

A 32 bit machine wont even let you build the bwa index required to to do the alignment, so no you're not going to be able to analyse the data on a 32bit Windows machine.

Please go and seek out a local bioinformatician who can help you. I've analysed 1000's of exomes, and if you are doing it for the first time, you won't even get started without a 64bit Linux machines with a few GB of RAM and a full week to dedicate to understanding how it all works.
Bukowski is offline   Reply With Quote
Old 05-04-2013, 02:46 AM   #11
JackieBadger
Senior Member
 
Location: Halifax, Nova Scotia

Join Date: Mar 2009
Posts: 381
Default

Google is a wonderful invention, you know?
JackieBadger is offline   Reply With Quote
Old 05-04-2013, 03:12 AM   #12
Mr.Zurich1992
Member
 
Location: Switzerland

Join Date: May 2013
Posts: 20
Default

Isn't possible I could convert the FASTQ files to bam files?


Cause there is a good program called Bamseek. Bamseek is able to read the FASTQ files, but the result is a big mess.
Mr.Zurich1992 is offline   Reply With Quote
Old 05-04-2013, 03:40 AM   #13
JackieBadger
Senior Member
 
Location: Halifax, Nova Scotia

Join Date: Mar 2009
Posts: 381
Default

BAM is an alignment file...FASTQ is not
JackieBadger is offline   Reply With Quote
Old 05-04-2013, 04:29 AM   #14
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

fastq files are human-readable text files if you unzip/untar them.

aligners like bwa will output the alignments in bam (binary) or sam (the human-readable text equivalent of bam) formats, and you can use samtools to convert between bam and sam.
mastal is offline   Reply With Quote
Old 05-06-2013, 12:47 AM   #15
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 993
Default

Look, you already paid a company to do the sequencing for you. They gave you back the raw data, right as it comes out of the machine, instead of running it through the usual analysis pipeline. You can do this yourself, but it will take you at least several weeks to learn how to do that. Why don't you simply pay the company to do the analysis for you, too, and to provide you with a list differences between your sample and the mouse reference, if this is what you need.
Simon Anders is offline   Reply With Quote
Old 05-07-2013, 04:31 PM   #16
nkaushik
Junior Member
 
Location: UK

Join Date: Mar 2012
Posts: 3
Default

There are commercial softwares e.g. CLCBio, DNAStar, NextGene etc. to analyse NGS data that can take fastQ format as input data just follow the manual.
nkaushik is offline   Reply With Quote
Old 06-30-2013, 02:27 AM   #17
vishnuamaram
Member
 
Location: india

Join Date: Jun 2013
Posts: 42
Default

Hi Bukowski, mastal / simon andrews

I hereby write you as you are experts and pioneers in the NGS Analysis.

1) i have whole genome seq. data of human lymphocytes done on illumina hiseq 2000 paired end 2X100bp read length

2) i had done FASTQC of my data and trimmed low quality bases using FASTX toolkit.

Q) i am struck to go further with analysis as i am not sure of how to perform indexing.
--> what does the indexing do exactly. do we need to consider the Chr un... files for indexing.
kindly let me know the downstream analysis steps with clarity.
Will be very thankful to you.

Vishnu.
vishnuamaram is offline   Reply With Quote
Old 06-30-2013, 04:32 AM   #18
hanshart
Member
 
Location: Germany

Join Date: Nov 2011
Posts: 27
Default

Quote:
Originally Posted by vishnuamaram View Post
...
Q) i am struck to go further with analysis as i am not sure of how to perform indexing.
--> what does the indexing do exactly. do we need to consider the Chr un... files for indexing.
kindly let me know the downstream analysis steps with clarity.
Will be very thankful to you.
Vishnu.
Hi Vishnu,
The indexing itself is not that important for you, it is just an algorithm to store your reference in a nice data structure. Every aligner uses it's own indexing strategy based on the alignment strategy.
What you want to do next is the alignment/mapping of your trimmed reads to a reference. For this you need a program like STAR, TopHat, Bowtie 1/2, BWA, .... So, you have to decide which mapping program to use and which reference. It is up to you and your biological questions, whether you want to include unplaced contigs (chr_Un..., UCSC) in your reference. Finally you end up with a folder of FASTA files or a single FASTA file fully describing your reference chromosomes/contigs. This FASTA has to be converted to an index suitable for your mapping program. Most often it should be only a single command to to this conversion step. For instance in Bowtie you simply type:
Code:
bowie-build REF_FASTA IDX_FOLDER/IDX_NAME
where REF_FASTA is the path to your reference FASTA (files).
For every mapping with Bowtie against this reference you have to provide the path to this index, as it is in fact just another representation of your reference FASTA.
Hope this helps
hanshart is offline   Reply With Quote
Old 07-31-2013, 10:25 PM   #19
vishnuamaram
Member
 
Location: india

Join Date: Jun 2013
Posts: 42
Default

Hi hanshart,

Thank you very much for your instant response and I apologize for my delayed response.

At present, i am done with the indexing and alignment of my data using BWA. ( whole genome sequence of human lymphocytes).

I am interesting in looking for SNP's and if possible structural variations- indels, CNV's.

Other than SAMTOOLS, what other software tools may be required for the further downstream analysis. Kindly let me know.

Thank you,
Vishnu.
vishnuamaram is offline   Reply With Quote
Old 08-01-2013, 12:20 AM   #20
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default How to read FASTQ files?

Other popular tools are GATK from the Broad institute for finding SNPs/genotype calling, and dindel or pindel for finding indels.
mastal is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:32 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO