SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   FASTQ to SAM conversion (http://seqanswers.com/forums/showthread.php?t=5325)

albrown415 06-01-2010 11:22 AM

FASTQ to SAM conversion
 
What is the best program to use for converting fastq (or eland extended) files to SAM format? Thanks!

nilshomer 06-01-2010 11:46 AM

Quote:

Originally Posted by albrown415 (Post 19470)
What is the best program to use for converting fastq (or eland extended) files to SAM format? Thanks!

Do you want to align the data first, or just represent the FASTQ data in the SAM format? Is the data paired end (mate-pair)?

albrown415 06-02-2010 08:20 AM

You're right. This was a silly question. Most people map the reads at this point and choose an alignment program that outputs the data into the proper format. I'm working to get Bowtie running on my computer, which I believe should be able to input fastq and output SAM.

maubp 06-04-2010 12:28 PM

If you did want to convert the FASTQ to an unaligned SAM or BAM file, try this:
http://picard.sourceforge.net/comman...tml#FastqToSam

albrown415 06-04-2010 01:59 PM

Thanks. That's great to know.

ebatis2 01-04-2012 07:03 AM

Hi,

I've received results from my first NGS run and like you posted, I'd like to convert my fastq file to a SAM file in order to upload and retrieve data from Galaxy. I'll need to map the reads and align them to the Rice genome...but is this something I could do on my MacOSX? I'm at a loss as far as how to retrieve the sequencing results! Any help would be greatly appreciated!!

husamia 01-04-2012 11:55 AM

Quote:

Originally Posted by ebatis2 (Post 60881)
Hi,

I've received results from my first NGS run and like you posted, I'd like to convert my fastq file to a SAM file in order to upload and retrieve data from Galaxy. I'll need to map the reads and align them to the Rice genome...but is this something I could do on my MacOSX? I'm at a loss as far as how to retrieve the sequencing results! Any help would be greatly appreciated!!

FASTQ is the raw reads with qualities. SAM is format to describe reads and their alignment. This was hinted in the previous respond above. It seems you've got to align the reads first if you just received the raw FASTQ. Perhaps you should be asking how to align your reads. I can't help much because I am not sure what your goal is. Galaxy has a tutorial on how to align your reads and produce a SAM file. Check it out.

amolkolte 12-10-2012 10:00 PM

Quote:

Originally Posted by husamia (Post 60902)
FASTQ is the raw reads with qualities. SAM is format to describe reads and their alignment. This was hinted in the previous respond above. It seems you've got to align the reads first if you just received the raw FASTQ. Perhaps you should be asking how to align your reads. I can't help much because I am not sure what your goal is. Galaxy has a tutorial on how to align your reads and produce a SAM file. Check it out.

I have the raw FASTQ reads and in order to perform de novo assembly using transAbySS, I need to feed the input in the form of bam or sam. Can you please shed some light on this.

TheRob 12-11-2012 05:23 PM

Hi Amolkote,

It is rather unusual for an assembly program to accept SAM/BAM input but not FastQ. I suspect it accepts FastQ, but I don't have any experience with transAbyss. Anyways, the only tool i know that will do the job (short of an awk or perl script, which can be dangerous) was mentioned above by maubp: FastqToSam.

Why are you using transAbyss exactly?

Stroehli 12-13-2012 11:57 PM

Hi,
I think you cannot run transAbyss on its own. Taking a short look at the manual, (http://www.bcgsc.ca/downloads/trans-...v1.2.0.doc.pdf) I figured you probably need to run Abyss first (see "Data Preparation" in the Workflow on page 7). Plus you might have to install all the external software mentioned in "Installation, 2. External Software" (page 5). Abyss will produce contigs (.fa) and the other aligners will produce the SAM/BAM files for you, so you don't have to convert them, if I got that right. Hope it helps.

Cheers,
Stroehli

amolkolte 01-24-2013 01:57 AM

Thanks TheRob and Stroehli !!

TheRob - I was using transbyss for de-novo assembly, since I don't have a concrete reference to begin with.

Stroehli - I have used abyss to assemble the contigs. thank you.

kurban910 08-05-2014 08:54 PM

fastq to sam
 
i have a raw reads dataset in format fastq, and i want to use it to find SNPs of the transcriptome data we have. after i searched some material i found that i can do it by using Samtools and SOAPsnp softwares, am i right:confused:? but before i use them i need to convert my raw reads fastq format to SAM format, right:confused:?
so i installed java, samtools and picard tools on my ubuntu 12.04(why i mention these here is because i am new at linux, so any suggestion would be appreciated). and then i write this commend in the terminal :
java -Xmx2g -jar FastqToSam.jar FASTQ=CD_ATGTCA_L007_R1_001.fastq.gz FASTQ2=CD_ATGTCA_L007_R2_001.fastq.gz OUTPUT=outputfile.sam PREDICTED_INSERT_SIZE=null QUALITY_FORMAT=Solexa SAMPLE_NAME=file4

then i got this :
Error: Unable to access jarfile FastqToSam.jar

i do not know what is going on:confused::confused::confused:.
i guess many people here may done these before ,so please anyone could share your knowledge ?!:):):)

ajagannath.patro 08-05-2014 09:02 PM

To access the jar, you can try giving complete path of the jar where it is installed. That should work.

WhatsOEver 08-05-2014 11:40 PM

Quote:

Originally Posted by kurban910 (Post 146840)
i have a raw reads dataset in format fastq, and i want to use it to find SNPs of the transcriptome data we have. after i searched some material i found that i can do it by using Samtools and SOAPsnp softwares, am i right:confused:? but before i use them i need to convert my raw reads fastq format to SAM format, right:confused:?
so i installed java, samtools and picard tools on my ubuntu 12.04(why i mention these here is because i am new at linux, so any suggestion would be appreciated). and then i write this commend in the terminal :
java -Xmx2g -jar FastqToSam.jar FASTQ=CD_ATGTCA_L007_R1_001.fastq.gz FASTQ2=CD_ATGTCA_L007_R2_001.fastq.gz OUTPUT=outputfile.sam PREDICTED_INSERT_SIZE=null QUALITY_FORMAT=Solexa SAMPLE_NAME=file4

then i got this :
Error: Unable to access jarfile FastqToSam.jar

i do not know what is going on:confused::confused::confused:.
i guess many people here may done these before ,so please anyone could share your knowledge ?!:):):)

SAM is the abbreviation for Sequence Alignment/Map format, which tells you that it should contain aligned/mapped reads. Though it is possible to create a kind of unmapped SAM file from fastq, this will be useless to address your question.

My suggestion: Make yourself familiar with read alignment via tophat (the software is here: http://ccb.jhu.edu/software/tophat/tutorial.shtml; the paper is here: http://www.nature.com/nprot/journal/....2012.016.html) and samtools in general (I suggest Dave Tang's brief wiki: http://davetang.org/wiki/tiki-index.php?page=SAMTools) and samtools mpileup in particular (http://samtools.sourceforge.net/mpileup.shtml)

dpryan 08-06-2014 12:34 AM

Since you mention SNP calling, you'll want to use a tools like BWA or bowtie2 rather than tophat for alignment. Aside from that, I'm in agreement with WhatsOEver.

WhatsOEver 08-06-2014 12:51 AM

Quote:

Originally Posted by dpryan (Post 146862)
Since you mention SNP calling, you'll want to use a tools like BWA or bowtie2 rather than tophat for alignment. Aside from that, I'm in agreement with WhatsOEver.

Agreed :)
I mentioned tophat as it has imo the best documentation and is therefore the easiest to start with. But in the end, it might not be the best solution here.

kurban910 08-06-2014 02:36 AM

i am glad for you guys reply, thanks.

kurban910 08-06-2014 08:25 AM

hello guys!
i have downloaded bwa-0.7.10, and then uncompressed it, then
kurban@kurban-X550VC:~/Downloads/bwa-0.7.10$ make

it showed this :
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS utils.c -o utils.o
utils.c:33:18: fatal error: zlib.h: No such file or directory
compilation terminated.
make: *** [utils.o] Error 1
what is what is happening there ?

maubp 08-06-2014 08:30 AM

You (or your SysAdmin) need to install the zlib library including the development files (header files like zlib.h).

kurban910 08-06-2014 09:03 AM

thank you, after installing zlib it worked

kurban910 08-07-2014 06:42 AM

commend in terminal:
kurban@kurban-X550VC:~/Downloads/bwa-0.7.10$ bwa mem gene.fa CD_ATGTCA_L007_R1_001.fastq CD_ATGTCA_L007_R1_002.fastq > aln-pe1.sam
and it shows:
[main] unrecognized command 'mem'

is this a problem of bwa version or what? any suggestion would be appreciated.

maubp 08-07-2014 07:02 AM

Probably - which version of bwa are you using? Note your command did NOT run a local copy of bwa in the current folder, but the system default via the $PATH setting.

Brian Bushnell 08-07-2014 08:40 AM

Adding a "./" might fix it, if the bwa executable is in that directory.
./bwa mem gene.fa CD_ATGTCA_L007_R1_001.fastq CD_ATGTCA_L007_R1_002.fastq > aln-pe1.sam

kurban910 08-07-2014 10:02 AM

thank you guys, it seems like a problem of old version bwa. but i have another question:
i have a five pairs of raw reads files all in fastq format as blow:
CD_ATGTCA_L007_R1_001.fastq CD_ATGTCA_L007_R2_004.fastq
CD_ATGTCA_L007_R1_002.fastq CD_ATGTCA_L007_R2_005.fastq
CD_ATGTCA_L007_R1_003.fastq
CD_ATGTCA_L007_R1_004.fastq
CD_ATGTCA_L007_R1_005.fastq
CD_ATGTCA_L007_R2_001.fastq
CD_ATGTCA_L007_R2_002.fastq
CD_ATGTCA_L007_R2_003.fastq

they are the paired end raw reads of the insects we sequenced. if i execute alignment $ bwa mem ref.fa read1.fq read2.fq > aln-pe.sam
i would get five sam files, right? but along the way of SNPs calling should i add these files into one whole file? if i do ,which step should i do that and how?

Brian Bushnell 08-07-2014 10:11 AM

If they are all the same library, I would cat them first.
cat CD_ATGTCA_L007_R1_*.fastq > r1.fq
cat CD_ATGTCA_L007_R2_*.fastq > r2.fq

Then map. You could concatenate them after mapping, also, if you strip the headers, but this is simpler.

GenoMax 08-07-2014 10:11 AM

You should "cat" the R1 and R2 reads together into a single file. Then use that file to do the alignments. Keep the order of the files intact (1 --> 5) as you cat them.

kurban910 08-07-2014 10:29 AM

thank you guys , i really learned a lot. but it's already midnight here,so i would do that first thing in the morning, c u.

kurban910 08-08-2014 03:25 AM

hello!
when i tried to filter SNPs by typing the commend in the terminal:
bcftools view my.var.bcf | vcfutils.pl varFilter - > my.var-final.vcf

it showed this:
open: No such file or directory
vcfutils.pl: command not found

then i found the location of vcfutils.pl :
$ locate vcfutils.pl
/usr/share/samtools/vcfutils.pl

then i typed the commend in terminal:
$ bcftools view my.var.bcf | /usr/share/samtools/vcfutils.pl varFilter - > my.var-final.vcf
it still gives me this:
open: No such file or directory

and i checked the file vcfutils.pl ,it there. i even put the commend like this way:
$ bcftools view my.var.bcf | perl /usr/share/samtools/vcfutils.pl varFilter - > my.var-final.vcf
it still gived me this:
open: No such file or directory

where should i make change in the command line this time?

GenoMax 08-08-2014 03:45 AM

Is the bcftools executable in the directory you are currently in? Have you tried to "locate" it and provide the full path for it like you did for vcfutils.pl?

Is your my.var.bcf file in the current directory?

In future, start a new thread when you have a new question.

blakeoft 08-08-2014 03:49 AM

kurban910, are you sure that you have the right name for the bcf file? It looks to me like the command is correct.

kurban910 08-08-2014 04:18 AM

Quote:

Originally Posted by GenoMax (Post 147102)
Is the bcftools executable in the directory you are currently in? Have you tried to "locate" it and provide the full path for it like you did for vcfutils.pl?

Is your my.var.bcf file in the current directory?

In future, start a new thread when you have a new question.

1. bcftools is executable in my current directory.

kurban@kurban-X550VC:~/Desktop/SNPs/CD$ bcftools

Program: bcftools (Tools for data in the VCF/BCF formats)
Version: 0.1.17-dev (r973:277)

Usage: bcftools <command> <arguments>

Command: view print, extract, convert and call SNPs from BCF
index index BCF
cat concatenate BCFs
ld compute all-pair r^2
ldpair compute r^2 between requested pairs


2.i have tried to provide full path of bcftools:
kurban@kurban-X550VC:~/Desktop/SNPs/CD$ /usr/bin/bcftools view my.var.bcf | /usr/share/samtools/vcfutils.pl varFilter - > my.var-final.vcf
it says:
open: No such file or directory
and file my.var.bcf is in my current diractory.

but it is still not working.

GenoMax 08-08-2014 04:27 AM

Are you sure bcftools in in your current directory (~/Desktop/SNPs/CD from what I can see above)? You seem to be running the copy that is in /usr/bin (in the set of commands in #2).

Have you verified that my.var.bcf file is non-zero bytes (i.e. there is something in it)?

Can you use [ CODE] (remove the leading space before CODE) put your commands here [/CODE] to make the commands you are pasting clear (enclose them in CODE tags on both sides like I have shown). Otherwise it is difficult to decipher if there are spaces in wrong spot in your command line.

blakeoft 08-08-2014 04:27 AM

kurban910, I just ran
Code:

bcftools view jim.bcf
and I don't have any file called jim.bcf. It gave me this:
Code:

open: No such file or directory
This makes me think that you have the wrong name. Be careful with 1's and l's, -'s and _'s, etc -- it's easy to get these confused. Heck, I even get .'s and _'s confused sometimes. In the directory that the bcf file is located, can you execute:
Code:

ls $PWD/*.bcf
and copy the bcf file's name exactly as it appears, including the full path, and then try running your command again?

kurban910 08-08-2014 04:37 AM

Quote:

Originally Posted by blakeoft (Post 147103)
kurban910, are you sure that you have the right name for the bcf file? It looks to me like the command is correct.

yes, i am sure the file name is right:(

kurban910 08-08-2014 04:44 AM

Quote:

Originally Posted by GenoMax (Post 147107)
Are you sure bcftools in in your current directory (~/Desktop/SNPs/CD from what I can see above)? You seem to be running the copy that is in /usr/bin (in the set of commands in #2).

Have you verified that my.var.bcf file is non-zero bytes (i.e. there is something in it)?

Can you use [ CODE] (remove the leading space before CODE) put your commands here [/CODE] to make the commands you are pasting clear (enclose them in CODE tags on both sides like I have shown). Otherwise it is difficult to decipher if there are spaces in wrong spot in your command line.

u r right ,thanks. i will put the commends in the code box next time. and yes, my.var.bcf file is around 7.6 BM.

and for your first question, bcftools is not in the current directory i am in, but its executable from here.

kurban910 08-08-2014 05:28 AM

Quote:

Originally Posted by blakeoft (Post 147108)
kurban910, I just ran
Code:

bcftools view jim.bcf
and I don't have any file called jim.bcf. It gave me this:
Code:

open: No such file or directory
This makes me think that you have the wrong name. Be careful with 1's and l's, -'s and _'s, etc -- it's easy to get these confused. Heck, I even get .'s and _'s confused sometimes. In the directory that the bcf file is located, can you execute:
Code:

ls $PWD/*.bcf
and copy the bcf file's name exactly as it appears, including the full path, and then try running your command again?

thank you for your time guys , now i find out not just this one, other commends also r not exacted in my terminal , ubuntu 12.04 some times isnít stable . after i reinstall the system i will try again.

maubp 09-18-2015 07:04 AM

Quote:

Originally Posted by maubp (Post 19694)
If you did want to convert the FASTQ to an unaligned SAM or BAM file, try this:
http://picard.sourceforge.net/comman...tml#FastqToSam

I need to do this locally, and since we didn't have Picard installed but do have Biopython (yes, I'm biased ;) ), I wrote a simple Python script to convert paired FASTQ files into unmapped SAM reads (which you can pipe into samtools to get as a BAM file):

https://github.com/peterjc/picobio/b...astq_to_sam.py

I would expect this to be slower than a dedicated tool, so probably not suitable for a high throughput pipeline - but it should be fine for a one-off analysis.


All times are GMT -8. The time now is 08:42 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.