SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
PCR duplicate removal for whole genome sequencing vs. whole exome sequencing cliff Bioinformatics 1 09-27-2011 07:29 AM
Qs in exome sequencing data analysis Maone Genomic Resequencing 4 06-17-2011 07:32 AM
Maone, newbie in exome sequencing and data analysis Maone Introductions 0 06-15-2011 07:11 AM
Hands-on ngs workshop - human exome sequencing and microbial whole genome sequencing vikram Events / Conferences 0 12-08-2010 08:36 PM
GS FLX data analysis software manual drgoettel 454 Pyrosequencing 3 07-14-2009 02:47 AM

Reply
 
Thread Tools
Old 09-20-2013, 06:07 AM   #121
Jugsy67
Junior Member
 
Location: UK

Join Date: Jul 2008
Posts: 2
Default new to exome analysis what software

Hi Guys

I am new to exome analysis and was hoping for guidance as to what software is accepted as the most robust pipeline for finding SNPs

cheers

Julian
Jugsy67 is offline   Reply With Quote
Old 10-07-2013, 12:29 AM   #122
vd4mindia
Member
 
Location: Milan

Join Date: May 2013
Posts: 40
Default Need some suggestions for downstream analysis

Hi, I need some advice regarding the downstream analysis and calling of the variants , I am trying to establish a pipeline for my lab and am new to exome sequecing data analysis. I am now trying to use it with a single sample(paired end) from the 1000 genome project. Sample 96 to be precise. I have already done the alignment and also done till the indexing of the sorted bam file. So the next step would be using the GATK tool
and identify the target regions for realignment then realign the BAM to get better INDEL calling and then calling the different packages of GATK to call the SNP and INDELs, I want to ask , I downloaded the latest version of GATK, is it advisable to work with that or with older versions? also in some forums I see the MarkDuplicates step is skipped and since the realignment is done with the GATK again for better INDEL calling , it seems we can skip this step. Suggestions should be welcome.
vd4mindia is offline   Reply With Quote
Old 10-07-2013, 12:35 AM   #123
vd4mindia
Member
 
Location: Milan

Join Date: May 2013
Posts: 40
Default

Hi, I need some advice regarding the downstream analysis and calling of the variants , I am trying to establish a pipeline for my lab and am new to exome sequecing data analysis. I am now trying to use it with a single sample(paired end) from the 1000 genome project. Sample 96 to be precise. I have already done the alignment and also done till the indexing of the sorted bam file. So the next step would be using the GATK tool
and identify the target regions for realignment then realign the BAM to get better INDEL calling and then calling the different packages of GATK to call the SNP and INDELs, I want to ask , I downloaded the latest version of GATK, is it advisable to work with that or with older versions? also in some forums I see the MarkDuplicates step is skipped and since the realignment is done with the GATK again for better INDEL calling , it seems we can skip this step. Suggestions should be welcome.
vd4mindia is offline   Reply With Quote
Old 01-10-2014, 05:21 AM   #124
blakeoft
Member
 
Location: Connecticut

Join Date: Oct 2013
Posts: 79
Default

Thank you for the nice how-to guide.

I have a couple questions about it.

1. The second step of 2.2 Actual Alignment uses the -f flag. Is this to specify what the output .sai file is called? I've looked through the bwa manual page (http://bio-bwa.sourceforge.net/bwa.shtml) and it doesn't mention -f.

2. This question is about the -r flag that is used on the same command and the next. The guide has:
Code:
bwa sampe -f out.sam -r "@RQ\tID:<ID>\tLB:<LIBRARY_NAME>\tSM:<SAMPLE_NAME>\tPL:ILLUMINA" hg19 input1.sai input2.sai input1.fq input2.fq
I don't know how much it matters, but should there be "@RG" instead of "@RQ" after the open quotes?

Thanks,
Blake
blakeoft is offline   Reply With Quote
Old 05-18-2015, 01:23 AM   #125
j6163m
Member
 
Location: Madrid

Join Date: May 2015
Posts: 14
Default

Hi ulz_peter and everybody,

I have a problem when I try to execute the next command (step in my exome analysis):



java -Xmx4g -Djava.io.tmpdir=/tmp \
-jar picard/SortSam.jar \
SO=coordinate \
INPUT=input.sam \
OUTPUT=output.bam \
VALIDATION_STRINGENCY=LENIENT \
CREATE_INDEX=true

The output to this (above)is a error message: It doesn´t find or load the jarfile SamSort.jar.Also ,sometimes, the error message is : it has not loaded the main class (or similar).
But I have seen the SortSam.jar file inside my picards-tools folder.I have downloaded well the picard-tools with SortSam.jar included.I have tried with differents paths for SamSort.jar,but the problem is the same.

What could I do?.Somebody could help me,please, to go on with my exome analysis?

Waiting for your answer,Thank you so much .

JM
j6163m is offline   Reply With Quote
Old 05-18-2015, 02:50 AM   #126
ulz_peter
Senior Member
 
Location: Graz, Austria

Join Date: Feb 2010
Posts: 219
Default

Hi JM,

Newer versions of Picard do not come with a bunch of jar files per command, but with a unified jar file where the command you want to execute can be specified (the version I use currently is picard 1.128.

So the command would now look something like:

java -jar picard.jar SortSam SO=coordinate INPUT=input.sam OUTPUT=output.bam VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true

Besides of that, it is hard to tell where the actual problem lies. You could try to post the actual code you are trying to run.
ulz_peter is offline   Reply With Quote
Old 05-18-2015, 10:37 AM   #127
j6163m
Member
 
Location: Madrid

Join Date: May 2015
Posts: 14
Default

Thank you ulz_peter for your answer.I send/copy you my input (with your instructions) and the respective output:

ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:~/Escritorio/picard-tools-1.128$ java -jar picard.jar SortSam SO=coordinate INPUT=input.sam OUTPUT=output.bam VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true
[Mon May 18 19:25:34 CEST 2015] picard.sam.SortSam INPUT=input.sam OUTPUT=output.bam SORT_ORDER=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false
[Mon May 18 19:25:34 CEST 2015] Executing as ubuntu@ubuntu-Compaq-CQ58-Notebook-PC on Linux 3.13.0-52-generic amd64; OpenJDK 64-Bit Server VM 1.7.0_79-b14; Picard version: 1.128(c8e12338d226532b30e9ecdbf33180a073c3ffc7_1421081159) IntelDeflater
[Mon May 18 19:25:34 CEST 2015] picard.sam.SortSam done. Elapsed time: 0,01 minutes.
Runtime.totalMemory()=60293120
To get help, see http://broadinstitute.github.io/pica...ml#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: Cannot read non-existent file: /home/ubuntu/Escritorio/picard-tools-1.128/input.sam
at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:308)
at picard.sam.SortSam.doWork(SortSam.java:71)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:187)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)
ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:~/Escritorio/picard-tools-1.128$

What is your opinion about this output?
I don t understand :...........Cannot read non-existent file:/home/ubuntu/Escritorio/picard-tools-1.128/input.sam. I say it because I did well the early steps belong to A short guide to Exome seq. analysis using Illumina technology (your analysis pdf guide),until this step(SAM to BAM conversion),with 2 fastq files (paired end) mine. I think the conversion to SAM files was without problems.Then, why cannot read non existent file:/home/ubuntu/Escritorio/picard-tools-1.128/input.sam ? What is this input.sam?

Do you think I can go on with the next steps(Marking PCR duplicates and the rest of steps) of this pipeline/analysis short guide and with the exactly same code (for the following steps) than the included in this same analysis short guide?.If now the code is different for all steps remaining,please, could you send me this one corrected for all steps?

Waiting for your answer,please, thank you so much for your help.(Sorry but I haven t experience with this pipeline).

Juan M.
j6163m is offline   Reply With Quote
Old 05-18-2015, 10:41 AM   #128
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

@Juan M.: Error indicates that file does not exist.

Can you see it with a directory listing? Post the output of

Code:
$ ls -lh /home/ubuntu/Escritorio/picard-tools-1.128/*.sam
Note: Just because you went through the steps does not mean that the process worked right. Did you see any other errors upstream of this step?

Last edited by GenoMax; 05-18-2015 at 10:45 AM.
GenoMax is offline   Reply With Quote
Old 05-18-2015, 11:30 AM   #129
j6163m
Member
 
Location: Madrid

Join Date: May 2015
Posts: 14
Default

Hi GenoMax.Thank you for your answer.

You are right. This is the output:

ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:~$ ls -lh /home/ubuntu/Escritorio/picard-tools-1.128/*.sam
ls: no se puede acceder a /home/ubuntu/Escritorio/picard-tools-1.128/*.sam: No existe el archivo o el directorio

The output says : It doesn t exist the file or directory.

However, when I look for into the Escritorio directory (Desktop), the output is:

ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:~$ ls -lh /home/ubuntu/Escritorio/*.sam
-rw-rw-r-- 1 ubuntu ubuntu 0 may 17 21:28 /home/ubuntu/Escritorio/INPUT=input.sam
-rw-rw-r-- 1 ubuntu ubuntu 253M may 11 23:07 /home/ubuntu/Escritorio/out.sam

Then, what can I do with the step SAM to BAM conversion? Or must I start again all the steps from the beginning with the BWA alignment?

Until now I applied step by step the exome analysis short guide(pdf),Do you know any easy method or pipeline to exome analysis?

Waiting for your answer,please, thank you so much.

Juan M.
j6163m is offline   Reply With Quote
Old 05-18-2015, 11:42 AM   #130
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

Quote:
Originally Posted by j6163m View Post
Hi GenoMax.Thank you for your answer.

You are right. This is the output:

ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:~$ ls -lh /home/ubuntu/Escritorio/picard-tools-1.128/*.sam
ls: no se puede acceder a /home/ubuntu/Escritorio/picard-tools-1.128/*.sam: No existe el archivo o el directorio

The output says : It doesn t exist the file or directory.
So that mystery is solved. We know that file is not there so you are getting the error.
Quote:
However, when I look for into the Escritorio directory (Desktop), the output is:

ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:~$ ls -lh /home/ubuntu/Escritorio/*.sam
-rw-rw-r-- 1 ubuntu ubuntu 0 may 17 21:28 /home/ubuntu/Escritorio/INPUT=input.sam
-rw-rw-r-- 1 ubuntu ubuntu 253M may 11 23:07 /home/ubuntu/Escritorio/out.sam
Can you show the first 10 lines of out.sam by doing this?

Code:
$ head -10 out.sam
I am not sure why the above ls command is showing full paths in your file listing (perhaps your system is setup that way). Which PDF guide are you following? Is it at the beginning of this thread?
GenoMax is offline   Reply With Quote
Old 05-18-2015, 02:33 PM   #131
j6163m
Member
 
Location: Madrid

Join Date: May 2015
Posts: 14
Default

Hi again,

This is the output:

ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:~$ cd /home/ubuntu/Escritorio
ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:~/Escritorio$ head -10 out.sam
@SQ SN:chr10 LN:135534747
@SQ SN:chr11 LN:135006516
@SQ SN:chr12 LN:133851895
@SQ SN:chr13 LN:115169878
@SQ SN:chr14 LN:107349540
@SQ SN:chr15 LN:102531392
@SQ SN:chr16 LN:90354753
@SQ SN:chr17 LN:81195210
@SQ SN:chr18 LN:78077248
@SQ SN:chr19 LN:59128983

Yes, the pdf guide is at the beginning of this thread.

What is the next step I have to do?

Thank you

Juan M.
j6163m is offline   Reply With Quote
Old 05-18-2015, 02:49 PM   #132
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

That sam file looks ok. Are you at the start of section 2.3? In any case don't blindly follow steps in the document if you don't understand what is happening at that step.

Last edited by GenoMax; 05-18-2015 at 02:52 PM.
GenoMax is offline   Reply With Quote
Old 05-19-2015, 12:25 AM   #133
j6163m
Member
 
Location: Madrid

Join Date: May 2015
Posts: 14
Default

Yes, I am at the begining of section 2.3(SAM to BAM conversión)

If you look at the section 2.2 there is a code like:

bwa samse -f out.sam -r
"@RQ\tID:<ID>\tLB:<LIBRARY_NAME>\tSM:<SAMPLE_NAME>\tPL:ILLUMIN A" hg19 input1.sai
input2.sai input1.fq input2.fq

This is the code I used (pair end data).If you see apear out.sam.Is this out.sam I have?

Then, Can I go on with step/section 2.4 and the same and exactly code included in the following steps/sections until the end (without change anything)?.If was necessary to change some of the codes of the differents steps,please, send me it.


Waiting for the answers to these questions,please, thank you so much for your help.

Juan M.
j6163m is offline   Reply With Quote
Old 05-19-2015, 04:34 AM   #134
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

Quote:
Originally Posted by j6163m View Post
Yes, I am at the begining of section 2.3(SAM to BAM conversión)

If you look at the section 2.2 there is a code like:

bwa samse -f out.sam -r
"@RQ\tID:<ID>\tLB:<LIBRARY_NAME>\tSM:<SAMPLE_NAME>\tPL:ILLUMIN A" hg19 input1.sai
input2.sai input1.fq input2.fq

This is the code I used (pair end data).If you see apear out.sam.Is this out.sam I have?

Then, Can I go on with step/section 2.4 and the same and exactly code included in the following steps/sections until the end (without change anything)?.If was necessary to change some of the codes of the differents steps,please, send me it.


Waiting for the answers to these questions,please, thank you so much for your help.

Juan M.
@ulz_peter: The manual looks a bit confusing. File from step 2.3 is called out.sam but then in 2.4 it is being referred to as input.sam? Since the syntax for Picard has changed perhaps you should consider updating your manual.

@Juan M: That looks to be the right file (but make a note that the name is not the same as in manual). Since you are using a newer version of picard you should use the command @ulz_peter provided in post #126.
GenoMax is offline   Reply With Quote
Old 05-19-2015, 10:07 AM   #135
j6163m
Member
 
Location: Madrid

Join Date: May 2015
Posts: 14
Default

Hi again,

This is my input and output for the step/section 2.4 of the guide (marking PCR duplicates);

ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:~/Escritorio/picard-tools-1.128$ java -jar picard.jar MarkDuplicates INPUT=input.bam OUTPUT=input.marked.bam METRICS_FILE=metrics VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true
[Tue May 19 19:42:48 CEST 2015] picard.sam.markduplicates.MarkDuplicates INPUT=[input.bam] OUTPUT=input.marked.bam METRICS_FILE=metrics VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9][0-9]+)[0-9]+)[0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false
[Tue May 19 19:42:48 CEST 2015] Executing as ubuntu@ubuntu-Compaq-CQ58-Notebook-PC on Linux 3.13.0-52-generic amd64; OpenJDK 64-Bit Server VM 1.7.0_79-b14; Picard version: 1.128(c8e12338d226532b30e9ecdbf33180a073c3ffc7_1421081159) IntelDeflater
[Tue May 19 19:42:48 CEST 2015] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0,01 minutes.
Runtime.totalMemory()=60293120
To get help, see http://broadinstitute.github.io/pica...ml#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: Cannot read non-existent file: /home/ubuntu/Escritorio/picard-tools-1.128/input.bam
at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:308)
at htsjdk.samtools.util.IOUtil.assertFilesAreReadable(IOUtil.java:325)
at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:108)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:187)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)


And this is my searching of bam files in my system:

ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:~/Escritorio/picard-tools-1.128$ ls -lh /home/ubuntu/Escritorio/picard-tools-1.128/*.bam
-rw-rw-r-- 1 ubuntu ubuntu 0 may 19 19:27 /home/ubuntu/Escritorio/picard-tools-1.128/OUTPUT=input.marked.bam

What is your opinion about the code I have used?. Thatś right or wrong?.Could you help me and to correct it,please ?.

Thank you so much.

Juan M.
j6163m is offline   Reply With Quote
Old 05-20-2015, 03:25 AM   #136
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

@Juan M: bam files are binary data so you can't read/view them directly (a text file command like 'head' will not work). That is the reason you only saw garbled data on screen (and your last post was tagged as spam since you pasted that data in). So what you have does look like a binary bam file.

I was hoping that @ulz_peter would chime in by now on a possible update to the manual.

Last edited by GenoMax; 05-20-2015 at 03:27 AM.
GenoMax is offline   Reply With Quote
Old 05-20-2015, 11:20 AM   #137
j6163m
Member
 
Location: Madrid

Join Date: May 2015
Posts: 14
Default

Thank you GenoMax.

Please, could you send me the new updated manual(or advise me to download it) when it was available?

Thankś again.

Juan M.
j6163m is offline   Reply With Quote
Old 09-16-2015, 03:50 AM   #138
sayali
Junior Member
 
Location: India

Join Date: Sep 2015
Posts: 3
Default

Thank a lot for the document.
I have a suggestion here. You can use cutadapt software after FastQC if the quality of the sequenced reads is not good.
sayali is offline   Reply With Quote
Old 03-09-2017, 04:26 AM   #139
hafiztalha
Junior Member
 
Location: Pakistan

Join Date: Mar 2017
Posts: 1
Default

ERROR MESSAGE: Invalid command line: Malformed walker argument: Could not find walker with name: IndelRealigner

having same error again and again. i have tried updating java , also i have latest GATK 3.7. can any one help me ?
hafiztalha is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:56 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO