SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   lobSTR dealing with the paired-end BAM file (http://seqanswers.com/forums/showthread.php?t=67030)

Alphabets 03-20-2016 01:12 AM

lobSTR dealing with the paired-end BAM file
 
Hello:

I use the software lobSTR to deal with the paired-end bam file. As the document shows, I should run the follow code before the bam file sorted by read name.

'''
lobSTR \
--index-prefix hg19_v3.0.2/lobstr_v3.0.2_hg19_ref/lobSTR_ \
-f my_sample.bam --bampair \
--rg-sample my_sample.sorted.bam --rg-lib my_sample \
--out my_sample_output
'''

My question is which bam file ir right in the above code's third line and fourth line, the bam before sorted or after sorted?
The document shows there should be the bam file before sorted, but I think it should be the sorted rather than the raw bam file in the third line.
Besides, I think the fourth line should be a tag not the sorted file.

Any other ideas about it and could anyone solve my confusion. Thanks!

mastal 03-20-2016 05:42 AM

The --rg-sample and --rg-lib parameters should be tags with read-group information, not the bam files.

See the lobSTR FAQs for more info.

http://melissagymrek.com/lobstr-code/faq.html

I think if you start with a bam file rather than fastq files then it should be sorted and indexed. Note that the command line options description for --bampair says that the file should be sorted by name order, 'samtools sort -n'.

Alphabets 03-20-2016 05:06 PM

Quote:

Originally Posted by mastal (Post 191160)
The --rg-sample and --rg-lib parameters should be tags with read-group information, not the bam files.

See the lobSTR FAQs for more info.

http://melissagymrek.com/lobstr-code/faq.html

I think if you start with a bam file rather than fastq files then it should be sorted and indexed. Note that the command line options description for --bampair says that the file should be sorted by name order, 'samtools sort -n'.

I know what you mean, the --rg-sample and --rg-lib parameters should not be the bam files. But I am confused of the third line "-f my_sample.bam --bampair". I think the -f parameter should be the sorted bam not the raw bam. The document shows it is the raw bam not the sorted bam file,which confuse me.

the document is:
http://melissagymrek.com/lobstr-code...html#pairedbam

mastal 03-21-2016 03:18 AM

In the end, it shouldn't matter, because anything that is expecting an unsorted bam file will work perfectly well if you give it a sorted bam file (but of course the reverse is not true, and if a program is expecting a sorted bam file it will throw an error if you give it an unsorted bam file).

I agree with you that the lobSTR documentation is confusing and not consistent. I had looked at the 'usage' page, which seems to give different examples than the genotype-calling page you gave the link to. Also in the usage page's definition of parameters, for --bampair it says that you have to give a bam file sorted by name (samtools sort -n), which it doesn't mention anywhere else where it shows examples of sorting the bam files with samtools sort.

mastal 03-21-2016 03:42 AM

I think maybe I understand a bit better now.

I think if you are running lobSTR with a paired-end bam file, you need to give it a bam file that has been name-sorted (samtools sort -n) for the -f parameter, because it needs the two reads of a pair to be next to each other in the file.

Later steps, like running allelotype, may need bam files sorted by coordinate, as shown in some of the examples where they sort the bam files that are output from running losSTR.

Hope this makes more sense.

Alphabets 03-21-2016 04:56 AM

Quote:

Originally Posted by mastal (Post 191180)
I think maybe I understand a bit better now.

I think if you are running lobSTR with a paired-end bam file, you need to give it a bam file that has been name-sorted (samtools sort -n) for the -f parameter, because it needs the two reads of a pair to be next to each other in the file.

Later steps, like running allelotype, may need bam files sorted by coordinate, as shown in some of the examples where they sort the bam files that are output from running losSTR.

Hope this makes more sense.



Thank you for your reply. I agree with you, but there is something wrong with me.

When I run lobSTR with my name-sorted paired-end bam file just like what you mentioned, I get the follow warnings:

'''
WARNING: Could not find pair for BRISCOE:4:... Is the bam file sorted by read name?
'''

All my screen is full of the warnings. Is it nomal?

Besides, I when I run allelotype with the bam it generates, I also get the warnings in my output:

'''
WARNING: Skipping locus chr1:123585375. Invalid period size (20)
WARNING: Discarding duplicate of locus chr1: 123587826
'''

I don't know what it means and what should I do?

Thank you.

mastal 03-21-2016 06:37 AM

Where do the bam files you are trying to use with lobSTR come from?

Are they unaligned bams, or are they the result of alignment with another aligner?

Have any reads been removed from the data set? For example, reads that didn't align, leaving some reads without a mate?

Alphabets 03-21-2016 04:53 PM

Quote:

Originally Posted by mastal (Post 191192)
Where do the bam files you are trying to use with lobSTR come from?

Are they unaligned bams, or are they the result of alignment with another aligner?

Have any reads been removed from the data set? For example, reads that didn't align, leaving some reads without a mate?

The bam files I use are alignment of HiSeq reads aligned to the reference genome, removed redundancy and base realignments were done.

Does it matter? And how should I do?

Thank you!


All times are GMT -8. The time now is 04:45 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.