Seqanswers Leaderboard Ad

**Lv Ray** · 07-15-2014, 10:02 PM

To add:
run command: samtools view ERR173172_merged.bam | less -S
you can see :
ERR173172.26885410 89 1 1629 1 100M = 1629 0 TTGGGT
ERR173172.26885410 133 1 1629 0 * = 1629 0 CGGTAT
ERR173172.8687716 89 1 1638 1 95M = 1638 0 GTTGGT
ERR173172.8687716 133 1 1638 0 * = 1638 0 GTCTGA
ERR173172.4507000 153 1 1648 1 92M8S = 1648 0 TCCCGT
ERR173172.4507000 69 1 1648 0 * = 1648 0 TGTCTT
ERR173172.53744916 89 1 4280 11 2S69M = 4280 0 GATGCC
ERR173172.53744916 133 1 4280 0 * = 4280 0 GATAGT
ERR173172.60595146 153 1 4308 11 100M = 4308 0 CCCCCC
ERR173172.60595146 69 1 4308 0 * = 4308 0 TGGATA
ERR173172.55733737 153 1 4310 11 100M = 4310 0 CTCTCC
ERR173172.55733737 69 1 4310 0 * = 4310 0 TTATTT
ERR173172.48676987 153 1 4313 11 100M = 4313 0 TCCCCC
ERR173172.48676987 69 1 4313 0 * = 4313 0 ATTTGG
ERR173172.8193734 89 1 4314 1 73M = 4314 0 CCCCCA
ERR173172.8193734 133 1 4314 0 * = 4314 0 GATTTG

**kmcarr** · 07-16-2014, 05:44 AM

Look at the Picard error message in part b of your original message. It tells you exactly what the problem is.

Exception in thread "main" picard.PicardException: Input file /datapool/lvlh/pig_reseq/ERX149135/ERR173172/ERR173172_merged.bam is not coordinate sorted.

The samtools view output shown in you second message further confirms that your BAM file is sorted by read name, not coordinate. Go back and sort your BAM file to put it in the proper (coordinate sorted) order and then repeat the Picard command as in (b) above.

**Lv Ray** · 07-16-2014, 06:49 AM

Thank you , kmcarr. But i think you are wrong.
ERR173172.26885410 89 1 1629 1 100M = 1629 0 TTGGGT
Like this, in my second message ,the 4th column confirms that my BAM file is sorted by coordinates,not the first column(read names)

**kmcarr** · 07-16-2014, 07:05 AM

Originally posted by Lv Ray View Post

Thank you , kmcarr. But i think you are wrong.
ERR173172.26885410 89 1 1629 1 100M = 1629 0 TTGGGT
Like this, in my second message ,the 4th column confirms that my BAM file is sorted by coordinates,not the first column(read names)

I see now that the fragment of BAM file you copied is very unusual in the fact that all of the read pairs shown have only one mate mapped. This is why the sorting appears at first look to be by name order. But then the BAM output is from ERR173172_merged.bam and your are trying to run MarkDuplicates on a different file ERR173172_unpaired.bam. Are you sure about the sort order of ERR173172_unpaired.bam is correct? What does the BAM file header look like (run "samtools view -H ERR173172_unpaired.bam").

None the less the error message still clearly indicates that Picard believes that the BAM file is not properly sorted and the problem has nothing to do with the read name regex. This may be caused by the unusual nature of this BAM file, i.e. that only contains reads with one unmapped mate.

**Lv Ray** · 07-16-2014, 07:30 PM

I am sorry ,kmcarr. I made a fault about my quetion, however , I checked my some dataset as you told me("samtools view -H *.bam")
samtools view -H ERR173172_unpaired.sorted.bam |less
@HD VN:1.0 SO:unsorted
@SQ SN:1 LN:315321322
@SQ SN:10 LN:79102373
@SQ SN:11 LN:87690581
@SQ SN:12 LN:63588571
@SQ SN:13 LN:218635234
@SQ SN:14 LN:153851969
@SQ SN:15 LN:157681621
@SQ SN:16 LN:86898991
@SQ SN:17 LN:69701581
@SQ SN:18 LN:61220071
@SQ SN:2 LN:162569375
@SQ SN:3 LN:144787322
@SQ SN:4 LN:143465943
@SQ SN:5 LN:111506441
@SQ SN:6 LN:157765593
@SQ SN:7 LN:134764511
@SQ SN:8 LN:148491826
@SQ SN:9 LN:153670197
@SQ SN:MT LN:16613
@SQ SN:X LN:144288218
@SQ SN:Y LN:1637716
@SQ SN:JH118944.1 LN:594937
@SQ SN:JH118636.1 LN:547643
@SQ SN:JH118966.1 LN:497305
@SQ SN:JH118951.1 LN:479775
@SQ SN:JH118524.1 LN:477901
@SQ SN:JH118901.1 LN:451395
It seems that I sorted it ,but the command "samtools view -H "gives me the unsorted information([COLOR="rgb(255, 140, 0)"]@HD VN:1.0 SO:unsorted[/COLOR])

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

picard MarkDuplicates READ_NAME_REGEX

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News