Skip to content

Releases: samtools/bcftools

1.12

17 Mar 16:21
1.12
Compare
Choose a tag to compare

Download the source code here: bcftools-1.12.tar.bz2.
(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)

Changes affecting the whole of bcftools, or multiple commands:

  • The output file type is determined from the output file name suffix, where available, so the -O/--output-type option is often no longer necessary.

  • Make F_MISSING in filtering expressions work for sites with multiple ALT alleles (#1343)

  • Fix N_PASS and F_PASS to behave according to expectation when reverse logic is used (#1397). This fix has the side effect of query (or programs like +trio-stats) behaving differently with these expressions, operating now in site-oriented rather than sample-oriented mode. For example, the new behavior could be:

    bcftools query -f'[%POS %SAMPLE %GT\n]' -i'N_PASS(GT="alt")==1'
    11	A	0/0
    11	B	0/0
    11	C	1/1
    

    while previously the same expression would return:

    11	C	1/1
    

    The original mode can be mimicked by splitting the filtering into two steps:

    bcftools view -i'N_PASS(GT="alt")==1' | bcftools query -f'[%POS %SAMPLE %GT\n]' -i'GT="alt"'
    

Changes affecting specific commands:

  • bcftools annotate:

    • New --rename-annots option to help fix broken VCFs (#1335)

    • New -C option allows to read a long list of options from a file to prevent very long command lines.

    • New append-missing logic allows annotations to be added for each ALT allele in the same order as they appear in the VCF. Note that this is not bullet proof. In order for this to work:

      • the annotation file must have one line per ALT allele

      • fields must contain a single value as multiple values are appended as they are and would break the correspondence between the alleles and values

  • bcftools concat:

    • Do not phase genotypes by mistake if they are not already phased with -l (#1346)
  • bcftools consensus:

    • New --mask-with, --mark-del, --mark-ins, --mark-snv options (#1382, #1381, #1170)

    • Symbolic <DEL> should have only one REF base. If there are multiple, take POS+1 as the first deleted base.

    • Make consensus work when the first base of the reference genome is deleted. In this situation the VCF record has POS=1 and the first REF base cannot precede the event. (#1330)

  • bcftools +contrast:

    • The NOVELGT annotation was previously not added when requested.
  • bcftools convert:

    • Make the --hapsample and --hapsample2vcf options consistent with each other and with the documentation.
  • bcftools call:

    • Revamp of call -G, previously sample grouping by population was not truly independent and could still be influenced by the presence of other sample groups.

    • Optional addition of INFO/PV4 annotation with call -a INFO/PV4

    • Remove generation of useless HOB and ICB annotation; use +fill-tags -- -t HWE,ExcHet instead

    • The call -f option was renamed to -a to (1) make it consistent with mpileup and (2) to indicate that it includes both INFO and FORMAT annotations, not just FORMAT as previously

    • Any sensible Number=R,Type=Integer annotation can be used with -G, such as AD or QS

    • Don't trim QUAL; although usefulness of this change is questionable for true probabilistic interpretation (such high precision is unrealistic), using QUAL as a score rather than probability is helpful and permits more fine-grained filtering

    • Fix a suspected bug in call -F in the worst case, for certain improve readability

    • call -C trio is temporarily disabled

  • bcftools csq:

    • Fix a bug wich caused incorrect FORMAT/BCSQ formatting at sites with too many per-sample consequences

    • Fix a bug which incorrectly handled the --ncsq parameter and could clash with reserved BCF values, consequently producing truncated or even incorrect output of the %TBCSQ formatting expression in bcftools query. To account for the reserved values, the new default value is --ncsq 15 (#1428)

  • bcftools +fill-tags:

    • MAF definition revised for multiallelic sites, the second most common allele is considered to be the minor allele (#1313)

    • New FORMAT/VAF, VAF1 annotations to set the fraction of alternate reads provided FORMAT/AD is present

  • bcftools gtcheck:

    • support matching of a single sample against all other samples in the file with -s qry:sample -s gt:-. This was previously not possible, either full cross-check mode had to be run or a list of pairs/samples had to be created explicitly
  • bcftools merge:

    • Make merge -R behavior consistent with other commands and pull in overlapping records with POS outside of the regions (#1374)

    • Bug fix (#1353)

  • bcftools mpileup:

    • Add new optional tag mpileup -a FORMAT/QS
  • bcftools norm:

    • New -a, --atomize functionality to decompose complex variants, for example MNVs into consecutive SNVs

    • New option --old-rec-tag to indicate the original variant

  • bcftools query:

    • Incorrect fields were printed in the per-sample output when subset of samples was requested via -s/-S and the order of samples in the header was different from the requested -s/-S order (#1435)
  • bcftools +prune:

    • New options --random-seed and --nsites-per-win-mode (#1050)
  • bcftools +split-vep:

    • Transcript selection now works also on the raw CSQ/BCSQ annotation.

    • Bug fix, samples were dropped on VCF input and VCF/BCF output (#1349)

  • bcftools stats:

    • Changes to QUAL and ts/tv plotting stats: avoid capping QUAL to predefined bins, use an open-range logarithmic binning instead

    • plot dual ts/tv stats: per quality bin and cumulative as if threshold applied on the whole dataset

  • bcftools +trio-dnm2:

    • Major revamp of +trio-dnm plugin, which is now deprecated and replaced by +trio-dnm2.
      The original trio-dnm calling model used genotype likelihoods (PLs) as the input for calling. However, that is flawed because PLs make assumptions which are unsuitable for de novo calling: PL(RR) can become bigger than PL(RA) even when the ALT allele is present in the parents. Note that this is true also for other programs such as DeNovoGear which rely on the same samtools calculation.
      The new recommended workflow is:
      bcftools mpileup -a AD,QS -f ref.fa -Ou proband.bam father.bam mother.bam | \
      bcftools call -mv -Ou | \
      bcftools +trio-dnm -p proband,father,mother -Oz -o output.vcf.gz
      
      This new version also implements the DeNovoGear model. The original behavior of trio-dnm is no longer supported.
      For more details see http://samtools.github.io/bcftools/trio-dnm.pdf

1.11

22 Sep 12:48
Compare
Choose a tag to compare

Download the source code here: bcftools-1.11.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)

Changes affecting the whole of bcftools, or multiple commands:

  • Filtering -i/-e expressions

    • Breaking change in -i/-e expressions on the FILTER column. Originally it was possible to query only a subset of filters, but not an exact match. The new behaviour is:

      Expression Result
      FILTER="A" Exact match, for example "A;B" does not pass
      FILTER!="A" Exact match, for example "A;B" does pass
      FILTER~"A" Both "A" and "A;B" pass
      FILTER!~"A" Neither "A" nor "A;B" pass
    • Fix in commutative comparison operators, in some cases reversing sides would produce incorrect results (#1224; #1266)

    • Better support for filtering on sample subsests

    • Add SMPL_*/S* family of functions that evaluate within rather than across all samples. (#1180)

  • Improvements in the build system

Changes affecting specific commands:

  • bcftools annotate:

    • Previously it was not possible to use --columns =TAG with INFO tags and the --merge-logic feature was restricted to tab files with BEG,END columns, now extended to work also with REF,ALT.

    • Make annotate -TAG/+TAG work also with FORMAT fields. (#1259)

    • ID and FILTER can be transferred to INFO and ID can be populated from INFO. However, the FILTER column still cannot be populated from an INFO tag because all possible FILTER values must be known at the time of writing the header (#947; #1187)

  • bcftools consensus:

    • Fix in handling symbolic deletions and overlapping variants. (#1149; #1155; #1295)

    • Fix --iupac-codes crash on REF-only positions with ALT=".". (#1273)

    • Fix --chain crash. (#1245)

    • Preserve the case of the genome reference. (#1150)

    • Add new -a, --absent option which allows to set positions with no supporting evidence to "N" (or any other character). (#848; #940)

  • bcftools convert:

    • The option --vcf-ids now works also with -haplegendsample2vcf. (#1217)

    • New option --keep-duplicates

  • bcftools csq:

    • Add misc/gff2gff.py script for conversion between various flavors of GFF files. The initial commit supports only one type and was contributed by @flashton2003. (#530)

    • Add missing consequence types. (PR #1203; #1292)

    • Allow overlapping CDS to support ribosomal slippage. (#1208)

  • bcftools +fill-tags:

    • Added new annotations: INFO/END, TYPE, F_MISSING.
  • bcftools filter:

    • Make --SnpGap optionally filter also SNPs close to other variant types. (#1126)
  • bcftools gtcheck:

    • Complete revamp of the command. The new version is faster and allows N:M sample comparisons, not just 1:N or NxN comparisons. Some functionality was lost (plotting and clustering) but may be added back on popular demand.
  • bcftools +mendelian:

    • Revamp of user options, output VCFs with mendelian errors annotation, read PED files (thanks to Giulio Genovese).
  • bcftools merge:

    • Update headers when appropriate with the '--info-rules *:join' INFO rule. (#1282)

    • Local alleles merging that produce LAA and LPL when requested, a draft implementation of samtools/hts-specs#434 (#1138)

    • New --no-index which allows to merge unindexed files. Requires the input files to have chromosomes in th same order and consistent with the order of sequences in the header. (PR #1253; samtools/htslib#1089)

    • Fixes in gVCF merging. (#1127; #1164)

  • bcftools norm:

    • Fixes in --check-ref s reference setting features with non-ACGT bases. (#473; #1300)

    • New --keep-sum switch to keep vector sum constant when splitting multiallelics. (#360)

  • bcftools +prune:

    • Extend to allow annotating with various LD metrics: r^2, Lewontin's D' (PMID:19433632), or Ragsdale's D (PMID:31697386).
  • bcftools query:

    • New %N_PASS() formatting expression to output the number of samples that pass the filtering expression.
  • bcftools reheader:

    • Improved error reporting to prevent user mistakes. (#1288)
  • bcftools roh:

    • Several fixes and improvements
      • the --AF-file description incorrectly suggested "REF\tALT" instead of the correct "REF,ALT". (#1142)
      • RG lines could have negative length. (#1144)
      • new --include-noalt option to allow also ALT=. records. (#1137)
  • bcftools scatter:

    • New plugin intended as a convenient inverse to concat (thanks to Giulio Genovese, PR #1249)
  • bcftools +split:

    • New --groups-file option for more flexibility of defining desired output. (#1240)

    • New --hts-opts option to reduce required memory by reusing one output header and allow overriding the default hFile's block size with --hts-opts block_size=XXX. On some file systems (lustre) the default size can be 4M which becomes a problem when splitting files with 10+ samples.

    • Add support for multisample output and sample renaming

  • bcftools +split-vep:

    • Add default types (Integer, Float, String) for VEP subfields and make --columns - extract all subfields into INFO tags in one go.

1.10.2

20 Dec 16:19
Compare
Choose a tag to compare

Download the source code here: bcftools-1.10.2.tar.bz2.
(The “Source code” downloads links below are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)


This release fixes crashes reported on files including integer INFO tags with values outside the range officially supported by VCF. It also fixes a bug where invalid BCF files would be created if such values were present.

1.10.1

16 Dec 20:10
@pd3 pd3
Compare
Choose a tag to compare

(Note: The “Source code” downloads below are generated by GitHub and are incomplete as they are missing some generated files.)


This release has been withdrawn due to inconsistencies with the generated tar files.

1.10

06 Dec 17:18
1.10
Compare
Choose a tag to compare

The bcftools-1.10.tar.bz2 download is the full source code release. The “Source code” downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.


  • Numerous bug fixes, usability improvements and sanity checks were added to prevent common user errors.

  • The -r, --regions (and -R, --regions-file) option should never create unsorted VCFs or duplicates records again. This also fixes rare cases where a spanning deletion makes a subsequent record invisible to bcftools isec and other commands.

  • Additions to filtering and formatting expressions

    • support for the spanning deletion alternate allele (ALT=*)

    • new ILEN filtering expression to be able to filter by indel length

    • new MEAN, MEDIAN, MODE, STDEV, phred filtering functions

    • new formatting expression %PBINOM (phred-scaled binomial probability), %INFO (the whole INFO column), %FORMAT (the whole FORMAT column), %END (end position of the REF allele), %END0 (0-based end position of the REF allele), %MASK (with multiple files indicates the presence of the site in other files)

  • New plugins

    • +gvcfz: compress gVCF file by resizing gVCF blocks according to specified criteria

    • +indel-stats: collect various indel-specific statistics

    • +parental-origin: determine parental origin of a CNV region

    • +remove-overlaps: remove overlapping variants.

    • +split-vep: query structured annotations such INFO/CSQ created by bcftools/csq or VEP

    • +trio-dnm: screen variants for possible de-novo mutations in trios

  • annotate

    • new -l, --merge-logic option for combining multiple overlapping regions
  • call

    • new bcftools call -G, --group-samples option which allows grouping samples into populations and applying the HWE assumption within but not across the groups.
  • csq

    • significant reduction of memory usage in the local -l mode for VCFs with thousands of samples and 20% reduction in the non-local haplotype-aware mode.

    • fixes a small memory leak and formatting issue in FORMAT/BCSQ at sites with many consequences

    • do not print protein sequence of start_lost events

    • support for "start_retained" consequence

    • support for symbolic insertions (ALT="<INS...>"), "feature_elongation" consequence

    • new -b, --brief-predictions option to output abbreviated protein predictions.

  • concat

    • the --naive command now checks header compatibility when concatenating multiple files.
  • consensus

    • add a new -H, --haplotype 1pIu/2pIu feature to output first/second allele for phased genotypes and the IUPAC code for unphased genotypes

    • new -p, --prefix option to add a prefix to sequence names on output

  • +contrast

    • added support for Fisher's test probability and other annotations
  • +fill-from-fasta

    • new -N, --replace-non-ACGTN option
  • +dosage

    • fix some serious bugs in dosage calculation
  • +fill-tags

    • extended to perform simple on-the-fly calculations such as calculating INFO/DP from FORMAT/DP.
  • merge

    • add support for merging FORMAT strings

    • bug fixed in gVCF merging

  • mpileup

    • a new optional SCR annotation for the number of soft-clipped reads
  • reheader

    • new -f, --fai option for updating contig lines in the VCF header
  • +trio-stats

    • extend output to include DNM homs and recurrent DNMs
  • VariantKey support

1.9

18 Jul 16:03
1.9
Compare
Choose a tag to compare
1.9

The bcftools-1.9.tar.bz2 download is the full source code release. The “Source code” downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.


  • annotate

    • REF and ALT columns can be now transferred from the annotation file.

    • fixed bug when setting vector_end values.

  • consensus

    • new -M option to control output at missing genotypes

    • variants immediately following insersions should not be skipped. Note however, that the current fix requires normalized VCF and may still falsely skip variants adjacent to multiallelic indels.

    • bug fixed in -H selection handling

  • convert

    • the --tsv2vcf option now makes the missing genotypes diploid, "./." instead of "."

    • the behavior of -i/-e with --gvcf2vcf changed. Previously only sites with FILTER set to "PASS" or "." were expanded and the -i/-e options dropped sites completely. The new behavior is to let the -i/-e options control which records will be expanded. In order to drop records completely, one can stream through "bcftools view" first.

  • csq

    • since the real consequence of start/splice events are not known, the aminoacid positions at subsequent variants should stay unchanged

    • add --force option to skip malformatted transcripts in GFFs with out-of-phase CDS exons.

  • +dosage: output all alleles and all their dosages at multiallelic sites

  • +fixref: fix serious bug in -m top conversion

  • -i/-e filtering expressions:

    • add two-tailed binomial test

    • add functions N_PASS() and F_PASS()

    • add support for lists of samples in filtering expressions, with many samples it was impractical to list them all on the command line. Samples can be now in a file as, e.g., GT[@samples.txt]="het"

    • allow multiple perl functions in the expressions and some bug fixes

    • fix a parsing problem, @ was not removed from @filename expressions

  • mpileup: fixed bug where, if samples were renamed using the -G (--read-groups) option, some samples could be omitted from the output file.

  • norm: update INFO/END when normalizing indels

  • +split: new -S option to subset samples and to use custom file names instead of the defaults

  • +smpl-stats: new plugin

  • +trio-stats: new plugin

  • Fixed build problems with non-functional configure script produced on some platforms

1.8

03 Apr 16:05
1.8
Compare
Choose a tag to compare
1.8
  • -i, -e filtering: Support for custom perl scripts

  • +contrast: New plugin to annotate genotype differences between groups of samples

  • +fixploidy: New options for simpler ploidy usage

  • +setGT: Target genotypes can be set to phased by giving --new-gt p

  • run-roh.pl: Allow to pass options directly to bcftools roh

  • Number of bug fixes


The bcftools-1.8.tar.bz2 download is the full source code release. The “Source code” downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.

1.7

12 Feb 17:12
1.7
Compare
Choose a tag to compare
1.7
  • -i, -e filtering: Major revamp, improved filtering by FORMAT fields and missing values. New GT=ref,alt,mis etc keywords, check the documentation for details.

  • query: Only matching expression are printed when both the -f and -i/-e expressions contain genotype fields. Note that this changes the original behaviour. Previously all samples were output when one matching sample was found. This functionality can be achieved by pre-filtering with view and then streaming to query. Compare
    bcftools query -f'[%CHROM:%POS %SAMPLE %GT\n]' -i'GT="alt"' file.bcf
    and
    bcftools view -i'GT="alt"' file.bcf -Ou | bcftools query -f'[%CHROM:%POS %SAMPLE %GT\n]'

  • annotate: New -k, --keep-sites option

  • consensus: Fix --iupac-codes output

  • csq: Homs always considered phased and other fixes

  • norm: Make -c none work and remove query -c

  • roh: Fix errors in the RG output

  • stats: Allow IUPAC ambiguity codes in the reference file; report the number of missing genotypes

  • +fill-tags: Add ExcHet annotation

  • +setGt: Fix bug in binom.test calculation, previously it worked only for nAlt<nRef!

  • +split: New plugin to split a multi-sample file into single-sample files in one go

  • Improve python3 compatibility in plotting scripts


The bcftools-1.7.tar.bz2 download is the full source code release. The “Source code” downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.

1.6

28 Sep 16:54
1.6
Compare
Choose a tag to compare
1.6
  • New sort command.

  • New options added to the consensus command. Note that the -i, --iupac option has been renamed to -I, --iupac, in favor of the standard -i, --include.

  • Filtering expressions (-i/-e): support for GT=<type> expressions and for lists and ranges (#639) - see the man page for details.

  • csq: relax some GFF3 parsing restrictions to enable using Ensembl GFF3 files for plants (#667)

  • stats: add further documentation to output stats files (#316) and include haploid counts in per-sample output (#671).

  • plot-vcfstats: further fixes for Python3 (@nsoranzo, #645, #666).

  • query bugfix (#632)

  • +setGT plugin: new option to set genotypes based on a two-tailed binomial distribution test. Also, allow combining -i/-e with -t q.

  • mpileup: fix typo (#636)

  • convert --gvcf2vcf bugfix (#641)

  • +mendelian: recognize some mendelian inconsistencies that were being missed (@oronnavon, #660), also add support for multiallelic sites and sex chromosomes.


The bcftools-1.6.tar.bz2 download is the full source code release. The “Source code” downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.

1.5

21 Jun 11:56
Compare
Choose a tag to compare
1.5
  • Added autoconf support to bcftools. See INSTALL for more details.

  • norm: Make norm case insensitive (#601). Trim the reference allele (#602).

  • mpileup: fix for misreported indel depths for reads containing adjacent indels (3c1205c).

  • plot-vcfstats: Open stats file in text mode, not binary (#618).

  • fixref plugin: Allow multiallelic sites in the -i, --use-id reference. Also flip genotypes, not just REF/ALT!

  • merge: fix gVCF merge bug when last record on a chromosome opened a gVCF block (#616)

  • New options added to the ROH plotting script.

  • consensus: Properly flush chain info (#606, thanks to @krooijers).

  • New +prune plugin for pruning sites by LD (R2) or maximum number of records within a window.

  • New N_MISSING, F_MISSING (number and fraction missing) filtering expressions.

  • Fix HMM initialization in roh when snapshots are used in multiple chromosome VCF.

  • Fix buffer overflow (#607) in filter.


The bcftools-1.5.tar.bz2 download is the full source code release. The “Source code” downloads are generated by GitHub and are incomplete as they [don't bundle HTSlib and] are missing some generated files.