Index | index by Group | index by Distribution | index by Vendor | index by creation date | index by Name | Mirrors | Help | Search |
Name: bgzip | Distribution: openSUSE Tumbleweed |
Version: 1.20 | Vendor: openSUSE |
Release: 1.1 | Build date: Mon Jul 15 14:17:55 2024 |
Group: Productivity/Scientific/Other | Build host: reproducible |
Size: 115932 | Source RPM: libhts-1.20-1.1.src.rpm |
Packager: https://bugs.opensuse.org | |
Url: https://github.com/samtools/htslib/ | |
Summary: Block compression/decompression utility from the HTSlib project |
HTSlib is an implementation of a unified C library for accessing common file formats, such as SAM, CRAM and VCF, used for high-throughput sequencing data, and is the core library used by samtools and bcftools. HTSlib implements a generalized BAM index, with file extension .csi (coordinate-sorted index). The HTSlib file reader first looks for the new index and then for the old if the new index is absent. This project also includes the popular tabix indexer, which indexes both .tbi and .csi formats, and the bgzip compression utility.
MIT
* Mon Jul 15 2024 jun wang <jgwang@suse.com> - Update to version 1.20 * When working on named files, bgzip now sets the modified and access times of the output files it makes to match those of the corresponding input * It's now possible to use a -o option to specify the output file name in bgzip * Improved error faidx error messages * Faster reading of SAM array (type "B") tags. These often turn up in ONT and PacBio data * Improved validity checking of base modification tags * mpileup overlap removal now works where one read has a deletion * The S3 plugin can now find buckets via S3 access point aliases * Added a --threads option (and -@ short option) to tabix * tabix can now index Graph Alignment Format (GAF) files * Security fix: Prevent possible heap overflow in cram_encode_aux() on bad RG:Z tags * Security fix: Prevent attempts to call a NULL pointer if certain URL schemes are used in CRAM @SQ UR: tags * Security fix: Fixed a bug where following certain AWS S3 redirects could downgrade the connection from TLS (i.e. https://) to unencrypted http://. This could happen when using path-based URLs and AWS_DEFAULT_REGION was set to a region other that the one where the data was stored * Fixed arithmetic overflow when loading very long references for CRAM * Fixed faidx and CRAM reference look-ups on compressed fasta where the .fai index file was present, but the .gzi index of compressed offsets was not * Fixed BCF indexing on-the-fly bug which produced invalid indexes when using multiple compression threads * Ensure that pileup destructors are called by bam_plp_destroy(), to prevent memory leaks * Ensure on-the-fly index timestamps are always older than the data file. Previously the files could be closed out of order, leading to warnings being printed when using the index * To prevent data corruption when reading (strictly invalid) VCF files with duplicated FORMAT tags, all but the first copy of the data associated with the tag are now dropped with a warning * Fixed a bug introduced in release 1.19 (PR #1689) which broke variant record data if it tried to remove an over-long tag * Changed error to warning when complaining about use of the CG tag in SAM or CRAM files * Wed Jan 03 2024 Stefan Brüns <stefan.bruens@rwth-aachen.de> - Update to version 1.19 Too many changes to list, for details see https://github.com/samtools/htslib/releases/tag/1.19 - Update to version 1.18 Too many changes to list, for details see https://github.com/samtools/htslib/releases/tag/1.18 - Update to version 1.17 Too many changes to list, for details see https://github.com/samtools/htslib/releases/tag/1.17 * Thu Sep 08 2022 Stefan Brüns <stefan.bruens@rwth-aachen.de> - Update to version 1.16 * Make hfile_s3 refresh AWS credentials on expiry in order to make HTSlib work better with AWS IAM credentials, which have a limited lifespan. * Allow BAM headers between 2GB and 4GB in size once more. * Improve error message when failing to load an index. * Permit MM (base modification) tags containing . and ? suffixes. * Warn if spaces instead of tabs are detected in a VCF file to prevent confusion. * Add an sclen filter expression keyword. * Improve error messages for CRAM reference mismatches. * Expose more of the CRAM API and add new functionality to extract the reference from a CRAM file. * Improvements to the implementation of embedded references in CRAM where no external reference is specified. * The CRAM writer now allows alignment records with RG:Z: aux tags that don't have a corresponding @RG ID in the file header. * Set tab delimiter in man page for tabix GFF3 sort. * When using libdeflate, the 1...9 scale of BGZF compression levels is now remapped to the 1...12 range used by libdeflate instead of being passed directly. * The VCF variant API has been extended so that it can return separate flags for INS and DEL variants as well as the existing INDEL one. * The missing, but trivial, le_to_u8() function has been added to hts_endian. * bcf_format_gt() now works properly on big-endian platforms. * Update htscodecs to version 1.3.0 for new SIMD code + various fixes. * Detect ARM Neon support and only build appropriate SIMD object files. * make print-config now reports extra CFLAGS that are needed to build the SIMD parts of htscodecs. * Fixed some Makefile dependency issues for the check/test targets and plugins. * Fix bug when reading position -1 in BCF (0 in VCF), which is used to indicate telomeric regions. * Various bugs and quirks have been fixed in the filter expression engine, mostly related to the handling of absent tags, and the is_true flag. * Fix buffer overrun in bam_plp_insertion_mod. * Remove limit of returned size from fai_retrieve(). * Cap hts_getline() return value at INT_MAX. * Fix breakend detection and test bcf_set_variant_type(). * Prevent arrays of BCF_BT_NULL values found in BCF files from causing bcf_fmt_array() to call exit() as the type is unsupported. * Improved detection of fasta and fastq files that have very long comments following identifiers. * Fixed a SEGV triggered by giving a SAM file to samtools import. * Wed Apr 20 2022 Ferdinand Thiessen <rpm@fthiessen.de> - Use system provided libhtscodecs - Update to version 1.15.1 * Security fix: Fixed broken error reporting in the sam_cap_mapq() function, due to a missing hts_log() parameter. Prior to this fix it was possible to abuse the log message format string by passing a specially crafted alignment record to this function. * Fixed excessive memory used by multi-threaded SAM output on long reads. * Fixed a bug where tabix would misinterpret region specifiers starting at position 0. * The VCF header parser will now issue a warning if it finds an INFO header with Type=Flag but Number not equal to 0. - Update to version 1.15 * Bgzip now has a --keep option to not remove the input file after compressing. * Improved file format detection so some BED files are no longer detected as FASTQ or FASTA. * Added xz (lzma), zstd and D4 formats to the file type detection functions. We don't actively support reading these data types, but function calls and htsfile can detect them. * CRAM now also uses libdeflate for read-names if the libdeflate version is new enough (1.9 onwards). Previously we used zlib for this due to poor performance of libdeflate. * The VCF and BCF readers will now issue a warning if contig, INFO or FORMAT IDs do not match the formats described in the VCFv4.3 specification. * Bug fixes - Update to version 1.14 * Added a keep option to bgzip to leave the original file untouched. * endpos has been added to the filter language, giving the position of the rightmost mapped base as measured by the CIGAR string. * Interfaces have been added to interpret the new base modification tags added to the SAMtags document in samtools/hts-specs#418. * New API functions hts_flush()/sam_flush()/bcf_flush() for flushing output htsFile/samFile/vcfFile streams. * The synced_bcf_reader now sorts lines with symbolic alleles by END tag as well as POS. * Added synced_bcf_reader options BCF_SR_REGIONS_OVERLAP and BCF_SR_TARGETS_OVERLAP for better control of records that start outside the desired region but overlap it are handled. * HTSlib will now accept long-cigar CG:B: tags made by htsjdk which don't quite follow the specification properly * The FASTA and FASTQ readers get an option to skip over the first item on the header line, and use the second as the read name. * HTSlib is now more strict when parsing the VCF samples line * HTSlib will now warn if it looks like the header has been corrupted by diagnostic messages from the program that made it. * File format detection will now recognise signatures for XZ, Zstd and D4 files (note that HTSlib will not read them yet). * Bug fixes - Update to version 1.13 * In case a PG header line has multiple ID tags supplied by other applications, the header API now selects the first one encountered as the identifying tag and issues a warning when detecting subsequent ID tags. * VCF header reading function (vcf_hdr_read) no longer tries to download a remote index file by default. * Support reading and writing FASTQ format in the same way as SAM, BAM or CRAM. Records read from a FASTQ file will be treated as unmapped data. * Added GCP requester pays bucket access. * Made mpileup's overlap removal choose which copy to remove at random instead of always removing the second one. * It is now possible to use platform specific BAQ parameters. * A new hts_bin_level API function has been added, to compute the level of a given bin in the binning index. * Related to the above, a new API method, hts_idx_nseq, now returns the total number of contigs from an index. * Added bracket handling to bcf_hdr_parse_line, for use with ##META lines. * Bug fixes * Wed Apr 28 2021 Ferdinand Thiessen <rpm@fthiessen.de> - Update to version 1.12: * Added experimental CRAM 3.1 and 4.0 support. * Added a general filtering syntax for alignment records in SAM/BAM/CRAM readers. * The knet networking code has been removed if favor of libcurl * The old htslib/knetfile.h interfaces have been marked as deprecated. * Added an introspection API for checking some of the capabilities provided by HTSlib * Made performance improvements to probaln_glocal method, which speeds up mpileup BAQ calculations. * Added a public method for constructing a BAM record from the component pieces. * Added two public methods, sam_parse_cigar and bam_parse_cigar, as part of a small CIGAR API * HTSlib, and the included htsfile program, will now recognise the old RAZF compressed file format. Note that while the format is detected, HTSlib is unable to read it. * The S3 plugin now has options to force the address style. It will recognise the addressing_style and host_bucket entries in the respective AWS .credentials and s3cmd .s3cfg files. * Fixed VCF #CHROM header parsing to only separate columns at tab characters. * Fixed a crash reported in bcf_sr_sort_set, which expects REF to be present. * Fixed a bcf synced reader bug when filtering with a region list, and the first record for a chromosome had the same position as the last record for the previous chromosome. * Fixed a bug in the overlapping logic of mpileup, dealing with iterating over CIGAR segments. * Fixed a tabix bug that prevented setting the correct number of lines to be skipped in a region file. * Made bam_itr_next an alias for sam_itr_next, to prevent it from crashing when working with htsFile pointers. * Fixed once per outgoing multi-threaded block bgzf_idx_flush assertion, to accommodate situations when a single record could span multiple blocks. * Fixed assumption of pthread_t being a non-structure, as permitted by POSIX. * Fixed the minimum offset of a BAI index bin, to account for unmapped reads. * Fixed the CRLF handling in sam_parse_worker method. * Included unistd.h and errno.h directly in HTSlib files, as opposed to including them indirectly, via third party code. * Fri Feb 19 2021 Wang Jun <jgwang@suse.com> - Update to version 1.11: https://github.com/samtools/htslib/releases/tag/1.11 * Features and Updates - Support added for remote reference files. - Added tabix --separate-regions option - Added tabix --cache option to set a BGZF block cache size. - Improved error checking in tabix and added a --verbosity option - A note about the maximum chromosome length usable with TBI indexes has been added to the tabix manual page. - New method vcf_open_mode() changes the opening mode of a variant file based on its file extension. - The VCF parser has been made faster and easier to maintain. - bcf_record_check() has been made faster - The VCF parser now recognises the <NON_REF> symbolic allele produced by GATK. - Support has been added for simultaneous reading of unindexed VCF/BCF files when using the synced_bcf_reader interface. - The VCF and BCF readers will now attempt to fix up invalid INFO/END tags - The htsFile interface can now detect the crypt4gh encrypted format - hts_srand48() now seeds the same POSIX-standard sequences of pseudo-random numbers regardless of platform, including on OpenBSD where plain srand48() produces a different cryptographically-strong non-deterministic sequence. - Iterators now work with 64 bit positions. - Improved the speed of range queries when using BAI indexes by making better use of the linear index data included in the file. - Alignments which consume no reference bases are now considered to have length 1. - A bam_set_seqi() function to modify a single base in the BAM structure has been added. - Writing SAM format is around 30% faster. - Added sam_format_aux1() which converts a BAM aux tag to a SAM format string. - bam_aux_update_str() no longer requires NUL-terminated strings. - It is now possible to use external plug-ins in language bindings that dynamically load HTSlib. - bgzf_close(), and therefore hts_close(), will now return non-zero when closing a BGZF handle on which errors have been detected. - Added a special case to the kt_fisher_exact() test for when the table probability is too small to be represented in a double. - Improved error diagnostics in the CRAM decoder (#1042), BGZF (#1049), the VCF and BCF readers (#1059), and the SAM parser (#1073). - ks_resize() now allocates 1.5 times the requested size when it needs to expand a kstring instead of rounding up to the next power of two. * Bug fixes - Fixed hfile_libcurl breakage when using libcurl 7.69.1 or later. - Fixed overflows kroundup32() and kroundup_size_t() which caused them to return zero when rounding up values where the most significant bit was set. - Fixed missing return parameter value in idx_test_and_fetch(). - Fixed crashes due to inconsistent selection between BGZF and plain (hFILE) interfaces when reading files. - Added and/or fixed byte swapping code for big-endian platforms. - Fixed a problem with multi-threaded on-the-fly indexes which would occasionally write virtual offsets pointing at the end of a BGZF block. - In sam_hdr_create(), free newly allocated SN strings when encountering an error. - Prevent double free in case of idx_test_and_fetch() failure. - In the header, link a new PG line only to valid chains. Prevents an explosive growth of PG lines on headers where PG lines are already present but not linked together correctly. - Also in the header, when calling sam_hdr_update_line(), update target arrays only when the name or length is changed. - Fixed buffer overflows in CRAM MD5 calculation triggered by files with invalid compression headers, or files with embedded references that were one byte too short. - Fix mpileup regression between 1.9 and 1.10 where overlap detection was incorrectly skipped on reads where RNEXT, PNEXT and TLEN were set to the "unavailable" values ("*", 0, 0 in SAM). - kputs() now checks for null pointer in source string. - Fix potential bcf_update_alleles() crash on 0 alleles. - Added bcf_unpack() calls to some bcf_update functions to fix a bug where updates made after a call to bcf_dup() could be lost. - Error message typo "Number=R" instead of "Number=G" fixed in bcf_remove_allele_set(). - Fixed crashes that could occur in BCF files that use IDX= header annotations to create a sparse set of CHROM, FILTER or FORMAT indexes, and include records that use one of the missing index values. - Fixed potential integer overflows in the VCF parser and ensured that the total length of FORMAT fields cannot go over 2Gbytes. - Download index files atomically in idx_test_and_fetch(). This prevents corruption when running parallel jobs on S3 files. - The pileup constructor callback is now given the copy of the bam1_t struct made by pileup instead of the original one passed to bam_plp_push(). - Fixed possible error in code_sort() on negative CRAM Huffman code length. - Fixed possible undefined shift in cram_byte_array_stop_decode_init(). - Fixed a bug where range queries to the end of a given reference would return incorrect results on CRAM files. - Fixed an integer overflow in cram_read_slice(). - Fixed a memory leak on failure in cram_decode_slice(). - Fixed a regression which caused cram_transcode_rg() to fail, resulting in a crash in "samtools cat" on CRAM files. - Fixed an undersized string reallocation in the threaded SAM reader which caused it to crash when reading SAM files with very long lines. Numerous memory allocation checks have also been added. * Mon Feb 10 2020 Todd R <toddrme2178@gmail.com> - Include baselibs.conf as source file. - Update baselibs.conf. * Wed Feb 05 2020 Todd R <toddrme2178@gmail.com> - Update to release 1.10.2 * This is a release fix that corrects minor inconsistencies discovered in previous deliverables. - Update to release 1.10.1 * The support for 64-bit coordinates in VCF brought problems for files not conforming to VCF/BCF specification. While previous versions would make out-of-range values silently overflow creating nonsense values but parseable file, the version 1.10 would silently create an invalid BCF. - Update to release 1.10 + Brief summary * Addition of support for references longer than 2Gb (NB: SAM and VCF formats only, not their binary counterparts). This may need changes in code using HTSlib. See README.large_positions.md for more information. * Added a SAM header API. * Major speed up to SAM reading and writing. This also now supports multi-threading. * We can now auto-index on-the-fly while writing a file. This also includes to bgzipped SAM.gz. * Overhaul of the S3 interface, which now supports version 4 signatures. This also makes writing to S3 work. * Thu Sep 06 2018 flyos@mailoo.org - Update to 1.9 * If `./configure` fails, `make` will stop working until either configure is re-run successfully, or `make distclean` is used. This makes configuration failures more obvious. (#711, thanks to John Marshall) * The default SAM version has been changed to 1.6. This is in line with the latest version specification and indicates that HTSlib supports the CG tag used to store long CIGAR data in BAM format. * bgzip integrity check option '--test' (#682, thanks to @sd4B75bJ, @jrayner) * Faidx can now index fastq files as well as fasta. The fastq index adds an extra column to the `.fai` index which gives the offset to the quality values. New interfaces have been added to `htslib/faidx.h` to read the fastq index and retrieve the quality values. It is possible to open a fastq index as if fasta (only sequences will be returned), but not the other way round. (#701) * New API interfaces to add or update integer, float and array aux tags. (#694) * Add `level=<number>` option to `hts_set_opt()` to allow the compression level to be set. Setting `level=0` enables uncompressed output. (#715) * Improved bgzip error reporting. * Better error reporting when CRAM reference files can't be opened. (#706) * Fixes to make tests work properly on Windows/MinGW - mainly to handle line ending differences. (#716) * Efficiency improvements: - Small speed-up for CRAM indexing. - Reduce the number of unnecessary wake-ups in the thread pool. (#703) - Avoid some memory copies when writing data, notably for uncompressed BGZF output. (#703) * Bug fixes: - Fix multi-region iterator bugs on CRAM files. (#684) - Fixed multi-region iterator bug that caused some reads to be skipped incorrectly when reading BAM files. (#687) - Fixed synced_bcf_reader() bug when reading contigs multiple times. (#691, reported by @freeseek) - Fixed bug where bcf_hdr_set_samples() did not update the sample dictionary when removing samples. (#692, reported by @freeseek) - Fixed bug where the VCF record ref length was calculated incorrectly if an INFO END tag was present. (71b00a) - Fixed warnings found when compiling with gcc 8.1.0. (#700) - sam_hdr_read() and sam_hdr_write() will now return an error code if passed a NULL file pointer, instead of crashing. - Fixed possible negative array look-up in sam_parse1() that somehow escaped previous fuzz testing. (#731, reported by @fCorleone) - Fixed bug where cram range queries could incorrectly report an error when using multiple threads. (#734, reported by Brent Pedersen) - Fixed very rare rANS normalisation bug that could cause an assertion failure when writing CRAM files. (#739, reported by @carsonhh) * Thu Jul 12 2018 flyos@mailoo.org - Better formatting of the descriptions - Spec file cleaned up using spec-cleaner - Update to 1.8 * The URL to get sequences from the EBI reference server has been changed to https://. This is because the EBI no longer serve sequences via plain HTTP - requests to the http:// endpoint just get redirected. HTSlib needs to be linked against libcurl to download https:// URLs, so CRAM users who want to get references from the EBI will need to run configure and ensure libcurl support is enabled using the - -enable-libcurl option. * Added libdeflate as a build option for alternative faster compression and decompression. Results vary by CPU but compression should be twice as fast and decompression faster. * It is now possible to set the compression level in bgzip. (#675; thanks to Nathan Weeks). * bgzip now gets its own manual page. * CRAM encoding now stored MD and NM tags verbatim where the reference contains 'N' characters, to work around ambiguities in the SAM specification (samtools #717/762). Also added "store_md" and "store_nm" cram-options for forcing these tags to be stored at all locations. This is best when combined with a subsequent decode_md=0 option while reading CRAM. * Multiple CRAM bug fixes, including a fix to free and the subsequent reuse of references with `-T ref.fa`. (#654; reported by Chris Saunders) * CRAM multi-threading bugs fixed: don't try to call flush on reading; processing of multiple range queries; problems with multi-slice containers. * Fixed crashes caused when decoding some cramtools produced CRAM files. * Fixed a couple of minor rANS issues with handling invalid data. * Fixed bug where probaln_glocal() tried to allocate far more memory than needed when the query sequence was much longer than the reference. This caused crashes in samtools and bcftools mpileup when used on data with very long reads. (#572, problem reported by Felix Bemm via minimap2). * sam_prop_realn() now returns -1 (the same value as for unmapped reads) on reads that do not include at least one 'M', 'X' or '=' CIGAR operator, and no longer adds BQ or ZQ tags. BAQ adjustments are only made to bases covered by these operators so there is no point in trying to align reads that do not have them. (#572) * BAM: HTSlib now supports BAMs which include CIGARs with more than 65535 operations as per HTS-Specs 18th November (dab57f4 and 2f915a8). * BCF/VCF: - Removed the need for long double in pileup calculations. - Sped up the synced reader in some situations. - Bug fixing: removed memory leak in bcf_copy. * CRAM: - Added support for HTS_IDX_START in cram iterators. - Easier to build when lzma header files are absent. - Bug fixing: a region query with REQUIRED_FIELDS option to disable sequence retrieval now gives correct results. - Bug fixing: stop queries to regions starting after the last read on a chromosome from incorrectly reporting errors (#651, #653; reported by Imran Haque and @egafni via pysam). * Multi-region iterator: The new structure takes a list of regions and iterates over all, deduplicating reads in the process, and producing a full list of file offset intervals. This is usually much faster than repeatedly using the old single-region iterator on a series of regions. * Curl improvements: - Add Bearer token support via HTS_AUTH_LOCATION env (#600). - Use CURL_CA_BUNDLE environment variable to override the CA (#622; thanks to Garret Kelly & David Alexander). - Speed up (removal of excessive waiting) for both http(s) and ftp. - Avoid repeatedly reconnecting by removal of unnecessary seeks. - Bug fixing: double free when libcurl_open fails. * BGZF block caching, if enabled, now performs far better (#629; reported by Ram Yalamanchili). * Added an hFILE layer for in-memory I/O buffers (#590; thanks to Thomas Hickman). * Tidied up the drand48 support (intended for systems that do not provide this function). * Fixed bug where iterators on CRAM files did not propagate error return values to the caller correctly. Thanks go to Chris Saunders. * Overhauled Windows builds. Building with msys2/mingw64 now works correctly and passes all tests. * More improvements to logging output (thanks again to Anders Kaplan). * Return codes from sam_read1() when reading cram have been made consistent with those returned when reading sam/bam. Thanks to Chris Saunders (#575). * BGZF CRC32 checksums are now always verified. * It's now possible to set nthreads = 1 for cram files. * hfile_libcurl has been modified to make it thread-safe. It's also better at handling web servers that do not honour byte range requests when attempting to seek - it now sets errno to ESPIPE and keeps the existing connection open so callers can revert to streaming mode it they want to. * hfile_s3 now recalculates access tokens if they have become stale. This fixes a reported problem where authentication failed after a file had been in use for more than 15 minutes. * Fixed bug where remote index fetches would fail to notice errors when writing files. * bam_read1() now checks that the query sequence length derived from the CIGAR alignment matches the sequence length in the BAM record.
/usr/bin/bgzip /usr/share/man/man1/bgzip.1.gz
Generated by rpm2html 1.8.1
Fabrice Bellet, Thu Jan 2 23:37:42 2025