Skip to content

Releases: manticoresoftware/manticoresearch

Manticore Search 5.0.0

18 May 14:19
Compare
Choose a tag to compare

Manticore Search 5.0.0, May 18th 2022

➡️➡️➡️ DOWNLOAD HERE ⬅️⬅️⬅️

Release blogpost https://manticoresearch.com/blog/manticore-search-5-0-0/

Major new features

  • 🔬 Support for Manticore Columnar Library 1.15.2, which enables Secondary indexes beta version. Building secondary indexes is on by default for plain and real-time columnar and row-wise indexes (if Manticore Columnar Library is in use), but to enable it for searching you need to set secondary_indexes = 1 either in your configuration file or using SET GLOBAL. The new functionality is supported in all operating systems except old Debian Stretch and Ubuntu Xenial.
  • Read-only mode: you can now specify listeners that process only read queries discarding any writes.
  • New /cli endpoint for running SQL queries over HTTP even easier.
  • Faster bulk INSERT/REPLACE/DELETE via JSON over HTTP: previously you could provide multiple write commands via HTTP JSON protocol, but they were processed one by one, now they are handled as a single transaction.
  • #720 Nested filters support in JSON protocol. Previously you couldn't code things like a=1 and (b=2 or c=3) in JSON: must (AND), should (OR) and must_not (NOT) worked only on the highest level. Now they can be nested.
  • Support for Chunked transfer encoding in HTTP protocol. You can now use chunked transfer in your application to transfer large batches with lower resource consumption (since you don't need to calculate Content-Length). On the server's side Manticore now always processes incoming HTTP data in streaming fashion without waiting for the whole batch to be transferred as previously, which:
    • decreases peak RAM consumption, which lowers a chance of OOM
    • decreases response time (our tests showed 11% decrease for processing a 100MB batch)
    • lets you overcome max_packet_size and transfer batches much larger than the largest allowed value of max_packet_size (128MB), e.g. 1GB at once.
  • #719 HTTP interface support of 100 Continue: now you can transfer large batches from curl (including curl libraries used by various programming languages) which by default does Expect: 100-continue and waits some time before actually sending the batch. Previously you had to add Expect: header, now it's not needed.
  • Having at least one full-text field in a real-time/plain index is not mandatory anymore. You can now use Manticore even in cases not having anything to do with full-text search.
  • Fast fetching for attributes backed by Manticore Columnar Library: queries like select * from <columnar table> are now much faster than previously, especially if there are many fields in the schema.
  • ⚠️ BREAKING CHANGE: Implicit cutoff. Manticore now doesn't spend time and resources processing data you don't need in the result set which will be returned. The downside is that it affects total_found in SHOW META and hits.total in JSON output. It is now only accurate in case you see total_relation: eq while total_relation: gte means the actual number of matching documents is greater than the total_found value you've got. To retain the previous behaviour you can use search option cutoff=0, which makes total_relation always eq.
  • ⚠️ BREAKING CHANGE: All full-text fields are now stored by default in plain indexes. You need to use stored_fields = (empty value) to make all fields non-stored (i.e. revert to the previous behaviour).
  • #715 HTTP JSON supports search options.

Minor changes

  • ⚠️ BREAKING CHANGE: Index meta file format change. Previously meta files (.meta, .sph) were in binary format, now it's just json. The new Manticore version will convert older indexes automatically, but:
    • you can get warning like WARNING: ... syntax error, unexpected TOK_IDENT
    • you won't be able to run the index with previous Manticore versions, make sure you have a backup
  • ⚠️ BREAKING CHANGE: Session state support with help of HTTP keep-alive. This makes HTTP stateful when the client supports it too. For example, using the new /cli endpoint and HTTP keep-alive (which is on by default in all browsers) you can call SHOW META after SELECT and it will work the same way it works via mysql. Note, previously Connection: keep-alive HTTP header was supported too, but it only caused reusing the same connection. Since this version it also makes the session stateful.
  • You can now specify columnar_attrs = * to define all your attributes as columnar in the plain mode which is useful in case the list is long.
  • Faster replication SST
  • ⚠️ BREAKING CHANGE: Replication protocol has been changed. If you are running a replication cluster, then when upgrading to Manticore 5 you need to:
    • stop all your nodes first cleanly
    • and then start the node which was stopped last with --new-cluster (run tool manticore_new_cluster in Linux).
    • read about restarting a cluster for more details.
  • Replication improvements:
    • Faster SST
    • Noise resistance which can help in case of unstable network between replication nodes
    • Improved logging
  • Security improvement: Manticore now listens on 127.0.0.1 instead of 0.0.0.0 in case no listen at all is specified in config. Even though in the default configuration which is shipped with Manticore Search the listen setting is specified and it's not typical to have a configuration with no listen at all, it's still possible. Previously Manticore would listen on 0.0.0.0 which is not secure, now it listens on 127.0.0.1 which is usually not exposed to the Internet.
  • Faster aggregation over columnar attributes.
  • Increased AVG() accuracy: previously Manticore used float internally for aggregations, now it uses double which increases the accuracy significantly.
  • Improved support for JDBC MySQL driver.
  • DEBUG malloc_stats support for jemalloc.
  • optimize_cutoff is now available as a per-table setting which can be set when you CREATE or ALTER a table.
  • ⚠️ BREAKING CHANGE: query_log_format is now sphinxql by default. If you are used to plain format you need to add query_log_format = plain to your configuration file.
  • Significant memory consumption improvements: Manticore consumes significantly less RAM now in case of long and intensive insert/replace/optimize workload in case stored fields are used.
  • shutdown_timeout default value was increased from 3 seconds to 60 seconds.
  • Commit ffd0499d Support for Java mysql connector >= 6.0.3: in Java mysql connection 6.0.3 they changed the way they connect to mysql which broke compatibility with Manticore. The new behaviour is now supported.
  • Commit 1da6dbec disabled saving a new disk chunk on loading an index (e.g. on searchd startup).
  • Issue #746 Support for glibc >= 2.34.
  • Issue #784 count 'VIP' connections separately from usual (non-VIP). Previously VIP connections were counted towards the max_connections limit, which could cause "maxed out" error for non-VIP connections. Now VIP connections are not counted towards the limit. Current number of VIP connections can be also seen in SHOW STATUS and status.
  • ID can now be specified explicitly.

⚠️ Other minor breaking changes

Read more

Manticore Search 4.2.0

23 Dec 06:37
Compare
Choose a tag to compare

Manticore Search 4.2.0, Dec 23rd 2021

➡️➡️➡️ DOWNLOAD HERE ⬅️⬅️⬅️

Release blogpost

Major new features

  • Pseudo-sharding support for real-time indexes and full-text queries. In previous release we added limited pseudo sharding support. Starting from this version you can get all benefits of the pseudo sharding and your multi-core processor by just enabling searchd.pseudo_sharding. The coolest thing is that you don't need to do anything with your indexes or queries for that, just enable it and if you have free CPU it will be used to lower your response time. It supports plain and real-time indexes for full-text, filtering and analytical queries. For example, here is how enabling pseudo sharding can make most queries' response time in average about 10x lower on Hacker news curated comments dataset multiplied 100 times (116 million docs in a plain index).

4.2.0 pseudo sharding on vs off

  • PQ transactions are now atomic and isolated. Previously PQ transactions support was limited. It enables much faster REPLACE into PQ, especially when you need to replace a lot of rules at once. Performance details:

Previous version 4.0.2

It takes 48 seconds to insert 1M PQ rules and 406 seconds to REPLACE just 40K in 10K batches.

root@perf3 ~ # mysql -P9306 -h0 -e "drop table if exists pq; create table pq (f text, f2 text, j json, s string) type='percolate';"; date; for m in `seq 1 1000`; do (echo -n "insert into pq (id,query,filters,tags) values "; for n in `seq 1 1000`; do echo -n "(0,'@f (cat | ( angry dog ) | (cute mouse)) @f2 def', 'j.json.language=\"en\"', '{\"tag1\":\"tag1\",\"tag2\":\"tag2\"}')"; [ $n != 1000 ] && echo -n ","; done; echo ";")|mysql -P9306 -h0; done; date; mysql -P9306 -h0 -e "select count(*) from pq"

Wed Dec 22 10:24:30 AM CET 2021
Wed Dec 22 10:25:18 AM CET 2021
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+

root@perf3 ~ # date; (echo "begin;"; for offset in `seq 0 10000 30000`; do n=0; echo "replace into pq (id,query,filters,tags) values "; for id in `mysql -P9306 -h0 -NB -e "select id from pq limit $offset, 10000 option max_matches=1000000"`; do echo "($id,'@f (tiger | ( angry bear ) | (cute panda)) @f2 def', 'j.json.language=\"de\"', '{\"tag1\":\"tag1\",\"tag2\":\"tag2\"}')"; n=$((n+1)); [ $n != 10000 ] && echo -n ","; done; echo ";"; done; echo "commit;") > /tmp/replace.sql; date
Wed Dec 22 10:26:23 AM CET 2021
Wed Dec 22 10:26:27 AM CET 2021
root@perf3 ~ # time mysql -P9306 -h0 < /tmp/replace.sql

real	6m46.195s
user	0m0.035s
sys	0m0.008s

Previous version 4.2.0

It takes 34 seconds to insert 1M PQ rules and 23 seconds to REPLACE them in 10K batches.

root@perf3 ~ # mysql -P9306 -h0 -e "drop table if exists pq; create table pq (f text, f2 text, j json, s string) type='percolate';"; date; for m in `seq 1 1000`; do (echo -n "insert into pq (id,query,filters,tags) values "; for n in `seq 1 1000`; do echo -n "(0,'@f (cat | ( angry dog ) | (cute mouse)) @f2 def', 'j.json.language=\"en\"', '{\"tag1\":\"tag1\",\"tag2\":\"tag2\"}')"; [ $n != 1000 ] && echo -n ","; done; echo ";")|mysql -P9306 -h0; done; date; mysql -P9306 -h0 -e "select count(*) from pq"

Wed Dec 22 10:06:38 AM CET 2021
Wed Dec 22 10:07:12 AM CET 2021
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+

root@perf3 ~ # date; (echo "begin;"; for offset in `seq 0 10000 990000`; do n=0; echo "replace into pq (id,query,filters,tags) values "; for id in `mysql -P9306 -h0 -NB -e "select id from pq limit $offset, 10000 option max_matches=1000000"`; do echo "($id,'@f (tiger | ( angry bear ) | (cute panda)) @f2 def', 'j.json.language=\"de\"', '{\"tag1\":\"tag1\",\"tag2\":\"tag2\"}')"; n=$((n+1)); [ $n != 10000 ] && echo -n ","; done; echo ";"; done; echo "commit;") > /tmp/replace.sql; date
Wed Dec 22 10:12:31 AM CET 2021
Wed Dec 22 10:14:00 AM CET 2021
root@perf3 ~ # time mysql -P9306 -h0 < /tmp/replace.sql

real	0m23.248s
user	0m0.891s
sys	0m0.047s

Minor changes

  • optimize_cutoff is now available as a configuration option in section searchd. It's useful when you want to limit the RT chunks count in all your indexes to a particular number globally.
  • Commit 00874743 accurate count(distinct ...) and FACET ... distinct over several local physical indexes (real-time/plain) with identical fields set/order.
  • PR #598 bigint support for YEAR() and other timestamp functions.
  • Commit 8e85d4bc Adaptive rt_mem_limit. Previously Manticore Search was collecting exactly up to rt_mem_limit of data before saving a new disk chunk to disk, and while saving was still collecting up to 10% more (aka double-buffer) to minimize possible insert suspension. If that limit was also exhausted, adding new documents was blocked until the disk chunk was fully saved to disk. The new adaptive limit is built on the fact that we have auto-optimize now, so it's not a big deal if disk chunks do not fully respect rt_mem_limit and start flushing a disk chunk earlier. So, now we collect up to 50% of rt_mem_limit and save that as a disk chunk. Upon saving we look at the statistics (how much we've saved, how many new documents have arrived while saving) and recalculate the initial rate which will be used next time. For example, if we saved 90 million documents, and another 10 million docs arrived while saving, the rate is 90%, so we know that next time we can collect up to 90% of rt_mem_limit before starting flushing another disk chunk. The rate value is calculated automatically from 33.3% to 95%.
  • Issue #628 unpack_zlib for PostgreSQL source. Thank you, Dmitry Voronin for the contribution.
  • Commit 6d54cf2b indexer -v and --version. Previously you could still see indexer's version, but -v/--version were not supported.
  • Issue #662 infinit mlock limit by default when Manticore is started via systemd.
  • Commit 63c8cd05 spinlock -> op queue for coro rwlock.
  • Commit 41130ce3 environment variable MANTICORE_TRACK_RT_ERRORS useful for debugging RT segments corruption.

Breaking changes

  • Binlog version was increased, binlog from previous version won't be replayed, so make sure you stop Manticore Search cleanly during upgrade: no binlog files should be in /var/lib/manticore/binlog/ except binlog.meta after stopping the previous instance.
  • Commit 3f659f36 new column "chain" in show threads option format=all. It shows stack of some task info tickets, most useful for profiling needs, so if you are parsing show threads output be aware of the new column.
  • searchd.workers was obsoleted since 3.5.0, now it's deprecated, if you still have it in your configuration file it will trigger a warning on start. Manticore Search will start, but with a warning.

Bugfixes

Read more

Manticore Search 4.0.2

21 Sep 09:42
Compare
Choose a tag to compare

Version 4.0.2, Sep 21st 2021

➡️➡️➡️ DOWNLOAD HERE ⬅️⬅️⬅️

Major new features

  • Full support of Manticore Columnar Library. Previously Manticore Columnar Library was supported only for plain indexes. Now it's supported:

    • in real-time indexes for INSERT, REPLACE, DELETE, OPTIMIZE
    • in replication
    • in ALTER
    • in indextool --check
  • Automatic indexes compaction (#478). Finally you don't have to call OPTIMIZE manually or via a crontask or other kind of automation. Manticore now does it on your own. You can set default compaction threshold via optimize_cutoff.

  • Chunk snapshots and locks system revamp. These changes may be invisible from outside at first glance, but they improve the behaviour of many things happening in real-time indexes significantly. In a nutshell, previously most Manticore data manipulation operations relied on locks heavily, now we use disk chunk snapshots instead.

    • read operations (e.g. SELECTs, replication) are performed with snapshots
    • operations that just change internal index structure without modifying schema/documents (e.g. merging RAM segments, saving disk chunks, merging disk chunks) are performed with read-only snapshots and replace the existing chunks in the end
    • UPDATEs and DELETEs are performed against existing chunks, but for the case of merging that may be happening the writes are collected and are then applied against the new chunks
    • UPDATEs acquire an exclusive lock sequentially for every chunk. Merges acquire a shared lock when entering the stage of collecting attributes from the chunk. So at the same time only one (merge or update) operation has access to attributes of the chunk.
    • when merging gets to the phase it needs attributes it sets a special flag. When UPDATE finishes it checks the flag and if it's set, the whole update is stored in a special collection. Finally when the merge finishes, it applies the updates set to the newborn disk chunk
    • ALTER runs via an exclusive lock
    • replication runs as a usual read operation, but in addition saves the attributes before SST and forbids updates during the SST
  • ALTER can add/remove a full-text field. Previously it could only add/remove an attribute.

  • 🔬 Experimental: pseudo sharding for full-scan queries - allows to parallelize any non-full-text search query. Instead of preparing shards manually you can now just enable new option searchd.pseudo_sharding and expect up to CPU cores lower response time for non-full-text search queries. Note it can easily occupy all existing CPU cores, so if you care not only about latency, but throughput too - use it with caution.

Minor changes

  • Linux Mint and Ubuntu Hirsute Hippo are supported via APT repository
  • faster update by id via HTTP in big indexes in some cases (depends on the ids distribution)

3.6.0

time curl -X POST -d '{"update":{"index":"idx","id":4611686018427387905,"doc":{"mode":0}}}' -H "Content-Type: application/x-ndjson" http://127.0.0.1:6358/json/bulk

real    0m43.783s
user    0m0.008s
sys     0m0.007s

4.0.2

time curl -X POST -d '{"update":{"index":"idx","id":4611686018427387905,"doc":{"mode":0}}}' -H "Content-Type: application/x-ndjson" http://127.0.0.1:6358/json/bulk

real    0m0.006s
user    0m0.004s
sys     0m0.001s

Breaking changes

  • the new version can read older indexes, but the older versions can't read Manticore 4's indexes
  • removed implicit sorting by id. Sort explicitly if required
  • charset_table's default value changes from 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F, U+401->U+451, U+451 to non_cjk
  • OPTIMIZE happens automatically. If you don't need it make sure to set auto_optimize=0 in section searchd in the configuration file
  • #616 ondisk_attrs_default were deprecated, now they are removed
  • for contributors: we now use Clang compiler for Linux builds as according to our tests it can build a faster Manticore Search and Manticore Columnar Library
  • if max_matches is not specified in a search query it gets updated implicitly with the lowest needed value for the sake of performance of the new columnar storage. It can affect metric total in SHOW META, but not total_found which is the actual number of found documents.

Migration from Manticore 3

  • make sure you a stop Manticore 3 cleanly:
    • no binlog files should be in /var/lib/manticore/binlog/ (only binlog.meta should be in the directory)
    • otherwise the indexes Manticore 4 can't reply binlogs for won't be run
  • the new version can read older indexes, but the older versions can't read Manticore 4's indexes, so make sure you make a backup if you want to be able to rollback the new version easily
  • if you run a replication cluster make sure you:
    • stop all your nodes first cleanly
    • and then start the node which was stopped last with --new-cluster (run tool manticore_new_cluster in Linux).
    • read about restarting a cluster for more details

Bugfixes

  • Lots of replication issues have been fixed:
    • 696f8649 - fixed crash during SST on joiner with active index; added sha1 verify at joiner node at writing file chunks to speed up index loading; added rotation of changed index files at joiner node on index load; added removal of index files at joiner node when active index gets replaced by a new index from donor node; added replication log points at donor node for sending files and chunks
    • b296c55a - crash on JOIN CLUSTER in case the address is incorrect
    • 418bf880 - while initial replication of a large index the joining node could fail with ERROR 1064 (42000): invalid GTID, (null), the donor could become unresponsive while another node was joining
    • 6fd350d2 - hash could be calculated wrong for a big index which could result in replication failure
    • #615 - replication failed on cluster restart
  • #574 - indextool --help doesn't display parameter --rotate
  • #578 - searchd high CPU usage while idle after ca. a day
  • #587 - flush .meta immediately
  • #617 - manticore.json gets emptied
  • #618 - searchd --stopwait fails under root. It also fixes systemctl behaviour (previously it was showing failure for ExecStop and didn't wait long enough for searchd to stop properly)
  • #619 - INSERT/REPLACE/DELETE vs SHOW STATUS. command_insert, command_replace and others were showing wrong metrics
  • #620 - charset_table for a plain index had a wrong default value
  • [8f75368](https://github.com/manti...
Read more

Manticore Search 3.6.0

04 May 06:50
Compare
Choose a tag to compare

Version 3.6.0, May 3rd 2021

Maintenance release before Manticore 4

➡️➡️➡️ DOWNLOAD HERE ⬅️⬅️⬅️

Major new features

  • Support for Manticore Columnar Library for plain indexes. New setting columnar_attrs for plain indexes
  • Support for Ukrainian Lemmatizer
  • Fully revised histograms. When building an index Manticore also builds histograms for each field in it, which it then uses for faster filtering. In 3.6.0 the algorithm was fully revised and you can get a higher performance if you have a lot of data and do a lot of filtering.

Minor changes

Optimizations

  • faster JSON parsing, our tests show 3-4% lower latency on queries like WHERE json.a = 1
  • non-documented command DEBUG SPLIT as a prerequisite for automatic sharding/rebalancing

Bugfixes

  • #584 - inaccurate and unstable FACET results
  • #506 - Strange behavior when using MATCH: those who suffer from this issue need to rebuild the index as the problem was on the phase of building an index
  • #387 - intermittent core dump when running query with SNIPPET() function
  • Stack optimizations useful for processing complex queries:
    • #469 - SELECT results in CRASH DUMP
    • e8420cc7 - stack size detection for filter trees
  • #461 - Update using the IN condition does not take effect correctly
  • #464 - SHOW STATUS immediately after CALL PQ returns - #481 - Fixed static binary build
  • #502 - bug in multi-queries
  • #514 - Unable to use unusual names for columns when use 'create table'
  • d1dbe771 - daemon crash on replay binlog with update of string attribute; set binlog version to 10
  • 775d0555 - fixed expression stack frame detection runtime (test 207)
  • 4795dc49 - percolate index filter and tags were empty for empty stored query (test 369)
  • c3f0bf4d - breaks of replication SST flow at network with long latency and high error rate (different data centers replication); updated replication command version to 1.03
  • ba2d6619 - joiner lock cluster on write operations after join into cluster (test 385)
  • de4dcb9f - wildcards matching with exact modifier (test 321)
  • 6524fc6a - docid checkpoints vs docstore
  • f4ab83c2 - Inconsistent indexer behavior when parsing invalid xml
  • 7b727e22 - Stored percolate query with NOTNEAR runs forever (test 349)
  • 812dab74 - wrong weight for phrase starting with wildcard
  • 1771afc6 - percolate query with wildcards generate terms without payload on matching causes interleaved hits and breaks matching (test 417)
  • aa0d8c2b - fixed calculation of 'total' in case of parallelized query
  • 18d81b3c - crash in Windows with multiple concurrent sessions at daemon
  • 84432f23 - some index settings could not be replicated
  • 93411fe6 - On high rate of adding new events netloop sometimes freeze because of atomic 'kick' event being processed once for several events a time and loosing actual actions from them
    status of the query, not the server status
  • d805fc12 - New flushed disk chunk might be lost on commit
  • 63cbf008 - inaccurate 'net_read' in profiler
  • f5379bb2 - Percolate issue with arabic (right to left texts)
  • 49eeb420 - id not picked correctly on duplicate column name
  • refactoring of network events to fix a crash in rare cases
  • e8420cc7 fix in indextool --dumpheader
  • ff716353 - TRUNCATE WITH RECONFIGURE worked wrong with stored fields

Breaking changes:

  • New binlog format: you need to make a clean stop of Manticore before upgrading
  • Index format slightly changes: the new version can read you existing indexes fine, but if you decide to downgrade from 3.6.0 to an older version the newer indexes will be unreadable
  • Replication format change: don't replicate from an older version to 3.6.0 and vice versa, switch to the new version on all your nodes at once
  • reverse_scan is deprecated. Make sure you don't use this option in your queries since 3.6.0 since they will fail otherwise
  • As of this release we don't provide builds for RHEL6, Debian Jessie and Ubuntu Trusty any more. If it's mission critical for you to have them supported contact us

Deprecations

  • No more implicit sorting by id. If you rely on it make sure to update your queries accordingly
  • Search option reverse_scan has been deprecated

Manticore Search 3.5.4

11 Dec 20:18
Compare
Choose a tag to compare

Manticore Search 3.5.2

01 Oct 13:27
Compare
Choose a tag to compare

Manticore Search 3.5.0

22 Jul 20:17
Compare
Choose a tag to compare

Manticore Search 3.4.2

10 Apr 08:59
Compare
Choose a tag to compare
Release 3.4.0

Manticore Search 3.4.0

26 Mar 09:44
Compare
Choose a tag to compare
Release 3.4.0

Release 3.3.0

04 Feb 13:20
Compare
Choose a tag to compare
(cherry picked from commit 3885c2976171f3390a7cace2c062997c20f42279)