indexer now supports DBMode=blob, which is now the
fastest DBMode for both indexing and searching.
libmnogosearch.so now can be installed as MySQL
fulltext parser. See "MySQL fulltext parser plugin" manual
section for details.
It's now possible to use variables in an external
parser command line. This example passes URL and TAG values in the
parser command line:
Mime "text/pdf" "text/plain" "/path/to/parser -u ${URL} -t ${TAG}"
See the list of all available variables in
"indexer -v6" output, in the lines beginning with "Response." prefix.
An optional fourth parameter for Mime command was
added, to post extra information to an external parser, together with
document content. For example:
Mime mytype "text/plain" "cat" "${URL} # ${HTTP.Content}"
"SQLWordForms sql" search.htm command was added. It
intorduces a new fuzzy search method allowing to load synonyms or word
forms from the SQL database. It can be used as a faster replacement for
Synonym and Ispell fuzzy search methods.
"indexer -Edumpspell" command was added to dump spell
data in a format suitable for loading into SQL database for further use
with "SQLWordForms".
A new "when" optional parameter was added into
"Section" indexer.conf command. It supports three values:
"afterheaders", "afterguesser" and "afterparser", and allows to create
user defined sections at different moments of document processing,
which for example makes it possible to replace HTTP headers sent by a
remote server.
"Limit" indexer.conf command and "fl" search parameter
were added, introducing fast limits support, which improves searching
through a part of the database, especially for DBMode=blob.
A new command "ReplaceVar name value" was added.
Synonym files now understand "Mode: reverse" and
"Mode: oneway" commands to change word expansion behaviour between "all
words exapand to all words on the same line" and "only the leftmost
word expands to other words on the same line".
"NumWordFactor num" search.htm command was added, where
num is between 0 and 255. It specifies how much the number of found
words in a document affects its final score. 255 means maxinum effect,
0 means ignore the count of found words.
"MinCoordFactor num" search.htm command was added. Use
this command to give more score for those documents having the first
found word closer to the beginning of the document. Use with a number
between 0 and 255. The default value is 0, which means no effect.
"URLDataThreshold num" search.htm command was added. It
allows to improve search performance with DBMode=blob for the queries
returning a small number of results (not more than several hundreds).
If search returns less than "num" documents, full URL information is
not loaded from the "bdict" table and the "url" table is used instead.
The default value is 0, which means always read URL data from the
"bdict" table. Find the number which is good for your installation
experimentally.
"UseNumericOperators yes/no" search.htm command was
added. When set to "yes", the "<" and ">" signs are treated as
numeric comparison operators, e.g. "<100" finds all documents which
have numbers less than 100 in their body or title or other sections
according to the "wf" settings. Default value is "no", i.e. numeric
operators are ignored.
New character set name aliases were added: "armscii8",
"koi8r", "koi8u" and "ujis", for MySQL names compatibility.
Fixed that XML character set declaration was not
processed, e.g.: <?xml version="1.0" encoding="utf-8"?>
Fixed that query tracking didn't work with Oracle, DB2,
Firebird, Mimer, Sybase (Bug#742).
Fixed that "crossdict" table wasn't created for Oracle,
DB2, Mimer and Interbase/Firebird (Bug#748).
Fixed that $(PerSite) value was calculated incorrectly
with several DBAddr search.htm commands.
Fixed that template operators inside a HTML comment were
interpreted instead of being printed just as a comment part (Bug#708,
part2).
Fixed that <!EREG> didn't work with "<" and
">" characters inside REPLACE attribute (Bug#1010)
Fixed that <META NAME="ROBOTS" CONTENT="NOINDEX">
didn't prevent indexing of the url.file, url.path, url.site, url.proto
sections (Bug#679).
indexer now chooses character set value in this order:
"Content-Type" HTTP header, "Content-Type" META tag, RemoteCharset
value from indexer.conf. Previously RemoteCharset was incorrectly
selected in the first instance (bug#575).
Fixed that "Sun, 6 Nov 1994 08:49:37 GMT" date format
was not recognized when indexing a NEWS server (Bug#694).
Syntax error in PostgreSQL trigger was fixed (Bug#784).
Build error on IRIX using native CC compiler was fixed
(bug#778).
Bug#760 "Empty title and body in search.cgi - Can't get
BLOB from oracle" was fixed.
Fixed that "mconv" incorrectly exited with "An output
error" message in some cases.
Fixed that search.cgi could crash when running with
DBMode=blob in some cases. Thanks to Goga for proposing the fix.
Fixed that the "regexp" keyword didn't work as an alias
for "regex" in some indexer.conf commands.