Why is a search using the File Content field returning different results than expected?

Searching for Documents using the File Content field performs a search of the file contents and properties of several kinds of files, including Microsoft Office, PDF, HTML, and zip.

The results from a full-text search depend on how the underlying Microsoft SQL Server database parses the “words” in a file and performs pattern matching.  Specifically, search results are driven by the following behaviors:

  • SQL Server treats the hyphen (“-”) as a word breaker.

    • Example: It doesn’t treat “DCR-08474” as a single string, but rather as two separate words: “DCR” and “08474”.

  • SQL Server ignores quotes.

  • SQL Server ignores simple letters and numbers as well as “noise” words like like "the", "an", and “and.”

  • SQL Server matches each search word to the beginning of an indexed word.

    • Example: Searching for “part” will match “part”, “parts” and “particular”, but will not match “apart” or “apartment”.

  • SQL Server matches a search phrase with multiple words to an indexed phrase with matching words in the same order uninterrupted.

    • Example: Searching for “qual manu” will match “quality manual” but not ”quality process manual”.

  • SQL Server ignores leading zeroes when matching a number.

    • Example: Searching for “000847” will match “08474” and “847”.

 The “Additional Information about Full-Text Searches” section of the Document Control online help shows several examples of what does and does not match various File Content search criteria.

 This is unfortunately one of the many tradeoffs of SQL Server’s full-text capabilities.  While they’re powerful for quickly finding possible matches, they are designed to err on the side of “false positives” instead of mistakenly leaving out possibilities, based on the assumption that someone can verify the results if it finds something unexpected, but has no way to verify whether it left something out.

Because of that, Grand Avenue encourages customers to use the File Content search as a convenience, not as rigorous part of a process, and rely instead on relationships, naming conventions and custom fields to organize data.

Copyright © 2022, Grand Avenue Software, Inc. All rights reserved.