Content search (Grep)

VTGrep supports several of the content-related features in YARA. Some example valid content search queries are:

Strings within double quotes are handled as UTF-8 strings escaped as Python string literals (e.g., "\n" is a newline, """ is a double quote, "\" is a single backslash, …). Because of this escaping, Windows paths are particularly tricky so either escape the backlashes accordingly or write the query in hexadecimal. As an example, "C:\Windows\System32" can either be content:"C:\Windows\System32" or content:{43 3a 5c 57 69 6e 64 6f 77 73 5c 53 79 73 74 65 6d 33 32}.

You can see a snippet of the file being matched if you hover over the eye icon on the left hand-side.

VTGrep Matches

Furthermore, VT-Grep not only matches the raw content of the file, but it also searches over uncompressed and unpacked files plus VBA Code streams.

Matches in subfiles are still reported with the hash of their respective parent files, so you'll need to extract the subfiles manually if you want to find the match (other than see it in the Match context hover-over box).

The following caveats apply to content search queries:

  • Results cannot be sorted.
  • Files submitted in the last 24 to 48h are not yet indexed. Consider using Hunting if you need the freshest results.
  • Can only be combined with the following search modifiers:
    • size
    • type
    • fs
    • ls
    • imphash
    • positives
    • tag
    • submissions
  • content and other search modifiers cannot be combined with an OR operator. However, combining other modifiers between them with an OR is OK. See examples below.

VTGrep leverages rare substrings to quickly narrow down content searches and find matches among petabytes of data. Conversely, extremely common substrings are impractical to index. The content query is cut into substrings trying to avoid such extremely common substrings but this is sometimes not possible (e.g., at the extremes of the query or when the extremely common substring is too long) and may affect the results. Content searches with unavoidable extremely common substrings either have a) partial matches (where a small percentage of the characters may not match the query) or b) empty results. A warning will be shown in either case:

  1. No more results found due to unselective query.
  2. No more results found due to unselective query. Try avoiding extremely common substrings: "http", "www."

In case 1, VT-Grep couldn't zero in on any rare query substring and timed out. This can happen even if there are no matches for the whole query. A simple retry or extending the query at the extremes may work but it's normally better just to search for something rarer.

Fixing case 2 is simpler because you often can rewrite the query to avoid the given extremely common substrings (like "http" or "www" above). Splitting or adding content at the side of the extremely common substrings is often enough for VTGrep to be able to avoid the popular ngrams.

Examples:

With extremely common substrings (bad)Avoids extremely common substrings (better)
content:"google.com"content:"photos.google.co"
content:{00 00 00 00}content:{CAFE 00 00 00 00 CAFE}
content:{CAFE 00 00 00 00 00 CAFE}content:{CAFE 00 00 ?? 00 00 CAFE}
content:"http://www.rare.com"content:"ww.rare.c"