MBot Posted September 9, 2024 at 01:48 PM Share Posted September 9, 2024 at 01:48 PM prevent re-compiling due to file changes caused by build.rs build.rs updates src/modules/add_modules.rs and src/modules/modules.rs every time it runs, but the same time cargo:rerun-if-changed=src/modules tells cargo to compile the crate again if a change occurs in src/modules. This mean that the build.rs script is invalidating the compiler's all the time. sort modules by name in auto-generated files. If modules are not explicitly sorted they will vary from one platform to the other. upgrade dependencies reduce the number of dependencies by disabling unused features from wasmtime With this change all default wasmtime features are disabled, and the ones that are actually used are explicitly enabled. This required changes in the implementation of shift operations, which were using the WAMS select instruction. Using this instruction requires enabling references and garbage collection, which is an overkill. re-organize imports fix typos make CompileError::join_with_or visible only within the crate. fix clippy warnings remove trailing spaces fix incorrect comment fix clippy warning due to WasmExportedFn4 not used. This type is necessary if we ever declare a WASM function with 4 arguments, but so far no fix clippy warnings and typos remove unused vars field in EmitContext change error numbers upgrade wasmtime and walrus Also bump the MRSV to 1.76.0 because wasmtime 22.0.0 requires it. remove unused function. rename yara-x-parser-ng to yara-x-parser assign code "E001" to syntax errors add newline after warnings in test cases upgrade dependencies fix comment add assertion show warning when build.rs can not re-generate src/modules/modules.rs Instead of panicking when build.rs is unable to re-generate the file, show a warning and return without error. fix typos use #![deny(missing_docs)] to force documentation of public APIs. fix minor issues in comments exclude test files from being published to crates.io The limit of 10MB imposed by crates.io doesn't allow to upload the test cases. remove yara-x dependency from yara-x-proto. remove redundant function check_type2 add some metadata to Cargo.toml upgrade wasmtime crate to version 19.0.0 remove benchmarks remove unused import add fmt="t" modifier to timestamp and export_timestamp fields upgrade regex-syntax and regex-automata crates. This solves an issue that prevented using unicode characters in regular expressions. See: rust-lang/regex@04f5d7b upgrade wasmtime, wasmprinter, yara and base64 split regexp_patterns_3 in two functions. fix typos remove unused method fix typos remove trailing space fix typos remove println statement rename macho module functions from x_present to has_x. Also implemented build_version_command in a more idiomatic way. improve comment fix minor error in feature description remove some debug messages. fix typos add derive feature to serde dependency for build script fix compilation upgrade goldenfile and chrono crates individual descriptions for each crate rename top-level directories. In the future we want to add a go directory that contains the Go bindings for YARA-X. Having a yara-x-go directory is not ideal, because the import path for that Go module would be github.com/VirusTotal/yara-x/yara-x-go. Instead we want the more natural github.com/VirusTotal/yara-x/go. However, adding a go directory makes the current naming incoherent, but with this change everything fits well again. Documentation some improvements to the documentation fix typo. fix typo. add comment about multiple nested Authenticode signatures add example in yara_x::mods New Features implement cuckoo module. The cuckoo module parses behaviour reports from the Cuckoo Sandbox https://cuckoosandbox.org/ The use of this module is currently discouraged. It is here for backward compatiblity with YARA, but it won't be actively mantained or improved as the Cuckoo Sandbox seems to be abandoned since 2017. add support for the --module-data option Adds support for the --module-data option. This required some changes in the API, particularly the introduction of two new methods to Scanner. add feature for compiling .proto files with protoc By default, .proto files are compiled using the pure-Rust compiler implemented by the protobuf-rust, but the protoc feature allows switching to protoc, the official Protobuf compiler. This required some changes in .proto files, because protoc is stricter than the pure-Rust protobuf compiler. raise warning when patterns are very common byte repetitions When a pattern is a repetition of bytes where all bytes are 0x00, 0x90 or 0xff, raise a warning. Such patterns are very common, and they may slow down the scan. add warning for potentially slow loops When a for loop iterates over a range that goes from 0 (or some other constant) to filesize (or some other expression that uses filesize) it may be very slow for large files. A loop that iterates over every byte in the file is not a very good idea. add a new parallel-compilation feature that enables/disables WASM parallel compilation This feature is enabled by default, because it greatly reduces compilation time when compiling a large number of rules. However, the new feature allows disabling parallel compilation if needed. implement ExactSizeIterator trait for ModuleOutputs. add modules subcommand to display modules available. Add the ability to print available modules via yr debug modules. I experimented with making it take a verbose flag that would also print the protobuf definition of the module, which would make it easier for users to find out what is "available" to use from the module in their rules, but it was a lot to print out and would require users to translate from the protobuf types into YARA types, which is probably more than they want to do. Add debug-cmd feature and put debug command behind it. Add a debug-cmd feature that enables the debug command. Prior to this the command was always available but just hidden. This makes it a compile-time feature that can be useful if you're into doing developer type things like debugging the syntax trees. implement DFS traversal or IR tree. Use the DFS traversal for implementing the Debug trait for IR and add test cases. better representation of flags in YAML output. When representing module outputs in YAML format, fields that are flags now have a comment indicating which of the flags are set. For instance, pe.characteristics was previously represented as: characteristics: 8226 And now is represented as: characteristics: 0x2022 # EXECUTABLE_IMAGE | LARGE_ADDRESS_AWARE | DLL implement tag printing and filtering This commit adds the -g and -t arguments to the scan command. The -g arguments prints the tags associated to the matching rules, while the -t arguments allows filtering the results by tag name. accept comparison between boolean expression and integer constant Expressions like pe.is_signed == 1 are now accepted. This kind of expressions are valid in legacy YARA, and very common. So we are now accepting them, but raising a warning. improve error messages for unclosed comments, literal strings and regexps raise warning when a rule is always true or false. add error/warning codes and allow disabling warnings by code. This is the detailed list of changes: Assign a code to each error and warning Add a Compiler::switch_warning API that allows enabling/disabling specific warnings. Add a Compiler::switch_all_warnings API that allows enabling/disabling all warnings. Add a command-line option --disable-warning for disabling specific warnings, or all warnings. Move warnings from parser to compiler. implement mach-o export trie parsing and export hashing function Implements mach-o trie export parsing as well as the export_hash() function mentioned previously in #93. implement API for treating slow patterns as error instead of warnings Slow patterns are those that have atoms with less than 2 bytes. Such patterns can slow down scanning and the compiler raises a warning when it finds one of those patterns. The new APIs allow telling the compiler to raise an error instead of a warning. implement theMetadata.into_json API This function returns the metadata associated to a rule as a serde_json::Value. implement dylib and entitlement hashing for macho Implemented a Mach-O similarity function dylib_hash() and entitlement_hash() which is similar to imphash or any other attribute hashing mechanism. This will hash dylib entries as defined in: https://github.com/g-les/macho_similarity/blob/main/implementation.md#dylib-hashing. expose rule metadata in the Rust API implement array entitlement parsing for complete entitlement parsing add environment variable that controls whether build.rs re-generates modules.rs and add_modules.rs. Now the YRX_REGENERATE_MODULES_RS environment variable can be used for disabling the re-generation of modules.rs and add_modules.rs. If the environment variable is present, and its value is either "false", "no", or "0", the files won't be re-generated. In any other case they will be re-generated. the relaxed_re_syntax option now handles invalid escape sequences inside character classes generalize the relaxed_escape_sequences option to relaxed_re_syntax With this change, the new option not only controls whether invalid escape sequences should be accepted, it also relaxes other cases in which the YARA-X is more strict with regular expressions than YARA. This is the case with characters that have a special meaning in a regular expression, but that YARA treats as literal if they appear in a context where this special meaning doesn't make sense. For instance, YARA interprets the curly braces in /foo{}bar/ as literal characters, while in /foo{1,2}bar/ they are interpreted as part of the repetition operator {1,2}. YARA-X doesn't accept /foo{}bar/ as a valid regular expression, unless you enable the relaxed_re_syntax option. add warnings() method to Compiler. This new method exposes the warnings issued by the compiler. implement math.to_number Required some changes in the grammar that wasn't accepting boolean expressions as function arguments. implement an option for allowing invalid escape sequences in regular expressions Historically, YARA has accepted any character that is preceded by a backslash in a regular expression, even if the sequence is not a valid one. For instance, \n, \t and \w are valid escape sequences in a regexp, but \N, \T and \j are not. However, YARA accepts all of these sequences. The valid escape sequences are interpreted as their special meaning (\n is a new-line, \w is a word character, etc.), while invalid escape sequences are interpreted simply as the character that appears after the backslash. So, \N becomes N, and \j becomes j. This change introduces the Compiler::relaxed_regexp_escape_sequences API, which allows to turn on an option that makes YARA-X to behave in the same way than YARA with respect to invalid escape sequences in regular expressions. This option is turned off by default. Also, the option --relaxed-escape-sequences is added to the CLI. implement constant folding for subtraction and multiplication add support for md2 hashes in Authenticode signatures and certificates implement Authenticode parsing and verification without relying on OpenSSL Until now we were using the authenticode-parser crate for Authenticode parsing and verification. This is simply a Rust wrapper around https://github.com/avast/authenticode-parser which is written in C and uses OpenSSL under the hood. Depending on OpenSSL makes building and deploying YARA-X harder, specially when you want to integrate YARA-X in other systems. With this change all the Authenticode parsing and validation is re-written in Rust. add note for clarifying regexp error add note for clarifying regexp error add explicative notes for some regexp errors interpret the sequences \< and \> in regexp as literals instead of word boundaries The regex_syntax crate interprets \< and \> as start of word and end of word boundaries respectively. However, in YARA this has always been interpreted as the escaped form of < and > literals. In order to keep backward compatibility, this commits introduce the Transformer type, which receives the AST produced by regex_syntax and replace the word boundary tokens by literal ones. issue warning when a rule is ignored because it indirectly depends on some unsupported module implement modulerefs in dotnet module. implement start-of-word and end-of-word matches in regular expressions. With this change \b{start} and \< in a regex are zero-with matches that match only at the start of a word, while \b{end} and \> match only at the end of a word. implement the Compiler::add_unsupported_module API. This API allows telling the compiler that some YARA module should be ignored. Any import statement for an unsupported module is ignored without errors (only a warning is issue) and any rule that uses the module is also ignored, while maintaining rules that don't depend on it. implement caching in magic module. implement the magic module. Mostly a copy of #12. After too many changes since the PR, it's easier to commit the code again than resolving the merge conflicts. parsing code signature data + more load commands for mach-o feat: implement code_signature_data parsing for mach-o feat: implement entitlement parsing for Mach-O from code_signature_data clippy fixes update tests for new tests feat: implement signing certificate parsing for mach-o feat: implement dyld_info load command parsing for mach-o feat: implement lc_symtab parsing for mach-o feat: implement lc_symtab table entries parsing for mach-o feat: implement LC_UUID parsing for mach-o feat: implement LC_BUILD_VERSION parsing for mach-o feat: implement LC_VERSION_MIN_* load command parsing for Mach-O feat: implement entitlement_present function for Mach-O fix: mach-o deps and comments fix: use device_type enum implement API for setting the output data for a YARA module control the name of the feature associated to each YARA module from the protobuf defining the module Now each YARA module can define the name of the cargo feature that enables/disables the module. This is done using the new cargo_feature option, like in: option (yara.module_options) = { name : "text" root_message: "text.Text" rust_module: "text" cargo_feature: "text-module" }; The cargo_feature option is not required. If missing the module will always be enabled. add support for the YRX_MOD_PROTO_DIRS environment variable With this environment variable you can specify additional directories where to find .proto files with module definitions. This variable can contain a single directory path or a comma-separated list of directory path. The directories will be walked recursively looking for .proto files containing module definitions. Notice however that these modules are data-only, they can't contain functions because for adding functions to a YARA module the .proto file must be accompanied by Rust source files that implement the functions. advance in the implementation of the C and Golang APIs implement ExactSizeIteratortrait for Patterns Bug Fixes setns related issues chg:[lib] parallel-compilation removed from wasmtime dependency So that it is possible for a user to disable wasmtime parallel-compilation it is necessary to remove it from wasmtime dependency definition. It is not such an issue because default feature includes it. chg:[lib] unshare(CLONE_FS) in heartbeat thread issue while creating statically linked binary that depends on musl instead of glibc. compilation error when fast-regexp feature is disabled. implement DFS traversal for loops and lookup expressions. avoid OOM errors due to corrupted files Corrupted import tables in PE files was causing high memory consumption because the thunks iterator was not limited. prevent panic with some corrupted PE files. avoid panic while scanning some corrupted PE files avoid crashes when scanning malformed files Malformed PE files were making YARA-X crash due to unknown digest algorithms. error when a module is ignored and some rule depends on that module with two-levels of indirection or more. For instance if rule_1 depends on ignored some ignored module, rule_2 depends on rule_1, and rule_3 depends on rule_2. This was causing an error when it should raise a warning. some issues with error reports some minor fixes in error reports revert accidental change issue with error spans that don't end at a UTF-8 char boundary. panic due to wrong debug assertion broken test cases make the order in which matching rules are reported deterministic We were using a HashMap for storing the matching rules, which resulted in matches being returned in a non-deterministic order because HashMap doesn't guarantee any iteration order. Now we use an IndexMap, which maintains the insertion order. panic in macho module panic in macho module when parsing corrupted files panic while parsing ULEB128 numbers that are too large. allow empty DLL names in pe module For compatibility with YARA empty DLL names are now allowed. increase the MAX_ROWS_PER_TABLE limit in dotnet module. File 67984703c89ee30cadaa8d7dd5c1a0e9f7f5d096ab0d6d03fdb01115780fa7c3 has more than 10.000 rows. the obsolete OID 1.3.14.3.2.29 may be used for identifying the authenticode hash algorithm. bug in AtomsQuality comparison. issue in pattern extraction causing too many atoms. Certain patterns were producing more atoms than expected. issues while parsing PE files rva_to_offset now returns the rva as file offset when the RVA is lower than the RVA of all sections, but only in the case where there's at least one section. resource ID strings are limited to 1000 characters at most use MAX_ATOMS_PER_REGEXP constant instead of literal number. prevent rustfmt error rustfmt was failing with: error[internal]: left behind trailing whitespace. stop parsing import descriptor when finding one where name is zero. This is the behaviour exhibited by YARA. rename functioninvoke_mod to invoke in fuzzers don't allow empty DLL names when optional header magic is invalid, assume the file is a 32-bits PE That's what YARA assumes, and it looks like most of the files that have a 0 as the optional header magic are actually 32-bit PE files. bug in try_match_literal_bck try_match_literal_bck was not matching wide literals when the number of bytes from the start of the file to the end of the literal found in the file was odd. wrong method name in test case allow empty DLL names, as YARA does. test case that only works with constant folding is enabled ignore resource entries with offset 0 This is for compatibility with YARA, that does the same. handle integer overflows in constant folding verification of Microsoft countersignatures Verification of Microsoft countersignatures now takes into account signer info digest and verifies it against digest of content info (timestamp info). The signature of the signed attributes was verified, the digest of countersigned signature was verified but what was missing is the verification that the signed attributes are actually signing the embedded timestamp information. This would allow anyone to take a valid countersignature, replace content info of the signed data with their own timestamp info, recalculate hash against the countersigned signature and it would be verified as OK. issue while parsing certificates in some PE files don't abort PE parsing when opt_hdr.magic is incorrect. When a file is corrupt opt_hdr.magic may have an incorrect value that doesn't correspond to a 32-bit nor a 64-bit file. In such cases, instead of making the parsing fail, try to continue as it was a 32-bit file even if some values are incorrect. This is what YARA does, and this makes YARA-X closer to how YARA works in these edge cases. support for more digest algorithms while computing Authenticode hash. handle errors when oid_to_object_identifier fails out of memory issue due to accumulation of items in ScanContext ScanContext contains hash maps that tracks the matches found for each pattern. The values in these maps are vectors that can become very large. For performance reasons, instead of completely deallocating those vectors after each scan, we were clearing the vector but retaining its capacity, so that they can be reused in later scans. The problem with that is that these vectors are not freed while the scanner is in use, and this can have a large impact on the process' memory footprint, causing OOM issues. This PR adds a PatternMatches type that encapsulates all the logic for tracking pattern matches, including a more sophisticated approach that tries to reuse the vectors as much as possible, but frees them when they grow too much. issues while parsing array type definitions in dotnet module. recognize more digest algorithms while parsing Authenticode signatures Some files (like 1e435fea9ced78bd31ae8320a894df290cdf8a262ba1b50c9b116caa26983145) identify the digest signature with OID 1.2.840.113549.1.1.5, which corresponds to sha1WithRSAEncryption. prevent slow resource parsing in corrupted files add more sanitation while parsing PE imports This prevents OOM errors while parsing corrupt files. integer overflow with corrupted files. integer overflow in corrupted files. avoid parsing too many corrupted resources. better sanitation while parsing corrupted resources. issue while building the crate by doc.rs When the crate is built by doc.rs the source code is put in a read-only file system and we can't modify source files. As we were doing with modules.rs. Now build.rs detects when the create is being built by doc.rs and doesn't update the modules.rs file. The add_modules.rs file is now stored in the repository for consistency and simplicity. copy yaml.proto and yara.proto into modules/protos The yara-x crate can't be published if it depends on files that are outside the crate, even if its in the same workspace. populate the field_offsets array in dotnet module. issue while parsing ELF files with incorrect size in section headers File 71adb87ee8ee76f32f54c70584ef14f67a4bc6f55df3f847c344726405927a1e has an incorrect size in the section header corresponding to the section that contains the string table. As we strictly applying that limit, the string table was truncated, even though it is perfectly valid in the file. YARA's parser was more relaxed and didn't take into account the size indicated in the section header, therefore was able to get strings from the string table without issues. With this change we follow the same strategy in YARA-X. syntax when quantifier in for loop is an arithmetic expression A condition like for 1+1 i in (0..10) : ( i <= 1 ) was failing with syntax error because 1+1 was not accepted as a quantifier. better handling of tab characters in error messages Tabs in source code were braking the layout of error messages. issue with math.entropy when input was an empty string In such cases math.entropy must return 0.0 instead of undefined. issue while parsing version information in corrupted PE. issue while rendering error message that contains snippets from multiple sources increase recursion limit while parsing protocol buffers. issue while parsing PE exports If OriginalFirstThunk is non-null, but the RVA can't be translated to a file offset, try using the FirstThunk instead. include the module name in the error shown for unknown modules more integrity checks for .NET files File dae837cd632a2436a7225026760edc008d6f36d6d1a713f2e8d8f384e064cabc was being reported as a .NET file, but it's not. invalid ascii bytes are encoded as hex when written into YAML when writing string into YAML, don't escape valid unicode characters. bug in class_to_masked_byte The function was returning a masked byte for class [:;|,], which can't be expressed as a masked byte. issue while parsing PE exports issues while parsing PE files issue while parsing PE version information There are some files like abeef1c9452835ba856c3bef32657076b7757c21e9f5c78f6336cfedc87d0b46, where the size of version information strings is the number of UTF-16 characters, without taking into account null terminator. issue with .NET resources Fields offset and length were being populated even if the resources didn't reside in the .NET file. issue with some regular expressions that were producing unicode classes. Unicode classes can appear even on regexps that were compiled without unicode support. This a well-known issue with the regex-syntax crate, and we should be able to handle it. See: rust-lang/regex#1088 bug in parse_dir_entries. regexps should be created without unicode support and allow invalid UTF-8. handle error returned by the magic library bug in caching logic in magic module issue with enums not defined in all cases There were cases in which the enums declared in a module's protobuf were not added as fields Struct and therefore were not found by YARA. PE version information not parsed in some corrupted files. The version information parsing was strictly following the PE specification, and failed with files where the StringFileInfo was corrupted. issue while matching regexps with FastVM When case-insensitive regexps contained jumps that didn't allow newline characters, there were cases in which false-positive matches were returned. bug while matching regexp that contain long jumps. accept invalid UTF-8 strings as PDB paths in pe module Files like d3ad9f9db7c89a229b1bdb8b49310ea51f71fa70be014b191b8a545dc8251cdb contain non-UTF-8 PDB paths. issue while matching wide regular expression with FastVM be more flexible about the number of PE sections parsed Before this change the parser was strict about the number of PE sections correctly parsed. If the parser was not able to parse as many sections as the number of sections indicated in the header, the parser failed. Now it tries to parse the number of sections indicated by the header, but as soon as it is able to parse at least one section, it succeeds. imphash and checksum must return undefined when the file is not PE. edge cases in FastVM::jump_fwd and FastVM::jump_bck When the literal after a jump starts with the newline character (0x0A), these functions were not working properly. issue while parsing PE version information. PE files can contain multiple resources of type RESOURCE_TYPE_VERSION, all of them must be parsed, not only the first one. edge case in which the compiler for fast regexps was emitting incorrect code. base64 patterns that were followed by padding not being matched Zm9vYg is the base64 encoding of foob without padding, but it can also appear with padding (Zm9vYg==). The padded form was not properly matched. PE files are now considered as such even if the optional header is corrupted add some additional validation while parsing PE files. bug while parsing .NET file. File 55e4ce3fe726043070ecd7de5a74b2459ea8bed19ef2a36ce7884b2ab0863047 was considered a valid .NET file, even if the IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR had size 0. This was a regression introduced in 981eb9c. issue in FastVM while matching wide regexps. \b not matching the underscore (_) character issue with FastVM not matching in certain cases. FastVM was not properly matching wide regexps in backward direction when the starting offset was an even number. issue while parsing PE resources. While reading PE resources we were relying on the size specified in the directory entry (IMAGE_DIRECTORY_ENTRY_RESOURCE). However, some files like 8970b83310e030f7b7191ed1329433bcf8a4f574e66f2d39bca5a4360c1dac96 may have an incorrect size in the directory entry and still have valid resources that are parsed by both YARA and pefile. issue in for .. in loops when iterating a list of expressions where one of them is undefined. test case test case issue introduced in dc2c9ea The previous commit fixed part of the issue with wide regexps, but didn't fix it correctly. issue with wide regexps causing false positives Strings matching wide regexps must contain interleaved zeroes, but we were not making sure that these bytes were actually zero when matching the regexp with PikeVM. compiling error when rules profiling is enabled compilation error when logging is enabled issues when number of rules is multiple of 8. show more explicative error when built-in module is not found issue with for .. in .. loop returning undefined value instead of false. issue with constant folding Constant folding for boolean AND and OR operations was broken. Boolean variables with known values were being folded, even if they were not constant. set a larger limit for NFA when compiling regular expressions issue introduced in d50dbfd The order in which enum fields must are inserted in structs must be preserved. The ' HashSettype doesn't preserve the order, butIndexSet` does it. enum types declared in .proto file that doesn't contain the root message Enum types that were declared in a .proto file different from the one that contains the root message were being ignored. remove the logic that parses signatures from Macho files. This code depended on the cryptographic-message-syntax crate, which introduces a ridiculous amount of indirect dependencies, including crates like tokio and reqwest. I don't know why a crate devoted to parsing cryptographic data structures depends on a HTTP client library. I've tried replacing cryptographic-message-syntax with cms, but it doesn't seems to be a straightforward change. We need to find a way to implement this feature using cms or some other crate that doesn't introduce insane dependencies. Other addd more test cases for IR trees add IR tests with and without constant folding. minor fixes in comments fix test cases that don't work without constant folding fix test case that doesn't work without constant folding fix broken test cases fix test cases that don't work without constant folding re-enable test cases re-enable test cases fix test cases fix test cases that won't work when constant-folding feature is disabled. add tests covering regexp repetition operators with spaces. add test case The test case makes sure that the dot (.) in a regexp doesn't match unicode characters by default. fix misleading test An existing test leaded into thinking that the dot (.) in regular expressions matched unicode characters by default, but that's not the case. The dot matches a single byte, unless unicode mode is explicitly enabled by using the (?u). In that case the dot matches an unicode character, no matter if it takes more than one byte. parallelize module tests add test case that makes sure that empty strings are matched by regexps fix test cases fix test broken after upgrading wasmprinter. add more regexp test cases re-write yara.proto as proto3 This is for compatibility with an internal version of yara.proto that we have at VirusTotal. Performance use iterators instead of allocating vectors in parse_exports use iterators instead of creating unnecessary vectors Refactor improve some associated functions in Structure. large refactoring of how compilation errors are handled The changes in this PR include: Redesign how the compilation errors are exposed in the Rust API. With this change users of the API have access to more details about the error, including the structure of error reports (report title, individual labels with their corresponding text and code spans, etc). These changes are backward-incompatible, though. The CLI now shows multiple compilation errors at a time, instead of showing only the first error found. All the error details exposed in the Rust API are also exposed in the C API and the Golang API. change arguments to ReportBuilder::create_report use associated functions to create unary operations in IR add associated functions to Expr for building expressions. migrate macros to syn v2 and do some improvements to errors and warnings. Errors and warnings now support more label types, like info, note and help. redesign the error recovery logic. check if source file is valid UTF-8 before trying to parse it start using the new parser start the migrating the compiler to the new parser check for valid base64 alphabet in the AST->IR phase check that $ is used inside a for .. of loop in the AST->IR phase. remove the get_pattern_index function. check for unknown patterns in the AST->IR phase. check for short patterns in the AST->IR phase. check for wrong xor range in the AST->IR phase. check for duplicate patterns in the AST->IR phase. check for unused patterns in the AST->IR phase. use BTreeSet instead of BTreeMap for tracking duplicate pattern modifiers check for duplicate tags in the AST->IR phase. check pattern modifiers in the AST->IR phase Instead of checking for duplicate pattern modifiers and invalid modifier combinations in the CST->AST phase, we now do it in the AST->IR phase. The idea is that the CST->AST phase should perform as many checks as possible. Most semantic validations will be done in the AST->IR phase. some code simplification in macho module. iterative DFS instead of recursion for Mach-O exports Refactor parser_export_node, changing the recursive implementation with a non-recursive one. simplify the uleb128 parser. make concat_seq easy to understand my moving part of its logic to optimize_seq more improvements to the atom extraction logic. allow calling non-global rules from global ones. make dylib_hash panic-free. implement emit_switch without recursion. This reduces the stack space consumption, avoiding stack overflow errors with rule that have a lot of patterns and use the x of them statement. remove unnecessary argument from emit_rule_condition expose all data structures defined by YARA modules. Also renamed invoke_mod and invoke_mod_dyn to invoke and invoke_dyn respectively. API refactoring Changes in the the Python, Golang and C APIs for accommodating the new option that allows invalid escape sequences in regular expressions. limit the number of warnings Also, move some of the warnings from the parser to the compiler. minor change in Expr::Const replace theariadne crate with annotate-snippets-rs for error reporting. The annotate-snippets-rs crate handles very long string in the source code better, and seems to be better maintained. use golden files for testing compiling errors and warnings. use our own regexp parser for building the HIR in get_regexp rename Scanner::timeout to Scanner::set_timeout rename add_unsupported_module to ignore_module. minor improvements make catch_undef more flexible Now catch_undef can be used with blocks that return any type (including blocks that return nothing), and the result returned by the catch_undef block in case of exception can be customized. change the logic used for specifying additional modules This removes the use of the YRX_PROTOC_CONFIG_FILE environment variable, in favor of two new environment variables YRX_EXTRA_PROTOS and YRX_EXTRA_PROTOS_BASE_PATH. The previous mechanism for specifying extra modules was more complex and didn't fit well with Bazel which was the primary use case for this mechanism. The new YRX_EXTRA_PROTOS accepts a list of space-separated file paths pointing to .proto files. This paths are either absolute or relative to the build.rs file. With YRX_EXTRA_PROTOS_BASE_PATH you can make the paths relative to some other point in the file system. merge CompileError and CompileErrorInfo in a single type. replace the YRX_MOD_PROTO_DIRS env variable with YRX_PROTOC_CONFIG_FILE Instead of having an environment variable that points to a directory with .proto files, now the environment variable points to a JSON file, that in turns contains the list of .proto files that should be included in the compilation. This way we can control which individual .proto files are included, in a directory that could contain other unrelated .proto files that don't need to be compiled. Style fix clippy warnings remove trailing spaces. remove trailing spaces fix clippy warning minor code simplification re-format comments prefix unused field with underscore fix clippy warning remove trailing spaces add trailing empty line make Compiler::c_alternation_literal return Result like most of the c_* functions improve legibility in macho module by reducing indentation levels. remove trailing whitespaces minor fixes. remove trailing spaces minor style change fix some clippy warnings fix clippy warning rename variable. fix clippy warnings fix clippy warnings add num_rules() method to Rules remove trailing spaces fix clippy warning fix more clippy warnings fix clippy warning fix clippy warnings Commit Statistics 307 commits contributed to the release. 307 commits were understood as conventional. 25 unique issues were worked on: #100, #103, #104, #110, #113, #116, #131, #132, #140, #143, #148, #171, #179, #180, #183, #188, #189, #191, #78, #82, #83, #84, #86, #91, #93 Commit Details #100 Implement Authenticode parsing and verification without relying on OpenSSL (fa628cf) #103 Out of memory issue due to accumulation of items in ScanContext (7093a31) #104 Verification of Microsoft countersignatures (e50c163) #110 Fix some clippy warnings (b2b8ec3) #113 Fix typos (e587a82) #116 Implement array entitlement parsing for complete entitlement parsing (33eba84) #131 Implement API for treating slow patterns as error instead of warnings (8c96849) #132 Implement mach-o export trie parsing and export hashing function (48d799d) #140 Add error/warning codes and allow disabling warnings by code. (f1b91a9) #143 Upgrade dependencies (b253ac4) #148 Iterative DFS instead of recursion for Mach-O exports (a837b6c) #171 Implement tag printing and filtering (0ee4100) #179 Add modules subcommand to display modules available. (83eb9ea) #180 Large refactoring of how compilation errors are handled (7def597) #183 Add support for the --module-data option (57ebf6e) #188 Add feature for compiling .proto files with protoc (5b1c348) #189 Setns related issues (34f872e) #191 Implement cuckoo module. (af2c42f) #78 Parsing code signature data + more load commands for mach-o (8550f8c) #82 Merge CompileError and CompileErrorInfo in a single type. (96488cc) #83 Change the logic used for specifying additional modules (bd1707f) #84 Rename macho module functions from x_present to has_x. (41ef9d0) #86 Set a larger limit for NFA when compiling regular expressions (180f00f) #91 Implement the magic module. (5ec7aef) #93 Implement dylib and entitlement hashing for macho (eec301a) Uncategorized Fix clippy warnings (8adf813) Improve some associated functions in Structure. (a70e7d8) Prevent re-compiling due to file changes caused by build.rs (1b180b1) Sort modules by name in auto-generated files. (4a9e0cc) Remove trailing spaces. (43cf02f) Raise warning when patterns are very common byte repetitions (1db2190) Add warning for potentially slow loops (38ddfb1) Add a new parallel-compilation feature that enables/disables WASM parallel compilation (40e9d17) Upgrade dependencies (ace86e8) Reduce the number of dependencies by disabling unused features from wasmtime (cc07498) Issue while creating statically linked binary that depends on musl instead of glibc. (6953528) Remove trailing spaces (984b90f) Some improvements to the documentation (a4b1ff5) Implement ExactSizeIterator trait for ModuleOutputs. (33d6dcd) Compilation error when fast-regexp feature is disabled. (907eb55) Re-organize imports (c5727f7) Fix typo. (f26b481) Fix typos (c34873e) Change arguments to ReportBuilder::create_report (7ee166a) Use associated functions to create unary operations in IR (fd91ebd) Fix clippy warning (2f9596d) Make CompileError::join_with_or visible only within the crate. (628aef6) Addd more test cases for IR trees (4edbec8) Implement DFS traversal for loops and lookup expressions. (4fa807e) Add associated functions to Expr for building expressions. (56e5f0c) Add IR tests with and without constant folding. (5066415) Implement DFS traversal or IR tree. (c979c90) Better representation of flags in YAML output. (aff398b) Migrate macros to syn v2 and do some improvements to errors and warnings. (205b29d) Fix clippy warnings (8076b0e) Remove trailing spaces (fb15a01) Fix incorrect comment (e258a2d) Minor code simplification (ab2aca2) Fix clippy warning due to WasmExportedFn4 not used. (a598ded) Fix clippy warnings and typos (3a53aef) Remove unused vars field in EmitContext (80458cc) Re-format comments (3edc342) Prefix unused field with underscore (9054408) Fix typo. (0386d93) Avoid OOM errors due to corrupted files (258e090) Change error numbers (8cb3b56) Prevent panic with some corrupted PE files. (5a6b944) Avoid panic while scanning some corrupted PE files (3f011ee) Fix clippy warning (1ba5dcc) Avoid crashes when scanning malformed files (b82c930) Accept comparison between boolean expression and integer constant (1dd3ade) Upgrade wasmtime and walrus (0261299) Improve error messages for unclosed comments, literal strings and regexps (e313325) Error when a module is ignored and some rule depends on that module with two-levels of indirection or more. (7df210b) Some issues with error reports (1e3ed73) Some minor fixes in error reports (9aced3e) Redesign the error recovery logic. (81c4a6d) Revert accidental change (3317825) Minor fixes in comments (f937ec0) Check if source file is valid UTF-8 before trying to parse it (e9814d6) Issue with error spans that don't end at a UTF-8 char boundary. (abdbf6e) Fix test cases that don't work without constant folding (9e7eaa6) Fix test case that doesn't work without constant folding (fbb56fc) Remove unused function. (b7a1758) Fix broken test cases (f58c097) Fix test cases that don't work without constant folding (faeeb19) Re-enable test cases (12e0585) Re-enable test cases (3d7ddbc) Rename yara-x-parser-ng to yara-x-parser (534ffee) Panic due to wrong debug assertion (7212bdd) Remove trailing spaces (32470af) Start using the new parser (e7d99ba) Start the migrating the compiler to the new parser (fd42c93) Add trailing empty line (1ed3647) Broken test cases (e42e9d8) Check for valid base64 alphabet in the AST->IR phase (a7b881d) Check that $ is used inside a for .. of loop in the AST->IR phase. (ae05f04) Remove the get_pattern_index function. (068c9b9) Check for unknown patterns in the AST->IR phase. (f054105) Check for short patterns in the AST->IR phase. (44c6dee) Check for wrong xor range in the AST->IR phase. (c9701da) Check for duplicate patterns in the AST->IR phase. (c69763e) Check for unused patterns in the AST->IR phase. (9bdccc9) Use BTreeSet instead of BTreeMap for tracking duplicate pattern modifiers (cf8b515) Check for duplicate tags in the AST->IR phase. (832cd2e) Check pattern modifiers in the AST->IR phase (1f135d1) Assign code "E001" to syntax errors (d561f62) Make the order in which matching rules are reported deterministic (d63893f) Make Compiler::c_alternation_literal return Result like most of the c_* functions (cc5d05c) Improve legibility in macho module by reducing indentation levels. (d8f74d7) Panic in macho module (a27349d) Some code simplification in macho module. (69c8541) Panic in macho module when parsing corrupted files (ba36c60) Panic while parsing ULEB128 numbers that are too large. (695a130) Allow empty DLL names in pe module (693797f) Fix test cases (fcde8d5) Fix test cases that won't work when constant-folding feature is disabled. (df2009f) Remove trailing whitespaces (d14d407) Add newline after warnings in test cases (f126840) Raise warning when a rule is always true or false. (5f6a1d7) Increase the MAX_ROWS_PER_TABLE limit in dotnet module. (2f410f7) Minor fixes. (44cb517) Simplify the uleb128 parser. (1bd014d) The obsolete OID 1.3.14.3.2.29 may be used for identifying the authenticode hash algorithm. (bae43d1) Make concat_seq easy to understand my moving part of its logic to optimize_seq (3e230f2) Fix comment (08163e1) Remove trailing spaces (75a1e81) More improvements to the atom extraction logic. (204d05a) Bug in AtomsQuality comparison. (522e0cd) Issue in pattern extraction causing too many atoms. (539e858) Issues while parsing PE files (49864a4) Use MAX_ATOMS_PER_REGEXP constant instead of literal number. (e8dedd7) Add assertion (164b085) Prevent rustfmt error (ae2ea2d) Allow calling non-global rules from global ones. (865db1d) Stop parsing import descriptor when finding one where name is zero. (547cb8e) Rename functioninvoke_mod to invoke in fuzzers (58e14c3) Implement theMetadata.into_json API (710ed69) Minor style change (8d160a5) Don't allow empty DLL names (e69ff75) Make dylib_hash panic-free. (5ac725f) Show warning when build.rs can not re-generate src/modules/modules.rs (964a296) Add tests covering regexp repetition operators with spaces. (e48592c) Expose rule metadata in the Rust API (1e816a7) Implement emit_switch without recursion. (b134252) Use #![deny(missing_docs)] to force documentation of public APIs. (dee0f65) Add environment variable that controls whether build.rs re-generates modules.rs and add_modules.rs. (63c9b03) When optional header magic is invalid, assume the file is a 32-bits PE (43a3f37) Remove unnecessary argument from emit_rule_condition (d977a41) Bug in try_match_literal_bck (555f7f3) Expose all data structures defined by YARA modules. (91795a5) The relaxed_re_syntax option now handles invalid escape sequences inside character classes (f423fbd) Generalize the relaxed_escape_sequences option to relaxed_re_syntax (14b4efe) Add warnings() method to Compiler. (d1fb7d6) Implement math.to_number (3d1b9ba) Wrong method name in test case (8a16dc2) API refactoring (ba3d27e) Fix clippy warning (6ee6352) Implement an option for allowing invalid escape sequences in regular expressions (9aa4477) Limit the number of warnings (eb69534) Allow empty DLL names, as YARA does. (17a15b0) Test case that only works with constant folding is enabled (e8d158f) Ignore resource entries with offset 0 (c09d64f) Handle integer overflows in constant folding (8ccad97) Add comment about multiple nested Authenticode signatures (c573ec5) Minor change in Expr::Const (9ca198e) Implement constant folding for subtraction and multiplication (c30a9ac) Issue while parsing certificates in some PE files (d7c5181) Don't abort PE parsing when opt_hdr.magic is incorrect. (9d9b01d) Support for more digest algorithms while computing Authenticode hash. (472ec49) Handle errors when oid_to_object_identifier fails (0193258) Use iterators instead of allocating vectors in parse_exports (b7766d2) Issues while parsing array type definitions in dotnet module. (f26df27) Recognize more digest algorithms while parsing Authenticode signatures (dc49fcc) Add support for md2 hashes in Authenticode signatures and certificates (3b2497a) Prevent slow resource parsing in corrupted files (52a87ee) Use iterators instead of creating unnecessary vectors (229527a) Add more sanitation while parsing PE imports (f49972b) Integer overflow with corrupted files. (2ecab59) Integer overflow in corrupted files. (587508c) Avoid parsing too many corrupted resources. (15ff70e) Better sanitation while parsing corrupted resources. (a6c59d1) Fix minor issues in comments (768d816) Issue while building the crate by doc.rs (44c393f) Exclude test files from being published to crates.io (b4978a1) Remove yara-x dependency from yara-x-proto. (5a23a86) Copy yaml.proto and yara.proto into modules/protos (0516d24) Populate the field_offsets array in dotnet module. (33c2027) Issue while parsing ELF files with incorrect size in section headers (1a42530) Add note for clarifying regexp error (3b2b53a) Add note for clarifying regexp error (cf8325b) Syntax when quantifier in for loop is an arithmetic expression (6233989) Add explicative notes for some regexp errors (f6c1c70) Remove redundant function check_type2 (4a16716) Better handling of tab characters in error messages (2bda579) Add test case (e3d7d05) Fix misleading test (3c1d700) Add some metadata to Cargo.toml (6dedd58) Issue with math.entropy when input was an empty string (ed6d9f1) Parallelize module tests (e2a4877) Issue while parsing version information in corrupted PE. (df93f14) Upgrade wasmtime crate to version 19.0.0 (bc8e5a9) Remove benchmarks (8079b78) Issue while rendering error message that contains snippets from multiple sources (7d07bf8) Increase recursion limit while parsing protocol buffers. (c50a5b4) Replace theariadne crate with annotate-snippets-rs for error reporting. (9fc17d1) Use golden files for testing compiling errors and warnings. (05ee5f4) Issue while parsing PE exports (6922819) Remove unused import (2e0acfd) Interpret the sequences \< and \> in regexp as literals instead of word boundaries (0c6672d) Add test case that makes sure that empty strings are matched by regexps (d5a5430) Include the module name in the error shown for unknown modules (9ac238f) Use our own regexp parser for building the HIR in get_regexp (5a5aea2) More integrity checks for .NET files (91f55df) Invalid ascii bytes are encoded as hex when written into YAML (87f18ce) When writing string into YAML, don't escape valid unicode characters. (133b111) Bug in class_to_masked_byte (d6519b5) Add fmt="t" modifier to timestamp and export_timestamp fields (a8257fc) Issue while parsing PE exports (7dd709f) Issues while parsing PE files (45593df) Rename Scanner::timeout to Scanner::set_timeout (644eab9) Rename add_unsupported_module to ignore_module. (e344cd5) Issue while parsing PE version information (396666f) Issue with .NET resources (8abac87) Issue with some regular expressions that were producing unicode classes. (8edb86e) Issue warning when a rule is ignored because it indirectly depends on some unsupported module (74ea70c) Bug in parse_dir_entries. (5a96896) Implement modulerefs in dotnet module. (6f32a08) Implement start-of-word and end-of-word matches in regular expressions. (bfba792) Fix test cases (7437523) Regexps should be created without unicode support and allow invalid UTF-8. (e1a0552) Upgrade regex-syntax and regex-automata crates. (03a52a2) Handle error returned by the magic library (a7190fe) Implement the Compiler::add_unsupported_module API. (7fcdf63) Bug in caching logic in magic module (2c14ffb) Implement caching in magic module. (dd537ee) Issue with enums not defined in all cases (6351d32) PE version information not parsed in some corrupted files. (7aee275) Rename variable. (b2ffebd) Issue while matching regexps with FastVM (7ca768d) Bug while matching regexp that contain long jumps. (3da6fc8) Accept invalid UTF-8 strings as PDB paths in pe module (ede8b9b) Issue while matching wide regular expression with FastVM (449cb03) Be more flexible about the number of PE sections parsed (013b539) imphash and checksum must return undefined when the file is not PE. (72a2ac8) Edge cases in FastVM::jump_fwd and FastVM::jump_bck (154efc8) Issue while parsing PE version information. (48e7554) Edge case in which the compiler for fast regexps was emitting incorrect code. (c183be5) Base64 patterns that were followed by padding not being matched (8354a5c) PE files are now considered as such even if the optional header is corrupted (a180b8d) Fix test broken after upgrading wasmprinter. (e2586ff) Upgrade wasmtime, wasmprinter, yara and base64 (cfc7626) Add some additional validation while parsing PE files. (8ce1d3b) Bug while parsing .NET file. (ca1ece9) Issue in FastVM while matching wide regexps. (fc19e22) Fix clippy warnings (c7e63f5) Split regexp_patterns_3 in two functions. (04c8cfc) \b not matching the underscore (_) character (a4e4498) Issue with FastVM not matching in certain cases. (15a0f8a) Fix typos (6ada059) Issue while parsing PE resources. (981eb9c) Issue in for .. in loops when iterating a list of expressions where one of them is undefined. (1c27d63) Minor improvements (4a063dc) Test case (fafbbb7) Test case (da2ae7a) Add more regexp test cases (866e0e6) Fix clippy warnings (9ad76ce) Issue introduced in dc2c9ea (cd87d88) Issue with wide regexps causing false positives (dc2c9ea) Compiling error when rules profiling is enabled (4767ae1) Compilation error when logging is enabled (8bf8ce8) Remove unused method (fc7aa80) Add num_rules() method to Rules (432d83c) Remove trailing spaces (0ea8204) Issues when number of rules is multiple of 8. (5b7e79f) Fix typos (a4dcbce) Show more explicative error when built-in module is not found (7b5e994) Issue with for .. in .. loop returning undefined value instead of false. (ce6f7e8) Make catch_undef more flexible (889a7cd) Remove trailing space (117cc3c) Fix typos (9cd712d) Issue with constant folding (e158260) Issue introduced in d50dbfd (d5a0924) Remove println statement (7695c7f) Enum types declared in .proto file that doesn't contain the root message (d50dbfd) Fix clippy warning (0a1c40a) Remove the logic that parses signatures from Macho files. (154994e) Improve comment (ae55e89) Fix minor error in feature description (b665654) Remove some debug messages. (e1fabdb) Fix typos (40d982f) Add example in yara_x::mods (0dbc8a7) Implement API for setting the output data for a YARA module (b4b54e7) Add derive feature to serde dependency for build script (b02e768) Fix more clippy warnings (414b25c) Fix compilation (d335613) Fix clippy warning (ef05df2) Fix clippy warnings (c3a5e08) Control the name of the feature associated to each YARA module from the protobuf defining the module (a4b12d8) Replace the YRX_MOD_PROTO_DIRS env variable with YRX_PROTOC_CONFIG_FILE (1d75a56) Re-write yara.proto as proto3 (dc98dfb) Upgrade goldenfile and chrono crates (42cca1b) Add support for the YRX_MOD_PROTO_DIRS environment variable (0261a8f) Advance in the implementation of the C and Golang APIs (be21199) Implement ExactSizeIteratortrait for Patterns (6cf62f4) Individual descriptions for each crate (d5f84ad) Rename top-level directories. (e7374d6) Download Link to comment Share on other sites More sharing options...
Recommended Posts