MBot Posted September 9, 2024 at 01:48 PM Share Posted September 9, 2024 at 01:48 PM revert rowan to version 0.15.15 Version 0.15.16 requires rustc 1.77 and our MSRV is still 1.76. upgrade dependencies re-organize imports hide the into_cst function from the documentation. This API is still unstable and should not be used by third-party code. hide the CST from the documentation The CST is still immature and subject to change. It should not be used by third-party code. inherit repository field from workspace in parser/Cargo.toml rename yara-x-parser-ng to yara-x-parser remove unused error. assign code "E001" to syntax errors fix typos fix minor issues in comments individual descriptions for each crate rename top-level directories. In the future we want to add a go directory that contains the Go bindings for YARA-X. Having a yara-x-go directory is not ideal, because the import path for that Go module would be github.com/VirusTotal/yara-x/yara-x-go. Instead we want the more natural github.com/VirusTotal/yara-x/go. However, adding a go directory makes the current naming incoherent, but with this change everything fits well again. remove dependency from num crate fix typos cover all cases in ErrorInfo::printable_string explictily small change in wording ("postfix" to "suffix") remove the new_string_expr macro remove unused arguments for macros remove unreachable code. fix typos improve error message for non-global rules that depend on global ones Also do some refactoring that removes the rule_ident_spans vector. some preliminary preparation for implementing regexp parsing remove unused test cases remove the implementation of Hash trait for patterns in AST. The idea behind implementing this trait was detecting duplicate patterns across different rules and reusing them. But this will done at the IR level, not at the AST level. improve some comments upgrade ariadne create to version 0.2.0 remove unused dependencies Documentation shorten parser description, as it is rendered badly in docs.rs The current description is too long to be properly rendered in docs.rs. minor documentation improvements New Features distinguish between mutable and immutable nodes in the CST implement the CST public API improve error messages for unclosed comments, literal strings and regexps add error/warning codes and allow disabling warnings by code. This is the detailed list of changes: Assign a code to each error and warning Add a Compiler::switch_warning API that allows enabling/disabling specific warnings. Add a Compiler::switch_all_warnings API that allows enabling/disabling all warnings. Add a command-line option --disable-warning for disabling specific warnings, or all warnings. Move warnings from parser to compiler. Implement multiline string literals in metadata. This commit implements multiline string literals in metadata section. rule a { meta: a = """ I'm a multiline string literal! Hooray! \"test\" I also handle escapes, \x41\x42\x43! ... and emojis 🤖! """ condition: true } While I'm here, also quote the metadata string values when printing the ast. expose rule metadata in the Rust API add warnings() method to Compiler. This new method exposes the warnings issued by the compiler. implement math.to_number Required some changes in the grammar that wasn't accepting boolean expressions as function arguments. implement an option for allowing invalid escape sequences in regular expressions Historically, YARA has accepted any character that is preceded by a backslash in a regular expression, even if the sequence is not a valid one. For instance, \n, \t and \w are valid escape sequences in a regexp, but \N, \T and \j are not. However, YARA accepts all of these sequences. The valid escape sequences are interpreted as their special meaning (\n is a new-line, \w is a word character, etc.), while invalid escape sequences are interpreted simply as the character that appears after the backslash. So, \N becomes N, and \j becomes j. This change introduces the Compiler::relaxed_regexp_escape_sequences API, which allows to turn on an option that makes YARA-X to behave in the same way than YARA with respect to invalid escape sequences in regular expressions. This option is turned off by default. Also, the option --relaxed-escape-sequences is added to the CLI. issue warning when a rule is ignored because it indirectly depends on some unsupported module implement the Compiler::add_unsupported_module API. This API allows telling the compiler that some YARA module should be ignored. Any import statement for an unsupported module is ignored without errors (only a warning is issue) and any rule that uses the module is also ignored, while maintaining rules that don't depend on it. advance in the implementation of the C and Golang APIs allow pattern identifiers starting with underscore to remain unused. For example, if pattern is named $_foo it doesn't need to be used in the rule's condition. produce warning for potentially slow patterns Any pattern producing an atom of length < 2 will raise this warning. handle error when regexp is too large Very large regexps where causing a panic instead of being handled as an error. raise warning when a regexp uses both /i and nocase draft the implementation of the regexp compiler. draft the implementation of the regexp compiler. implement the matches operator implement the conversion of hex pattern AST into HIR. add IR for patterns. provide a mechanism for accessing the protobuf message produced by a module. The protobuf message returned by the module's main function is now accessible from other functions exported by the module. This is useful for implementing functions that rely on information that was already extracted from the scanned data and stored in that protobuf message. Bug Fixes wrong function name. Add more test cases for CST. tokenizer not recognizing regexps that a single character issue while computing the span of unexpected tokens. some issues with error reports some minor fixes in error reports issue while parsing some multi-line comments issue with bitwise not (~) operator. The operator was being treated as a valid infix operator. bug while parsing rule. issue with error spans that don't end at a UTF-8 char boundary. issue with string literal that ends in \" broken test cases syntax when quantifier in for loop is an arithmetic expression A condition like for 1+1 i in (0..10) : ( i <= 1 ) was failing with syntax error because 1+1 was not accepted as a quantifier. better handling of tab characters in error messages Tabs in source code were braking the layout of error messages. issue while rendering error message that contains snippets from multiple sources don't use usize::next_multiple_of, use the num crate instead. issue in error reports when the source file contains UTF-8 characters longer than a byte. The root cause of this issue is that the ariadne crates works with character-wise spans, while our AST works with byte-wise spans. Our AST gets the spans from pest, so switching to character-wise spans is not option. Also, I think that byte-wise spans are easier to work with because they can be used directly with &str types containing the source code. In the future we may consider adapting ariadne to our needs by making it work with byte-wise spans, but after some examination of the source code it doesn't look like a trivial task. cover more grammar rules in ErrorInfo::printable_string issue when literal string contains a backslash followed by a unicode character larger than 1 byte. raise compiler error when some pattern set is empty prevent the parser from taking too long while parsing pathological cases issue with identifiers that start with some keywords Identifiers like trueFoo were parsed as true followed by Foo. produce error when a pattern identifier is used without being declared don't complain about unused patterns when them is used in for .. of loop. The unused patterns set was not being emptied when them was used in this kind of loops, causing an error. don't panic when a function that expects arguments is called without them. remove duplicated entry in invalid_combinations Other minor fixes in comments re-enable test cases fix test cases fix test case broken by previous commit run tests with nightly and MSRV 1.66.1 Performance small performance improvement improve the grammar by avoiding unnecessary backtracking minor optimization in string_lit_from_cst Refactor large refactoring of how compilation errors are handled The changes in this PR include: Redesign how the compilation errors are exposed in the Rust API. With this change users of the API have access to more details about the error, including the structure of error reports (report title, individual labels with their corresponding text and code spans, etc). These changes are backward-incompatible, though. The CLI now shows multiple compilation errors at a time, instead of showing only the first error found. All the error details exposed in the Rust API are also exposed in the C API and the Golang API. redesign the error recovery logic. change assert to debug_assert check if source file is valid UTF-8 before trying to parse it check for valid base64 alphabet in the AST->IR phase check that $ is used inside a for .. of loop in the AST->IR phase. check for unknown patterns in the AST->IR phase. check for short patterns in the AST->IR phase. check for wrong xor range in the AST->IR phase. check for duplicate patterns in the AST->IR phase. check for unused patterns in the AST->IR phase. check for duplicate tags in the AST->IR phase. check pattern modifiers in the AST->IR phase Instead of checking for duplicate pattern modifiers and invalid modifier combinations in the CST->AST phase, we now do it in the AST->IR phase. The idea is that the CST->AST phase should perform as many checks as possible. Most semantic validations will be done in the AST->IR phase. small changes that make the UTF-8 validation easier to understand limit the number of warnings Also, move some of the warnings from the parser to the compiler. replace theariadne crate with annotate-snippets-rs for error reporting. The annotate-snippets-rs crate handles very long string in the source code better, and seems to be better maintained. use golden files for testing compiling errors and warnings. rename add_unsupported_module to ignore_module. big refactor that addresses multiple issues This refactor started as an attempt to fix a bug in the logic for structure field lookups, but fixing this issue required more fundamental changes. Some of the changes are: TypeValue::String now has a reference-counted string WasmArg has been refactored for making Rust-to-WASM type conversions simpler ScanContext now contains a map of used objects (structs, arrays, maps or strings) for tracking values that are passed from Rust to WASM and back to Rust. The field lookup mechanism for finding fields a structure has been refactored. ast::FieldAccess is now an n-ary expression. make the fields of Span private. finish the introduction of n-ary expressions in AST make some changes in the compiler API Methods add_source, define_global and new_namespace now receives a mutable reference instead of taking ownership of the compiler object. The problem with the previous approach is that add_source is a fallible function. When the function succeeds it returns the compiler back to the caller, and the caller regain ownership, but when it fails the ownership is lost. This means that once add_source fails the compiler can't be used anymore, you can't add more sources or get the compiler rules. make the AST more flat by allowing multiple operands in some expressions. For example, in expression a and b and c instead of having a tree with two levels and two and operations that are binary expressions, have a tree with one level where there's a single and operation with three operands. This reduces the amount of stack space while traversing the AST recursively and improve performance. don't put HexJump into a Box don't put HexByte into boxes. Context::span() now receives a CSTNode improve error reporting logic Introduce the SourceId field on each Span, which indicates the original source code the span is associated with. With this change we don't need to pass a reference to the original source code to ReportBuilder::build_report, making the code simpler. Also, it allows to create multi-file error reports, showing code snippets from different source files. Fix an issue where duplicate rule declarations were not being detected if the rules where in different source files under the same namespace. remove namespaces from the AST. For the time being we are not introducing the concept of namespace at the source-code leve. Remove namespaces from the AST to reduce the cognitive load while maintaining the code. implement intermediate representation (IR) This is a large refactor that introduces an intermediate representation (IR) and removes type information from the AST. Type information is moved to the IR, which is built from the AST. The code emitter now uses the IR as input instead of the AST. The IR can be manipulated in the future for applying some optimizations before emitting the final code. Style reformat code fix indentation in grammar.pest fix some clippy warnings fix clippy warnings Remove the mut modifier from arguments that are not used as mutable. fix clippy warnings fix clippy warning fix clippy warnings fix warning fix clippy warning fix clippy warning minor stylistic changes Test more test cases for hex patterns Commit Statistics 166 commits contributed to the release. 126 commits were understood as conventional. 12 unique issues were worked on: #110, #113, #121, #13, #140, #162, #167, #18, #180, #25, #30, #57 Commit Details #110 Fix some clippy warnings (b2b8ec3) #113 Fix typos (e587a82) #121 Implement multiline string literals in metadata. (afe7266) #13 Implement Aho-Corasick scanning. (01890bf) #140 Add error/warning codes and allow disabling warnings by code. (f1b91a9) #162 Inherit repository field from workspace in parser/Cargo.toml (14f42ba) #167 Distinguish between mutable and immutable nodes in the CST (cc58ac4) #18 Implement the base64 modifier. (36cb6e4) #180 Large refactoring of how compilation errors are handled (7def597) #25 Implement intermediate representation (IR) (a6d83b3) #30 Improve error reporting logic (d27e9ce) #57 Big refactor that addresses multiple issues (debde08) Uncategorized Revert rowan to version 0.15.15 (309f53d) Upgrade dependencies (ace86e8) Re-organize imports (c5727f7) Shorten parser description, as it is rendered badly in docs.rs (06b28c7) Hide the into_cst function from the documentation. (8ff07f3) Hide the CST from the documentation (7505ef6) Wrong function name. (adb126c) Implement the CST public API (a7c94b6) Improve error messages for unclosed comments, literal strings and regexps (e313325) Tokenizer not recognizing regexps that a single character (801b405) Reformat code (934814e) Issue while computing the span of unexpected tokens. (4d4bcff) Some issues with error reports (1e3ed73) Some minor fixes in error reports (9aced3e) Issue while parsing some multi-line comments (2a3fea5) Small performance improvement (66526f2) Redesign the error recovery logic. (81c4a6d) Issue with bitwise not (~) operator. (ad5759c) Bug while parsing rule. (194afb2) Change assert to debug_assert (6db1e35) Minor fixes in comments (f937ec0) Check if source file is valid UTF-8 before trying to parse it (e9814d6) Issue with error spans that don't end at a UTF-8 char boundary. (abdbf6e) Issue with string literal that ends in \" (6492bba) Re-enable test cases (12e0585) Rename yara-x-parser-ng to yara-x-parser (534ffee) Broken test cases (e42e9d8) Check for valid base64 alphabet in the AST->IR phase (a7b881d) Check that $ is used inside a for .. of loop in the AST->IR phase. (ae05f04) Check for unknown patterns in the AST->IR phase. (f054105) Remove unused error. (144dece) Check for short patterns in the AST->IR phase. (44c6dee) Check for wrong xor range in the AST->IR phase. (c9701da) Check for duplicate patterns in the AST->IR phase. (c69763e) Check for unused patterns in the AST->IR phase. (9bdccc9) Fix test cases (d72715c) Check for duplicate tags in the AST->IR phase. (832cd2e) Check pattern modifiers in the AST->IR phase (1f135d1) Assign code "E001" to syntax errors (d561f62) Fix indentation in grammar.pest (c45c8c7) Improve the grammar by avoiding unnecessary backtracking (6c3fce5) Expose rule metadata in the Rust API (1e816a7) Small changes that make the UTF-8 validation easier to understand (482156c) Add warnings() method to Compiler. (d1fb7d6) Fix test case broken by previous commit (395da6b) Implement math.to_number (3d1b9ba) Implement an option for allowing invalid escape sequences in regular expressions (9aa4477) Limit the number of warnings (eb69534) Fix minor issues in comments (768d816) Syntax when quantifier in for loop is an arithmetic expression (6233989) Better handling of tab characters in error messages (2bda579) Issue while rendering error message that contains snippets from multiple sources (7d07bf8) Replace theariadne crate with annotate-snippets-rs for error reporting. (9fc17d1) Use golden files for testing compiling errors and warnings. (05ee5f4) Rename add_unsupported_module to ignore_module. (e344cd5) Issue warning when a rule is ignored because it indirectly depends on some unsupported module (74ea70c) Implement the Compiler::add_unsupported_module API. (7fcdf63) Advance in the implementation of the C and Golang APIs (be21199) Individual descriptions for each crate (d5f84ad) Rename top-level directories. (e7374d6) Remove dependency from num crate (752d74b) Fix typos (8a245f2) Don't use usize::next_multiple_of, use the num crate instead. (22a2831) Issue in error reports when the source file contains UTF-8 characters longer than a byte. (67b9c0b) Cover all cases in ErrorInfo::printable_string explictily (7de5124) Cover more grammar rules in ErrorInfo::printable_string (803b321) Minor optimization in string_lit_from_cst (67ef2a9) Issue when literal string contains a backslash followed by a unicode character larger than 1 byte. (bf8e673) Fix clippy warnings (c5a8c0f) Small change in wording ("postfix" to "suffix") (89e785b) Make the fields of Span private. (5eb7ede) Merge branch 'main' into fast_regexp (ab2ebfe) Raise compiler error when some pattern set is empty (47c230f) Merge branch 'main' into fast_regexp (912e250) Fix clippy warnings (aa2dc88) Fix clippy warning (089da64) Allow pattern identifiers starting with underscore to remain unused. (4e12bfe) Finish the introduction of n-ary expressions in AST (3cc89ab) Produce warning for potentially slow patterns (bfbe243) Fix clippy warnings (47dbac6) Make some changes in the compiler API (dbb2406) Prevent the parser from taking too long while parsing pathological cases (c0d07ca) Make the AST more flat by allowing multiple operands in some expressions. (e69afd5) Remove the new_string_expr macro (24a0262) Remove unused arguments for macros (dfcdbd0) Handle error when regexp is too large (f55f9b0) Issue with identifiers that start with some keywords (16e15df) Fix warning (f37d049) Raise warning when a regexp uses both /i and nocase (3b9f86c) More test cases for hex patterns (fd45fba) Draft the implementation of the regexp compiler. (d80a50e) Draft the implementation of the regexp compiler. (9c4c70b) Implement the matches operator (f11b259) Fix clippy warning (a6fdf51) Don't put HexJump into a Box (16a99d1) Don't put HexByte into boxes. (b044754) Implement the conversion of hex pattern AST into HIR. (32ce07c) Remove unreachable code. (cf1f88a) Fix clippy warning (fa1cd6c) Minor stylistic changes (9a2324a) Minor documentation improvements (67bcdf6) Context::span() now receives a CSTNode (9965445) Fix typos (46bdb25) Improve error message for non-global rules that depend on global ones (63d4458) Some preliminary preparation for implementing regexp parsing (7a2a6ff) Remove unused test cases (4ce15ca) Remove the implementation of Hash trait for patterns in AST. (6a93408) Improve some comments (5a9e441) Remove namespaces from the AST. (56cba67) Run tests with nightly and MSRV 1.66.1 (afc2bdb) Add IR for patterns. (ebb1605) Produce error when a pattern identifier is used without being declared (923f844) Don't complain about unused patterns when them is used in for .. of loop. (11668de) Upgrade ariadne create to version 0.2.0 (4c916f4) Provide a mechanism for accessing the protobuf message produced by a module. (b90d1cc) Don't panic when a function that expects arguments is called without them. (f5da442) Remove unused dependencies (c31c3ff) Remove duplicated entry in invalid_combinations (ebb8d92) Implement base64wide modifier. (254220d) Implement Hash trait for ast::Pattern. (8262ec9) Remove the as_str method from Ident. (671025d) Start implementing logic for extracting atoms from text patterns. (8301665) Derive Debug trait for Struct. (01c5404) Handle the DOT_DOT token in ErrorInfo::printable_string. (ddd6217) Minor improvements in documentation. (a20eaec) Handle undefined fields in proto3 correctly. (2254fb2) Better error message when using a non-supported type as a map key. (7e36785) More code simplification. (6232bb5) More code simplification. (558f516) Draft implementation of "for" loops with maps. (7f44b3e) Allow returning a tuple from #[wasm_export] functions. (637ba8b) Fix all issues found by clippy. (2c2e8fa) Remove MaybeUndef<T> and use Option<T> instead. (fd4b990) Implement defined operator. (cdc7946) Stop using fast-line-col feature from pest crate. (2a43296) Fix wrong merge. (ff32319) Multiple changes in preparation for implementing built-in functions (9ffe658) Prepare for implementing built-in functions. (6dfd8cf) Start implementing built-in functions uintXX. (b58a5cc) Allow associating module's functions with nested structures. (e257ca2) Allow returning strings and undefined values from #[wasm_export] functions. (7b1e62c) Fix another issue related to unstable order of signatures. (6ebf69e) Fix issue with random order of items in error message. (802602b) Implement support for overloaded functions. (7d51ce7) Implement module function calls. (7169173) Fix issue with source files that use CRLF newlines. (ed9bc8d) Remove redundant code. (7468597) Polish minor details. (0313edf) Improve documentation. (525a30f) Make cast_to_bool a function instead of a macro. (f2e3bc1) Improve documentation. (6a63a95) Add a "ascii-tree" feature. (84324bc) Add missing features and fix build issues. (83d3e3b) Split the project into multiple crates. (8098de4) Download Link to comment Share on other sites More sharing options...
Recommended Posts