Rust|Rust macro_rules 学习笔记 Rustmacro_rules学习笔记

Rust macro_rules 学习笔记前一篇文章写了macro_rules入门的基本知识。但还有很多细节的内容，如果不知道这些内容，就会在编写时出现各种各样的错误。所以，这篇文章将所有相关的细节内容做了个整理。
部分内容直接翻译自The Little Book of Rust Macros，而有的内容是笔者自己总结的。

参考文献：The Rust Reference

有关元变量的细节匹配顺序
一旦某个规则中的元变量与某一Token Tree匹配，便不会停止或回溯；这意味着即便整个Token Tree与这条规则不完全匹配，macro_rules也不再继续向下匹配，而是抛出一个错误。例如：

macro_rules! some_rules { ($e:expr) => { $e }; ($a:ident++) => { { $a = $a+1; $a } }; }fn main() { let mut a = 0; println!("{}", some_rules!(a++)); // compile error }

上例中，前两个tokena+可以作为一个正确的表达式的起始，所以输入的Token Tree被第一条规则的$e:expr捕获（不再回溯，换言之，不再尝试与第二条规则匹配），然而整个Token Treea++并不是一个有效的表达式，所以编译器抛出了错误。
因此，在编写macro_rules时应该遵守先具体、再笼统的原则。
上述例子可以这样改正：

macro_rules! some_rules { // 把“更具体”的规则放在前面 ($a:ident++) => { { $a = $a+1; $a } }; ($e:expr) => { $e }; }fn main() { let mut a = 0; println!("{}", some_rules!(a++)); }

Never Look Ahead
如何获取一串重复单元中的最后一个？以下macro是否可行？

macro_rules! get_last { ($($i:ident),* , $j:ident) => { }; }fn main(){ get_last!(a,b,c,d); }

编译该示例，你会得到一个错误：

error: local ambiguity when calling macro `get_last`: multiple parsing options: built-in NTs ident ('j') or ident ('i'). --> src/lib.rs:6:17 | 6 |get_last!(a,b,c,d); |^

原因是Rust编译器并不支持“前向断言”（look ahead），它不会先找到$j然后去检测前面是否存在$i。
Rust Reference的解释如下：

When matching, no lookahead is performed; if the compiler cannot unambiguously determine how to parse the macro invocation one token at a time, then it is an error.

该示例给出了一种解决方案:

macro_rules! get_last { ($($i:ident),*) => { get_last!(@internal $($i),*) }; (@internal $i0:ident) => {// 注意把这个规则放在前面 $i0 }; (@internal $i0:ident, $($i:ident),*) => { get_last!(@internal $($i),*) }; }fn main(){ let d =1; println!("{}",get_last!(a,b,c,d)); }

不透明性
编写的宏展开可能会去调用其他macro_rules，但需要注意，大多数元变量在替换时对其他macro_rules来说会变得“不透明”。也就是说，将元变量作为第二个macro的输入时，第二个macro只能看到不透明的抽象语法树而无法知道具体的内容。
该“不透明性”的限制适用于除了ident，tt，liftime以外的所有类型的元变量。
举一个最简单的例子：

macro_rules! foo { ($e:expr) => { bar!($e); } // ERROR:^^ no rules expected this token in macro call }macro_rules! bar { (3) => {} }foo!(3);

在这一例子中，对第二个宏bar来说，第一个宏foo中的$e只是一个expr类型的语法树，bar无法知道实际的Token Tree是什么，所以编译时抛出错误。（因为bar仅仅知道它所接受到的实际参数是expr类型，它可能是一个ident，也可能是其他表达式，而不一定是3。）
【Rust|Rust macro_rules 学习笔记】但下面的例子却可以通过编译：

macro_rules! foo { ($e:expr) => { bar!($e); } }macro_rules! bar { ($l:tt) => {} }foo!(3 + 4);

这是因为一个expr一定是单个Token Tree。
总之，一个macro可以处理一个已经被捕获的元变量，当且仅当：

该macro所需的参数是$t:tt（或$($t:tt)?等）。因为
该元变量的类型被该macro所需的参数类型兼容。
- 同种类型是相互兼容的。
- 其它情况，例如path被ty和pat等兼容（path可以作为type或pattern）；block被expr兼容；item被stmt兼容；等等。

这种特性使macro无法像函数那样随意嵌套使用。不过，大多数需要嵌套macro的需求，可以使用Token Tree Munching（TT Munching）方法解决。
后缀
Rust是一门高速发展的编程语言，为了避免将来的语法改变可能导致的对宏的解释的变化，macro_rules限制了元变量之后所跟随的token的类型。（暂且将紧随在元变量之后的token称为“后缀”）
根据Rust Reference，完整的列表如下（Rust1.58）：

expr 与stmt 的后缀只能是下列中的一个： =>， ,或 ;
pat 与pat_param 的后缀只能是下列中的一个： =>， ,，=，|，if 或 in.
path 与ty 的后缀只能是下列中的一个：
- =>，,，=，|，;，:，>，>>，[，{，as 或 where
- 元变量block
vis 的后缀只能是下列中的一个：
- ,
- 除了priv以外的任何标识符
- 元变量 ident，ty 或 path
其他元变量的后缀没有限制

这种限制对于重复单元也适用。如果一个元变量（或任意重复单元）可以重复多次，那么其分隔符（如果有的话）必须能够作为该元变量的“后缀”；如果一个元变量可以出现一次或零次，那么其后紧随的token也必须遵守以上规则。例如：

macro_rules! some_rules { ( $($e:expr),* ) => {}; ( $($e:expr); * ) => {}; // ( $($e:expr):* ) => {}; // error ( $($idt:ident):* ) => {}; // ok }

作用域和导出方法作用域
macro_rules的作用域是定义该macro的mod，例如：

foo!{}// undefinedmacro_rules! foo { () => {} }foo!{}// definedmod some { foo!{}// definedmacro_rules! bar { () => {} } }bar!{}// undefinedsome::bar!{}// Error //^^^ could not find `bar` in `some`

macro_rules不能被pub等可见性标识修饰。
默认情况下,也无法通过路径访问某个macro_rules。

对mod使用属性#[macro_use]可以将该mod下的所有macro的作用域扩展到上一级mod（playground link）：

mod some { #[macro_use] mod inner { macro_rules! bar { () => {} } }bar!{}// defined }bar!{}// undefined

或者，也可以在mod内部开头使用属性#![macro_use]（playground link）

mod some { #![macro_use]#[macro_use] mod inner { macro_rules! bar { () => {} } } macro_rules! foo { () => {} }bar!{}// defined foo!{}// defined }bar!{}// defined foo!{}// defined

导出
你可能会猜想，在某个crate的根部使用#![macro_use]属性，就能导出该crate的macro。但实际上这种做法不可行。
要导出macro_rules，需要为其使用#[macro_export]属性。有几点值得注意：

#[macro_export]是macro的属性，而不是mod的属性（区别于#[macro_use]）；
该属性导出的macro位于根module，而忽略其实际路径；
该属性会导出目标macro，但并不会改变它在本crate的作用域。

例如：

/** * crate foo */mod some { mod inner { #[macro_export] macro_rules! call { () => {} } } call!(); // ERROR! undefined macro `call` crate::call!(); // OK } call!(); // OK

导入
在另一个crate bar中导入上面的宏。首先在cargo.toml中添加依赖项。然后：

/** * crate bar */use foo::call; // use foo::inner::call; // Error:^^^^^^^^^^^no `call` in `inner`call!()// defined

如果使用Edition2015，则需要额外添加一行：

/** * crate bar */extern crate foo; // 引入外部crateuse foo::call; call!()// defined

或者：

/** * crate bar */// #[macro_use]// 导入所有macro_rules #[macro_use(call)]// 导入`call` extern crate foo; // 引入外部cratecall!()// defined

特殊变量$crate
如果要导出macro_rules，那么请注意：宏展开中所使用的变量（或类型/Trait...）未必在被导入的crate中定义。
可以使用特殊的元变量$crate。它用于指代定义该macro的crate，如$crate::Type，$crate::Trait。
举个例子：

pub mod inner { #[macro_export] macro_rules! call_foo { () => { $crate::inner::foo() }; }pub fn foo() {} }

尽管 $crate允许宏展开使用它所在的crate的条目，但它并不改变引用条目的可见性。也就是说，在宏调用的位置，使用$crate引用的条目也必须是可见的，只不过不需要另外导入。在下面的例子中，如果在其他crate调用了call_foo，就会导致错误。（因为crate::foo在其他crate并不可见）

#[macro_export] macro_rules! call_foo { () => { $crate::foo() }; }fn foo() {}

其他细节变量污染
如果在宏展开中绑定了新的变量，会发生什么？比如下面这个例子：

macro_rules! with_a { ($e:expr) => { { let a = 10; $e } } }fn main() { dbg!(with_a!(a * 2)); }

编译该程序，你将得到一个错误：

dbg!(with_a!(a * 2)); // ERROR:^ not found in this scope

macro中使用的所有identifier都有一个无形的“语法上下文”。在两个identifier比较时，只有它们的文本名称和语法上下文都相同，这两个identifier才是相同的。
在上面的例子中，a * 2的a与宏展开里

let a = 10;

的a具有不同的语法上下文，所以它们不被看做是同一个变量。
要使两个a具有相同的语法上下文，可以这么修改（playground link）：

macro_rules! with_a { ($a:ident; $e:expr) => { { let $a = 10; $e } } }fn main() { dbg!(with_a!(a; a * 2)); }

也就是说，macro_rules不会造成变量污染，或者称它是“卫生的”（Hygiene）。
Debug trace_macros
开启trace_macros!(true)可以控制编译器打印出每一条macro_rules宏调用。调用trace_macros!(false)将关闭该功能。（需要为nightly版本）
例如：

#![feature(trace_macros)]macro_rules! each_tt { () => {}; ($_tt:tt $($rest:tt)*) => {each_tt!($($rest)*); }; }each_tt!(foo bar baz quux); trace_macros!(true); each_tt!(spim wak plee whum); trace_macros!(false); each_tt!(trom qlip winp xod);

编译该程序，将输出：

note: trace_macro --> src/lib.rs:10:1 | 10 | each_tt!(spim wak plee whum); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | = note: expanding `each_tt! { spim wak plee whum }` = note: to `each_tt! (wak plee whum) ; ` = note: expanding `each_tt! { wak plee whum }` = note: to `each_tt! (plee whum) ; ` = note: expanding `each_tt! { plee whum }` = note: to `each_tt! (whum) ; ` = note: expanding `each_tt! { whum }` = note: to `each_tt! () ; ` = note: expanding `each_tt! {}` = note: to ``

在调试递归宏时，这些内容会非常有帮助。
log_syntax
log_syntax比trace_macros更有针对性，它可以打印出任何传递给它的tokens。
例如：

#![feature(log_syntax)]macro_rules! each_tt { () => {}; ($_tt:tt $($rest:tt)*) => {log_syntax!($_tt); each_tt!($($rest)*); }; }each_tt!(spim wak plee whum);

编译，将打印出：