C#|c#WebBrowser的自动化模拟点击

  • 作用:
  • 编写一些游戏的挂机脚本之类的东西。
  • 编写爬虫,自动抓取页面中的数据。
  • 代替人工,完成一些简单而又乏味的重复工作。
  • 嗯,纯粹为了技术。
一、元素选择器 1.设计元素选择器
WebBrowser内置了一些元素选择的基本功能,比如:
  • GetElementById,根据元素Id选择元素
  • GetElementsByTagName,根据标签名选择元素
  • GetElementFromPoint,根据坐标选择元素
这些基本功能,完全满足不了在各种各样的页面中进行元素查找的需求。一个理想的选择是JQuery的元素选择器(或者CSS元素选择器),JQuery的强大之处就在于它的元素选择器,因此,本文将JQuery的元素选择功能移植到C#中,用于操作WebBrowser。
一个元素选择器通常由基础元素选择器过滤器两部分组成,元素选择的过程可以看做:
  • 1.根据基础元素选择器,从当前页面中选择满足条件的元素组成一个元素列表
  • 2.选择一个过滤器,如果没有过滤器这结束
  • 3.过滤元素列表中的元素,然后转到步骤2
基础元素选择器可以分为:
  • id选择器,根据元素Id选择元素,以#号开头,例如:#username
  • class选择器,根据class选择元素,以.号开头,例如:.content
  • 标签选择器,根据标签名选择元素,以标签名开头,例如:input
  • *号是一个特殊的选择器,用于选择所有的元素,当后面包含过滤器时可以省略
  • 由以上4种过滤器以,号分割的方式组合的过滤器,例如:#username,#password
过滤器可以分为:
  • 属性过滤器,包含在[]中,例如:[text='确定']
  • 冒号过滤器,以:,例如::contains('登陆')
冒号过滤器是一个统称,包括基本过滤器、内容过滤器、可见性过滤器、子元素过滤器、表单过滤器等。
注:本文只实现了JQuery的大部分功能,并不是完全移植。
设计的元素选择器需要满足:
  • 1.选择一个或多个元素
  • 2.一个元素选择器对象是基于某个WebBrowser对象的
  • 3.元素选择器的整个工作过程都依赖一个元素列表
元素选择器的基本结构如下:

12345678910111213141516171819202122232425262728293031323334353637383940414243

public class Selector : SelectorLexer{#region 数据属性/// /// 浏览器控件/// public WebBrowser WebBrowser { get; set; }/// /// 文档对象/// public HtmlDocument Document { get; set; }/// /// 元素列表/// protected List Elements { get; set; }#endregion#region 构造函数public Selector(WebBrowser webBrowser){this.WebBrowser = webBrowser; this.Document = webBrowser.Document; this.Elements = new List(); }#endregion#region 公开方法/// /// 选择多个元素/// public List SelectElements(string selector); /// /// 选择单个元素/// public HtmlElement SelectElement(string selector); #endregion}


下面将会详细讲解元素选择器的工作流程。
2.解读核心代码
2.1 元素选择器入口
下面的代码便是元素选择的入口:

12345678910111213141516

/// /// 选择多个元素/// public List SelectElements(string selector){if (selector.StartsWith("[") || selector.StartsWith(":")) selector = "*" + selector; this.Elements = new List(); ; this.src = https://www.it610.com/article/selector; this.position = -1; this.GetChar(); ParseSelector(); return Elements; }


这段代码执行如下流程:
  • 1.首先检查selector是否缺省了默认的*号,如果缺省了,则补上
  • 2.新建元素列表,用以开启一个新流程
  • 3.重置指针的位置
  • 4.解析元素选择器的根节点(启动解析)
  • 5.返回元素列表
2.2 语法树根节点 一个元素选择器由一个基础选择器和0个或多个基础过滤器组成,代码如下:
12345678910111213

/// /// Selector := BaseSelector { BaseFilter }/// protected void ParseSelector(){BaseSelector(); SkipWhiteSpace(); while (currentChar != '\0'){BaseFilter(); }}

2.3 基础选择器 一个基础选择器可以是一个*号选择器,也可以是以下一种或几种的组合:
  • Id选择器
  • Class选择器
  • 标签选择器
12345678910111213141516171819202122232425

/// /// BaseSelector := (IdSelector | ClassSelector | TagSelector | StarSelector) (',' BaseSelector)?/// public void BaseSelector(){if (Expect("#")){IdSelector(); }else if (Expect(".")){ClassSelector(); }else if (char.IsLetter(currentChar)){TagSelector(); }else if (Expect("*")){StarSelector(); }else throw new Exception("语法不正确."); if (ExpectThenDrop(",")) BaseSelector(); }

Id选择器直接调用内置的GetElementById函数,代码如下:

12345678910111213

/// /// IdSelector := '#' id /// public void IdSelector(){NeedThenDrop('#'); var id = Identifier(); if (string.IsNullOrEmpty(id)) return; var element = Document.GetElementById(id); if (element!=null && !Elements.Contains(element)) Elements.Add(element); }

【C#|c#WebBrowser的自动化模拟点击】
Class选择器通过过滤元素的class属性工作,代码如下:

1234567891011121314151617181920212223

/// /// ClassSelector := '.' class/// public void ClassSelector(){NeedThenDrop('.'); var css = Identifier(); if (string.IsNullOrEmpty(css)) return; foreach (var element in Elements){var cssList = element.GetAttribute("className").Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries); foreach (var s in cssList){if (s == css && !Elements.Contains(element)){Elements.Add(element); break; }}}}


注意:要获取class属性的值,请使用className
标签选择器直接调用内置的GetElementsByTagName函数,代码如下:

123456789101112

/// /// TagSelector := TagName/// public void TagSelector(){var tagName = TagName(); foreach (HtmlElement element in Document.GetElementsByTagName(tagName)){if (!Elements.Contains(element)) Elements.Add(element); }}


*号选择器表示选择页面中的所有元素,代码如下:

123456789101112

/// /// StarSelector := '*'/// public void StarSelector(){NeedThenDrop('*'); foreach (HtmlElement element in Document.All){Elements.Add(element); }}


2.4 基础过滤器 一个基础过滤器可以是一个属性过滤器,也可以是冒号过滤器,代码如下:
123456789

/// /// BaseFilter := { AttributeFilter | ColonFilter }/// public void BaseFilter(){if (Expect("[")) AttributeFilter(); else if(ExpectThenDrop(":"))ColonFilter(); else throw new Exception("语法不正确."); }

属性过滤器的代码如下:

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384

/// /// AttributeFilter := '[' attr ']'///| '[' attr '=' val ']'///| '[' attr '!=' val ']'///| '[' attr '^=' val ']'///| '[' attr '$=' val ']'///| '[' attr '*=' val ']'///| '[' attr '~=' val ']'///| '[' attr '|=' val ']'/// public void AttributeFilter(){NeedThenDrop('['); /*1.读取属性*/var attribute = AttributeName(); /*2.读取运算*/SelectedByAttribute handler = null; var hasVal = true; if (ExpectThenDrop("]")){handler = (element, attr, val) => !string.IsNullOrEmpty(GetAttribute(element, attr)); hasVal = false; }else if (ExpectThenDrop("=")){handler = (element, attr, val) => GetAttribute(element, attr) == val; }else if (ExpectThenDrop("!=")){handler = (element, attr, val) => GetAttribute(element, attr) != val; }else if (ExpectThenDrop("^=")){handler = (element, attr, val) => GetAttribute(element, attr).StartsWith(val); }else if (ExpectThenDrop("$=")){handler = (element, attr, val) => GetAttribute(element, attr).EndsWith(val); }else if (ExpectThenDrop("*=")){handler = (element, attr, val) => GetAttribute(element, attr).Contains(val); }else if (ExpectThenDrop("~=")){handler = (element, attr, val) =>{var list = GetAttribute(element, attr)?.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries); foreach (var s in list){if (s == val) return true; }return false; }; }else if (ExpectThenDrop("|=")){handler = (element, attr, val) =>{var s = GetAttribute(element, attr); return s.StartsWith(val) || s.StartsWith(val + "_"); }; }else throw new Exception("语法错误."); /*3.读取值*/var value = https://www.it610.com/article/string.Empty; if (hasVal){value = ReadString(); NeedThenDrop(']'); }/*4.执行匹配*/var ls = new List(); foreach (var element in Elements){if (handler(element, attribute, value)) ls.Add(element); }Elements = ls; }


注意:有些属性没有办法直接通过内置的GetAttribute函数获取,这里我们自己写了一个,采用正则表达式。
冒号过滤器的代码如下:

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255

/// /// ColonFilter := ':' 'animated'///| ':' 'button'///| ':' 'checkbox'///| ':' 'checked'///| ':' 'contains' '(' text ')'///| ':' 'disabled'///| ':' 'empty'///| ':' 'enabled'///| ':' 'eq' '(' n ')'///| ':' 'even'///| ':' 'file'///| ':' 'first'///| ':' 'first-child'///| ':' 'gt' '(' n ')'///| ':' 'has' '(' sel ')'///| ':' 'header'///| ':' 'hidden'///| ':' 'image'///| ':' 'input'///| ':' 'last'///| ':' 'last-child'///| ':' 'lt' '(' n ')'///| ':' 'not' '(' sel ')'///| ':' 'nth' '(' n ')'///| ':' 'nth-child' '(' n ')'///| ':' 'odd'///| ':' 'only-child'///| ':' 'parent'///| ':' 'password'///| ':' 'radio'///| ':' 'reset' ///| ':' 'selected'///| ':' 'submit'///| ':' 'text'///| ':' 'visible'/// public void ColonFilter(){NeedThenDrop(':'); var children = new List(); /*1.准备过滤条件*/SelectedHandle handle = null; if (ExpectThenDrop("animated")){throw new Exception("不支持的元素选择器:animated"); }else if (ExpectThenDrop("button")){handle = (element, index) => element.TagName.ToLower() == "button" || (element.TagName.ToLower() == "input" && GetAttribute(element, "type") == "button"); }else if (ExpectThenDrop("checkbox")){handle = (element, index) => element.TagName.ToLower() == "input" && GetAttribute(element, "type") == "checkbox"; }else if (ExpectThenDrop("checked")){handle = (element, index) => element.TagName.ToLower() == "input" && GetAttribute(element, "type") == "checkbox" && GetAttribute(element, "checked") == "checked"; }else if (ExpectThenDrop("contains")){NeedThenDrop('('); var text = ReadUntil(')'); NeedThenDrop(')'); handle = (element, index) => element.InnerText.Contains(text); }else if (ExpectThenDrop("disabled")){handle = (element, index) => element.Enabled = false; }else if (ExpectThenDrop("empty")){handle = (element, index) => string.IsNullOrEmpty(element.InnerText) && string.IsNullOrEmpty(element.InnerHtml); }else if (ExpectThenDrop("enabled")){handle = (element, index) => element.Enabled = true; }else if (ExpectThenDrop("eq") || ExpectThenDrop("nth")){NeedThenDrop('('); var text = ReadUntil(')'); NeedThenDrop(')'); var n = int.Parse(text); handle = (element, index) => index == n; }else if (ExpectThenDrop("even")){handle = (element, index) => index % 2 == 0; }else if (ExpectThenDrop("file")){handle = (element, index) => element.TagName.ToLower() == "input" && GetAttribute(element, "type") == "file"; }else if (ExpectThenDrop("first")){handle = (element, index) => index == 0; }else if (ExpectThenDrop("first-child")){handle = (element, index) =>{if (element.Children.Count > 0) children.Add(element.Children[0]); return false; }; }else if (ExpectThenDrop("gt")){NeedThenDrop('('); var text = ReadUntil(')'); NeedThenDrop(')'); var n = int.Parse(text); handle = (element, index) => index > n; }else if (ExpectThenDrop("has")){NeedThenDrop('('); var sel = ReadInnerText('(', ')'); NeedThenDrop(')'); var innerSelector = new Selector(WebBrowser); var innerElements = innerSelector.SelectElements(sel); handle = (element, index) => innerElements.Contains(element); }else if (ExpectThenDrop("header")){var headers = new List { "h1", "h2", "h3", "h4", "h5", "h6", "h7" }; handle = (element, index) => headers.Contains(element.TagName.ToLower()); }else if (ExpectThenDrop("hidden")){throw new Exception("暂不支持该功能."); }else if (ExpectThenDrop("image")){handle = (element, index) => element.TagName.ToLower() == "input" && GetAttribute(element, "type") == "image"; }else if (ExpectThenDrop("input")){var inputs = new List { "input", "textarea", "select", "button" }; handle = (element, index) => inputs.Contains(element.TagName.ToLower()); }else if (ExpectThenDrop("last")){var last = Elements.Count - 1; handle = (element, index) => index == last; }else if (ExpectThenDrop("last-child")){handle = (element, index) =>{if (element.Children.Count > 0) children.Add(element.Children[element.Children.Count - 1]); return false; }; }else if (ExpectThenDrop("lt")){NeedThenDrop('('); var text = ReadUntil(')'); NeedThenDrop(')'); var n = int.Parse(text); handle = (element, index) => index < n; }else if (ExpectThenDrop("not")){NeedThenDrop('('); var sel = ReadInnerText('(', ')'); NeedThenDrop(')'); var innerSelector = new Selector(WebBrowser); var innerElements = innerSelector.SelectElements(sel); handle = (element, index) => !innerElements.Contains(element); }else if (ExpectThenDrop("nth-child")){NeedThenDrop('('); var text = ReadUntil(')'); NeedThenDrop(')'); var n = int.Parse(text); handle = (element, index) =>{if (element.Children.Count > n){children.Add(element.Children[n]); }return false; }; }else if (ExpectThenDrop("odd")){handle = (element, index) => index % 2 == 1; }else if (ExpectThenDrop("only-child")){handle = (element, index) =>{if (element.Children.Count == 1) children.Add(element.Children[0]); return false; }; }else if (ExpectThenDrop("parent")){handle = (element, index) => element.Children.Count > 0; }else if (ExpectThenDrop("password")){handle = (element, index) => element.TagName.ToLower() == "input" && GetAttribute(element, "type") == "password"; }else if (ExpectThenDrop("radio")){handle = (element, index) => element.TagName.ToLower() == "input" && GetAttribute(element, "type") == "radio"; }else if (ExpectThenDrop("reset")){handle = (element, index) => (element.TagName.ToLower() == "input" || element.TagName.ToLower() == "button") && GetAttribute(element, "type") == "reset"; }else if (ExpectThenDrop("selected")){handle = (element, index) => element.TagName.ToLower() == "option " && GetAttribute(element, "selected") == "selected"; }else if (ExpectThenDrop("reset")){handle = (element, index) => (element.TagName.ToLower() == "input" || element.TagName.ToLower() == "button") && GetAttribute(element, "type") == "submit"; }else if (ExpectThenDrop("text")){handle = (element, index) => element.TagName.ToLower() == "input" && GetAttribute(element, "type") == "text"; }else if (ExpectThenDrop("visible")){throw new Exception("暂不支持该功能."); }else throw new Exception("过滤器不能为空."); /*2.执行过滤*/var ls = new List(); for (var i = 0; i < Elements.Count; i++){var element = Elements[i]; if (handle(element, i)) ls.Add(element); }ls.AddRange(children); Elements = ls; }


3.使用元素选择器
下面的代码展示如何使用元素选择器:
123

var selector=new Selector(webBrowser); selector.SelectElement("#kw").InnerText="Baidu"; selector.SelectElement(".s_btn").InvokeMember("click");

二、自动模拟点击 1.与元素交互
1.1 设置元素的值 设置input元素的值时需要设置其value属性,代码如下:

123456789101112131415161718

/// /// 设置元素的值/// public void SetValue(string selector, object value){var ele = Selector.SelectElement(selector); if (ele != null){if (ele.TagName.ToLower() == "input"){ele.SetAttribute("value", value?.ToString()); return; }ele.InnerText = value?.ToString(); }Debug.WriteLineIf(ele == null, $"未找到指定的元素:{selector}"); }


1.2 获取元素的值 获取input元素的值时需要从其value属性中获取,代码如下:

12345678910111213141516

/// /// 获取元素的值/// public string GetValue(string selector){var ele = Selector.SelectElement(selector); if (ele != null){if (ele.TagName.ToLower() == "input") return ele.GetAttribute("value"); return ele.InnerText; }Debug.WriteLine($"未找到指定的元素:{selector}"); return null; }


1.3 点击指定的元素 使用InvokeMember函数模拟点击操作,代码如下:

12345678910111213

/// /// 点击指定的元素/// public void Click(string selector){var ele = Selector.SelectElement(selector); if (ele != null){ele.InvokeMember("click"); }Debug.WriteLineIf(ele == null, $"未找到指定的元素:{selector}"); }


1.4 移除指定的元素
12345678910111213

/// /// 移除指定的元素/// public void Remove(string selector){var list = Selector.SelectElements(selector); foreach (var element in list){Debug.WriteLine("Remove:" + element.TagName); element.OuterHtml = string.Empty; }}

1.5 模拟键盘输入 SetValue函数是一个强制设置元素值的方法,有时候,我们还需要触发获取焦点和离开焦点的事件,而对于只读的元素,不设置其值,即模拟用户的键盘输入。代码如下:
1234567891011121314151617

/// /// 模拟键盘向指定的输入控件输入一行文本/// public void SendKeys(string selector, string value, bool readOnly = true){var ele = Selector.SelectElement(selector); if (ele != null){if (readOnly && ele.GetAttribute("readonly") == "readonly") return; ele.InvokeMember("focus"); ele.InnerText = value; ele.InvokeMember("blur"); }Debug.WriteLineIf(ele == null, $"未找到指定的元素:{selector}"); }

1.6 获取焦点 有时候,我们需要让指定元素获取焦点,代码如下:

12345678910

/// /// 获取焦点/// public void Focus(string selector){var ele = Selector.SelectElement(selector); if (ele != null) ele.Focus(); Debug.WriteLineIf(ele == null, $"未找到指定的元素:{selector}"); }


2.等待操作
2.1 等待浏览器的状态 等待浏览器到达指定的状态,代码如下:

12345678910

/// /// 等待浏览器到达指定的状态/// public void Wait(WebBrowserReadyState state = WebBrowserReadyState.Complete){while (WebBrowser.ReadyState != state){Application.DoEvents(); }}


2.2 等待指定的URL 等待指定的URL加载完毕,通常在Navigate之后调用。代码如下:

1234567891011

/// /// 等待指定的URL加载完成,注意每次只能等待一个url/// public void WaitUrl(string url){waitUrl = url; while (waitUrl != null){Application.DoEvents(); }}


2.3 等待指定的的元素出现 等待指定的的元素出现,代码如下:
12345678910

/// /// 等待指定的的元素出现/// public void Wait(string selector){while (Selector.SelectElement(selector) == null){Application.DoEvents(); }}

2.4 等待指定的元素消失 等待指定的元素消失,代码如下:

12345678910

/// /// 等待指定的元素消失/// public void WaitClose(string selector){while (Selector.SelectElement(selector) != null){Application.DoEvents(); }}


2.5 等待一段时间 等待一段时间,类似于Thread.Sleep()函数,代码如下:

12345678910111213

/// /// 等待一段时间/// public void Sleep(int milliseconds){var start = DateTime.Now; var stop = start.AddMilliseconds(milliseconds); while (DateTime.Now < stop){Application.DoEvents(); }}


2.6 延迟一段时间后执行 延迟一段时间后执行(不阻塞浏览器线程),代码如下:

12345678910111213141516

/// /// 延迟一段时间后执行/// public void Delay(int milliseconds, Action action){var timer = new Timer(); timer.Interval = milliseconds; timer.Tick += (sender, e) =>{timer.Stop(); timer.Dispose(); action(this); }; timer.Start(); }


2.7 延迟到某个条件满足后执行 延迟到某个条件满足后执行(不阻塞浏览器线程),代码如下:

12345678910111213141516171819

/// /// 延迟到某个条件满足后执行/// public void Delay(string selector, Action action, bool isDisplay = true){var timer = new Timer(); timer.Interval = 20; timer.Tick += (sender, e) =>{var element = Selector.SelectElement(selector); if (isDisplay == (element == null)) return; timer.Stop(); timer.Dispose(); action(this); }; timer.Start(); }


3.脚本注入
3.1 执行指定脚本 根据脚本名称执行指定的脚本,采用内置InvokeScript的方法,代码如下:
1234567891011

/// /// 执行指定脚本,脚本必须已加载完毕./// public object Invoke(string scriptName, params object[] args){var document = WebBrowser.Document; if (document == null) return null; var result = document.InvokeScript(scriptName, args); return result; }

3.2 使用EVAL执行任意脚本 代码如下:

1234567891011

/// /// 使用eval执行任意脚本。注意:eval在局部作用域中执行,而execScript在全局作用域中执行。/// public object Eval(string script){var document = WebBrowser.Document; if (document == null) return null; var result = document.InvokeScript("eval", new object[] { script }); return result; }


3.3 使用EXECSCRIPT执行任意脚本 使用execScript执行任意脚本。注意:eval在局部作用域中执行,而execScript在全局作用域中执行。代码如下:

12345678910

/// /// 使用execScript执行任意脚本。注意:eval在局部作用域中执行,而execScript在全局作用域中执行。/// public void Exec(string script){var document = WebBrowser.Document; if (document == null) return; document.InvokeScript("execScript", new object[] { script }); }

    推荐阅读