C# 正则表达式提取html中的文本

2024-11-02 02:27:56

推荐回答（5个）

回答1：

static void Main(string[] args)
{
String s = @"

这里是要取出的文本A 这里是要取出的文本B 超链接里的文本不取出这里是要取出的文本C

";
Regex regex = new Regex( "(/?\\w+)[^>]*>([^<]*)<", RegexOptions.IgnoreCase );

MatchCollection ms = regex.Matches( s );

foreach( Match m in ms )
{
string tagName = m.Groups[1].Value.ToLower();
string text = m.Groups[2].Value.Trim();
if( tagName != "a" && text.Length > 0 )
Console.WriteLine( text );
}
}
结果：
这里是要取出的文本A
这里是要取出的文本B
这里是要取出的文本C

回答2：

去找一个叫HtmlAgilityPack的组件，然后使用xpath查找节点，比使用正则容易得多了。

回答3：

第一个问题，(?<=

).*?(?=

)

第二个问题，[^^]*?
第三个问题，(?<=

).*?(?=

)
其他的你参照一下吧。一楼已经说的基本差不多了。把正则换一下就行了。

回答4：

string s1 = new Regex("(?<=

).*?(?=

)").Match("

value

").Value);

回答5：