Releases: zhegexiaohuozi/JsoupXpath
Releases · zhegexiaohuozi/JsoupXpath
V2.5.3
优化following-sibling
following
preceding-sibling
preceding
行为,以便更好的适配文本提取场景,如下:
@Test
public void issue64And65(){
String content = "<div class='a'>1</div>" +
"<div>2</div>\n" +
"<div class='a'>3</div>\n" +
"<div>4</div>\n" +
"<div>5</div>11" +
"<tag>6</tag>12" +
"<div>7<span>8</span></div>" +
"";
JXDocument j = JXDocument.create(content);
Assert.assertEquals("7", j.selNOne("//div[text()='5']/following-sibling::div/text()").asString());
Assert.assertEquals("6", j.selNOne("//div[text()='5']/following-sibling::tag/text()").asString());
Assert.assertEquals("11", j.selNOne("//div[text()='5']/following-sibling::text()").asString());
Assert.assertEquals("12", j.selNOne("//div[text()='7']/preceding-sibling::text()").asString());
Assert.assertEquals("5", j.selNOne("//div[text()='7']/preceding-sibling::div/text()").asString());
Assert.assertEquals("6", j.selNOne("//div[text()='7']/preceding-sibling::tag/text()").asString());
Assert.assertEquals("6", j.selNOne("//div[text()='7']/preceding-sibling::tag/text()").asString());
Assert.assertEquals("11 6 12 7 8", j.selN("//div[text()='5']/following::text()").stream().map(Objects::toString).collect(Collectors.joining(" ")).trim());
Assert.assertEquals("6", j.selN("//div[text()='5']/following::tag/text()").stream().map(Objects::toString).collect(Collectors.joining(" ")).trim());
Assert.assertEquals("8", j.selN("//div[text()='5']/following::span/text()").stream().map(Objects::toString).collect(Collectors.joining(" ")).trim());
Assert.assertEquals("5 7", j.selN("//div[text()='4']/following::div/text()").stream().map(Objects::toString).collect(Collectors.joining(" ")).trim());
Assert.assertEquals("2 1", j.selN("//div[text()='3']/preceding::text()").stream().map(Objects::toString).collect(Collectors.joining(" ")).trim());
Assert.assertEquals("3 2 1", j.selN("//div[text()='4']/preceding::text()").stream().map(Objects::toString).collect(Collectors.joining(" ")).trim());
}
以及豆瓣详情页提取测试:
@Test
public void testDoubanDetailInfoExtra() throws Exception{
JXDocument doc = createFromResource("d_detail_page.html");
JXNode score = doc.selNOne("//*[@id=\"interest_sectl\"]/div/div[2]/strong/text()");
logger.info("{}", score.asString());
JXNode title = doc.selNOne("//*[@id=\"wrapper\"]/h1/span/text()");
logger.info("{}", title.asString());
JXNode pageNum = doc.selNOne("//*[@id=\"info\"]/span[contains(text(),'页数')]/following-sibling::text()");
logger.info("{}", pageNum.asString());
Assert.assertEquals("956", pageNum.asString());
JXNode price = doc.selNOne("//*[@id=\"info\"]/span[contains(text(),'定价')]/following-sibling::text()");
logger.info("{}", price.asString());
Assert.assertEquals("139.00元", price.asString());
}
v2.5.2
last()
优化jsoup
依赖版本升级,避免安全隐患
v2.5.1
- 修复了 PrecedingSiblingOneSelector 这个函数无效的问题 , 感谢@s24963386贡献!
- 修复 #66 ,函数参数表达式使用的上下文不够全面的问题
- 优化
text()
块节点属性信息,以便更好的支持倒序索引 - 增加
double/long sum(node-set)
函数,计算给定的节点集合中数字节点值的和,计算参数范围内包含非数字内容则计算无效。 - 优化
num()
结果表现,尽量符合用户使用直觉。整数返回整数,浮点数返回浮点数,不再统一只返回浮点数。
2.5.0
升级部分依赖版本至最新版,功能没有变动和调整,各位同学可以根据各自的实际使用的情况选择是否升级至该版本。
- Jsoup版本由
1.10.3
升级至1.14.1
- commons-lang3 版本
3.3.2
升级至3.12.0
- slf4j-api 版本
1.7.25
升级至1.7.32
2.4.3
v2.4.2
- fix #44
@Test
public void fixTextElNoParentTest(){
String test="<div class='a'> a <div>need</div> <div class='e'> not need</div> c </div>";
JXDocument j = JXDocument.create(test);
List<JXNode> l = j.selN("//div[@class='a']//text()[not(ancestor::div[@class='e'])]");
Set<String> finalRes = new HashSet<>();
for (JXNode i : l){
logger.info("{}",i.toString());
finalRes.add(i.asString());
}
Assert.assertFalse(finalRes.contains("not need"));
Assert.assertTrue(finalRes.contains("need"));
Assert.assertEquals(4, finalRes.size());
}
v2.4.1
- fix #52
- 优化
//text()
递归场景渲染顺序
之前版本的关于text()
函数的实现有些简化了,在某些特殊场景无法做到按索引精准提取某一文本块。本次更新重构了text()
函数,支持语法范围内全部标准行为。
@Test
public void FixTextBehaviorTest(){
String html = "<p><span class=\"text-muted\">分类:</span>动漫<span class=\"split-line\"></span><span class=\"text-muted hidden-xs\">地区:</span>日本<span class=\"split-line\"></span><span class=\"text-muted hidden-xs\">年份:</span>2010</p>";
JXDocument jxDocument = JXDocument.create(html);
List<JXNode> jxNodes = jxDocument.selN("//text()[3]");
String actual = StringUtils.join(jxNodes,"");
logger.info("actual = {}",actual);
Assert.assertEquals("2010", actual);
}
对老代码的影响
text()
不再简单的返回节点下的所有文本,而是按照标准语义识别出多个文本块,返回文本块列表,如
<p> one <span> two</span> three </p>
//text()
返回["one", "two", "three" ]
//text()[2]
返回["three"]
- 每个文本块会自动去掉开头和结尾的空白
allText()
表现会和以前一样,可酌情使用