We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wikipedia markdown could appropriately parsed. Candidate: https://github.com/vsch/flexmark-java is fork of https://github.com/commonmark/commonmark-java
sirthias/parboiled uses sirthias/pegdown which is depricated and suggests vsch/flexmark-java
Example without Table parsing, but with LinkRef extraction: https://en.wikipedia.org/w/api.php?action=parse&prop=wikitext&format=json&oldid=1126125069
package org.dice_research.launuts.linking; import java.io.File; import org.dice_research.launuts.Config; import org.dice_research.launuts.io.Io; import org.jetbrains.annotations.NotNull; import org.json.JSONObject; import com.vladsch.flexmark.parser.Parser; import com.vladsch.flexmark.util.ast.Node; import com.vladsch.flexmark.util.ast.NodeVisitor; import com.vladsch.flexmark.util.ast.TextCollectingVisitor; import com.vladsch.flexmark.util.ast.VisitHandler; import com.vladsch.flexmark.util.ast.Visitor; public class WikipediaLinking { public static final String PREFIX_WP_OLDID = "https://en.wikipedia.org/w/api.php?action=parse&prop=wikitext&format=json&oldid="; public static final String WP_NUTS1EU_OLDID = "1126125069"; public static final String WP_NUTS1EU_FILENAME = "NUTS-1-EU.json"; public static File getWpNuts1euFile() { return new File(Config.get(Config.KEY_DOWNLOAD_DIRECTORY), WP_NUTS1EU_FILENAME); } /** * Downloads NUTS 1 sources from 17:45, 7 December 2022. * * @see https://en.wikipedia.org/w/index.php?title=First-level_NUTS_of_the_European_Union&oldid=1126125069 * @see https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=First-level_NUTS_of_the_European_Union&rvslots=*&rvprop=content */ public WikipediaLinking downloadWpNuts1Eu() { Io.download(PREFIX_WP_OLDID + WP_NUTS1EU_OLDID, getWpNuts1euFile(), false); return this; } private String getNutsWikisource() { return new JSONObject(Io.readFileToString(getWpNuts1euFile())).getJSONObject("parse").getJSONObject("wikitext") .getString("*"); } public static void main(String[] args) { String markdown = new WikipediaLinking().downloadWpNuts1Eu().getNutsWikisource(); if (Boolean.FALSE) { System.out.println(markdown); } // https://github.com/vsch/flexmark-java/blob/0.64.0/flexmark-java-samples/src/com/vladsch/flexmark/java/samples/BasicSample.java Parser parser = Parser.builder().build(); Node document = parser.parse(markdown); if (Boolean.FALSE) { VisitorSmpl visitorSmpl = new WikipediaLinking().new VisitorSmpl(); visitorSmpl.visit(document); System.out.println(visitorSmpl.getText()); } if (Boolean.FALSE) { TextCollectingVisitor textCollectingVisitor = new TextCollectingVisitor(); System.out.println(textCollectingVisitor.collectAndGetText(document)); } } /** * Usage: VisitorSmpl visitorSmpl = new WikipediaLinking().new VisitorSmpl(); * visitorSmpl.visit(document); System.out.println(visitorSmpl.getText()); * * @see https://github.com/vsch/flexmark-java/blob/0.64.0/flexmark-java-samples/src/com/vladsch/flexmark/java/samples/VisitorSample.java */ public class VisitorSmpl implements Visitor<Node> { NodeVisitor visitor = new NodeVisitor(new VisitHandler<>(Node.class, this::visit)); StringBuilder sb = new StringBuilder(); @Override public void visit(@NotNull Node node) { sb.append(node.getChars().unescape()); visitor.visitChildren(node); } public String getText() { return sb.toString(); } } }
Also see #12 and #13
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Wikipedia markdown could appropriately parsed.
Candidate: https://github.com/vsch/flexmark-java
is fork of https://github.com/commonmark/commonmark-java
sirthias/parboiled uses sirthias/pegdown which is depricated and suggests vsch/flexmark-java
Example without Table parsing, but with LinkRef extraction:
https://en.wikipedia.org/w/api.php?action=parse&prop=wikitext&format=json&oldid=1126125069
Also see #12 and #13
The text was updated successfully, but these errors were encountered: