Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate JDT wiki pages #54

Closed
trancexpress opened this issue Feb 15, 2023 · 23 comments
Closed

Migrate JDT wiki pages #54

trancexpress opened this issue Feb 15, 2023 · 23 comments

Comments

@trancexpress
Copy link

trancexpress commented Feb 15, 2023

See: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/681

Essentially we'll try the automatic process: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/681#note_414703

Export the pages you want to convert from https://wiki.eclipse.org/Special:Export, save the file to /tmp/wikiexport (or somewhere else, but adjust below accordingly) then

docker run -it --rm -v /tmp/wikiexport:/wikiexport --entrypoint /bin/bash php:8
curl -sS https://getcomposer.org/installer | php -- --install-dir=/usr/local/bin --filename=composer
apt-get update && apt-get install -y git vim pandoc
git clone https://github.com/outofcontrol/mediawiki-to-gfm.git
cd mediawiki-to-gfm
composer update --no-dev
./convert.php --filename=/wikiexport/Eclipsepedia-.xml --output=/wikiexport/markdown/

Aiming to convert everything under: https://wiki.eclipse.org/Category:JDT

There is also the question of whether we can link the old wiki pages to the new ones, assuming the old wiki pages will be archived.

@trancexpress
Copy link
Author

trancexpress commented Feb 15, 2023

Note that the converter has a problem with html like this and similar:

{|class

So the xml generated by the export should be adjusted a bit:

--- Eclipsepedia-20230215110023_original.xml    2023-02-15 13:02:13.559548788 +0200
+++ Eclipsepedia-20230215110023.xml     2023-02-15 13:38:01.612788671 +0200
@@ -311,7 +311,7 @@
 === Milestone 1 ===
 Here we will list all the features which are currently available in the plugin.
 
-{| style="width:75%; 
+{{| style="width:75%;
 |- valign="top"
 | '''Compile warnings for non-final locks'''
 || Many developers have the need that every lock should be final to not change the lock inside the synchronized block. The plugins adds a new compiler warning which checks every used lock of synchronized statements wether they are final.
@@ -337,11 +337,11 @@
 
 [[Image:Jdtc_nullSynchronizedFix.png]]
 
-|}
+|}}
 
 === Milestone 2 ===
 
-{| style="width:75%; 
+{{| style="width:75%;
 |- valign="top"
 | '''Preference page for problem severities'''
 || As everyone has it's own style of development or is only interested in some of the compile checks, there is now a preference page which let's you decide if you want to see the problems as warnings or as errors. You can even disable the checks completely.
@@ -359,11 +359,11 @@
 
 [[Image:Jdtc_globalLocksString.png]]
 
-|}
+|}}
 
 === Milestone 3 ===
 
-{| style="width:75%; 
+{{| style="width:75%;
 |- valign="top"
 | '''Inlining synchronized methods'''
 || JDT does a great job with providing the "Inline method" refactoring. There is just one major issue that synchronization is completely ignored. In order to fix this, I'm working on a patch for the "Inline method" refactoring which allows you to inline synchronized methods and still having the proper synchronization in place. This will be availabe in one of the next builds of JDT after the patch is accepted (see [https://bugs.eclipse.org/bugs/show_bug.cgi?id=112100 bug 112100]).
@@ -384,11 +384,11 @@
 
 [[Image:Jdtc_override.png]]
 
-|}
+|}}
 
 === Milestone 4 ===
 
-{| style="width:75%; 
+{{| style="width:75%;
 |- valign="top"
 | '''Convert between lock types'''
 || With the addition of the <code>java.util.concurrent</code> package in Java 1.5 you're now able to use other types of locks then the built-in <code>synchronized</code> statement. In order to quickly convert your synchronized statement to a ReentrantLock (which is essentially the same), we will now provide a quick fix for this. But please use the ReentrantLock only if you really need it's additional features which are not available with the <code>synchronized</code> statement.
@@ -404,7 +404,7 @@
 And the end result:
 
 [[Image:Jdtc_convertLockResult.png]]
-|}
+|}}
 
 === Milestone 5 ===
 
@@ -419,7 +419,7 @@
 == Timeline ==
 Here is a complete list of the milestones and release candidates planned for this plugin.
 
-{| class="wikitable" style="text-align:center"
+{{| class="wikitable" style="text-align:center"
 |- style="background:#efefef;"
 ! Milestone !! Date !! Planned items
 |- style="background:lightgrey;"
@@ -454,7 +454,7 @@
 ! Pencils down
 | August 11
 |align="left"| -
-|}
+|}}
 
 
 == Community Involvement ==
@@ -5795,7 +5795,7 @@
 === Overview ===
 [https://bugs.eclipse.org/bugs/showdependencytree.cgi?id=539137&maxdepth=1&hide_resolved=0 The Java Release Tree]
 <div> 
-        {|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
+        {{|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
         | Java Version Support
         | JDT Bug(s)
         | Remarks
@@ -5839,14 +5839,14 @@
         | [http://openjdk.java.net/projects/jdk/10/ Java 10]
         | [https://bugs.eclipse.org/bugs/show_bug.cgi?id=525732 Top Level Java 10 Bug]
         | Release (3/2018)
-|}</div>
+|}}</div>
 
 === JDK 19 ===
 
 Handy General OPEN JDK Queries aka top-level requirement
 
 <div> 
-        {|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
+        {{|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
         | <b>[https://tinyurl.com/eclipseJDTcsr19 Generic Query]</b>
         | <b>[https://tinyurl.com/eclipseJDTjls19 Language Specification]</b>
         | <b>[https://tinyurl.com/eclipseJDTvm19 VM Specification]</b>
@@ -5854,14 +5854,14 @@
 | <b>[https://tinyurl.com/eclipseJDTjavadoc19 javadoc] </b>
 | <b>[https://bugs.openjdk.java.net/secure/Dashboard.jspa?selectPageId=20303 JEP Dashboard] </b>
 | <b>[https://bugs.openjdk.java.net/secure/Dashboard.jspa?selectPageId=20304 CSR Dashboard] </b>
-|}
+|}}
 </div>
 
 ==== Java 19 Planning Input Data ====
 
 Planning data to be filled if needed - else track via bugs
 <div> 
-        {|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
+        {{|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
         | 
         | style="background: none repeat scroll 0% 0% green;" | [https://bugs.eclipse.org/bugs/show_bug.cgi?id=571397 For Example Array and Record Pattern (First Preview)]
         | style="background: none repeat scroll 0% 0% green;" | [https://bugs.eclipse.org/bugs/show_bug.cgi?id=571398 For Example Switch Pattern (First Preview)]
@@ -5914,7 +5914,7 @@
         | 0d
         | 0d
         |-
-  |}</div>
+  |}}</div>
 
 
 {| cellspacing="0" cellpadding="5" border="1" style="width: 300px; height: 25px;"
@@ -5937,7 +5937,7 @@
 Handy General OPEN JDK Queries aka top-level requirement
 
 <div> 
-        {|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
+        {{|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
         | <b>[https://tinyurl.com/3khyf37r Generic Query]</b>
         | <b>[https://bit.ly/3zJfs8K Language Specification]</b>
         | <b>[https://bit.ly/2ZBAJon VM Specification]</b>
@@ -5945,7 +5945,7 @@
 | <b>[https://bit.ly/3kJG31a javadoc] </b>
 | <b>[https://bugs.openjdk.java.net/secure/Dashboard.jspa?selectPageId=20303 JEP Dashboard] </b>
 | <b>[https://bugs.openjdk.java.net/secure/Dashboard.jspa?selectPageId=20304 CSR Dashboard] </b>
-|}
+|}}
 </div>
 
 === JDK 17 ===
@@ -5953,7 +5953,7 @@
 Handy General OPEN JDK Queries aka top-level requirement
 
 <div> 
-        {|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
+        {{|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
         | <b>[https://bit.ly/3tIUcNH Generic Query]</b>
         | <b>[https://bit.ly/3tJxpS0 Language Specification]</b>
         | <b>[https://bit.ly/38ZWj7F VM Specification]</b>
@@ -5961,14 +5961,14 @@
 | <b>[https://bit.ly/3eZtKvb javadoc] </b>
 | <b>[https://bugs.openjdk.java.net/secure/Dashboard.jspa?selectPageId=19800 JEP Dashboard] </b>
 | <b>[https://bugs.openjdk.java.net/secure/Dashboard.jspa?selectPageId=19801 CSR Dashboard] </b>
-|}
+|}}
 </div>
 === JDK 16 ===
 
 Handy General OPEN JDK Queries aka top-level requirement [https://eclip.se/gW Dep Tree]
 
 <div> 
-        {|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
+        {{|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
         | <b>[https://bit.ly/33Cxawx Generic Query]</b>
         | <b>[https://bit.ly/3iIh8Yb Language Specification]</b>
         | <b>[https://bit.ly/3kt68i2 VM Specification]</b>
@@ -5976,7 +5976,7 @@
 | <b>[https://bit.ly/3mHnDx0 javadoc] </b>
 | <b>[https://bugs.openjdk.java.net/secure/Dashboard.jspa?selectPageId=19517 JEP Dashboard] </b>
 | <b>[https://bugs.openjdk.java.net/secure/Dashboard.jspa?selectPageId=19118 CSR Dashboard] </b>
-|}
+|}}
 </div>
 
 === JDK 15 ===
@@ -5985,7 +5985,7 @@
 [https:://eclip.se/gu | Top Level Bug - 559959]
 
 <div> 
-        {|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
+        {{|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
         | <b> [https://bit.ly/2xOUdI2 Generic Query]</b>
         | <b>[https://bit.ly/2V3rHdF Language Specification]</b>
         | <b>[https://bit.ly/2UY5cXG VM Specification]</b>
@@ -5993,7 +5993,7 @@
 | <b>[https://bit.ly/3bTrTTV javadoc] </b>
 | <b>[https://bugs.openjdk.java.net/secure/Dashboard.jspa?selectPageId=19114 JEP Dashboard] </b>
 | <b>[https://bugs.openjdk.java.net/secure/Dashboard.jspa?selectPageId=19115 CSR Dashboard] </b>
-|}
+|}}
 </div>
 
 === JDK 14 ===
@@ -6002,7 +6002,7 @@
 [https://bugs.eclipse.org/bugs/showdependencytree.cgi?id=549808&hide_resolved=1 | Top Level Bug - 549808]
 
 <div> 
-        {|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
+        {{|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
         | <b> [https://tinyurl.com/y37semed Generic Query]</b>
         | <b>[https://bugs.openjdk.java.net/browse/JDK-8231435?jql=fixVersion%20in%20(%2214%22%2C%2014.0.1%2C%2014.0.2)%20AND%20component%20%3D%20specification%20AND%20Subcomponent%20in%20(language%2C%20language%2C%20language) Language Specification]</b>
         | <b>[https://bugs.openjdk.java.net/issues/?jql=fixVersion%20in%20(%2214%22%2C%2014.0.1%2C%2014.0.2)%20AND%20component%20%3D%20specification%20AND%20Subcomponent%20in%20(vm%2C%20vm%2C%20vm%2C%20vm) VM Specification]</b>
@@ -6010,7 +6010,7 @@
 | <b>[https://bugs.openjdk.java.net/browse/JDK-8231587?jql=project%20%3D%20JDK%20AND%20component%20%3D%20tools%20AND%20Subcomponent%20%3D%20"javadoc(tool)" javadoc] </b>
 | <b>[https://bugs.openjdk.java.net/secure/Dashboard.jspa?selectPageId=18512 JEP Dashboard] </b>
 | <b>[https://bugs.openjdk.java.net/secure/Dashboard.jspa?selectPageId=18511 CSR Dashboard] </b>
-|}
+|}}
 </div>
 === JDK 13 ===
 
@@ -6019,7 +6019,7 @@
 [https://bugs.eclipse.org/bugs/showdependencytree.cgi?id=539066&hide_resolved=1 | Top Level Bug - 539066]
 
 <div> 
-        {|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
+        {{|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
         | <b> [https://bugs.openjdk.java.net/browse/JDK-8217698?jql=project%20%3D%20JDK%20AND%20issuetype%20%3D%20CSR%20AND%20fixVersion%20in%20(%2213%22%2C%2013.0.1)%20AND%20Subcomponent%20in%20(java.lang%2C%20tools%2C%20build%2C%20build%2C%20launcher%2C%20build%2C%20javac%2C%20launcher%2C%20tools%2C%20tools%2C%20tools%2C%20build%2C%20build%2C%20java.lang%2C%20compiler%2C%20build%2C%20tools%2C%20Compiler%2C%20java.lang.module%2C%20build%2C%20build%2C%20build%2C%20tools%2C%20tools%2C%20VM%2C%20build%2C%20lang%2C%20compiler%2C%20Parser%2C%20Parser%2C%20java.lang%2C%20language%2C%20java.lang.module%2C%20java.lang%2C%20java.lang.module%2C%20javac%2C%20tools%2C%20compiler%2C%20java.lang.module%2C%20javac%2C%20%22javadoc(tool)%22%2C%20build%2C%20build%2C%20compiler%2C%20Tools%2C%20build%2C%20tools%2C%20build%2C%20launcher%2C%20build%2C%20language%2C%20%22javadoc(tool)%22%2C%20java.lang.module%2C%20compiler%2C%20language%2C%20build%2C%20java.lang%2C%20tools%2C%20%22javadoc(tool)%22%2C%20javac) Generic Query]</b>
         | <b>[https://bugs.openjdk.java.net/browse/JDK-4660984?jql=fixVersion%20%3D%20%2213%22%20AND%20component%20%3D%20specification%20AND%20Subcomponent%20%3D%20language Language Specification]</b>
         | <b>[https://bugs.openjdk.java.net/browse/JDK-4660984?jql=fixVersion%20%3D%20%2213%22%20AND%20component%20%3D%20specification%20AND%20Subcomponent%20%3D%20vm VM Specification]</b>
@@ -6027,7 +6027,7 @@
 | <b>[https://bugs.openjdk.java.net/browse/JDK-8217842?Jql=project%20%3D%20JDK%20AND%20fixVersion%20in%20(%2213%22%2C%2013.0.1)%20AND%20Subcomponent%20in%20(%22javadoc(tool)%22%2C%20%22javadoc(tool)%22%2C%20%22javadoc(tool)%22) javadoc] </b>
 | <b>[https://bugs.openjdk.java.net/secure/Dashboard.jspa?selectPageId=18216 JEP Dashboard] </b>
 | <b>[https://bugs.openjdk.java.net/secure/Dashboard.jspa?selectPageId=18217 CSR Dashboard] </b>
-|}
+|}}
 </div>
 
 === JDK 12 ===
@@ -6036,18 +6036,18 @@
 Handy General OPEN JDK Queries aka top-level requirement
 
 <div> 
-        {|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
+        {{|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
         | <b> [https://bugs.openjdk.java.net/issues/?jql=project%20%3D%20JDK%20AND%20issuetype%20%3D%20CSR%20AND%20fixVersion%20%3D%20%2212%22%20AND%20Subcomponent%20in%20(java.lang%2C%20tools%2C%20build%2C%20build%2C%20launcher%2C%20build%2C%20javac%2C%20launcher%2C%20tools%2C%20tools%2C%20tools%2C%20build%2C%20build%2C%20java.lang%2C%20compiler%2C%20build%2C%20tools%2C%20Compiler%2C%20java.lang.module%2C%20build%2C%20build%2C%20build%2C%20tools%2C%20tools%2C%20VM%2C%20build%2C%20lang%2C%20compiler%2C%20Parser%2C%20Parser%2C%20java.lang%2C%20language%2C%20java.lang.module%2C%20java.lang%2C%20java.lang.module%2C%20javac%2C%20tools%2C%20compiler%2C%20java.lang.module%2C%20javac%2C%20%22javadoc(tool)%22%2C%20build%2C%20build%2C%20compiler%2C%20Tools%2C%20build%2C%20tools%2C%20build%2C%20launcher%2C%20build%2C%20language%2C%20%22javadoc(tool)%22%2C%20java.lang.module%2C%20compiler%2C%20language%2C%20build%2C%20java.lang%2C%20tools%2C%20%22javadoc(tool)%22%2C%20javac) Generic Query]</b>
         | <b>[https://bugs.openjdk.java.net/browse/JDK-4660984?jql=fixVersion%20%3D%20%2212%22%20AND%20component%20%3D%20specification%20AND%20Subcomponent%20%3D%20language Language Specification]</b>
         | <b>[https://bugs.openjdk.java.net/browse/JDK-4660984?jql=fixVersion%20%3D%20%2212%22%20AND%20component%20%3D%20specification%20AND%20Subcomponent%20%3D%20vm VM Specification]</b>
         | <b>[https://bugs.openjdk.java.net/browse/JDK-8028563?jql=project%20%3D%20JDK%20AND%20fixVersion%20%3D%20%2212%22%20AND%20Subcomponent%20%3D%20javac javac]</b>
         | <b>[https://bugs.openjdk.java.net/browse/JDK-8215291?jql=project%20%3D%20JDK%20AND%20fixVersion%20%3D%20%2212%22%20AND%20Subcomponent%20%3D%20%22javadoc(tool)%22 javadoc]</b>
-        |}
+        |}}
 </div>
 
 Distilled from the above, specific bugs are listed below:
 <div>
-        {|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
+        {{|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
         | <b>JEP /JDK Bug</b>
         | <b>JDK Status</b>
         | <b>Eclipse Bug/Wiki Link</b>
@@ -6097,7 +6097,7 @@
         | Done
         | 0d
         | 
-        |}
+        |}}
 </div>
 
 === JDK 11 ===
@@ -6105,19 +6105,19 @@
 Handy General OPEN JDK Queries aka top-level requirement
 
 <div> 
-        {|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
+        {{|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
         | <b> [https://bugs.openjdk.java.net/browse/JDK-8193576?jql=project%20%3D%20JDK%20AND%20issuetype%20%3D%20CSR%20AND%20fixVersion%20%3D%20%2211%22%20AND%20Subcomponent%20in%20(java.lang%2C%20build%2C%20build%2C%20launcher%2C%20build%2C%20javac%2C%20launcher%2C%20tools%2C%20tools%2C%20tools%2C%20build%2C%20build%2C%20java.lang%2C%20compiler%2C%20build%2C%20tools%2C%20Compiler%2C%20java.lang.module%2C%20build%2C%20build%2C%20build%2C%20tools%2C%20tools%2C%20VM%2C%20build%2C%20lang%2C%20compiler%2C%20Parser%2C%20java.lang%2C%20language%2C%20java.lang.module%2C%20java.lang%2C%20java.lang.module%2C%20javac%2C%20tools%2C%20compiler%2C%20java.lang.module%2C%20javac%2C%20build%2C%20build%2C%20compiler%2C%20Tools%2C%20build%2C%20build%2C%20launcher%2C%20build%2C%20language%2C%20java.lang.module%2C%20compiler%2C%20language%2C%20build%2C%20java.lang%2C%20tools%2C%20javac) Generic Query]</b>
         | <b>[https://bugs.openjdk.java.net/browse/JDK-4660984?jql=fixVersion%20%3D%20%2211%22%20AND%20component%20%3D%20specification%20AND%20Subcomponent%20%3D%20language Language Specification]</b>
         | <b>[https://bugs.openjdk.java.net/browse/JDK-4660984?jql=fixVersion%20%3D%20%2211%22%20AND%20component%20%3D%20specification%20AND%20Subcomponent%20%3D%20vm VM Specification]</b>
         | <b>[https://bugs.openjdk.java.net/browse/JDK-8203690?jql=project%20%3D%20JDK%20AND%20fixVersion%20%3D%20%2211%22%20AND%20Subcomponent%20%3D%20javac javac] </b>
-        |}
+        |}}
 </div>
 
 Distilled from the above, specific bugs are listed below:
 
 
 <div>
-        {|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
+        {{|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
         | <b>JEP /JDK Bug</b>
         | <b>JDK Status</b>
         | <b>Eclipse Bug/Wiki Link</b>
@@ -6264,7 +6264,7 @@
 |
 |NA
 |
-        |}
+        |}}
 </div>
 
 === JDK 10 ===
@@ -6275,7 +6275,7 @@
 Investigate Java 10 features and the possible support from JDT Core [https://bugs.eclipse.org/bugs/show_bug.cgi?id=525732 Top Level Bug].
 
 <div> 
-        {|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
+        {{|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
         | <b>Sl No</b>
         | <b>JEP /JDK Bug</b>
         | <b>Eclipse Bug/Wiki Link</b>
@@ -6317,12 +6317,12 @@
         |  to add bug number
         | 
         | CSR
-        |}
+        |}}
 </div>
 
 === Technical Debt - Current and Past Releases ===
 <div> 
-        {|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
+        {{|class="wikitable" border="1"  cellpadding="4"  cellspacing="4"
         | Java Version Support
         | Open JDT Bug(s)
         | Remarks
@@ -6358,7 +6358,7 @@
         | Java 8
         | [https://bit.ly/2wZaQQD Open]
         | 
-|}</div>
+|}}</div>
 
 [[Category:JDT]]</text>
       <sha1>n2xdlbo6h3kwm83n5blx0zqjf1s7sbh</sha1>

I'll look into changing the wiki pages so this is not necessary.

@trancexpress
Copy link
Author

trancexpress commented Feb 15, 2023

Looks like nothing is done for images. E.g.: https://github.com/trancexpress/eclipse.jdt/blob/gh54/wiki/Concurrency-related_refactorings_for_JDT.md

While the original is: https://wiki.eclipse.org/Concurrency-related_refactorings_for_JDT

FYI, I did an experiment moving CBI and Jenkins wiki pages to GitHub markdown format last week. It works pretty well. Here are the steps:

I guess I'll have to ask about this in the gitlab ticket.

trancexpress added a commit to trancexpress/eclipse.jdt that referenced this issue Feb 15, 2023
trancexpress added a commit to trancexpress/eclipse.jdt that referenced this issue Feb 15, 2023
@trancexpress
Copy link
Author

Looks like nothing is done for images.

I wrote a small crawler to download the images. Local disk paths have to be adjusted, of course.

package test;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;
import java.nio.channels.Channels;
import java.nio.channels.ReadableByteChannel;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.LinkedHashMap;
import java.util.LinkedHashSet;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Set;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class EclipseWikiCrawler {

    private static final String NO_TITLE = "no_title";
    private static final Pattern TITLE_PATTERN = Pattern.compile(".*<title>(.*)</title>.*");
    private static final Pattern IMAGE_PATTERN = Pattern.compile(".*\\[\\[Image:([^\\]]*)\\]\\].*");

    public static void main(String[] args) throws IOException {
    	Map<String, List<String> > images = getImages();
        String site = "https://wiki.eclipse.org/File:";
        for (Entry<String, List<String>> entry : images.entrySet()) {
        	String title = entry.getKey();
        	String path = title.replace(' ', '_');
        	int indexOfLastSlash = path.lastIndexOf('/');
        	if (indexOfLastSlash == -1) {
        		path = "";
        	} else {
        		path = path.substring(0, indexOfLastSlash);
        	}
        	Set<String> imageNames = new LinkedHashSet<>(entry.getValue());
        	for (String imageName : imageNames) {
        		URL url = new URL(site + imageName);
        		try (BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()))) {
        			String inputLine;
        			while ((inputLine = in.readLine()) != null) {
        				int srcStartIndex = inputLine.indexOf("src=\"/images/");
        				if (srcStartIndex != -1) {
        					int srcEndIndex = inputLine.indexOf(imageName, srcStartIndex);
        					if (srcEndIndex != -1) {
        						String srcPart = inputLine.substring(srcStartIndex, srcEndIndex);
        						String src = srcPart.substring("src=\"".length());
        						String imageUrl = "https://wiki.eclipse.org/" + src + imageName;
        						downloadImage(imageUrl, "/data/git/eclipse/eclipse.jdt/wiki/" + path +"/" + imageName);
        						break;
        					}
        				}
        			}
        		}
        	}
        }
    }

    private static Map<String, List<String> > getImages() throws IOException {
    	Map<String, List<String> > images = new LinkedHashMap<>();
    	String xmlFile = "/tmp/wikiexport/Eclipsepedia-20230215110023.xml";
    	List<String> lines = Files.readAllLines(Paths.get(xmlFile));
    	String currentTitle = NO_TITLE;
    	for (String line : lines) {
    		Matcher m = TITLE_PATTERN.matcher(line);
    		if (m.matches()) {
    			currentTitle = m.group(1);
    			continue;
    		}
    		m = IMAGE_PATTERN.matcher(line);
    		if (m.matches()) {
    			String image = m.group(1);
    			List<String> list = images.get(currentTitle);
    			if (list == null) {
    				list = new ArrayList<>();
    				images.put(currentTitle, list);
    			}
				list.add(image);
    		}
    	}
    	return images;
    }

    private static void downloadImage(String url, String fileLocation) {
        Address address = new Address(url);
        Image image = new Image(address, fileLocation);
        try {
            image.download();
            if (!image.successful()) {
            	throw new IOException("Failed to download image: " + url);
            }
            System.out.println("Done with " + address.url() + " -> " + fileLocation);
        } catch (IOException e) {
            System.err.println("Error when downloading " + address.url() + " : " + e.getMessage());
        }
    }

    private static class Address {

        private final String url;

        public Address(String url) {
            this.url = url;
        }

        public String url() {
            return url;
        }
    }

    private static class Image {

        private static final int MIN_BYTES = 1;

        private final Address address;
        private final String location;

        private boolean successful;


        public Image(Address address, String location) {
            this.address = address;
            this.location = location;
            successful = false;
        }


        public void download() throws IOException {
            successful = true;

            File imageOnDisk = new File(location);
            if (imageOnDisk.exists()) {
                return;
            }

            URL website = new URL(address.url());
            try (
                ReadableByteChannel rbc = Channels.newChannel(website.openStream());
                FileOutputStream fos = new FileOutputStream(imageOnDisk)) {

                fos.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE);
            }


            if (imageOnDisk.exists() && (imageOnDisk.length() < MIN_BYTES)) {
                imageOnDisk.delete();
                successful = false;
            }
        }

        public boolean successful() {
            return successful;
        }
    }

}

@trancexpress
Copy link
Author

trancexpress commented Feb 15, 2023

@iloveeclipse can you check this? https://github.com/trancexpress/eclipse.jdt/tree/gh54/wiki

We have the files now, we probably want to check each page for obvious problems.

But first, how do we want to structure the wiki? Do we want to place project specific pages in the projects repository? E.g. JDT core topics in the JDT core repository? Or do we want all pages in this repository, like I've done on my fork and branch? Please involve whoever you think is active to decide this.

@trancexpress
Copy link
Author

Note that there are some hard-coded links to the wiki within the wiki pages (i.e. the full webpage address is in the html code)... we'll have to redirect those manually.

@iloveeclipse
Copy link
Member

iloveeclipse commented Feb 15, 2023

Simeon, looking on the converted result, I think we want only jdt core part converted, which would inclide

That should be inside https://github.com/eclipse-jdt/eclipse.jdt.core/wiki repo.

We can also check individually if something can be migrated to JDT UI/Debug repos, but so far the pages I've seen are only about very old plans and have no practical use anymore.

However, if I look at https://wiki.eclipse.org/Category:JDT and under some of the pages referenced there, I miss some content in the "converted" repo.

E.g. missing :

Looking on above, it looks like entry pages are missing in general.

@akurtakov
Copy link
Contributor

If I may suggest these outdated UI/Debug wiki pages to be stripped of the outdated content whenever seen as that has huge impact when searching for smth.

@trancexpress
Copy link
Author

* https://wiki.eclipse.org/JDT_Code_Setup_Using_Oomph

https://github.com/trancexpress/eclipse.jdt/blob/gh54/wiki/JDT_Code_Setup_Using_Oomph.md

* https://wiki.eclipse.org/JDT_Core (entry page)

https://github.com/trancexpress/eclipse.jdt/blob/gh54/wiki/JDT_Core.md

* https://wiki.eclipse.org/JDT_Core/Null_Analysis (entry page explaining the rest)

https://github.com/trancexpress/eclipse.jdt/blob/gh54/wiki/JDT_Core/Null_Analysis.md

* https://wiki.eclipse.org/JDT/FAQ (again entry page is missing)

https://github.com/trancexpress/eclipse.jdt/blob/gh54/wiki/JDT/FAQ.md

What is missing?

@trancexpress
Copy link
Author

trancexpress commented Feb 15, 2023

Simeon, looking on the converted result, I think we want only jdt core part converted, which would inclide
* JDT
* JDT_Core
* JDT_Core_Programmer_Guide

That should be inside https://github.com/eclipse-jdt/eclipse.jdt.core/wiki repo.

I'm not sure you understand the structure after the converting, there are a lot more articles. The folders are a result of nested pages. Most JDT pages are in the directory as .md files. Which ones of these do you want?

I think it will be more simple if you point to the wiki pages you want to have, from https://wiki.eclipse.org/Category:JDT. Just the names copied from the list will be enough, or call me if you want to show which ones; it will likely save time.

@iloveeclipse
Copy link
Member

OK, I've expected to see entry pages inside the matching directory, but they seem to be placed one level higher.
Not sure if that is the reason for cross-links not working?

E.g. https://github.com/trancexpress/eclipse.jdt/blob/gh54/wiki/JDT_Core_Programmer_Guide.md points to https://github.com/trancexpress/eclipse.jdt/blob/gh54/wiki/JDT_Core_Committer_FAQ which is not there, but at https://github.com/trancexpress/eclipse.jdt/blob/gh54/wiki/JDT_Core_Committer_FAQ.md

@trancexpress
Copy link
Author

trancexpress commented Feb 15, 2023

OK, I've expected to see entry pages inside the matching directory, but they seem to be placed one level higher. Not sure if that is the reason for cross-links not working?

E.g. https://github.com/trancexpress/eclipse.jdt/blob/gh54/wiki/JDT_Core_Programmer_Guide.md points to https://github.com/trancexpress/eclipse.jdt/blob/gh54/wiki/JDT_Core_Committer_FAQ which is not there, but at https://github.com/trancexpress/eclipse.jdt/blob/gh54/wiki/JDT_Core_Committer_FAQ.md

No idea, this is all generated by mediawiki-to-gfm. We'll have to go over the generated .md and fix whatever is broken manually. But first, I need to know what we want and where.

@trancexpress
Copy link
Author

trancexpress commented Feb 15, 2023

Another thing I see that is not good, the converter tool replaces empty spaces with _. But GitHub wants dashes for empty spaces... I'm not sure if this can be configured for the tool.

@trancexpress
Copy link
Author

trancexpress commented Feb 15, 2023

If I may suggest these outdated UI/Debug wiki pages to be stripped of the outdated content whenever seen as that has huge impact when searching for smth.

@iloveeclipse I think we'll have to prune the pages we are adding from outdated information (where we can tell the information is outdated, at least). IMO better to not have the pages than add something so outdated as e.g. explaining how gerrit works. We'll do more harm than help if we don't.

@iloveeclipse
Copy link
Member

I would separate things that are "trivial" to prune (like simply not including entire pages) from non-trivial ones, where some parts of the page have to be reviewed carefully.

So to get things done, I would propose to avoid non-automated content editing as much as possible, so we have one big "move" step where it's easy to review the content by checking old/new page, considering only "automated" changes are applied to content.

Once we have the content in the wiki, one can go through the pages one by one, manually throwing away obsoleted parts.

@trancexpress
Copy link
Author

So to get things done, I would propose to avoid non-automated content editing as much as possible, so we have one big "move" step where it's easy to review the content by checking old/new page, considering only "automated" changes are applied to content.

Sure, but lets do that while the wiki contents are on my fork. Before they go "live" in JDT.

@iloveeclipse
Copy link
Member

Also there I personally would find it easier to review the first "big" move without manual edits.

@trancexpress
Copy link
Author

Also there I personally would find it easier to review the first "big" move without manual edits.

I'm not sure what you mean. You want the full contents on the JDT core wiki (and not just on my fork) and then we prune it from old information?

Or you want to review the "freshly generated" .md contents? Those are bad in some aspects such as images and links to other parts of the wiki pages we are migrating. I'm in the process of fixing thsoe.

@iloveeclipse
Copy link
Member

My assumption was that the automated "migration" is one step, manual editing & polishing is anoher one.

In which repo it is done I don't care, I assume in yours would be easier for you.

@trancexpress
Copy link
Author

My assumption was that the automated "migration" is one step, manual editing & polishing is anoher one.

I've already started the manual work, though it won't be in a separate commit (I'm amending and force pushing).

The automated content is up though: https://github.com/trancexpress/eclipse.jdt.core/wiki

Its just full of dead links and missing images.

@trancexpress
Copy link
Author

@msohn
Copy link

msohn commented Nov 30, 2023

@trancexpress : thanks for this crawler, I did some improvements to make it work for exporting the EGit wiki.

  • I'd like to upload this to a GitHub repo to simplify reuse for other projects which may want to use it.
  • Are you ok with that ?
  • Which license would you prefer to use for the source code ?

@trancexpress
Copy link
Author

@trancexpress : thanks for this crawler, I did some improvements to make it work for exporting the EGit wiki.

* I'd like to upload this to a GitHub repo to simplify reuse for other projects which may want to use it.
* Are you ok with that ?

Sure, no problem.

* Which license would you prefer to use for the source code ?

3-clause BSD is fine with me, but choose whichever license you think fits best - you are uploading the code after all.

msohn added a commit to msohn/EclipseWikiCrawler that referenced this issue Nov 30, 2023
@msohn
Copy link

msohn commented Nov 30, 2023

I pushed the repo to https://github.com/msohn/EclipseWikiCrawler

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants