Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up merge from iev-data gem, update README #39

Merged
merged 4 commits into from
Jan 6, 2025
Merged

Conversation

ronaldtse
Copy link
Member

Fixes #37

Comment on lines 41 to +43
a&.sub(%r{<br/>.*$}, "")
&.sub(%r{, &lt;.*$}, "")
&.gsub(%r{<[^<>]*>}, "")&.strip
&.sub(/, &lt;.*$/, "")
&.gsub(/<[^<>]*>/, "")&.strip

Check failure

Code scanning / CodeQL

Incomplete multi-character sanitization High

This string may still contain
<script
, which may cause an HTML element injection vulnerability.

Copilot Autofix AI 19 days ago

To fix the problem, we need to ensure that all instances of the targeted pattern are removed from the string. We can achieve this by applying the regular expression replacement repeatedly until no more replacements can be performed. This ensures that the unsafe text does not re-appear in the sanitized input.

We will modify the code to repeatedly apply the sub method until the string no longer changes. This will be done for both the %r{<br/>.*$} and , &lt;.*$ patterns.

Suggested changeset 1
lib/iev.rb

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/lib/iev.rb b/lib/iev.rb
--- a/lib/iev.rb
+++ b/lib/iev.rb
@@ -40,5 +40,13 @@
     a = doc&.at(xpath)&.children&.to_xml
-    a&.sub(%r{<br/>.*$}, "")
-      &.sub(/, &lt;.*$/, "")
-      &.gsub(/<[^<>]*>/, "")&.strip
+    previous = nil
+    while a != previous
+      previous = a
+      a = a&.sub(%r{<br/>.*$}, "")
+    end
+    previous = nil
+    while a != previous
+      previous = a
+      a = a&.sub(/, &lt;.*$/, "")
+    end
+    a&.gsub(/<[^<>]*>/, "")&.strip
   end
EOF
@@ -40,5 +40,13 @@
a = doc&.at(xpath)&.children&.to_xml
a&.sub(%r{<br/>.*$}, "")
&.sub(/, &lt;.*$/, "")
&.gsub(/<[^<>]*>/, "")&.strip
previous = nil
while a != previous
previous = a
a = a&.sub(%r{<br/>.*$}, "")
end
previous = nil
while a != previous
previous = a
a = a&.sub(/, &lt;.*$/, "")
end
a&.gsub(/<[^<>]*>/, "")&.strip
end
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
Comment on lines +18 to +48
text.gsub(
%r{<a href="?(IEV)\s*(\d\d\d-\d\d-\d\d\d?)"?>(.*?)</?a>},
'{{\3, \1:\2}}',
).gsub(
%r{<a href="?\s*(\d\d\d-\d\d-\d\d\d?)"?>(.*?)</?a>},
'{{\3, IEV:\2}}',
).gsub(
# To handle <a> tags without ending tag like
# `Voir <a href=IEV103-05-21>IEV 103-05-21`
# for concept '702-03-11' in `fr`
/<a href="?(IEV)?\s*(\d\d\d-\d\d-\d\d\d?)"?>(.*?)$/,
'{{\3, IEV:\2}}',
).gsub(
%r{<a href="?([^<>]*?)"?>(.*?)</a>},
'\1[\2]',
).gsub(
Regexp.new([SIMG_PATH_REGEX, '\\s*', FIGURE_TWO_REGEX].join),
"#{IMAGE_PATH_PREFIX}/#{term_domain}/\\1[Figure \\2 - \\3; \\6]",
).gsub(
Regexp.new([SIMG_PATH_REGEX, '\\s*', FIGURE_ONE_REGEX].join),
"#{IMAGE_PATH_PREFIX}/#{term_domain}/\\1[Figure \\2 - \\3]",
).gsub(
/<img\s+([^<>]+?)\s*>/,
"#{IMAGE_PATH_PREFIX}/#{term_domain}/\\1[]",
).gsub(
/<br>/,
"\n",
).gsub(
%r{<b>(.*?)</b>},
'*\\1*',
)

Check failure

Code scanning / CodeQL

Polynomial regular expression used on uncontrolled data High

This
regular expression
that depends on a
library input
may run slow on strings starting with '' and with many repetitions of 'a'.
This
regular expression
that depends on a
library input
may run slow on strings starting with '' and with many repetitions of 'a'.
This
regular expression
that depends on a
library input
may run slow on strings starting with '' and with many repetitions of 'a'.
Comment on lines +18 to +42
text.gsub(
%r{<a href="?(IEV)\s*(\d\d\d-\d\d-\d\d\d?)"?>(.*?)</?a>},
'{{\3, \1:\2}}',
).gsub(
%r{<a href="?\s*(\d\d\d-\d\d-\d\d\d?)"?>(.*?)</?a>},
'{{\3, IEV:\2}}',
).gsub(
# To handle <a> tags without ending tag like
# `Voir <a href=IEV103-05-21>IEV 103-05-21`
# for concept '702-03-11' in `fr`
/<a href="?(IEV)?\s*(\d\d\d-\d\d-\d\d\d?)"?>(.*?)$/,
'{{\3, IEV:\2}}',
).gsub(
%r{<a href="?([^<>]*?)"?>(.*?)</a>},
'\1[\2]',
).gsub(
Regexp.new([SIMG_PATH_REGEX, '\\s*', FIGURE_TWO_REGEX].join),
"#{IMAGE_PATH_PREFIX}/#{term_domain}/\\1[Figure \\2 - \\3; \\6]",
).gsub(
Regexp.new([SIMG_PATH_REGEX, '\\s*', FIGURE_ONE_REGEX].join),
"#{IMAGE_PATH_PREFIX}/#{term_domain}/\\1[Figure \\2 - \\3]",
).gsub(
/<img\s+([^<>]+?)\s*>/,
"#{IMAGE_PATH_PREFIX}/#{term_domain}/\\1[]",
).gsub(

Check failure

Code scanning / CodeQL

Polynomial regular expression used on uncontrolled data High

This
regular expression
that depends on a
library input
may run slow on strings starting with '<img\\t' and with many repetitions of '\\t\\t'.
This
regular expression
that depends on a
library input
may run slow on strings starting with '<img\\t;' and with many repetitions of '\\t'.
This
regular expression
that depends on a
library input
may run slow on strings starting with '<img\\t' and with many repetitions of '\\t\\t'.
This
regular expression
that depends on a
library input
may run slow on strings starting with '<img\\t;' and with many repetitions of '\\t'.
This
regular expression
that depends on a
library input
may run slow on strings starting with '<img\\t' and with many repetitions of '\\t\\t'.
This
regular expression
that depends on a
library input
may run slow on strings starting with '<img\\t;' and with many repetitions of '\\t'.
Comment on lines +18 to +33
text.gsub(
%r{<a href="?(IEV)\s*(\d\d\d-\d\d-\d\d\d?)"?>(.*?)</?a>},
'{{\3, \1:\2}}',
).gsub(
%r{<a href="?\s*(\d\d\d-\d\d-\d\d\d?)"?>(.*?)</?a>},
'{{\3, IEV:\2}}',
).gsub(
# To handle <a> tags without ending tag like
# `Voir <a href=IEV103-05-21>IEV 103-05-21`
# for concept '702-03-11' in `fr`
/<a href="?(IEV)?\s*(\d\d\d-\d\d-\d\d\d?)"?>(.*?)$/,
'{{\3, IEV:\2}}',
).gsub(
%r{<a href="?([^<>]*?)"?>(.*?)</a>},
'\1[\2]',
).gsub(

Check failure

Code scanning / CodeQL

Polynomial regular expression used on uncontrolled data High

This
regular expression
that depends on a
library input
may run slow on strings starting with '<a href=>' and with many repetitions of '<a href=>a'.
Comment on lines +18 to +30
text.gsub(
%r{<a href="?(IEV)\s*(\d\d\d-\d\d-\d\d\d?)"?>(.*?)</?a>},
'{{\3, \1:\2}}',
).gsub(
%r{<a href="?\s*(\d\d\d-\d\d-\d\d\d?)"?>(.*?)</?a>},
'{{\3, IEV:\2}}',
).gsub(
# To handle <a> tags without ending tag like
# `Voir <a href=IEV103-05-21>IEV 103-05-21`
# for concept '702-03-11' in `fr`
/<a href="?(IEV)?\s*(\d\d\d-\d\d-\d\d\d?)"?>(.*?)$/,
'{{\3, IEV:\2}}',
).gsub(

Check failure

Code scanning / CodeQL

Polynomial regular expression used on uncontrolled data High

This
regular expression
that depends on a
library input
may run slow on strings starting with '' and with many repetitions of 'a'.
Comment on lines +18 to +24
text.gsub(
%r{<a href="?(IEV)\s*(\d\d\d-\d\d-\d\d\d?)"?>(.*?)</?a>},
'{{\3, \1:\2}}',
).gsub(
%r{<a href="?\s*(\d\d\d-\d\d-\d\d\d?)"?>(.*?)</?a>},
'{{\3, IEV:\2}}',
).gsub(

Check failure

Code scanning / CodeQL

Polynomial regular expression used on uncontrolled data High

This
regular expression
that depends on a
library input
may run slow on strings starting with '' and with many repetitions of 'a'.
Comment on lines +18 to +21
text.gsub(
%r{<a href="?(IEV)\s*(\d\d\d-\d\d-\d\d\d?)"?>(.*?)</?a>},
'{{\3, \1:\2}}',
).gsub(

Check failure

Code scanning / CodeQL

Polynomial regular expression used on uncontrolled data High

This
regular expression
that depends on a
library input
may run slow on strings starting with '' and with many repetitions of 'a'.
@ronaldtse ronaldtse merged commit 3a4ad30 into main Jan 6, 2025
14 of 15 checks passed
@ronaldtse ronaldtse deleted the rt-clean-up-merge branch January 6, 2025 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Updating the README to incorporate documentation from iev-data
1 participant