-
Notifications
You must be signed in to change notification settings - Fork 82
#186 Add string encoders #770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughA new utility for encoding ASCII strings to EBCDIC was introduced, including a method for the conversion and a corresponding test suite. Additionally, the core code page abstraction was enhanced with a default ASCII-to-EBCDIC mapping and a public accessor, supporting the new encoding functionality. Changes
Estimated code review effort3 (30–60 minutes) Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoders/StringEncoders.scala (1)
28-29
: Consider adding conversion table validation.While the length validation is good, consider adding validation for the conversion table parameter to ensure it has exactly 256 elements, which would catch configuration errors early.
def encodeEbcdicString(string: String, conversionTable: Array[Byte], length: Int): Array[Byte] = { require(length >= 0, s"Field length cannot be negative, got $length") + require(conversionTable.length == 256, s"Conversion table must have exactly 256 elements, got ${conversionTable.length}")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoders/StringEncoders.scala
(1 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage.scala
(2 hunks)cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoders/StringEncodersSpec.scala
(1 hunks)
🔇 Additional comments (4)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage.scala (1)
39-56
: Add documentation and tests for the ASCII→EBCDIC table in CodePage.scalaTo ensure the hard-coded
asciiToEbcdicMapping
is correct:• File & location:
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage.scala
Lines 39–56 (theArray[Byte](0x00.toByte, …)
definition)• Verify every byte against the official IBM CP037 ASCII→EBCDIC specification (for example, IBM’s docs or the table on https://en.wikipedia.org/wiki/EBCDIC_037).
• Add a comment above the mapping with the source URL or spec reference.
• Add unit tests that cover all printable ASCII characters (round-trip: ASCII→EBCDIC→ASCII) to detect any mismatches.
fix_required
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoders/StringEncoders.scala (1)
28-40
: LGTM! Solid implementation with proper byte handling.The encoding logic correctly handles:
- Negative length validation with
require()
- Proper signed/unsigned byte conversion using
(asciiByte + 256) % 256
- Truncation when input string exceeds specified length
- Zero-padding when input string is shorter than specified length
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoders/StringEncodersSpec.scala (2)
1-66
: Excellent comprehensive test coverage!The test suite thoroughly covers:
- Round-trip encoding/decoding verification with complex ASCII character set
- Truncation behavior when string exceeds specified length
- Zero-padding when string is shorter than specified length
- Empty string handling
- Error conditions with negative length parameters
- Actual byte value verification using hex comparison
The tests provide confidence that the encoding functionality works correctly across various scenarios.
10-19
: Smart use of round-trip testing for validation.Using
StringDecoders.decodeEbcdicString()
to verify the encoded output is an excellent approach that validates the encoding produces the expected results when decoded back to ASCII.
/** | ||
* Each class inherited from CodePage should provide its own conversion table | ||
*/ | ||
protected def asciiToAsciiMapping: Array[Byte] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the method name to match its purpose.
The method name asciiToAsciiMapping
is incorrect. Based on the documentation comment and usage context, this should be asciiToEbcdicMapping
since it provides ASCII-to-EBCDIC conversion.
- /**
- * Each class inherited from CodePage should provide its own conversion table
- */
- protected def asciiToAsciiMapping: Array[Byte] = {
+ /**
+ * Each class inherited from CodePage should provide its own conversion table
+ */
+ protected def asciiToEbcdicMapping: Array[Byte] = {
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
/** | |
* Each class inherited from CodePage should provide its own conversion table | |
*/ | |
protected def asciiToAsciiMapping: Array[Byte] = { | |
/** | |
* Each class inherited from CodePage should provide its own conversion table | |
*/ | |
protected def asciiToEbcdicMapping: Array[Byte] = { |
🤖 Prompt for AI Agents
In
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage.scala
around lines 35 to 38, rename the method asciiToAsciiMapping to
asciiToEbcdicMapping to correctly reflect its purpose of providing
ASCII-to-EBCDIC conversion as indicated by the comment and usage context.
/** | ||
* Gets a mapping table for ASCII to EBCDIC conversions. Uses underlying protected abstract method to get | ||
* the actual table. | ||
* | ||
* @return An ASCII to EBCDIC conversion table as an array of chars | ||
*/ | ||
@throws(classOf[IllegalArgumentException]) | ||
final def getAsciiToEbcdicMapping: Array[Byte] = { | ||
asciiToAsciiMapping | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add array size validation for consistency.
The public accessor should validate the conversion table size like the existing getEbcdicToAsciiMapping
method does. This ensures consistency and prevents runtime issues with malformed conversion tables.
@throws(classOf[IllegalArgumentException])
final def getAsciiToEbcdicMapping: Array[Byte] = {
- asciiToAsciiMapping
+ val ConversionTableElements = 256
+ val table = asciiToEbcdicMapping
+ if (table.length != ConversionTableElements) {
+ throw new IllegalArgumentException(
+ s"An ASCII to EBCDIC conversion table should have exactly $ConversionTableElements elements. It has ${table.length} elements.")
+ }
+ table
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
/** | |
* Gets a mapping table for ASCII to EBCDIC conversions. Uses underlying protected abstract method to get | |
* the actual table. | |
* | |
* @return An ASCII to EBCDIC conversion table as an array of chars | |
*/ | |
@throws(classOf[IllegalArgumentException]) | |
final def getAsciiToEbcdicMapping: Array[Byte] = { | |
asciiToAsciiMapping | |
} | |
@throws(classOf[IllegalArgumentException]) | |
final def getAsciiToEbcdicMapping: Array[Byte] = { | |
val ConversionTableElements = 256 | |
val table = asciiToEbcdicMapping | |
if (table.length != ConversionTableElements) { | |
throw new IllegalArgumentException( | |
s"An ASCII to EBCDIC conversion table should have exactly $ConversionTableElements elements. It has ${table.length} elements.") | |
} | |
table | |
} |
🤖 Prompt for AI Agents
In
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage.scala
around lines 79 to 88, the getAsciiToEbcdicMapping method returns the
asciiToAsciiMapping array without validating its size. To fix this, add a check
to verify that the returned array has the expected length, similar to the
validation done in getEbcdicToAsciiMapping. If the size is incorrect, throw an
IllegalArgumentException to prevent runtime issues caused by malformed
conversion tables.
Summary by CodeRabbit
New Features
Tests