-
Notifications
You must be signed in to change notification settings - Fork 697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SEDONA-475] Add RS_NormalizeAll #1221
Conversation
|
||
// Normalize the band values | ||
for (int i = 0; i < bandValues.length; i++) { | ||
bandValues[i] = minLim + ((bandValues[i] - minValue) * (maxLim - minLim)) / (maxValue - minValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen if all band values are the same. Then the denominator becomes zero and hence will lead to unexpected behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if all band values are zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens to the NoDataValue of a raster after normalization? @rbavery any experience?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussing with @rbavery, we should exclude NoDataValue when performing the normalization. Moreover, we should provide an optional parameter for users to set the new NoDataValue.
double normalizedMax1 = Arrays.stream(normalizedBand1).max().getAsDouble(); | ||
double normalizedMin2 = Arrays.stream(normalizedBand2).min().getAsDouble(); | ||
double normalizedMax2 = Arrays.stream(normalizedBand2).max().getAsDouble(); | ||
double[] expected1 = {0.0, 17.0, 34.0, 51.0, 68.0, 85.0, 102.0, 119.0, 136.0, 153.0, 170.0, 187.0, 204.0, 221.0, 238.0, 255.0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please add tests for bands that have all same values.
- Please add tests for rasters that have different pixel types.
NormalizeAll
is not supposed to change the pixel types. Moreover, the data truncation should happen as expected. If the input raster is integer, the normalized output raster should still have integer values, which means some decimal places derived from normalization will get truncated. One example is from MapAlgebra: https://sedona.apache.org/1.5.1/api/sql/Raster-map-algebra/#map-algebra
// Find min and max values in the band | ||
double minValue = Arrays.stream(bandValues).min().orElse(Double.NaN); | ||
double maxValue = Arrays.stream(bandValues).max().orElse(Double.NaN); | ||
System.out.println("minValue: "+minValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not commit System.out.println
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In short, this function should have the following variants:
RS_NormalizeAll (raster: Raster)
RS_NormalizeAll (raster: Raster, minLim: Double, maxLim: Double)
RS_NormalizeAll (raster: Raster, minLim: Double, maxLim: Double, NoDataValue:Double)
RS_NormalizeAll (raster: Raster, minLim: Double, maxLim: Double, NoDataValue:Double, minValue:Double, maxValue:Double)
|
||
if (minValue == maxValue) { | ||
// If all values in the band are same - set middle value of range as default value. | ||
double defaultValue = (minLim + maxLim) / 2.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this happens, please choose minLim
as the default value.
Please describe the behavior in the document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Have described the current behavior in the PR description above.
common/src/main/java/org/apache/sedona/common/raster/MapAlgebra.java
Outdated
Show resolved
Hide resolved
common/src/main/java/org/apache/sedona/common/raster/MapAlgebra.java
Outdated
Show resolved
Hide resolved
@@ -437,6 +437,90 @@ public static double[] normalize(double[] band) { | |||
return result; | |||
} | |||
|
|||
public static GridCoverage2D normalizeAll(GridCoverage2D rasterGeom) { | |||
return normalizeAll(rasterGeom, 0d, 255d, -9999, -99999, -99999); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't use magic numbers here. Can you use null?
// Get the band values as an array | ||
double[] bandValues = bandAsArray(rasterGeom, bandIndex); | ||
|
||
// Find min and max values in the band, excluding NoDataValue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should fetch the NoDataValue
from each band. There are operators functions in RasterUtils can already do it. The NoDataValue
of the input is to be used when storing in the raster band.
} | ||
|
||
// Unset minmaxFlag if minValue and maxValue are not given; | ||
boolean minmaxFlag = minValue != -99999; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check null
instead of a magic number.
// Normalize the band values, setting NoDataValue to maxLim | ||
for (int i = 0; i < bandValues.length; i++) { | ||
if (bandValues[i] == noDataValue) { | ||
bandValues[i] = maxLim; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the band value equals to the no data value of a band, then you should set it as noDataValue
from the input.
If the input noDataValue
is not given, then set it to maxLim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiayuasu setting noDataValue to the maxLim by default will corrupt real data values.
instead I think the default should be to scale real values from 0, 254 and set the noDataValue to 255 if it is not given.
Or, scale real values between 1-255, set noDataValue to 0 by default. this makes more sense to me, 0 is used as a noDataValue more often than 255.
* Implement RS_NormalizeAll * add IllegalArgumentException * Fix override issue * Handle same band values * Refactor NormalizeAll; Add tests * Update Documentation * Add flag argument: normalizeAcrossBands * fix lint * Optimize normalizeAll: remove redundant min/max calculations
Did you read the Contributor Guide?
Is this PR related to a JIRA ticket?
[SEDONA-XXX] my subject
.What changes were proposed in this PR?
Adds RS_NormalizeAll, to normalize values in all bands of a raster between a given normalization range.
Key features -
minLim
andmaxLim
. By default, the values are normalized to range [0, 255].noDataValue
to be used for missing or invalid data in raster bands. By default, noDataValue is set to maxLim.minValue
andmaxValue
to skip computation of them during runtime.normalizeAcrossBands
flag, when set, normalization is performed across all bands based on global min and max values. If false, each band is normalized individually based on its own min and max values.How was this patch tested?
Passes new and existing tests
Did this PR include necessary documentation updates?
vX.Y.Z
format.