Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NaN values in kmeans (1 cluster, 0 iterations) #2

Closed
epifanio opened this issue Mar 17, 2014 · 11 comments
Closed

NaN values in kmeans (1 cluster, 0 iterations) #2

epifanio opened this issue Mar 17, 2014 · 11 comments
Milestone

Comments

@epifanio
Copy link

Hi,
I'm trying to perform a kmeans classification on a 8 band Envi file.
I can load my image in IPython with spectral, but when i try to perform the kmeans classification, it ends with this output :

In [26]: import spectral.io.envi as envi
In [27]: img = envi.open('spy.hdr','spy.envi').load()

In [28]: print img.info()
    # Rows:           2114
    # Samples:        2264
    # Bands:             8
    Data format:   float32

In [29]: (m, c) = kmeans(img, 5, 30)
Initializing clusters along diagonal of N-dimensional bounding box.
Iteration 1...  0.0%
kmeans terminated with
1
clusters after
0
iterations.
  • gdalinfo on this image gave me :
epinux@Debian-70-wheezy-64-minimal:~$ gdalinfo /var/www/shared/seabbass/spy.envi 
Driver: ENVI/ENVI .hdr Labelled
Files: /var/www/shared/seabbass/spy.envi
       /var/www/shared/seabbass/spy.hdr
Size is 2264, 2114
Coordinate System is:
PROJCS["NAD_1983_UTM_Zone_18N",
    GEOGCS["GCS_North_American_1983",
        DATUM["North_American_Datum_1983",
            SPHEROID["GRS_1980",6378137,298.257222101]],
        PRIMEM["Greenwich",0],
        UNIT["Degree",0.017453292519943295]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["latitude_of_origin",0],
    PARAMETER["central_meridian",-75],
    PARAMETER["scale_factor",0.9996],
    PARAMETER["false_easting",500000],
    PARAMETER["false_northing",0],
    UNIT["Meter",1]]
Origin = (506789.000000000000000,4204310.000000000000000)
Pixel Size = (0.500000000000000,-0.500000000000000)
Metadata:
  Band_1=Band 1
  Band_2=Band 2
  Band_3=Band 3
  Band_4=Band 4
  Band_5=Band 5
  Band_6=Band 6
  Band_7=Band 7
  Band_8=Band 8
Image Structure Metadata:
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (  506789.000, 4204310.000) ( 74d55'21.68"W, 37d59'11.08"N)
Lower Left  (  506789.000, 4203253.000) ( 74d55'21.71"W, 37d58'36.78"N)
Upper Right (  507921.000, 4204310.000) ( 74d54'35.27"W, 37d59'11.04"N)
Lower Right (  507921.000, 4203253.000) ( 74d54'35.31"W, 37d58'36.75"N)
Center      (  507355.000, 4203781.500) ( 74d54'58.49"W, 37d58'53.91"N)
Band 1 Block=2264x1 Type=Float64, ColorInterp=Undefined
  Description = Band 1
Band 2 Block=2264x1 Type=Float64, ColorInterp=Undefined
  Description = Band 2
Band 3 Block=2264x1 Type=Float64, ColorInterp=Undefined
  Description = Band 3
Band 4 Block=2264x1 Type=Float64, ColorInterp=Undefined
  Description = Band 4
Band 5 Block=2264x1 Type=Float64, ColorInterp=Undefined
  Description = Band 5
Band 6 Block=2264x1 Type=Float64, ColorInterp=Undefined
  Description = Band 6
Band 7 Block=2264x1 Type=Float64, ColorInterp=Undefined
  Description = Band 7
Band 8 Block=2264x1 Type=Float64, ColorInterp=Undefined
  Description = Band 8

Am I doing something wrong ?
Thanks for any help!

@tboggs
Copy link
Member

tboggs commented Mar 17, 2014

That is an odd output/result. A few questions for you:

  1. Are you using either python 2.6 or 2.7?
  2. Does the image appear normal when viewed with spectral.imshow(img[:, :, 0])?
  3. Does the kmeans function give the same result if you run it on a small subset of the image
    (e.g., kmeans(img[:400, :400, :], 5, 30))?

@epifanio
Copy link
Author

  • I'm using Python 2.7.6
  • i'm running this on a remote server, i can't test the imgshow right now, can i try to save to a fig ?
  • applyng the code you suggest on a subset of my image, seems to work as aspected :
In [7]: kmeans(img[:400, :400, :], 5, 30)
Initializing clusters along diagonal of N-dimensional bounding box.
Iteration 1...159987 pixels reassigned.
Iteration 2...14264 pixels reassigned.
Iteration 3...25448 pixels reassigned.
Iteration 4...13515 pixels reassigned.
Iteration 5...16316 pixels reassigned.
Iteration 6...518 pixels reassigned.
Iteration 7...622 pixels reassigned.
Iteration 8...895 pixels reassigned.
Iteration 9...8305 pixels reassigned.
Iteration 10...868 pixels reassigned.
Iteration 11...1045 pixels reassigned.
Iteration 12...32 pixels reassigned.
Iteration 13...42 pixels reassigned.
Iteration 14...1423 pixels reassigned.
Iteration 15...9185 pixels reassigned.
Iteration 16...1829 pixels reassigned.
Iteration 17...41 pixels reassigned.
Iteration 18...56 pixels reassigned.
Iteration 19...  0.0%
kmeans terminated with

5

clusters after

18

iterations.
Out[7]: 
(array([[4, 3, 2, ..., 4, 3, 2],
        [4, 1, 1, ..., 4, 3, 3],
        [2, 4, 4, ..., 4, 3, 3],
        ..., 
        [3, 3, 3, ..., 3, 3, 4],
        [1, 2, 2, ..., 2, 3, 2],
        [4, 1, 2, ..., 3, 3, 2]]),
 array([[  4.87387029e-01,   8.80335790e-02,   5.87207557e-04,
           8.08299900e-04,  -5.05539110e-03,   6.45170923e-03,
           5.88018233e-04,  -4.32912569e+01],
        [  4.98791716e-01,   6.05523754e-02,  -2.45059305e-04,
          -2.55586132e-04,  -3.53794920e-03,   3.03717744e-03,
          -2.45185623e-04,  -3.57781930e+01],
        [  4.99665285e-01,   6.06814825e-02,  -4.82010714e-05,
          -1.10220508e-04,  -3.28787793e-03,   3.12945896e-03,
          -4.81984661e-05,  -3.31973796e+01],
        [  5.01100766e-01,   6.17189042e-02,   1.31622379e-05,
          -8.25569570e-07,  -3.15459040e-03,   3.16692798e-03,
           1.31631411e-05,  -3.14069450e+01],
        [  5.06499741e-01,   5.80774928e-02,   1.77698221e-04,
           8.67825375e-05,  -2.66000710e-03,   2.92453750e-03,
           1.77747856e-04,  -2.94223762e+01]]))

@epifanio
Copy link
Author

it works until :

img[:440, :440, :]

while gave the problem for values bigger than :

In [22]: img[:440, :440, :].shape
Out[22]: (440, 440, 8)

seems it doesn't like the band=0 of my dataset.
i tried to run iteratively :

kmeans(img[:, :, [0, 1]], 5, 30)  .. gave error
kmeans(img[:, :, [0, 1, 2]], 5, 30) .. gave error

removing the band=0 and using the bands from 1 to 7
the algorithm converge at iteration 21 (but has a drastic jump around iteration 19)

In [39]: kmeans(img[:, :, [1,2,3,4,5,6,7]], 5, 30)
Initializing clusters along diagonal of N-dimensional bounding box.
Iteration 1...4786089 pixels reassigned.
Iteration 2...1064607 pixels reassigned.
Iteration 3...923154 pixels reassigned.
Iteration 4...307705 pixels reassigned.
Iteration 5...235729 pixels reassigned.
Iteration 6...76280 pixels reassigned.
Iteration 7...204053 pixels reassigned.
Iteration 8...124739 pixels reassigned.
Iteration 9...197998 pixels reassigned.
Iteration 10...91149 pixels reassigned.
Iteration 11...231949 pixels reassigned.
Iteration 12...19991 pixels reassigned.
Iteration 13...113651 pixels reassigned.
Iteration 14...146500 pixels reassigned.
Iteration 15...129638 pixels reassigned.
Iteration 16...31791 pixels reassigned.
Iteration 17...39267 pixels reassigned.
Iteration 18...108815 pixels reassigned.
Iteration 19...47464 pixels reassigned.
Iteration 20...1 pixels reassigned.
Iteration 21...  0.0%
kmeans terminated with

5

clusters after

20

iterations.
Out[39]: 
(array([[4, 3, 3, ..., 2, 4, 0],
        [4, 2, 2, ..., 3, 2, 3],
        [3, 4, 4, ..., 3, 3, 4],
        ..., 
        [4, 4, 3, ..., 3, 4, 3],
        [4, 4, 3, ..., 4, 3, 3],
        [3, 3, 3, ..., 3, 3, 3]]),
 array([[  1.52154902e-01,   3.70025044e-04,   5.62316940e-04,
          -5.57668854e-03,   6.51024847e-03,   3.71242999e-04,
          -4.05720013e+01],
        [  1.32276382e-01,  -4.87968880e-05,   5.90554698e-05,
          -5.09471708e-03,   5.10521747e-03,  -4.85566433e-05,
          -3.78903018e+01],
        [  1.17825378e-01,  -7.07515093e-05,  -4.46298710e-05,
          -4.75192526e-03,   4.63654304e-03,  -7.07482685e-05,
          -3.54834363e+01],
        [  9.89556395e-02,   1.31215113e-05,  -2.37434461e-06,
          -4.04723271e-03,   4.05791686e-03,   1.30518545e-05,
          -3.27382162e+01],
        [  8.51595316e-02,  -6.48957268e-05,  -3.04117366e-05,
          -3.71132631e-03,   3.61589750e-03,  -6.50171122e-05,
          -2.98744709e+01]]))

While using using the bands from 1 to 6
the classification need more iteration
(i removed the 0 that gives error when using the full image
but also the 7th band that gives the drastic jump in number of pixels reassigned)
Can be this depending by of the "different order of magnitudo" of the values in the several bands ?

In [43]: kmeans(img[:, :, [1,2,3,4,5,6]], 5, 150)
Initializing clusters along diagonal of N-dimensional bounding box.
Iteration 1...4423112 pixels reassigned.
Iteration 2...1489803 pixels reassigned.
Iteration 3...782223 pixels reassigned.
Iteration 4...349305 pixels reassigned.
Iteration 5...172782 pixels reassigned.
Iteration 6...101796 pixels reassigned.
Iteration 7...67593 pixels reassigned.
Iteration 8...89262 pixels reassigned.
Iteration 9...100563 pixels reassigned.
Iteration 10...106266 pixels reassigned.
Iteration 11...108522 pixels reassigned.
Iteration 12...108542 pixels reassigned.
Iteration 13...106762 pixels reassigned.
Iteration 14...104212 pixels reassigned.
Iteration 15...100701 pixels reassigned.
Iteration 16...96556 pixels reassigned.
Iteration 17...91909 pixels reassigned.
Iteration 18...87439 pixels reassigned.
Iteration 19...82686 pixels reassigned.
Iteration 20...77589 pixels reassigned.
Iteration 21...72987 pixels reassigned.
Iteration 22...67979 pixels reassigned.
Iteration 23...64062 pixels reassigned.
Iteration 24...59902 pixels reassigned.
Iteration 25...56295 pixels reassigned.
Iteration 26...52681 pixels reassigned.
Iteration 27...48951 pixels reassigned.
Iteration 28...45530 pixels reassigned.
Iteration 29...42271 pixels reassigned.
Iteration 30...38962 pixels reassigned.
Iteration 31...36264 pixels reassigned.
Iteration 32...33506 pixels reassigned.
Iteration 33...31079 pixels reassigned.
Iteration 34...28664 pixels reassigned.
Iteration 35...26447 pixels reassigned.
Iteration 36...24408 pixels reassigned.
Iteration 37...22415 pixels reassigned.
Iteration 38...20346 pixels reassigned.
Iteration 39...18693 pixels reassigned.
Iteration 40...17290 pixels reassigned.
Iteration 41...15980 pixels reassigned.
Iteration 42...14752 pixels reassigned.
Iteration 43...13519 pixels reassigned.
Iteration 44...12520 pixels reassigned.
Iteration 45...11459 pixels reassigned.
Iteration 46...10806 pixels reassigned.
Iteration 47...9879 pixels reassigned.
Iteration 48...9308 pixels reassigned.
Iteration 49...8663 pixels reassigned.
Iteration 50...8121 pixels reassigned.
Iteration 51...7489 pixels reassigned.
Iteration 52...6845 pixels reassigned.
Iteration 53...6518 pixels reassigned.
Iteration 54...6022 pixels reassigned.
Iteration 55...5518 pixels reassigned.
Iteration 56...5067 pixels reassigned.
Iteration 57...4663 pixels reassigned.
Iteration 58...4325 pixels reassigned.
Iteration 59...4025 pixels reassigned.
Iteration 60...3803 pixels reassigned.
Iteration 61...3530 pixels reassigned.
Iteration 62...3315 pixels reassigned.
Iteration 63...3019 pixels reassigned.
Iteration 64...2713 pixels reassigned.
Iteration 65...2481 pixels reassigned.
Iteration 66...2320 pixels reassigned.
Iteration 67...2136 pixels reassigned.
Iteration 68...1938 pixels reassigned.
Iteration 69...1778 pixels reassigned.
Iteration 70...1593 pixels reassigned.
Iteration 71...1473 pixels reassigned.
Iteration 72...1347 pixels reassigned.
Iteration 73...1196 pixels reassigned.
Iteration 74...1063 pixels reassigned.
Iteration 75...935 pixels reassigned.
Iteration 76...853 pixels reassigned.
Iteration 77...819 pixels reassigned.
Iteration 78...760 pixels reassigned.
Iteration 79...633 pixels reassigned.
Iteration 80...599 pixels reassigned.
Iteration 81...546 pixels reassigned.
Iteration 82...532 pixels reassigned.
Iteration 83...468 pixels reassigned.
Iteration 84...459 pixels reassigned.
Iteration 85...452 pixels reassigned.
Iteration 86...431 pixels reassigned.
Iteration 87...392 pixels reassigned.
Iteration 88...361 pixels reassigned.
Iteration 89...327 pixels reassigned.
Iteration 90...334 pixels reassigned.
Iteration 91...263 pixels reassigned.
Iteration 92...240 pixels reassigned.
Iteration 93...218 pixels reassigned.
Iteration 94...175 pixels reassigned.
Iteration 95...159 pixels reassigned.
Iteration 96...147 pixels reassigned.
Iteration 97...127 pixels reassigned.
Iteration 98...111 pixels reassigned.
Iteration 99...87 pixels reassigned.
Iteration 100...90 pixels reassigned.
Iteration 101...92 pixels reassigned.
Iteration 102...92 pixels reassigned.
Iteration 103...76 pixels reassigned.
Iteration 104...79 pixels reassigned.
Iteration 105...60 pixels reassigned.
Iteration 106...56 pixels reassigned.
Iteration 107...50 pixels reassigned.
Iteration 108...42 pixels reassigned.
Iteration 109...38 pixels reassigned.
Iteration 110...37 pixels reassigned.
Iteration 111...35 pixels reassigned.
Iteration 112...44 pixels reassigned.
Iteration 113...31 pixels reassigned.
Iteration 114...24 pixels reassigned.
Iteration 115...20 pixels reassigned.
Iteration 116...18 pixels reassigned.
Iteration 117...14 pixels reassigned.
Iteration 118...18 pixels reassigned.
Iteration 119...12 pixels reassigned.
Iteration 120...13 pixels reassigned.
Iteration 121...7 pixels reassigned.
Iteration 122...10 pixels reassigned.
Iteration 123...10 pixels reassigned.
Iteration 124...9 pixels reassigned.
Iteration 125...6 pixels reassigned.
Iteration 126...8 pixels reassigned.
Iteration 127...11 pixels reassigned.
Iteration 128...8 pixels reassigned.
Iteration 129...10 pixels reassigned.
Iteration 130...7 pixels reassigned.
Iteration 131...6 pixels reassigned.
Iteration 132...5 pixels reassigned.
Iteration 133...2 pixels reassigned.
Iteration 134...2 pixels reassigned.
Iteration 135...  0.0%
kmeans terminated with

5

clusters after

134

iterations.
Out[43]: 
(array([[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ..., 
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]]),
 array([[  3.86463895e-02,  -1.12364379e-05,  -9.48796184e-06,
          -3.06150554e-03,   3.04078403e-03,  -1.12376759e-05],
        [  9.10712089e-02,  -3.27034963e-05,   6.40955110e-06,
          -3.95789453e-03,   3.93158537e-03,  -3.27187167e-05],
        [  1.57725510e-01,  -5.65761830e-05,   7.07470689e-05,
          -5.41263624e-03,   5.42673644e-03,  -5.66468686e-05],
        [  2.48059478e-01,   1.52573066e-05,   1.64463233e-04,
          -6.99177974e-03,   7.17158760e-03,   1.53446294e-05],
        [  3.87974672e-01,   1.93248540e-04,  -3.51770148e-04,
          -9.35825187e-03,   9.20136210e-03,   1.94880376e-04]]))


I'm running this on a debian stable 64bit I7 32GB ram

@epifanio
Copy link
Author

  • seems that in the band [0] there are some pixels with value = nan
  • the band 7 has a range of value [-69.37, -20.55] while all the other bands is between [-1,1]
In [58]: for i in arange(img.shape[2]):
   ....:     print 'min: ', i, img[:, :, [i]].min()
   ....:     print 'max: ', i, img[:, :, [i]].max()

min:  0 nan
max:  0 nan
min:  1 0.0
max:  1 1.0
min:  2 -0.126686185598
max:  2 0.121467545629
min:  3 -0.109070904553
max:  3 0.0927563458681
min:  4 -0.140595197678
max:  4 0.061065658927
min:  5 -0.0892281010747
max:  5 0.134907007217
min:  6 -0.128155678511
max:  6 0.122419863939
min:  7 -69.3799972534
max:  7 -20.5599994659

@tboggs
Copy link
Member

tboggs commented Mar 17, 2014

Yes, it appears to be due to the nan in the data. I can reproduce this with the SPy sample image as follows:

In [5]: (m, c) = kmeans(data, 5, 3)
Initializing clusters along diagonal of N-dimensional bounding box.
Iteration 1...21016 pixels reassigned.
Iteration 2...3485 pixels reassigned.
Iteration 3...5579 pixels reassigned.
kmeans terminated with 5 clusters after 3 iterations.

In [6]: data[10, 10, 10] = np.nan

In [7]: (m, c) = kmeans(data, 5, 3)
Initializing clusters along diagonal of N-dimensional bounding box.
Iteration 1...  0.0%
kmeans terminated with

1

clusters after

0

iterations.

You might want to replace the nan values with values interpolated from adjacent pixels, rather than ignoring the band entirely. With regard to your band 7 values, consider looking at the statistics for that band. If the large spread is not due to a few anomalous pixels, then that band will likely dominate the k-means cluster centers. You could eliminate that by scaling the bands.

I'll consider issuing a warning for data containing nan values. The load method might be a good place to do it since the data will all be in memory at that point.

@epifanio
Copy link
Author

Thanks!
i'm now filling the nan with a spline interpolation and i rescaled the band 7 to have the same range of the other band using :

def normalize(array, minval=0, maxval=1):
    normalized = minval + ( ((array-array.min())*(maxval-minval)) /(array.max() - array.min()))
    return normalized

@tboggs tboggs changed the title Kmeans on 8 band Envi image (1 cluster, 0 iteration) NaN values in kmeans (1 cluster, 0 iterations) Mar 17, 2014
@tboggs tboggs added this to the v0.15 milestone Mar 17, 2014
@tboggs
Copy link
Member

tboggs commented Mar 17, 2014

Good idea. I've updated the title to reflect the issue you encountered. I've also labelled this as an enhancement (to warn/fail when nan is encountered) to be included in the next release.

@epifanio
Copy link
Author

Thanks a lot for you help! i got my classification working

SEABASS_KMEANS-test.ipynb

Compared with R, i was using a k-means from the cluster packages: clara(), the results are pretty similar but spectralpython is much faster!!!

I've few question related to data I/O in spectralpython , can i ask here ? or should i open a new issue ?

@tboggs
Copy link
Member

tboggs commented Mar 17, 2014

Nice! Yes, please open a new issue/question if it is beyond the scope of the NaN issue.

The envi.save_image function accepts an optional metadata keyword argument. You could try passing in the img.metadata attribute of the original image, which should preserve the ENVI header parameters.

Regarding your notebook, I notice you are viewing the kmeans clusters by calling spectral.imshow(m). That works but it gives you a gray-scale image, which probably isn't best for seeing the different classes. Try viewing it with spectral.imshow(classes=m) instead and see if that doesn't work better for you. Also, SPy progress output usually doesn't play nice in an IPython notebook so you can disable that with

spectral.settings.show_progress = False

@epifanio
Copy link
Author

Thanks for the hints! I updated the notebook with the last change you suggest me.
it is at the same link
I got an error in the envi.save_image function, probably my fault for a wrong usage.

@tboggs
Copy link
Member

tboggs commented Mar 18, 2014

envi.save_image expects the image to have shape (nrows, ncols, nbands) but the the cluster map, m, only has shape (nrows, ncols). You can get around this by saving the cluster map like this:

envi.save_image('spy.hdr', m[:, :, np.newaxis], metadata=img.metadata)

@tboggs tboggs closed this as completed in 3800f32 Mar 18, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants