Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Numerical categorical columns are not supported #442

Open
pieths opened this issue Feb 18, 2020 · 0 comments
Open

Numerical categorical columns are not supported #442

pieths opened this issue Feb 18, 2020 · 0 comments

Comments

@pieths
Copy link
Collaborator

pieths commented Feb 18, 2020

NimbusML only has support for string based categorical columns. Numerical categorical columns (KeyDataViewType) which are returned from ML.Net are not converted back to their original representation even though Pandas does support it. See the age_1 column below for an example.

import numpy
from pandas import DataFrame, Series, concat, Categorical, to_datetime
from nimbusml import Pipeline
from nimbusml import FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.preprocessing import ToKey, FromKey

path = get_dataset('infert').as_filepath()
data = FileDataStream.read_csv(path, sep=',', numeric_dtype=numpy.float32,
                               names={0: 'id'}).to_df()
print(data.head())
print(data.dtypes)

pipeline = Pipeline([ToKey(columns={'age_1': 'age', 'edu_1': 'education'})])

features = pipeline.fit_transform(data)
print(features.head())
print(features.dtypes)

cat = Categorical.from_codes([0, 1, 2, 1], ['a', 'b', 'c'])
print(cat)
cat = Categorical.from_codes([0, 1, 2, 1], [4.2, 5.1, 6.34])
print(cat)
cat = Categorical.from_codes([0, 1, 2, 1], [10, 11, 12])
print(cat)
    id education   age  parity  induced  case  spontaneous  stratum  pooled.stratum
0  1.0    0-5yrs  26.0     6.0      1.0   1.0          2.0      1.0             3.0
1  2.0    0-5yrs  42.0     1.0      1.0   1.0          0.0      2.0             1.0
2  3.0    0-5yrs  39.0     6.0      2.0   1.0          0.0      3.0             4.0
3  4.0    0-5yrs  34.0     4.0      2.0   1.0          0.0      4.0             2.0
4  5.0   6-11yrs  35.0     3.0      1.0   1.0          1.0      5.0            32.0
id                float32
education          object
age               float32
parity            float32
induced           float32
case              float32
spontaneous       float32
stratum           float32
pooled.stratum    float32
dtype: object
    id education   age  parity  induced  case  spontaneous  stratum  pooled.stratum  age_1    edu_1
0  1.0    0-5yrs  26.0     6.0      1.0   1.0          2.0      1.0             3.0      0   0-5yrs
1  2.0    0-5yrs  42.0     1.0      1.0   1.0          0.0      2.0             1.0      1   0-5yrs
2  3.0    0-5yrs  39.0     6.0      2.0   1.0          0.0      3.0             4.0      2   0-5yrs
3  4.0    0-5yrs  34.0     4.0      2.0   1.0          0.0      4.0             2.0      3   0-5yrs
4  5.0   6-11yrs  35.0     3.0      1.0   1.0          1.0      5.0            32.0      4  6-11yrs
id                 float32
education           object
age                float32
parity             float32
induced            float32
case               float32
spontaneous        float32
stratum            float32
pooled.stratum     float32
age_1                int32
edu_1             category
dtype: object
[a, b, c, b]
Categories (3, object): [a, b, c]
[4.20, 5.10, 6.34, 5.10]
Categories (3, float64): [4.20, 5.10, 6.34]
[10, 11, 12, 11]
Categories (3, int64): [10, 11, 12]
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant