Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training regression model from DataView #1235

Closed
derekendres opened this issue Oct 12, 2018 · 9 comments
Closed

Training regression model from DataView #1235

derekendres opened this issue Oct 12, 2018 · 9 comments
Labels
documentation Related to documentation of ML.NET question Further information is requested

Comments

@derekendres
Copy link

How do I train a regression models if I bring in the data via a DataView from a custom end point via the IEnumerable<>? I tried following the cookbook example but I couldn't find the right trainer to do this.

@sfilipi sfilipi added the documentation Related to documentation of ML.NET label Oct 15, 2018
@derekendres
Copy link
Author

I guess I am still stuck. In my scenario I am not reading from a data file (it is coming in from a REST API source) and therefore I don't have the reader from TextLoader.CreateReader(....) to create the learningPipeline from, using reader.MakeNewEstimator(...).

@sfilipi
Copy link
Member

sfilipi commented Oct 17, 2018

see if the dynamic style of the API helps:
https://github.com/dotnet/machinelearning/blob/master/test/Microsoft.ML.Tests/Scenarios/Api/CookbookSamples/CookbookSamples.cs#L369

to https://github.com/dotnet/machinelearning/blob/master/test/Microsoft.ML.Tests/Scenarios/Api/CookbookSamples/CookbookSamples.cs#L400

Also, might be easier if you post a snippet of your code? which estimator do you want to use first? you can just instantiate it

@derekendres
Copy link
Author

Yeah. That helped. I think I can get it going. I had looked at before but re-looking at it today helped today. For some reason I was thinking I had to have a certain start to the EstimatorChain. I am going to spend time on it tomorrow so I will update my response more concretely then.

@derekendres
Copy link
Author

I think I am at an odd spot waiting for the .7 version of the nuget (or building the sourcelocally). The examples now use the MLContext which isn't in .6 (at least in .NET core that I added from NuGet today), but it looks like once I have that, I should be on the right path with what I have below.

This is the code I have, the things commented out are because currently they don't produce a happy compiler, but are what I am expecting. Basically I am taking the taxi example and instead of getting the data from the file, I uploaded the file to a database and want to get it from there.

        IEnumerable<ocsObj> ocsData = dataService.GetRangeValuesAsync<ocsObj>(streamId, first._time.ToString(), 10).Result;

          // Microsoft.ML.Data,NLContext
        //var mlContext = new MLContext();

        LinearRegressionPredictor pred = null;
        var env = new LocalEnvironment();
        var regressionContext = new RegressionContext(env);

        var trainData = env.CreateStreamingDataView(ocsData);


        var dynamicLearningPipeline = new CopyColumnsEstimator(env, "fare_amount", "label").Append(new ConcatEstimator(env, "features", "rate_code", "passenger_count", "trip_time_in_secs", "trip_distance"))
            //.Append(regressionContext.Trainers.Sdca("label", "Features")
            //or
            //.Append(r => (r.label, score: regressionContext.Trainers.Sdca(r.label,r.features,l1Threshold: 0f,maxIterations: 100,onFit: p => pred = p)))            
                    ;


        var dynamicModel = dynamicLearningPipeline.Fit(trainData );

@artidoro artidoro added the question Further information is requested label Oct 27, 2018
@artidoro
Copy link
Contributor

If you want to play around with the current version of the code (before .7), you can build the packages locally.

Just follow this doc to build locally: https://github.com/dotnet/machinelearning/blob/master/docs/project-docs/developer-guide.md

And then to produce the packages locally you can run the command: build.cmd -buildPackages

@artidoro
Copy link
Contributor

If you don't have any other questions on this topic, I will close the issue in a few days.

@derekendres
Copy link
Author

Compiling .7 locally works. Thanks

@derekendres
Copy link
Author

It works to get the context but I am still not getting a good result from the linear regression. Using
0.7.0-preview-27030-0. not sure if I should reopen this one, or move it to a different issue...

Basically I am getting a 0 for my prediction when I do the below code. I get an error if I don't include the Data ColumnName on a property of ocsObj. (Data was added to ocsObj even though there is no values there at all). The error I get if I don't include the Data ColumnName on the PredictionFunction TSrc type is: System.InvalidOperationException: 'Column 'Data' not found in the data view'.

        IEnumerable<ocsObj> ocsData   = dataService.GetRangeValuesAsync<ocsObj>(streamId, first.index.ToString(), 10).Result;
        IEnumerable<ocsObj>  ocsData_test = dataService.GetRangeValuesAsync<ocsObj>(streamId_test, first.index.ToString(), 10).Result;
        
		
    var mlContext = new MLContext();
        var trainData = mlContext.CreateStreamingDataView(ocsData);
		
        var dynamicLearningPipeline = new CopyColumnsEstimator(mlContext, "fare_amount", "Label").Append(new ConcatEstimator(mlContext, "Features", "rate_code", "passenger_count", "trip_time_in_secs", "trip_distance")).Append(mlContext.Regression.Trainers.FastTree());

        var dynamicModel = dynamicLearningPipeline.Fit(trainData);

        var metrics = mlContext.Regression.Evaluate(dynamicModel.Transform(testData), "Label");

        var predictionFunc = dynamicModel.MakePredictionFunction<ocsObj, ocsObjPrediction>(mlContext);
        var input = ocsData_test.First();

        var prediction = predictionFunc.Predict(input);


        Console.WriteLine("Prediction {0} \n Actual {1}", prediction.Data, input.fare_amount);
		
		
		
public class ocsObj
{
    [ColumnName("Data")] //needed for prediction function, but i am just assigning to this, because I don't know why this is needed...
    public float Data { get; set; } //added only for make Prediction function
    public float fare_amount { get; set; }
    public float rate_code { get; set; }
    public float passenger_count { get; set; }
    public float trip_time_in_secs { get; set; }
    public float trip_distance { get; set; }
    public string vendor_id { get; set; }
    public string payment_type { get; set; }
    public String index { get; set; }
}

public class ocsObjPrediction
{
    [ColumnName("Data")]
    public float Data { get; set; }
}

@derekendres derekendres reopened this Oct 30, 2018
@ghost ghost locked as resolved and limited conversation to collaborators Mar 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Related to documentation of ML.NET question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants