Asked by Ashley

I'm currently working on a project to see which data imputation method works best with a dataset I have.
I have the complete dataset.

Independent variable : Yield of the crop
Dependent variables : Year , Season , Production per hectare

So I'm planning to apply data imputation methods such as Multiple Linear Regression, KNN, Polynomial Interpolation.

My method is to randomly remove some independent variable fields(test set) and then try to imputate them using above techniques by training the rest of the dataset using above techniques, and comparing with the original Yield value.
Then I plan to select the data imputation method which works best for this dataset.

Consider this procedure done using Python programming language.(Google Colab environment)

Now I've coded upto the part where I've trained the model using 80:20 train:test data ratio.

I've computed the linear regression coefficients and my test dataset already have been inserted with the Yield values from the model.

Since, I need graphical and statistical evidence of the efficiency and accuracy of each model, how am I supposed to impute Yield values to the whole dataset and compare with original Yield values.

Do I have to manually create an equation containing the equation of the linear model , substitute independent variables and then find the Yield values from the model and then then compare with the original Yield value?

Is there any code that automatically adds a column with the Yield values derived from the linear regression model, for the whole dataset, just any method that will give the estimate values for all the Yield values in the dataset.

4 years ago

There are no human answers yet.

There are no AI answers yet. The ability to request AI answers is coming soon!

Submit Your Answer

We prioritize human answers over AI answers.

If you are human, and you can answer this question, please submit your answer.

Asked by Ashley

Answers

Submit Your Answer

Related Questions