What is Training Dataset in Machine Learning

In machine learning, a common task is the study and construction of algorithms that can learn from and make predictions on data. Such algorithms are used to make data-driven predictions or decisions, and to build a mathematical model from input data. These input data used to build the model are usually divided into multiple data sets. In particular, three data sets are commonly used in different stages of the creation of the model: training, validation, and test sets.

A training data set is a sample data set used during the learning process and is used to fit the parameters (e.g, weights) of a predictive data model.

For classification tasks, a supervised learning algorithm looks at the training data set to determine, or learn, the optimal combinations of variables that will generate a good predictive data model. The goal is to produce a trained (fitted) model that generalizes well to new, unknown data. The fitted model is evaluated using “new” examples from the held-out datasets (validation and test datasets) to estimate the model’s accuracy in classifying new data. To reduce the risk of issues such as over-fitting, the examples in the validation and test datasets should not be used to train the model.

In this tutorial, I tried to brief about the training dataset in Machine Learning. Hope you have enjoyed the tutorial. If you want to get updated, like the Facebook page https://www.facebook.com/LearningBigDataAnalytics and stay connected.

Add a Comment