Introduction
Machine learning holds immense importance in today's world, revolutionizing various industries with its predictive capabilities. Cleansing data is a fundamental step before diving into model creation, whether it's sourced statically or via APIs. The essence lies in understanding the variables that drive these models. This article aims to put forward the essense of variables in the realm of machine learning.
X - the independent variable
Consider a housing price dataset encompassing columns like "LotArea," "Street," "Alley," and "LotShape." These attributes, independent of each other, constitute what we categorize as the variable 'X' in machine learning. For instance:
import pandas as pd
from sklearn.model_selection import train_test_split
test_path = "/content/test (1).csv"
train_path = "/content/train (1).csv"
test_data_full = pd.read_csv(test_path)
train_data_full = pd.read_csv(train_path)
features = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']
X = train_Data_full[features]
the above code imports the data and with a limited features it fills the X - independent variable with the required data.
y : the dependent variable
The 'Y' variable typically represents the data we aim to predict, dependent on the features provided. In the context of this article's dataset:
import pandas as pd
from sklearn.model_selection import train_test_split
test_path = "/content/test (1).csv"
train_path = "/content/train (1).csv"
test_data_full = pd.read_csv(test_path)
train_data_full = pd.read_csv(train_path)
features = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']
y = train_data_full.SalePrice
This code fetches the dependent data, 'y,' from the dataset, in this case, represented by 'SalePrice.'
Understanding the distinction between independent ('X') and dependent ('y') variables forms the cornerstone of developing effective machine learning models. By correctly assigning and manipulating these variables, the predictive power of the model can be greatly enhanced.