Data preparation in our example basically means bringing all four training datasets together (and new data datasets too). Final result of this exercise should be a table with 5298 rows, one row per customer. The data should be ready for training the model, which includes performing a series of data transformations before we start eventually building the model.
Dataflow: Account Info
- Conversion of TotalCharges to Numeric.
- Transformation of DOE into separate features for Year and Month and then Min-Max normalisation of year and Onehot transformation for Month. As a result one new column called Year and 11 Month_xxx are created.
- Onehot transformations for ElectronicBilling, ContractType and PaymentMethod are also performed.
- For two columns need we need to substract 1 to get values 0 and 1.
- Onehot transformation is done for two columns.
- Two columns are removed (Country, State).
- import original dataset TCP Original Train Service DS,
- filter on selected TypeOfService,
- perform onehot transformation on selected TypeOfService using ServiceDetails,
- remove original column for TypeOfService and ServiceDetails.
- Fiber optic,
- DSL and
Dataflow: Bringing them all together
Setting up sequences and automating the whole thing
- Sequence to prepare training data
- Sequence to prepare new (testing) data