The test dataset is massive with 900K customers, 11M rows, and 191 columns, including both numerical and categorical features. The binary target variable, default or no default, is determined by whether a customer pays back their outstanding credit card balance within 120 days of the statement date.įigure 1 shows an overview of the problem and dataset, highlighting the key aspects of credit default prediction and the characteristics of the dataset. The aim is to predict if a customer will default on their credit card balance in the future, using their past monthly customer profile data. This made it a great fit for the model that we were coding. NVIDIA Triton supports tree-based models such as XGBoost, LightGBM, and more through its Forest Inference Library backend. To support deployment, NVIDIA Triton is a high-performance, multi-model inference server supporting both GPU and CPU that enables the easy deployment of models from a variety of frameworks, such as TensorFlow, PyTorch, and ONNX. In this case, it supports data preprocessing and exploratory data analysis at the beginning of the workflow. It includes a variety of tools and libraries for data preprocessing, ML, and visualization. RAPIDS is a suite of open-source software libraries and APIs designed to accelerate data science workflows on GPUs. RAPIDS and Triton Inference Server both support key phases in the ML process. When you’re preparing a complex ML model, there are many steps to prepare, train, and deploy an effective model. Essential tools for data preparation and deployment Now, I describe how my team used this technique in our American Express Default Fault Prediction solution. Recently, deep neural networks have been widely used to generate high-quality new data, by exploiting their ability to learn the distribution from the existing data. They also can automatically extract features from raw data. Tree-based models, such as XGBoost, are mostly used for tabular datasets because they can handle noisy, redundant features and make it easy to interpret and understand the logic behind the predictions.ĭeep neural networks, on the other hand, excel at learning long-term temporal dependencies and sequential patterns in data. Tree-based models and deep neural networks are widely considered to be the most popular choices for ML practitioners. The key to solving this business problem successfully is to uncover the temporal patterns within the data. This dataset is highly representative of real-world scenarios: it is large, contains both numerical and categorical columns, and presents a time-series problem. American Express, the largest payment card issuer in the world, provided an industrial-scale dataset that includes time-series behavioral data and anonymized customer profile information. Future credit default predictionsĬredit default prediction is central to managing risk in a consumer lending business. This solution was one of the top 10 solutions out of 4,874 teams in the Kaggle American Express Default Prediction competition. GPU deployment results in significantly faster inference times. Using the American Express Default Prediction competition as an example, I explain how the multi-model solution can be deployed on either a GPU or a CPU. I demonstrate how NVIDIA RAPIDS supports data preparation and ML training for large datasets and how NVIDIA Triton Inference Server seamlessly serves both deep neural nets by PyTorch and tree-based models by XGBoost when predicting credit default. In this post, I discuss how to leverage the versatility of NVIDIA software to handle different types of models and integrate them into your application. A common example is when compatibility issues with different frameworks can lead to delayed insights.Ī solution that easily serves various combinations of deep neural nets and tree-based models and that is framework-agnostic would help simplify deployment and scale ML solutions as they take on multiple layers. Also, deploying complex multi-model ML solutions in production can be a challenging task. Training models effectively requires large, diverse datasets that may require multiple models to predict effectively. Today’s machine learning (ML) solutions are complex and rarely use just a single model.
0 Comments
Leave a Reply. |