Weld Quality Prediction Using Machine Learning

Overview

As part of an academic project at CentraleSupélec, our team—comprising Louis Gauthier and Clément Florval of Digiwave, in collaboration with three other students—developed a machine learning model to predict the quality of welds based on mechanical, physical, and chemical parameters. The objective was to create a predictive tool that could assess weld quality without resorting to destructive testing methods.

Weld Quality Prediction Interface

Project Details

Objective

The primary goals were to:

Predict weld quality using parameters obtainable without destructive testing.
Identify factors influencing key properties like tensile strength, elasticity, and ductility.
Reduce reliance on expert welders by capturing and analyzing welding data.

Data Source

The dataset used for this project is publicly available and can be accessed via the following link: WeldDB Dataset. Compiled by Tracey Cool and H. K. D. H. Bhadeshia from the University of Cambridge, it contains detailed information on various weld samples, including their chemical compositions and mechanical properties.

Challenges and Complexity

The project presented several complexities:

Undefined Target Variables: The dataset did not have explicit target variables representing weld quality. We had to deeply understand the data to define appropriate target variables that indicate weld quality, such as yield strength, ultimate tensile strength, and elongation.
Missing Data: A significant portion of the data had missing values, especially in the target variables. We conducted a thorough analysis of missingness types (MCAR, MAR, MNAR) to decide on appropriate imputation strategies.
Destructive vs. Non-Destructive Tests: Many variables were results of destructive tests (e.g., tensile strength tests), which cannot be used as input features in real-world scenarios where the weld cannot be destroyed (e.g., bridges, pipelines). We focused on predicting these properties using non-destructive parameters.
Variable Interdependencies: Certain variables, like the Charpy impact toughness, were dependent on testing conditions such as temperature, making them challenging to model without proper domain understanding.
Domain Knowledge Integration: The project required integrating welding domain knowledge to make informed decisions during data preprocessing, feature selection, and model development.

Solution

Data Exploration and Preprocessing:
- Understanding the Data: Performed an in-depth analysis to identify which variables could serve as targets and which should be inputs. Recognized that variables resulting from destructive tests should not be used as inputs but could be predicted.
- Handling Missing Data: Analyzed missing data patterns and applied appropriate imputation methods. Used semi-supervised learning techniques to fill missing target values, acknowledging that some missingness was due to the nature of the testing methods.
- Feature Engineering: Excluded variables resulting from destructive tests as inputs and focused on non-destructive parameters. Transformed categorical variables and handled outliers effectively.
- Normalization and Standardization: Standardized feature variables to have a mean of 0 and a standard deviation of 1 to improve model performance.
Correlation Analysis and PCA:
- Correlation Matrix: Computed correlations to identify significant predictors for each target variable.
- Principal Component Analysis: Applied PCA to reduce dimensionality and identify the most impactful variables, aiding in eliminating redundancy.
Machine Learning Models:
- Model Selection: Tested various regression models, including Linear Regression, Random Forest, Gradient Boosting, Support Vector Machines, and XGBoost.
- Hyperparameter Tuning: Utilized grid search and cross-validation to optimize model hyperparameters.
Semi-Supervised Learning:
- Handling Limited Labels: Implemented semi-supervised learning to leverage unlabeled data, improving the model's ability to generalize from limited labeled instances.
- Iterative Self-Training: Used self-training techniques to iteratively predict missing target values and retrain the model.
Model Evaluation and Feature Importance:
- Validation Protocol: Employed rigorous cross-validation protocols to ensure robust evaluation.
- Metrics: Evaluated models using RMSE and R² score, focusing on standardized targets.
- Feature Importance Analysis: Used the trained model to identify which variables most significantly affect weld quality, providing valuable insights for process optimization.
Web Application Development:
- User Interface: Developed a Flask-based web interface for easy interaction with the model.
- Functionality: Allowed users to input welding parameters manually or upload CSV files for batch predictions.

Key Features

Non-Destructive Evaluation: Predicts weld quality using non-destructive parameters, eliminating the need for physical tests that could damage the weld.
Domain Knowledge Integration: Leveraged expertise in welding to make informed decisions throughout the project.
Advanced Data Analysis: Utilized techniques like PCA, cluster analysis, and feature importance analysis to understand the data deeply and determine the most critical factors affecting weld quality.
User-Friendly Interface: The web app facilitates easy interaction with the model, making it accessible for industrial applications.

Technologies Used

Python: For data processing, model development, and web application.
Pandas and NumPy: Data manipulation and numerical computations.
Scikit-Learn and XGBoost: Machine learning algorithms and model evaluation.
Flask: Web framework for developing the user interface.
Matplotlib and Seaborn: Data visualization.

Results

Accuracy: Achieved high predictive accuracy with an RMSE of 0.5 on standardized target variables (mean = 0, standard deviation = 1) and R² scores above 0.8 for key target variables.
Efficiency: Reduced the need for destructive testing by providing reliable predictions based on readily available parameters.
Insightful Analysis: Through feature importance analysis and PCA, identified the most influential variables on weld quality, such as certain chemical compositions and welding parameters.
Practical Application: The web application demonstrates potential real-world usage in industries where weld quality is critical.

True vs Predicted Values

Conclusion

This project underscores the importance of domain knowledge in data science projects, especially when dealing with complex datasets lacking explicit target variables. By thoroughly understanding the welding process and the nature of the data, we were able to:

Define Appropriate Targets: Identified which mechanical properties best represent weld quality, acknowledging that some variables are results of destructive tests and should not be used as inputs.
Handle Data Challenges: Addressed missing data issues and the complexities arising from variables dependent on testing conditions, such as the Charpy impact toughness and temperature.
Leverage Advanced Techniques: Applied semi-supervised learning, PCA, clustering, and feature importance analysis to enhance model performance and uncover hidden patterns.

Recommendations for Obtaining High-Quality Welds:

Optimize Welding Parameters: Use the predictive model and insights from feature importance analysis to adjust parameters like current, voltage, and material composition for better weld quality.
Continuous Data Collection: Regularly collect welding data to refine the model and adapt to new welding technologies or materials.
Training and Knowledge Sharing: Educate welding personnel using insights derived from the data analysis to enhance their understanding of factors affecting weld quality.

Repository and Notebook

Project Repository: Weld Quality Prediction Repository
Main Notebook: Weld Quality Prediction Notebook

This project was developed by Louis Gauthier and Clément Florval of Digiwave, in collaboration with three other students from CentraleSupélec as part of an academic course. For more information about our services, visit Digiwave's Portfolio.

Weld Quality Prediction Using Machine Learning

Weld Quality Prediction Using Machine Learning

Overview

Project Details

Objective

Data Source

Challenges and Complexity

Solution

Key Features

Technologies Used

Results

Conclusion

Repository and Notebook

Ready to Work with Us?