Data quality and its impact on patient diagnostics

Bots 'N Brains October 15, 2020 0 Comments

Data quality is an important factor for the success or failure of a machine learning system; In fact, data quality is more important than machine learning algorithm. There are two main factors that affect data quality: dataset and model. The dataset is sent to the model to learn. Machine learning is not possible outside of this dataset and size and variation define how easily a model can learn from it. Therefore data scientists play an important role in terms of algorithm scaling.

AI may fail (unreliable) because the data is not representative or it is not suitable for the task at hand. Therefore, it is important to ensure that the data quality needed to make Medical AI more reliable and that the algorithms are strong enough and adequate for the purpose. In short, determining the security and effectiveness of AI is based on verification of data quality and verification of its compatibility with the algorithm model. Furthermore, as AI has the potential to change over time, verification and validation processes should not be used as a one-time premarket activity, but rather throughout the life cycle of the system, from initial design and clinical compliance to its post-market use until decommissioning. Continuous assurance of the safety of the AI-based device and its life cycle performance helps regulators, physicians and patients gain confidence in machine learning AI.

There are many factors that contribute to data quality, including the completeness, accuracy and accuracy of the data; Quote; Bias; And consistency in data labeling (e.g., different labels may mean the same thing but the algorithm treats them differently).

Dataset citations contain variables and biases that apply to humans so that the AI ​​solution can detect it.

Any bias in the dataset will affect the performance of the machine learning system. There are many resources, including population, frequency frequency and instrumental bias.

Having a system that is inadvertently biased on one subset of the patient population results in poor model performance when confronted with another subset and eventually leads to health care inequalities. When working with quality data, there may be instances of intentional bias (also known as positive bias), such as a dataset designed for people over the age of 70 to view age-related health issues.

When considering the application of a dataset for a machine learning application, it is important to understand the claims it makes. It depends on whether the data can be reproduced, and whether any citations are reliable, or whether a proper balance has been achieved in the representative population classes. For example, a dataset may contain chest X-rays from men aged 18-30 in a particular country, half of whom have pneumonia. This dataset cannot be said to indicate pneumonia in females. Since this subgroup may not be listed in the dataset variables and may not be explicitly represented in the sample size, it cannot be said that it represents young males in a particular ethnic group.

Trained in AI model dataset. It learns trained variables and annotations on the dataset. In healthcare, most neural networks are trained in the dataset, evaluated for accuracy and then used for inference (e.g., by implementing the model on new images).

It is important to understand what the model can reliably identify (e.g., model arguments). Neural networks can generalize a bit, allowing them to learn things that are slightly different from their training dataset. For example, a carefully trained model on male chest X-rays may work well on the female population or even with different X-ray devices. The only way to verify this is to display the trained model with the new test dataset. Depending on the model’s performance, AI can accurately diagnose pneumonia in both male and female patients and demonstrate that it can be normalized on various X-ray machines. There may be minor differences in performance between datasets, but they may still be much more accurate than human.

In summary, AI learns the variables, bias, and annotations of a dataset, with the assumption that it can identify an important feature. After training, an algorithm is tested, which indicates that this feature can be detected with a certain level of accuracy. To test a claim that AI can detect a specific item, it needs to be tested in a dataset that specifies this feature as fair. If it performs satisfactorily on this dataset, the model can claim to be able to detect this feature in future datasets that share the same variable as the test dataset.

The following example shows an example of a poor quality dataset and its incorrect relationship with the algorithm model causing failure in the output.

A custom learning taxonomy [1] analyzed photographs to distinguish between wolves and huskies. Instead of identifying the distinctive features between the two dog breeds, the system found that photos of huskies had snow in the background, but no photos of wolves. The system’s conclusions are not relevant to its training data but are used in real-world scenarios because external and inappropriate variables (i.e., backgrounds) are included in the learning dataset. [2] This is an example of how the AI ​​system in a dataset can detect random patterns or correlations and assign a false or inconsistent causal or meaningful relationship.

AboutPrathamesh Gosavi

Leave a Reply

Your email address will not be published. Required fields are marked *