Breast Cancer
Prediction

Project Overview

Breast Cancer Prediction is a classification task aimed at predicting the diagnosis of a breast mass as either malignant or benign. The dataset used for this prediction consists of features computed from a digitized image of a fine needle aspirate (FNA) of the breast mass. These features describe various characteristics of the cell nuclei present in the image.

About the Dataset:

The dataset contains the following information for each instance:

  • ID number: A unique identifier for each sample.
  • Diagnosis: The target variable indicating the diagnosis, where 'M' represents malignant and 'B' represents benign.
  • For each cell nucleus, ten real-valued features are computed, which are:

  • Radius: The mean distance from the center to points on the perimeter of the nucleus.
  • Texture: The standard deviation of gray-scale values in the nucleus.
  • Perimeter: The perimeter of the nucleus.
  • Area: The area of the nucleus.
  • Smoothness: A measure of local variation in radius lengths.
  • Compactness: Computed as the square of the perimeter divided by the area minus 1.0.
  • Concavity: Describes the severity of concave portions of the nucleus contour.
  • Concave points: Represents the number of concave portions of the nucleus contour.
  • Symmetry: Measures the symmetry of the nucleus.
  • Fractal dimension: This feature approximates the "coastline" of the nucleus, using the concept of fractal geometry.
  • These features provide quantitative measurements that can be used to assess the characteristics of cell nuclei and aid in distinguishing between malignant and benign breast masses. By training a machine learning model on this dataset, it is possible to develop a predictive model that can assist in the early detection and diagnosis of breast cancer.

    View Notebook Download Dataset