Please use this identifier to cite or link to this item: https://hdl.handle.net/1946/33577
Breast cancer screening is usually done via digital mammography imaging. These screenings are creating large data sets that open up the possibility for further study in medical image analysis. Data sets of images may appear high dimensional when every pixel is considered as a dimension. However, for natural images or images belonging to a given category the signal can usually be represented with much fewer dimensions. In recent years, the advances in low dimensional representation of images can to a large extent be attributed to deep learning based methods. In this study, we apply two unsupervised methods for feature extraction on a data set of 4648 mammograms. First, we apply stochastic singular value decomposition (SSVD) to extract 200 features. Second, we train a convolutional autoencoder (CAE) with a bottleneck of size 200. With the two representations of the images, we do a feature-age correlation study. The CAE model features correlate significantly more with age than those of the SSVD model. Finally, we run the features through a correlation study with roughly 20,000 phenotypes. 190 breast related phenotypes were found to significantly correlate with features from the CAE model, while only 2 from the SSVD model. Our study suggests that CAE models are a good starting point for creating biologically meaningful phenotypes from mammograms.
Filename | Size | Visibility | Description | Format | |
---|---|---|---|---|---|
brynjar_MSc_thesis_final.pdf | 1.14 MB | Open | Complete Text | View/Open |