Annonce
Postdoc - Computationaly efficient kernel non-negative factorization and applications to mass spectrometry-based proteomic data unmixing
26 Juillet 2024
Catégorie : Post-doctorant
- Laboratory: Gipsa-lab (GAIA department) / CEA (EDyP Lab)
- Advisors: Thomas Burger (CEA, CNRS), Antoine Chatalic (Gipsa-Lab, CNRS)
- Topic: Signal processing, machine learning
- Keywords: NMF, factor analysis, sparse analysis, kernel methods, kernel approximations
- Duration: 24 months
- Gross salary: between 2991€, and 4166€/month depending on expertise
## Subject
Factor analysis techniques, which consist in finding a factorization X≈DW of a matrix of observations X in a dictionary D and corresponding weights W, have shown to be a powerful tool for a variety of applications. We are interested in the setting where a positivity constraint is imposed on D and W which is known as non-negative matrix factorization (NMF) (Gillis, 2020), and where the columns or rows of W may additionally be constrained to be sparse. When modeling a mixing operation which is not strictly linear, more flexibility can be obtained by lifting the columns of X using a non-linear transformation, yielding approaches such as kernel NMF (Li et al. 2012).
Even though many computationally efficient algorithms have been proposed for NMF and sparse factorization (Mensch et al. 2018; Schnass et al. 2020), the application of these techniques on large datasets remains challenging, and the case of kernel NMF has been less studied. We propose to work in this direction and develop efficient and theoretically-grounded algorithms for kernel NMF. To achieve this goal, we will in particular consider algorithms that learn the dictionary D before learning the weights W, rather than performing both operations in an alternative manner. Inspired by works works in archetypal analysis (Courty et al. 2014), we will devise greedy selection rules to build the dictionary by iteratively selecting some of the dataset entries. The use of compressive learning strategies, based on a parametric model for the weights, will also be investigated (Gribonval et al. 2021). Kernel approximation techniques such as the Nyström method will also be used to deal efficiently with the high or infinite dimension induced by the data embedding.
The proposed methods will be applied on data resulting from high-throughput mass spectrometry-based proteomic analysis of complex biological samples, i.e., data consisting of mass spectra collected over time. In this context, working with kernel embeddings has shown empirically to provide good performance with clustering models (Permiakova et al. 2021), but has not been investigated with NMF.
## Required Skills
The candidate holds or is about to complete a PhD degree in signal processing, computer science, mathematics, statistics or a closely-related field. The research topic involves both a theoretical component, and a practical component which will consist in the implementation of the proposed algorithms for subsequent benchmarking on proteomic data. Some prior knowledge of factor analysis techniques and kernel methods will be appreciated, as well as having an interest for working in an interdisciplinary context with applications in biology.
## Contact
Applications (CV and application letter) should be sent by September 15 at
- antoine.chatalic@cnrs.fr
- thomas.burger@cea.fr
## References
- Courty, Nicolas, Xing Gong, Jimmy Vandel, and Thomas Burger (2014). “SAGA: Sparse and Geometry-Aware Non-Negative Matrix Factorization through Non-Linear Local Embedding”. In: Machine Learning 97.1, pp. 205–226. doi: 10.1007/s10994-014-5463-y.
- Gillis, Nicolas (2020). Nonnegative Matrix Factorization.
- Gribonval, Rémi, Antoine Chatalic, Nicolas Keriven, Vincent Schellekens, Laurent Jacques, and Philip Schniter (2021). “Sketching Data Sets for Large-Scale Learning: Keeping Only What You Need”. In: IEEE Signal Processing Magazine 38.5, pp. 12–36. doi: 10.1109/MSP.2021.3092574.
- Li, Yifeng and Alioune Ngom (2012). “A New Kernel Non-Negative Matrix Factorization and Its Application in Microarray Data Analysis”. In: 2012 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB). San Diego, CA, USA: IEEE, pp. 371–378. doi: 10.1109/CIBCB.2012.6217254.
- Mensch, Arthur, Julien Mairal, Bertrand Thirion, and Gael Varoquaux (2018). “Stochastic Subsampling for Factorizing Huge Matrices”. In: IEEE Transactions on Signal Processing 66.1, pp. 113–128. doi: 10.1109/TSP.2017.2752697.
- Permiakova, Olga, Romain Guibert, Alexandra Kraut, Thomas Fortin, Anne-Marie Hesse, and Thomas Burger (2021). “CHICKN: Extraction of Peptide Chromatographic Elution Profiles from Large Scale Mass Spectrometry Data by Means of Wasserstein Compressive Hierarchical Cluster Analysis”. In: BMC Bioinformatics 22.1, p. 68. doi: 10.1186/s12859-021-03969-0.
- Schnass, Karin and Flavio Teixeira (2020). “Compressed Dictionary Learning”. In: Journal of Fourier Analysis and Applications 26.2, p. 33. doi: 10.1007/s00041-020-09738-6.