Summary
The objective of this project is to investigate scalability questions arising with a new wave of smart relational data management systems that integrate analytics and query processing. These questions will be addressed by a fundamental shift from centralized processing on tabular data representation, as supported by traditional systems and analytics software packages, to distributed and approximate processing on factorized data representation.
Factorized representations exploit algebraic properties of relational algebra and the structure of queries and analytics to achieve radically better data compression than generic compression schemes, while at the same time allowing processing in the compressed domain. They can effectively boost the performance of relational processing by avoiding redundant computation in the one-server setting, yet they can also be naturally exploited for approximate and distributed processing. Large relations can be approximated by their subsets and supersets, i.e., lower and upper bounds, that factorize much better than the relations themselves. Factorizing relations, which represent intermediate results shuffled between servers in distributed processing, can effectively reduce the communication cost and improve the latency of the system.
The key deliverables will be novel algorithms that combine distribution, approximation, and factorization for computing mixed loads of queries and predictive and descriptive analytics on large-scale data. This research will result in fundamental theoretical contributions, such as complexity results for large-scale processing and tractable algorithms, and also in a scalable factorized data management system that will exploit these theoretical insights. We will collaborate with industrial partners, who are committed to assist in providing datasets and realistic workloads, infrastructure for large-scale distributed systems, and support for transferring the products of the research to industrial users.
Factorized representations exploit algebraic properties of relational algebra and the structure of queries and analytics to achieve radically better data compression than generic compression schemes, while at the same time allowing processing in the compressed domain. They can effectively boost the performance of relational processing by avoiding redundant computation in the one-server setting, yet they can also be naturally exploited for approximate and distributed processing. Large relations can be approximated by their subsets and supersets, i.e., lower and upper bounds, that factorize much better than the relations themselves. Factorizing relations, which represent intermediate results shuffled between servers in distributed processing, can effectively reduce the communication cost and improve the latency of the system.
The key deliverables will be novel algorithms that combine distribution, approximation, and factorization for computing mixed loads of queries and predictive and descriptive analytics on large-scale data. This research will result in fundamental theoretical contributions, such as complexity results for large-scale processing and tractable algorithms, and also in a scalable factorized data management system that will exploit these theoretical insights. We will collaborate with industrial partners, who are committed to assist in providing datasets and realistic workloads, infrastructure for large-scale distributed systems, and support for transferring the products of the research to industrial users.
Unfold all
/
Fold all
More information & hyperlinks
Web resources: | https://cordis.europa.eu/project/id/682588 |
Start date: | 01-06-2016 |
End date: | 31-05-2022 |
Total budget - Public funding: | 1 980 966,00 Euro - 1 980 966,00 Euro |
Cordis data
Original description
The objective of this project is to investigate scalability questions arising with a new wave of smart relational data management systems that integrate analytics and query processing. These questions will be addressed by a fundamental shift from centralized processing on tabular data representation, as supported by traditional systems and analytics software packages, to distributed and approximate processing on factorized data representation.Factorized representations exploit algebraic properties of relational algebra and the structure of queries and analytics to achieve radically better data compression than generic compression schemes, while at the same time allowing processing in the compressed domain. They can effectively boost the performance of relational processing by avoiding redundant computation in the one-server setting, yet they can also be naturally exploited for approximate and distributed processing. Large relations can be approximated by their subsets and supersets, i.e., lower and upper bounds, that factorize much better than the relations themselves. Factorizing relations, which represent intermediate results shuffled between servers in distributed processing, can effectively reduce the communication cost and improve the latency of the system.
The key deliverables will be novel algorithms that combine distribution, approximation, and factorization for computing mixed loads of queries and predictive and descriptive analytics on large-scale data. This research will result in fundamental theoretical contributions, such as complexity results for large-scale processing and tractable algorithms, and also in a scalable factorized data management system that will exploit these theoretical insights. We will collaborate with industrial partners, who are committed to assist in providing datasets and realistic workloads, infrastructure for large-scale distributed systems, and support for transferring the products of the research to industrial users.
Status
CLOSEDCall topic
ERC-CoG-2015Update Date
27-04-2024
Images
No images available.
Geographical location(s)