Summary
The goal of the project is to automatically analyse human activities observed in videos. Any solution to this problem will allow the development of novel applications. It could be used to create short videos that summarize daily activities to support patients suffering from Alzheimer's disease. It could also be used for education, e.g., by providing a video analysis for a trainee in the hospital that shows if the tasks have been correctly executed.
The analysis of complex activities in videos, however, is very challenging since activities vary in temporal duration between minutes and hours, involve interactions with several objects that change their appearance and shape, e.g., food during cooking, and are composed of many sub-activities, which can happen at the same time or in various orders.
While the majority of recent works in action recognition focuses on developing better feature encoding techniques for classifying sub-activities in short video clips of a few seconds, this project moves forward and aims to develop a higher level representation of complex activities to overcome the limitations of current approaches. This includes the handling of large time variations and the ability to recognize and locate complex activities in videos. To this end, we aim to develop a unified model that provides detailed information about the activities and sub-activities in terms of time and spatial location, as well as involved pose motion, objects and their transformations.
Another aspect of the project is to learn a representation from videos that is not tied to a specific source of videos or limited to a specific application. Instead we aim to learn a representation that is invariant to a perspective change, e.g., from a third-person perspective to an egocentric perspective, and can be applied to various modalities like videos or depth data without the need of collecting massive training data for all modalities. In other words, we aim to learn the essence of activities.
The analysis of complex activities in videos, however, is very challenging since activities vary in temporal duration between minutes and hours, involve interactions with several objects that change their appearance and shape, e.g., food during cooking, and are composed of many sub-activities, which can happen at the same time or in various orders.
While the majority of recent works in action recognition focuses on developing better feature encoding techniques for classifying sub-activities in short video clips of a few seconds, this project moves forward and aims to develop a higher level representation of complex activities to overcome the limitations of current approaches. This includes the handling of large time variations and the ability to recognize and locate complex activities in videos. To this end, we aim to develop a unified model that provides detailed information about the activities and sub-activities in terms of time and spatial location, as well as involved pose motion, objects and their transformations.
Another aspect of the project is to learn a representation from videos that is not tied to a specific source of videos or limited to a specific application. Instead we aim to learn a representation that is invariant to a perspective change, e.g., from a third-person perspective to an egocentric perspective, and can be applied to various modalities like videos or depth data without the need of collecting massive training data for all modalities. In other words, we aim to learn the essence of activities.
Unfold all
/
Fold all
More information & hyperlinks
Web resources: | https://cordis.europa.eu/project/id/677650 |
Start date: | 01-06-2016 |
End date: | 31-05-2021 |
Total budget - Public funding: | 1 499 875,00 Euro - 1 499 875,00 Euro |
Cordis data
Original description
The goal of the project is to automatically analyse human activities observed in videos. Any solution to this problem will allow the development of novel applications. It could be used to create short videos that summarize daily activities to support patients suffering from Alzheimer's disease. It could also be used for education, e.g., by providing a video analysis for a trainee in the hospital that shows if the tasks have been correctly executed.The analysis of complex activities in videos, however, is very challenging since activities vary in temporal duration between minutes and hours, involve interactions with several objects that change their appearance and shape, e.g., food during cooking, and are composed of many sub-activities, which can happen at the same time or in various orders.
While the majority of recent works in action recognition focuses on developing better feature encoding techniques for classifying sub-activities in short video clips of a few seconds, this project moves forward and aims to develop a higher level representation of complex activities to overcome the limitations of current approaches. This includes the handling of large time variations and the ability to recognize and locate complex activities in videos. To this end, we aim to develop a unified model that provides detailed information about the activities and sub-activities in terms of time and spatial location, as well as involved pose motion, objects and their transformations.
Another aspect of the project is to learn a representation from videos that is not tied to a specific source of videos or limited to a specific application. Instead we aim to learn a representation that is invariant to a perspective change, e.g., from a third-person perspective to an egocentric perspective, and can be applied to various modalities like videos or depth data without the need of collecting massive training data for all modalities. In other words, we aim to learn the essence of activities.
Status
CLOSEDCall topic
ERC-StG-2015Update Date
27-04-2024
Images
No images available.
Geographical location(s)