Data processing algorithms

Summary
The work within this deliverable will start by defining in detail the Big Data processing functionality required by the platform to support users Requirements will be defined for three broad categories of functionality data management data mining and realtime stream processing The partners will identify all the sources of data both internal and external that will be handled by the platform For internal data sources the partners will specify the expected data characteristics format data rate and transfer method eg streaming or batch uploads For external data sources the partners will also specify volume and access method Expected future data sources will also be considered Once the data sources have been identified and described in technical detail the partners will specify the types of storage methods and tools employed for storing and processing this data distributed file systems keyvalue databases document databases etc the types of preprocessing performed on data from each source and the interface between data management layer and the higher services layer including the types of queries and aggregations supported by the data management layerThe partners will identify the type of machine learning and data mining tasks involved in producing the secondary data to be provided to the various stakeholders Also the relevant data sources and define appropriate features for each data mining task Classifications labels on training datasets will be provided for supervised learning tasks The partners will address issues such as concept drift and will define clear and realistic performance goals such as accuracy and recall for each data mining taskThe partners will specify the realtime alerts to be detected by the platform Exact performance goals will be set for each alert including false positive and false negative rates and detection latency The relevant data sources will be specified and triggering conditions and thresholds will be set for each alert The partners will consider employing machine learning when conditions or threshold for triggering an alert are not clear