Research on social mining relies on massive data sets of digital traces of human activities. Many big data sets are already available at the proposer’s labs (see Section 2.2.2 for a detailed table), including call graphs from mobile phone call data, networks crawled from many online social networks, including Facebook and Flickr, transaction micro-data from diverse retailers, query logs both from search engines and e-commerce, society-wide mobile phone call data records, GPS tracks from personal navigation devices, survey data about customer satisfaction or market research, billions of tweets, and data from location-aware social networks. The partners will make such data available for collaborative research, by adopting various strategies, ranging from sharing the open data sets with the scientific community at large, to sharing the data with disclosure restriction within the consortium, also on a bilateral basis, or allowing data access within secure environments at each local installation.
In addition, the consortium will develop methods and ad-hoc campaigns for logging and collecting the digital footprints of multiple social dimensions generated by humans, when interacting in the physical environment by exploiting pervasive sensors. These will take the form of a wide variety of smart devices spread in the environment such as RFID readers, sensors and actuators, and sensor-rich smart phones as well as crowd-sensing campaigns leveraging participation of users as data producers. The access under VA and TA offered in SoBigData will concern both existing and newly collected datasets. The access trough VA will be granted for all those datasets whose policies allow open diffusion; conversely, for all the data set whose access is restricted due to licensing restrictions, access will be provided only through TA. Moreover, for some datasets, in order to avoid Term of Usage (ToS) infringements, access will be offered only through Virtual Data Collections (see T8.1) both in VA and TA. All the results of this WP will be served as input to T10.1 and integrated into the SoBigData RI.
T8.1 Data Management and Integration Plan. The purpose of the data management plan is to define policies that guide the partners in the collection, description, preservation and sharing of their data sets for virtual and transnational access.
T8.2: Participatory & Opportunistic Crowd Sensing. This task is focused on implementing the paradigm of crowd-sensing in these two distinct types of sensing. Regarding the opportunistic version, the infrastructure will offer methodologies for sensing the crowd during ongoing events.
T8.3: Analytical Crawling. This task develops and provisions an adaptable integrated Web crawler, which is easy to use and supports the special requirements of data scientists.
T8.4 Integrating Open Data through the Living Archive platform. Living Archive is a search engine for Open Data and has a community-run catalogue of interesting open datasets available on the web.