Analyzing Social Media for Improving Transportation Safety

Source Organization:
University:Carnegie Mellon University
Principal Investigator:Feng Chen and Ramayya Krishnan
PI Contact
Project Manager:Courtney Ehrlichman
Funding Source(s) and Amounts Provided (by each agency or organization):
Total Dollars:$149,354
Agency ID/Contract/Grant Number:
Start and End Dates:01/2013-12/2013
Project Status:Complete
Subject Categories:
Abstract:The goal of this project is to develop an online intelligent system that automatically monitors and collects timely and comprehensive information from social media (e.g., blogs, online forums, and twitter) about the current status of the transportation network and traffic flow to support advanced safety enhancement.

Our proposed approach is composed of five major components: 1) Public Safety Data Extraction. We plan to build a classifier (e.g., SVM) to automatically identify transportation-safety related posts on local social media platforms covering the area of interest. However, it is computationally expensive to train a classifier for social media, because of the short length and large volume of the messages, as well as the non-standard abbreviations. It is much cheaper to collect labels for news articles (e.g., national transportation safety board), so transfer learning techniques can be applied to build the classifier without the direct labeling of social media. 2) Heterogeneous Safety Data Modeling. Social media is heterogeneous by nature and has a variety of both entity types (e.g. user, post, hashtag, term, link, mention, location, and time) and relationships (e.g. originator, reply, friendship, and followership). To model this very complex data structure, we plan to build a heterogeneous network model for the safety data. 3) Transportation Safety Topics: Discovery. Transportation safety could include many different topics, such as road blockage or damage due to heavy snows or floods, missing people swept away by a flood, the malfunctioning of traffic lights, traffic incidents, and drunk driving to name but a few. In addition, topics may relate to different geographic locations and time periods. We propose to design a customized spatiotemporal topic model specifically for transportation safety applications. 4) Bias Estimation Using Traditional Traffic Sensor Data. Social media could potentially be a biased sample, and it is important to estimate this bias by cross-validation using traditional transportation census data, such as loop detector and camera data, incident reports, and transportation surveys. 5) User Interface and High Level Applications. These will include a regional sentiment index, safety alarms, and safety recommendations.
Describe Implementation of Research Outcomes (or why not implemented):
Impacts/Benefits of Implementation (actual, not anticipated):
Project URL: