Computer Science > Social and Information Networks
[Submitted on 11 Sep 2017]
Title:Social Media Text Processing and Semantic Analysis for Smart Cities
View PDFAbstract:With the rise of Social Media, people obtain and share information almost instantly on a 24/7 basis. Many research areas have tried to gain valuable insights from these large volumes of freely available user generated content.
With the goal of extracting knowledge from social media streams that might be useful in the context of intelligent transportation systems and smart cities, we designed and developed a framework that provides functionalities for parallel collection of geo-located tweets from multiple pre-defined bounding boxes (cities or regions), including filtering of non-complying tweets, text pre-processing for Portuguese and English language, topic modeling, and transportation-specific text classifiers, as well as, aggregation and data visualization.
We performed an exploratory data analysis of geo-located tweets in 5 different cities: Rio de Janeiro, São Paulo, New York City, London and Melbourne, comprising a total of more than 43 million tweets in a period of 3 months. Furthermore, we performed a large scale topic modelling comparison between Rio de Janeiro and São Paulo. Interestingly, most of the topics are shared between both cities which despite being in the same country are considered very different regarding population, economy and lifestyle.
We take advantage of recent developments in word embeddings and train such representations from the collections of geo-located tweets. We then use a combination of bag-of-embeddings and traditional bag-of-words to train travel-related classifiers in both Portuguese and English to filter travel-related content from non-related. We created specific gold-standard data to perform empirical evaluation of the resulting classifiers. Results are in line with research work in other application areas by showing the robustness of using word embeddings to learn word similarities that bag-of-words is not able to capture.
Submission history
From: João Pereira Sr. [view email][v1] Mon, 11 Sep 2017 14:30:35 UTC (8,207 KB)
Current browse context:
cs.SI
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.