Research suggests that public fear and anger in wake of a terror attack can each uniquely contribute to policy attitudes and risk-avoidance behaviors. Given the importance of these negative-valanced emotions, there is value in studying how terror events can incite fear and anger at various times and locations relative to an attack. We analyze 36,496 Twitter posts, including re-Tweets, authored in response to the 2016 Orlando nightclub shooting and examined how fear- and anger-related language varied with time and distance from the attack. Fear-related words sharply decreased over time, though the trend was strongest at locations near the attack, while anger-related words slightly decreased over time and increased with distance from Orlando. Comparing these results to users? pre-attack emotional language suggested that distant users remained both angry and fearful after the shooting, while users close to the attack remained angry but quickly reduced expressions of fear to pre-attack levels.
In the landscape of online social networking sites, many platforms are reaching a scale and longevity that require designers to address the post-mortem data interactions that follow people's deaths. To evaluate the experiences and challenges people face when caring for memorialized profiles, we conducted 28 qualitative interviews with people serving as legacy contacts for memorialized Facebook accounts. We report on who legacy contacts are, their practices and their expectations, and find that people were chosen to be a legacy contact for one major reason: trust. In our analysis we find disconnects between how people understand trust in the context of interpersonal relationships and how trust is technically implemented. We conclude by discussing the persistent challenges of representing the ambiguity of interpersonal trust in impersonal, computational systems.
In recent years, streaming platforms for video games have seen increasingly large interest, as so-called "esports" have developed into a lucrative branch of business. Like for other sports, watching esports has become a new kind of entertainment medium, which is possible due to platforms that allow gamers to live stream their gameplay, the most popular platform being Twitch.tv. On these platforms, users can comment on streams in real-time and thereby express their opinion about the events in the stream. Due to the popularity of Twitch.tv, this can be a valuable source of feedback for streamers aiming to improve their reception in a gaming-oriented audience. In this work, we explore the possibility of deriving feedback for video streams on Twitch.tv by analyzing the sentiment of live text comments made by stream viewers in highly active channels. Automatic sentiment analysis on these comments is a challenging task, as one can compare the language used in Twitch.tv with that used by an audience in a stadium, shouting as loud as possible in sometimes non-organized ways. This language is very different from common English, mixing Internet slang and gaming-related language with abbreviations, intentional and unintentional grammatical and orthographic mistakes as well as emoji-like images called emotes. Classic lexicon based sentiment analysis techniques therefore fail when applied to Twitch comments. In order to overcome the challenge posed by the non-standard language, we propose two unsupervised lexicon based approaches that make heavy use of the information encoded in emotes, as well as a weakly supervised neural network based classifier trained on the lexicon based outputs, that is supposed to help generalization to unknown words by use of domain-specific word embeddings. To enable better understanding of Twitch.tv comments, we analyze a large dataset of comments, uncovering specific properties of their language and provide a smaller set of comments labeled with sentiment information by crowd sourcing. We present two case studies showing the effectiveness of our methods in generating sentiment trajectories for events live-streamed on Twitch.tv that correlate well with specific topics in the given stream. This allows for a new kind of implicit real-time feedback gathering for Twitch streamers and companies producing games or streaming content on Twitch. We make our datasets as well as our code publicly available for further research.
Geolocating Twitter users---the task of identifying their home locations---serves a wide range of community and business applications such as managing natural crises, journalism, and public health. Many approaches have been proposed for automatically geolocating users based on their tweets; at the same time, various evaluation metrics have been proposed to measure the effectiveness of these approaches, making it challenging to understand which of these metrics is the most suitable for this task. In this paper, we propose a guide for a standardized evaluation of Twitter user geolocation by analyzing fifteen models and two baselines in a controlled experimental setting. Models are evaluated using ten metrics over four geographic granularities. We use rank correlations to assess the effectiveness of these metrics. Our results demonstrate that the choice of effectiveness metric can have a substantial impact on the conclusions drawn from a geolocation system experiment, potentially leading experimenters to contradictory results about relative effectiveness. We show that for general evaluations, a range of performance metrics should be reported, to ensure that a complete picture of system effectiveness is conveyed. Given the global geographic coverage of this task, we specifically recommend evaluation at micro versus macro levels to measure the impact of the bias in distribution over locations. Although a lot of complex geolocation algorithms have been applied in recent years, a majority class baseline is still competitive at coarse geographic granularity. We propose a suite of statistical analysis tests, based on the employed metric, to ensure that the results are not coincidental.
Online discussions help individuals to gather knowledge and make important decisions in diverse areas from health and finance to computing and data science. Online discussion groups exhibit unique group dynamics not found in traditional small groups, such as staggered participation and asynchronous communication, and the effects of these features on knowledge sharing is not well-understood. In this paper we focus on one such aspect: wide variation in group size. Using a controlled experiment with a hidden profile task we evaluate online discussion groups' capacity to share distributed knowledge when group size ranges from 4 to 32 participants. We found that individuals in medium-sized discussions performed the best, and we suggest that this represents a tradeoff in which larger groups tend to share more facts, but have more difficulty than smaller groups at resolving misunderstandings.
Current theories struggle to explain how participants in peer-production self-organize to produce high-quality knowledge in the absence of formal coordination mechanisms. The literature traditionally holds that norms, policies, and roles make coordination possible. However, peer-production is largely free from workflow constraints and most peer-production communities do not allocate or assign tasks. Yet, scholars have suggested that ordered work sequences can emerge in such settings. We refer to sequences of activities that emerge organically as components of ?emergent routines?. The volunteer nature of peer-production, coupled with high degrees of turnover, makes learning and coordination difficult, calling into question the extent to which emergent routines could be ingrained in the community. The objective of this paper is to characterize the work sequences that organically emerge in peer-production, as well as to understand the temporal dynamics of these emergent routine components. We center our empirical investigation on the peer-production of a set of 1,000 Wikipedia articles. Using a dataset of labelled wiki work, we employ Variable-Length Markov Chains (VLMC) to identify sequences of activities exhibiting structural dependence, cluster the sequences to identify components of emergent routines, and then track their prevalence over time. We find that work is organized according to several routine components and that the prevalence of these components changes over time.
This article examines the principles outlined in the General Data Protection Regulation (GDPR) in the context of social network data. We provide both a practical guide to GDPR-compliant social network data processing, covering aspects such as data collection, consent, anonymization and data analysis, and a broader discussion of the problems emerging when the general principles on which the regulation is based are instantiated to this research area.
Using fitness trackers to generate and collect quantifiable data is a widespread practice aimed at better understanding one?s health and body. The intentional design of fitness trackers as genderless or universal is predicated on masculinist design values and assumptions that do not result in "neutral" devices and systems. Instead, ignoring gender in the design of fitness tracking devices marks a dangerous ongoing inattention to the needs, desires, and experiences of women, as well as transgender and gender non-conforming persons. We utilize duoethnography, a methodology emphasizing personal narrative and dialogue, as a tool that promotes feminist reflexivity in the design and study of fitness tracking technologies. Using the Jawbone UP3 as our object of study, we present findings that illustrate the gendered physical and interface design features and discuss how these features reproduce narrow understandings of gender, health, and lived experiences.
The increasing size of data sets with which researchers in a variety of domains are confronted has led to a range of creative responses, including the deployment of modern machine learning techniques and the advent of large scale ?citizen science projects?. However, the ability of the latter to provide suitably large training sets for the former is stretched as the size of the problem (and competition for attention amongst projects) grows. We explore the application of unsupervised learning to leverage structure that exists in an initially unlabelled data set. We simulate grouping similar points before presenting those groups to volunteers to label. Citizen science labelling of grouped data is more efficient, and the gathered labels can be used to improve efficiency further for labelling future data. To demonstrate these ideas we perform experiments using data from the Pan-STARRS Survey for Transients (PSST) with volunteer labels gathered by the Zooniverse project, Supernova Hunters and a simulated project using the MNIST handwritten digit data set. Our results show that, in the best case, we might expect to reduce the required volunteer effort by 87.0% and 92.8%f or the two data sets respectively. These results illustrate a symbiotic relationship between machine learning and citizen scientists where each empowers the other with important implications for the design of citizen science projects in the future.
Social isolation has been identified as a major risk in elderly people living alone because of their association with cognitive decline, depression and other mental health related issues. Ambient Assisted Living (AAL) is identified as a key technology to facilitate independent living and maintain social connnectedness between elderly, their families and caregivers. AAL combines Internet of Things (IoT), Smart Homes, and machine learning to produce a smart solution that encourages independent, safe and socially active life for elderly people within their own home. In this paper, we propose, develop, implement and validate a novel Internet of Things-based solution that uses passive (i.e. non-obstructive methods) sensing for real-time monitoring of elderly in their homes. The significance of the proposed solution is in the use of machine learning and statistical models to automatically build a personalised model by learning normal behavioural pattern for the person from deployed sensors in the house. It then uses this model to detect significant changes in behavioural pattern should they occur that could be a consequence of possible health deterioration. We evaluate the performance of the proposed solution via real-world in-home trials installed in six elderly people's home for a period from 1.5 to 4 months. A discussion and analysis of the in-home trial outcomes and feedback from elderly who participated in the trials conclude the paper.
Virtual teams that use integrated communication platforms are ubiquitous in cross-border collaboration. This study explored the use of communication media and team outcomes ? both social outcomes and task accomplishment ? in multilingual virtual teams. Based on surveys from 96 virtual teams (with 578 team members), the research showed that more time spent in rich communication channels, such as online conferences, increased inclusion and satisfaction, whereas more time spent with written communication that is lower in richness increased the level of task accomplishment. Team members with lower language proficiency felt less included in all collaboration channels, whereas team members with higher language proficiency felt less satisfied with lean collaboration. Also, limited language proficiency speakers were significantly less likely to view rich tools as helpful for their teams to reach a mutual decision. Our data supports media richness theory in its original context for native and highly proficient English speakers. Our study extends the scope of the theory by applying it to the new context of team members with limited language proficiency. Management should implement a collaboration infrastructure consisting of communication platforms that integrate a variety of media to account for different tasks and different communication needs.