Most users on social media have intrinsic characteristics, such as interests and political views, that can be exploited to identify and track them, thus raising privacy and identity concerns in online communities. In this paper we investigate the problem of user identity linkage on two behavior datasets collected from different experiments. Specifically, we focus on user linkage based on users' interaction behaviors with respect to content topics. We propose an embedding method to model a topic as a vector in a latent space so as to interpret its deep semantics. Then a user is modeled as a vector based on his or her interactions with topics. The embedding representations of topics are learned by optimizing the joint-objective: the compatibility between topics with similar semantics, the discriminative abilities of topics to distinguish identities, and the consistency of the same user's characteristics from two datasets. The effectiveness of our method is verified on real-life datasets and the results show that it outperforms related methods. We also analyze failure cases in the application of our identity linkage method. Our analysis shows that factors such as the visibility and variance of user behaviors and user's group psychology can result in mis-linkages. We also analyze the details of the behaviors of some representative users to understand the essential reasons for their identity being mis-linked. We find that these users have high variance level in their behaviors. According to the above experimental results, we introduce a confidence score into identity linkage to provide information about the accuracy of the method results.
Today's online question and answer (Q&A) services are receiving a large volume of questions. It becomes increasingly challenging to motivate domain experts to provide quick and high-quality answers. Recent systems seek to engage real-world experts by allowing them to set a price on their answers. This leads to a ``targeted'' Q&A model where users to ask questions to a target expert by paying the corresponding price. In this paper, we perform a case study on two emerging targeted Q&A systems Fenda (China) and Whale (US) to understand how monetary incentives affect user behavior. By analyzing a large dataset of 220K questions (worth 1 million USD), we find that payments indeed enable quick answers from experts, but also drive certain users to game the system for profits. In addition, this model requires users (experts) to proactively adjust their price to make profits. People who are unwilling to lower their prices are likely to hurt their income and engagement over time.
Targeted social media advertising based on psychometric user profiling has emerged as an effective way of reaching individuals who are predisposed to accept and be persuaded by the advertising message. This paper argues that in the case of political advertising, this may present a democratic and ethical challenge. Hypertargeting methods such as psychometrics can ?crowd out? political communication with opposing views due to individual attention and time limitations, creating inequities in the access to information essential for voting decisions. The use of psychometrics also appears to have been used to spread both information and misinformation through social media in recent elections in the U.S. and Europe. This paper is an applied ethics study of these methods in the context of democratic processes and compared to purely commercial situations. The ethical approach is based on the theoretical, contractarian work of John Rawls which serves as a lens through which the author examines whether the rights of individuals, as Rawls attributes them, are violated by this practice. The paper concludes that within a Rawlsian framework, use of psychometrics in commercial advertising on social media platforms, though not immune to criticism, is not necessarily unethical. In a democracy, however, the individual cannot abandon the consumption of political information, and since using psychometrics in political campaigning makes access to such information unequal, it violates Rawlsian ethics and should be regulated.