Publications and Unpublished Papers

* denotes equal contribution and/or joint lead authorship.

This page includes relevant papers that may have been for classes, theses, or research. The left hand side indicates the organization, department, or lab it was written or published for.

2020

STS

Social Construction of Privacy: Reddit Case Study

Rajiv Sarvepalli

Abstract Prospectus

In the technological age of today, privacy becomes a more and more valuable commodity. With so many companies that live off the idea that information is money, it becomes increasingly concerning the amount of an individual’s information that is public. It is public in every sense of the word, not just to a group of people, but to the whole world. Consider the constant data scandals that plague our technological world. Whether it is Facebook, Google, or governments, someone is always getting caught selling, collecting, or losing data that many consider infringes on their privacy. Therefore, as stewards of these technologies, we must develop preemptive ways of protecting the privacy of the individual in an information-based world focused on the collective. The heterogeneous nature of society, especially with respect to privacy, makes the perspective vary greatly from person to person. This study shall focus on Reddit, an anonymous social media since individuals within anonymous social media communities tend to view anonymity as some form of privacy and therefore tend to care about in some manner about privacy. In order to understand the perspective and definitions of privacy, privacy needs to be analyzed in the context of a society.
VISLANG

Image-Caption Geolocation for Privacy

Rajiv Sarvepalli*, Nicholas Mohammad*, and Ramya Bhaskara*

Motivation Abstract PDF Demo Project

Image geolocation, classifying the location of an input image, is a difficult problem in computer vision with many applications. In recent years, large datasets of geotagged images have become readily available for researchers to use, and interest in the area has increased. Current state-of-the-art models like img2gps use deep image classification approaches in which the world is split into a quadtree and the model predicts which cell an input image resides in. Unlike these approaches which focus solely on vision, we propose to include not only visual data in our model, but also textual. To elaborate, our model will estimate geographic location with a multi-modal model, which leverages both an image classifier and text-based geolocation parser. Our results indicate that differentiation between geographically similar locations is improved by the use of hierarchical models, and that while a text parser can disambiguate explicit locations from text near perfectly, it is more challenging to disambiguate colloquial, misspelled, and more specific locations.

In recent years, “anonymous” social media sources have become increasingly popular, with the rise of sites like Reddit, where true identities are typically masked with usernames. With this comes the need to scan the content you post, making sure it doesn’t reveal anything about your identity. Our goal with this project is to create a tool that allows users to scan their desired images and text pairings to see whether it reveals too much personal information about themselves. The applications of such a tool can be expanded to all forms of social media, allowing users to control the amount of information they are sharing with the world.

2019

SRC

PERSONASCOPE: Defending Against Persona Abuse Attacks

Rajiv Sarvepalli, Yonghwi Kwon

Abstract Poster

The influence of advanced cyber attacks (e.g., Advanced Persistent Threat) is ever-increasing. Notoriously, these attacks frequently use stolen user credentials to frame innocent users for the attacker’s crimes. While attempts have been made to detect abnormalities in user behavior to identify real users behind explicit credentials (e.g., log-in user names), the current modeling of user behavior is limited. Simply detecting anomalies in user actions is insufficient; users are too dynamic and often change their behavior (as they change their roles). As a result, existing techniques have difficulty identifying the stolen user credential attacks (also known as persona abuse attack). To solve this problem, we develop an advanced user modeling technique that automatically synthesizes realistic user models to identify (ideally all) possible user models including those of attackers. Specifically, we formulate the problem as a searching problem for user models within the complete user model space which includes all the plausible users made from combinations of user actions (i.e., reading a file, printing, opening an application) based on states. A finite state machine is created to represent all the possible user actions based on states, and each traversal of the finite state machine for a possible starting state provides a user model in the user space. Later, real users in a real-world business environment can be fitted to a user model, and a change of user behavior can be precisely detected as a transition to a different model. Transitions between user models allow for dynamic modeling of users since a real user can then be defined as a set of user models. The categorization of users proposed above allows for accurate user identification enabling precise determination of user transitions. Anomaly detection of user transitions from one model to another will provide more precise detection of inappropriate user behaviors including malicious behaviors.

Publications and Unpublished Papers

* denotes equal contribution and/or joint lead authorship.

2020

Social Construction of Privacy: Reddit Case Study

Image-Caption Geolocation for Privacy

2019

PERSONASCOPE: Defending Against Persona Abuse Attacks