* denotes equal contribution and/or joint lead authorship.
This page includes relevant papers that may have been for classes, theses, or research. The left hand side indicates the organization, department, or lab it was written or published for.
In the technological age of today, privacy becomes a more and more valuable commodity. With so many companies that live off the idea that information is money, it becomes
increasingly concerning the amount of an individual’s information that is public. It is public
in every sense of the word, not just to a group of people, but to the whole world. Consider the
constant data scandals that plague our technological world. Whether it is Facebook, Google,
or governments, someone is always getting caught selling, collecting, or losing data that many
consider infringes on their privacy. Therefore, as stewards of these technologies, we must
develop preemptive ways of protecting the privacy of the individual in an information-based
world focused on the collective. The heterogeneous nature of society, especially with respect
to privacy, makes the perspective vary greatly from person to person. This study shall focus
on Reddit, an anonymous social media since individuals within anonymous social media
communities tend to view anonymity as some form of privacy and therefore tend to care about
in some manner about privacy. In order to understand the perspective and definitions of
privacy, privacy needs to be analyzed in the context of a society.
Image geolocation, classifying the location of an input image, is a difficult problem in computer vision with many applications. In recent years, large datasets of geotagged images have become readily available for researchers to use, and interest in the area has increased. Current state-of-the-art models like img2gps use deep image classification approaches in which the world is split into a quadtree and the model predicts which cell an input image resides in. Unlike these approaches which focus solely on vision, we propose to include not only visual data in our model, but also textual. To elaborate, our model will estimate geographic location with a multi-modal model, which leverages both an image classifier and text-based geolocation parser. Our results indicate that differentiation between geographically similar locations is improved by the use of hierarchical models, and that while a text parser can disambiguate explicit locations from text near perfectly, it is more challenging to disambiguate colloquial, misspelled, and more specific locations.
In recent years, “anonymous” social media sources have become increasingly popular, with the rise of sites like Reddit, where true identities are typically masked with usernames. With this comes the need to scan the content you post, making sure it doesn’t reveal anything about your identity. Our goal with this project is to create a tool that allows users to scan their desired images and text pairings to see whether it reveals too much personal information about themselves. The applications of such a tool can be expanded to all forms of social media, allowing users to control the amount of information they are sharing with the world.
The influence of advanced cyber attacks (e.g., Advanced Persistent Threat) is ever-increasing. Notoriously, these attacks frequently use stolen user credentials to frame innocent users for the attacker’s crimes. While attempts have been made to detect abnormalities in user behavior to identify real users behind explicit credentials (e.g., log-in user names), the current modeling of user behavior is limited. Simply detecting anomalies in user actions is insufficient; users are too dynamic and often change their behavior (as they change their roles). As a result, existing techniques have difficulty identifying the stolen user credential attacks (also known as persona abuse attack).
To solve this problem, we develop an advanced user modeling technique that automatically synthesizes realistic user models to identify (ideally all) possible user models including those of attackers. Specifically, we formulate the problem as a searching problem for user models within the complete user model space which includes all the plausible users made from combinations of user actions (i.e., reading a file, printing, opening an application) based on states. A finite state machine is created to represent all the possible user actions based on states, and each traversal of the finite state machine for a possible starting state provides a user model in the user space.
Later, real users in a real-world business environment can be fitted to a user model, and a change of user behavior can be precisely detected as a transition to a different model. Transitions between user models allow for dynamic modeling of users since a real user can then be defined as a set of user models. The categorization of users proposed above allows for accurate user identification enabling precise determination of user transitions. Anomaly detection of user transitions from one model to another will provide more precise detection of inappropriate user behaviors including malicious behaviors.