It’s summer and the topic of conversation is the fleeting nature of our college years. Two years done with just two more to go, it’s all anyone can think about. And though the reminiscing tends to be forlorn, it’s still helpful to reflect on successes and failures for the future.
Having been raised in one house, college was the biggest lifestyle shift teenage me had thus far seen, but any unease that caused was drowned out by excitement. I had learned that college-related questions from relatives and family friends were just excuses for older relatives to monologue about the ‘good ol’ days’ at XYZ University. This hadn’t been the case for high school where reviews were mixed. A gone-too-soon bliss for those who found their clique. A living nightmare for those who didn’t. In college, gone were the days of teenage awkwardness and constant pressure to fit in. New in town: maturity and independence. Also promised were meaningful friendships built to last.
But moving into my dorm breathing through an N-95 and having not been in a classroom in 6 months, the only thing promised was uncertainty. It took time to relearn how to socialize. Small talk seemed to fall to the floor and conversations failed to extend beyond hometowns and intended majors. It was hard to not be cynical about the development of friendships.
note from the literature: Teenage relationships moved online during the lockdown.1
"Over 12% of adolescents reported using social media more than 10 hours a day [during lockdown].” The increase in social media usage was found to be correlated with higher rates of depression. There was certainly a craving for in-person friendships; we just had to relearn talking and not typing.
1Ellis, W. E., Dumas, T. M., & Forbes, L. M. (2020). Physically isolated but socially connected: Psychological adjustment and stress among adolescents during the initial COVID-19 crisis. Canadian Journal of Behavioural Science, 52, 177– 187.
Two years later, though the masks aren’t fully off, the socialization rust is. Club meetings have locations instead of Zoom links. Professor’s office hours are actually in their offices. Even the dining halls are back to their usual overcrowded capacities.
I’m fortunate to have made countless friends, some of whom even pass the “wedding test” (whether we’re close enough that I could envision inviting them to my eventual wedding). I’ve never been in a better stance socially and for that I’m thankful.
Despite all that success, the trials and tribulations of young adult fraternizing raise interesting questions about what it takes to make a friendship. What are the initial signs of a strong connection? Why do we lose some friends? With the sudden exposure to a vast ocean of people, how quickly can we recognize a lifelong friendship?
Dataset Introduction
Two years of messages from an iMessage group chat containing 31 participants that I consider some of my closest friends in college. In total there are 96,467 messages spanning from the beginning of our freshman year to now. Each message is timestamped.
For a guide on how to pull your own iMessage data and clean it for analysis, check out my guide.
Relationship Metric
To measure relationships within the group over time, a relationship metric is calculated. The metric is intended to be a measure of the engagement between each pair of people in the group. The metric is calculated using a linear combination of the relative frequency of iMessage reactions between the two people and mentioning of one another.
Attempting to use sentiment analysis to classify positive and negative engagement found genuinely negative engagement to be very rare. Also, a decline in relationship strength between two people seems to cause a lack of enagement more than it does negative engagement. Thus, I chose to treat all engagement as positive engagement. This current metric is simple. Future project goals include incorporating unsupervised conversation disentanglement to allow for more hidden engagement to be included. For instance, I would want to measure how often person A participates in the same conversations as participant B.
The end result? Here is what two years of friendships (between 11 sample people) looks like:
From a graphical perspective, several storylines are recognizeable.
The Common Shapes of Friendship
A falling out. A friendship that picked up quickly and reached a peak far above the rest. Perhaps it was pre-destined to crash and burn like a meme stock artificially pumped by an army of Reddit users. Was there a particular fatal argument that could have been avoided? I honestly can’t remember.
Initially strong with a slow decline. Barring a late-stage uptick, never in a state of improvement. A failure by almost any measure. Or was it? Nobody is meant for everybody. At least this time we could move on quickly.
A slow success. Here’s one that took time to grow and hasn’t stopped rising. A success. These are the ones that validate all the agonizing small talk. But why so slow? Those first few months seem like such wasted time. Maybe, for better or worse, we’re just different people than we once were.
These few patterns are easily identifiable, but they raise the question: can we use data to predict sustained friendship on a larger scale?
Machine Learning Methodology
We can treat measuring the predictability of sustained friendship as a time series classification task. Specifically, I wanted to find out how the predictive accuracy changes as we consider data increasingly distant from the initial point of meeting. That is, I wanted to answer the question: how many months into a friendship can we accurately determine whether it will last?
Friendships were labeled as ‘strong’ or ‘weak’ based on the strength of the relationship at the two-year mark (now). A major issue with our data is sparsity. To combat this issue, classification was done using seven models trained on the mean, slope, and standard deviation of the intervals between months. This interval-based approach is based on Time Series Forest (TSF) models which perform comparatively well with sparse series.
The models' accuracies were calculated given more and more months of data (from one month to nine months).
From just the first four months of a friendship, we can predict its strength after two years with varying levels of accuracy between the models (~ 50% to 65%). Looking further at the first six months of friendship, the accuracy increases as the models converge to around 66%. By nine months the models have plateaued at around 68% with the support vector machine model performing the best (above 70%). Randomly guessing the success of a friendship would have an accuracy of 50% and perfect prediction would have 100%.
From the results, we can draw several conclusions, the most important of which is that friendship takes time to reveal itself.
Though an immediate connection may hasten the bonding process, a true connection requires chemistry at a depth of character that isn’t quickly revealed. The aspects of your character that dictate who you can form a bond with are largely intangible. More important than your sense of humor or niche academic interests, they are the core values that determine your actions and decisions.
note from the literature: Commmon interests may be overvalued.2
“Geographic proximity and race are greater determinants of social interaction than are common interests.” My familial global ties are eclectic and I can attest to having found more shared values among peers of similar backgrounds because of it. That’s not to say race is the sole determinant of one’s values, but it certainly plays a role. It’s why the commonalities I’ve found with diverse students are far more meaningful than academic and athletic interests.
2David Marmaros, Bruce Sacerdote. How Do Friendships Form?, The Quarterly Journal of Economics, Volume 121, Issue 1, February 2006, Pages 79–119.
Also important to note are the challenges added by personal growth. College exposes you to ideas and environments outside your comfort zone. Like a snowball rolled through fresh powder, you evolve with the novel challenges thrown at you. And though that development can be congruent with those in your circle, it can also push you in opposite directions. I’m proudly a much different person than the one that first took step on campus. I question whether I now would be friends with that past version of myself. And I definitely doubt his ability to pick the best friends for me now.
Even with time, friendship isn’t completely predictable. Random occurrences can bring otherwise unalike people together. Like with most things, luck plays a role. Many of my friends I met simply because they lived in my dorm, attended my classes, or joined my clubs.
note from the literature: Luck plays a role.2
“We find that long-term friendships grow from chance meetings and that small and random differences in proximity have a big impact on our circle of friends.”
2David Marmaros, Bruce Sacerdote. How Do Friendships Form?, The Quarterly Journal of Economics, Volume 121, Issue 1, February 2006, Pages 79–119.
So the optimal strategy? Cast a wide net.
In parameterizing the machine learning models, one has to make decisions based on the precision-recall trade-off. Models can be adjusted to classify a high number of positive cases correctly but that often means a high number of negative classes will be classified incorrectly. That forces the engineer to consider which mistake is more costly: incorrect positive classification or incorrect negative classification?
Consider a pandemic-related example. One COVID test identifies 83% of positive cases as positive, but also incorrectly classifies 40% of negative cases as positive. Another test identifies only 56% of positive cases as positive, but incorrectly classifies only 8% of negative cases. Which test would be preferable depends on whether false-positive or false-negative cases are more harmful. For COVID, the answer is typically false-negative because missing positive cases could lead to the virus spreading further.
In this case, we are forced to consider whether inaccurately identifying a strong friend or a weak friend is more problematic.
Coming into college, I would have argued for the former. With so little time and so many people, it seems important to quickly filter out people with whom you don’t recognize an immediate connection. However, our results show that initial chemistry is almost meaningless. With the importance of letting relationships develop, I can recognize the value in simply maximizing connections – in casting a wide net. Ask that kid sitting next to you about his cool sneakers. Join the clubs that match your interests and join the clubs that don’t. Be thankful for invites when your friends make plans. The future is unknowable so cherish every person that happens into your life.
The optimal machine learning model? One that classifies every person as passing the “wedding test”.