Slide01

Emojis have been called the world’s fastest growing language. Yet remarkably little research has been done on differences in emoji usage between cultures. Previous analysis has usually focused on aggregate differences between countries based on data from third party keyboard apps. Here, we analyze emojis in nearly 35,000 tweets in nine languages about a common topic: the Islamic holy month of Ramadan, which began this past weekend. In doing so, we see broad commonalities across languages but also noteworthy and meaningful differences.

We used the Twitter Search API to collect tweets mentioning #Ramadan or رمضان# from May 27-28, 2017, the first two days of Ramadan. Since Twitter automatically detects tweet language and allows for tweet search by language, we were able to obtain tweets from around the world in nine languages: English, Arabic, Urdu, Farsi, Indonesian, Turkish, French, German, and Spanish.

A universal language for a 🌍 celebration, with local variation as well

Previous research has found that the majority of emoji usage, in general, consists of faces and hearts. Our Ramadan dataset largely replicates this pattern. We also observe some interesting and novel findings when we look at the top 10 most commonly used emojis in each language (presented in descending order in the above graphic):

  • In four languages – English, Turkish, German, and Spanish – the ❤️ (red heart) is the most commonly used emoji. In four other languages, the red heart takes second place. (Farsi is the only language in which the red heart not first or second and shows up in fifth place.) The other most frequently occurring heart emojis are 😍 (heart eyes), which is in the top 10 for six languages, 💕 (two hearts), 💙 (blue heart), and 💚 (green heart).
  • 🌙 (crescent moon) is the most common emoji in Arabic, Urdu, and Farsi tweets, and is in second place for English and Turkish tweets, and is used in addition to the crescent moon Twitter automatically appends to the #Ramadan hashtag.
  •  🙏 (praying hands) is the top emoji in Indonesian and French, and is in the top three for English, Farsi, Turkish, German, and Spanish. Among Arabic and Urdu speaking Twitter users, 🙏  is rarely used (perhaps due to associations with non-Muslim prayer in these communities which makes Muslims reluctant to use it in the context of Ramadan).
  •  😭 (loudly crying face) appears in the top 10 emojis in tweets in English, French, and German, and is often used for lowkey complaining about the difficulties of fasting and evening prayers.
  • 💪 (flexed biceps) appears in the top 10 emojis in French and German tweets, usually in motivational tweets encouraging those observing Ramadan to remain strong and go hard (e.g. “Kraft an alle die grad Fasten ❤ Es ist schwer aber wir schaffen das 💪🏼 #Ramadan”, which roughly translates as “Strength to all those who are fasting, it’s hard but we can do this”, or “Très bon #Ramadan à tous les musulmans qui verront ce tweet, courage 💪🏻🔥”, which roughly translates as “A very good Ramadan to all Muslims who see this tweet, courage”)

The long tail of emoji usage

Slide02-ENGLISH

Slide03-ARABIC

One of the fascinating aspects of emoji data science is the degree to which a small set of emojis dominate emoji usage. We see this in our Ramadan dataset as well, where above, we plot the frequency of the top 20 emojis, separately for English and Arabic language tweets. We see the long tail at work, where in English language tweets, frequency falls steeply after the top three emojis (❤️, 🌙, & 🙏), while in Arabic, the top emoji (🌙) is used more than twice as often as the second most common emoji (❤️), after which usage for the remaining emojis level off.

We found that on average, across the nine languages in our analysis, for any given language, the top eight emojis are sufficient to account for 50% of all unique emojis used in the dataset. Even more strikingly, the top 32 emojis, on average, are sufficient to account for 80% of all unique emojis used in the dataset.

Here are the top 20 emojis for the remaining languages in our dataset:

Crescents versus Praying Hands

We can also compare the top 20 emojis used in English vs. Arabic tweets. We see that in both languages, hearts are the most frequently appearing emojis, comprising seven of the top 20 for both English and Arabic. However, the top 20 emojis in English include six faces and five hand gestures, while Arabic only has two faces and three hand gestures. Instead, Arabic includes five plant and flower emojis, which might indicate greater closeness to nature and less emphasis on face emojis used to represent one’s own feelings.

Is there a more robust way of comparing emoji usage between two mutually exclusive subsets of the Ramadan dataset? There is, indeed. Just as in our previous work, where we compared emoji usage in tweets mentioning Kanye West vs. Taylor Swift and Donald Trump vs. Hillary Clinton, we can compute odds ratios for each emoji between two different sets of languages and represent the results in a two dimensional ✈️.

PPT4_Western_VS_Eastern_Censored

In the above plot, for simplicity, we take the naive approach of grouping Western languages on the one hand – English, French, German, and Spanish – and non-Western languages on the other hand – Arabic, Farsi, Indonesian, Turkish, and Urdu. (Since the greatest sample of tweets is from English and Arabic, they exert a disproportionate amount of influence on the final plot.)

We find a strong bifurcation with 🙏 over indexing in Western languages on the right and 🌙 over indexing on non-Western languages on the left, with ❤️  smack in the center. We also see hearts and flowers disproportionately used in Ramadan tweets in non-Western languages. The emojis that lean towards Western languages – 💪, 🔥 (fire), 💯 (hundred points), and 😭 – might all be more common in American and European vernacular usage.

PPT5_prayinghands_vs_crescentmoon_bylanguage.png

Finally, we have one more way of comparing frequency of 🙏 and 🌙 across languages. Above, we see a pattern slightly more nuanced than the division of languages into Western and non-Western. French has the highest usage of 🙏, but it’s closely followed by Indonesian and Turkish. Meanwhile, Arabic and Farsi have minimal usage of 🙏 and Urdu has literally no usage of 🙏 at all in our admittedly modest sample of tweets about Ramadan. We also see 🌙 most heavily used in Arabic tweets, followed by Turkish tweets.

But does 🙏 here really mean praying hands?

The meaning of 🙏 remains highly contentious in certain parts of the internet, with vocal factions continuing to maintain that it actually means “please” or “thank you” and has only minimal religious connotations. A source no less authoritative than Emojipedia maintains to this day that the emoji represents “two hands placed firmly together, meaning please or thank you in Japanese culture.”

Regardless of how the emoji was originally intended to be used, in the context of tweets about Ramadan, 🙏 almost always refers to prayer or more abstractly to Islamic religious or spiritual practices.

Here are some illustrative examples from English language tweets in our dataset:

We also find tweets in Arabic with explicit Islamic prayers appended with 🙏, for example:

What about French, the language that uses 🙏 the most? We observe the same type of usage:

Final thoughts

The late Portuguese writer and Nobel laureate Jose Saramago has a piercing quote about the power of language in one of his books. While The Double was written in 2002, and hence predates emojis, Saramago’s insightful words seem deeply prescient in the modern age:

We have an odd relationship with words. We learn a few when we are small, throughout our lives we collect others through education, conversation, our contact with books, and yet, in comparison, there are only a tiny number about whose meaning, sense, and denotation we would have absolutely no doubts if, one day, we were to ask ourselves seriously what they meant. Thus we affirm and deny, thus we convince and are convinced, thus we argue, deduce, and conclude, wandering fearlessly over the surface of concepts about which we have only the vaguest of ideas, and, despite the false air of confidence that we generally affect as we feel our way along the road in the verbal darkness, we manage, more or less, to understand each other and even, sometimes, to find each other.

In comparison to words, emojis are on the one hand, more universal, yet on the other hand, infinitely more nuanced. Since we use them mostly in interpersonal communications (either when texting friends or while posting on social media), the prevalence of filter bubbles means we may never know what an emoji objectively means in the broader world outside of our personal social networks.

Unlike words, which we learn in university and from books and authoritative sources such as newspapers, we learn emojis in the real world. The risk of meaning being lost in translation is much greater but the intimacy is also greater as well since we ourselves give meaning to, and find meaning in, emojis.

Our analysis of tweets from the opening weekend of Ramadan in nine languages shows us that order can emerge from chaos when it comes to emoji usage. There is tremendous commonality in the emojis we use to talk about Ramadan, whether in Arabic, Urdu, Spanish, English, French, or the four other languages we evaluated.

Nevertheless, meaningful differences also emerge in emoji usage and better understanding these differences may be a major 🔑 that allows us to make sense of the effect of technology on communication, the evolution of language over time, and eventually, each other.

By Hamdan Azhar

 

* A NOTE ON METHODOLOGY: The above analysis relies on an older emoji unicode dictionary that doesn’t include some of the latest emojis that may also be found in Ramadan related tweets, namely 🕋 (kaaba), 🕌  (mosque), or 📿 (prayer beads). For more information on our general methodology, check out our recently released tutorial on Emoji Data Science in R. You can also watch my talk on PRISMOJI and emoji data science at CSV Conference in Portland last month here. If you’d like to contribute to open source emoji data science research, say hi at hello@prismoji.com 🙂