In one of the more ambitious and enlightening projects in recent memory, designer, coder, and data scientist Matt Daniels decided to focus on the vocabulary of hip-hop artists using their first 35,000 words used on record as a means of being able to fairly compare longer careers like that of Jay Z with newer upstarts like Drake. In his own words, “35,000 words covers 3-5 studio albums and EPs. I included mixtapes if the artist was just short of the 35,000 words. Quite a few rappers don’t have enough official material to be included (e.g., Biggie, Kendrick Lamar). As a benchmark, I included data points for Shakespeare and Herman Melville, using the same approach (35,000 words across several plays for Shakespeare, first 35,000 of Moby Dick). I used a research methodology called token analysis to determine each artist’s vocabulary. Each word is counted once, so pimps, pimp, pimping, and pimpin are four unique words. To avoid issues with apostrophes (e.g., pimpin’ vs. pimpin), they’re removed from the dataset. It still isn’t perfect. Hip hop is full of slang that is hard to transcribe (e.g., shorty vs. shawty), compound words (e.g., king shit), featured vocalists, and repetitive choruses.”
Here are some of the main takeaways:
1. Aesop Rock is well-above every artist.
2. Wu-Tang Clan at #6 is impressive given that 10 members, with vastly different styles, are equally contributing lyrics.
3. U-God and GZA bolster the group’s average. Raekwon and Method Man’s contributions have a lower average compared to other members, but recognize that their data points would exceed most artists in hip-hop.
4. The south has the lowest average (4,268) and the east-coast the highest (4,804).
5. Snoop Dogg, 2pac, Kanye West, and Lil Wayne are all in the bottom 20 percent.
Read the entire analysis here.