Most cliched adjectives and nouns
Photo © Tom Newby Photography. CC-BY-2.0
While playing Articulate with my parents over Christmas, I had to describe ‘immense’, and tried to do so by saying that you could say something was ‘of <blank> proportions’. This didn’t work, but it got me thinking about what adjectives are typically only used to describe one or two things, and, conversely, which nouns are typically described by only one or two adjectives.
Time to dust off the Google N-Gram data from a few years ago, and combine it with a parts-of-speech database.
Most cliched adjectives
For each adjective, I found the noun it was most commonly used in front of, and the percentage of uses of that adjective explained by a use before that noun. The adjectives with the highest such percentage are the ‘most cliched’.
Taking the twenty most cliched adjectives, after manually weeding out not-really adjectives, geographical phrases (like ‘Saudi Arabia’), I found the most cliched adjective is:
|91·5%||of the time||stainless||is used, it’s to describe||steel|
(This is an approximate statement. A more accurate one would be: 91·5% of the time that ‘stainless’ precedes a noun, that noun is ‘steel’. But the above gets the point across.)
The full top twenty list is:
‘Untitled document’ is pretty good, as is ‘Grateful Dead’ and ‘Looney Tunes’.
Most cliched nouns
In a similar way, I examined each noun and found the adjective most commonly used to describe it, and the percentage of occurrences of that noun which were paired with that adjective. With this measure, the most cliched noun is:
|97·4%||of the time||annotation||is used, it’s described as||functional|
(Again, this is an approximate description. A more accurate one would be: 97·4% of the time ‘annotation’ follows an adjective, that adjective is ‘functional’.)
Giving the list as <adjective> <noun>, the full top twenty list is:
Funny to see two British metal bands in there. I think the presence of ‘super saver’ and ‘other shoppers’ might indicate a large contribution to the corpus from commerce sites.
What about ‘immense’?
But if I had to try to get someone to say ‘immense’ by giving a noun it’s commonly used before, what would that noun be? There are two ways you could answer that. For all nouns, find the fraction of times it’s preceded by ‘immense’ and pick the highest. Or, for all nouns find the rank of ‘immense’ in the list of adjectives it’s preceded by, and choose the highest (i.e., numerically smallest) rank. These aren’t necessarily the same thing, but it turns out they are, and the best you can do is:
|2·0%||of the time||multitude||is used, it’s described as||immense|
And ‘immense’ is the sixth-most common adjective used to describe a multitude; the top ten being:
An example adjective where ‘best rank’ and ‘highest fraction’ do not lead to the same noun is ‘striking’:
striking contrast — the highest rank ‘striking’ reaches is when describing ‘contrast’, when it’s the sixth-most common adjective for ‘contrast’ and accounts for 2·1% of descriptions of ‘contrast’.
striking similarity — the highest fraction ‘striking’ reaches is when describing ‘similarity’, when it accounts for 2·8% of descriptions of ‘similarity’, and ranks seventh amongst adjectives for ‘similarity’.
And somebody with knowledge of the Google N-Gram data, when given the clue ‘<blank> proportions’?
|6·9%||of the time||proportions||is used, they’re described as||epic|
It wasn’t a very good clue for ‘immense’ at all.
Representativeness of the data
The data-set shows its origins as a corpus drawn from the web, in that there are several amusing features of the results which sadly are not suitable for a general readership. Even though I don’t really have a readership of any sort, I therefore omit them. Sigh.
Most common doubled word
A couple of years, a related question came up: What is the most common doubled word? Filtering for unsuitable content, the winners turned out to be:
|1.||blah blah||7.||no no|
|2.||had had||8.||really really|
|3.||very very||9.||much much|
|4.||ha ha||10.||long long|
|5.||la la||11.||Duran Duran|
|6.||big big||12.||etc etc|
A good showing from band names altogether in that data-set.