Most cliched adjectives and nouns

Photo © Tom Newby Photography. CC-BY-2.0

While playing Articulate with my parents over Christmas, I had to describe ‘immense’, and tried to do so by saying that you could say something was ‘of <blank> proportions’. This didn’t work, but it got me thinking about what adjectives are typically only used to describe one or two things, and, conversely, which nouns are typically described by only one or two adjectives.

Time to dust off the Google N-Gram data from a few years ago, and combine it with a parts-of-speech database.

Most cliched adjectives

For each adjective, I found the noun it was most commonly used in front of, and the percentage of uses of that adjective explained by a use before that noun. The adjectives with the highest such percentage are the ‘most cliched’.

Taking the twenty most cliched adjectives, after manually weeding out not-really adjectives, geographical phrases (like ‘Saudi Arabia’), I found the most cliched adjective is:

91·5% of the time stainless is used, it’s to describe steel

(This is an approximate statement. A more accurate one would be: 91·5% of the time that ‘stainless’ precedes a noun, that noun is ‘steel’. But the above gets the point across.)

The full top twenty list is:

91·5% stainless steel 76·5% sporting goods
89·4% objectionable content 75·9% stained glass
87·0% wrought iron 75·1% motley fool
84·7% typographical errors 74·8% designated trademarks
84·0% elapsed time 74·5% Grateful Dead
82·6% martial arts 73·1% respective owners
81·9% supreme court 72·6% vice president
81·1% movable type 72·1% deviant comments
79·3% untitled document 71·3% Looney Tunes
76·9% breaking news 70·9% nervous system

‘Untitled document’ is pretty good, as is ‘Grateful Dead’ and ‘Looney Tunes’.

Most cliched nouns

In a similar way, I examined each noun and found the adjective most commonly used to describe it, and the percentage of occurrences of that noun which were paired with that adjective. With this measure, the most cliched noun is:

97·4% of the time annotation is used, it’s described as functional

(Again, this is an approximate description. A more accurate one would be: 97·4% of the time ‘annotation’ follows an adjective, that adjective is ‘functional’.)

Giving the list as <adjective> <noun>, the full top twenty list is:

97·4% functional annotation 87·7% other shoppers
97·0% real estate 86·5% multiple sclerosis
96·8% creative commons 86·3% hot tub
91·6% global warming 85·6% simple syndication
91·4% registered trademark 85·5% due diligence
91·0% super saver 84·7% grand theft
90·8% free counters 84·2% Iron Maiden
89·1% planned parenthood 82·8% remote sensing
88·5% used textbooks 82·3% Black Sabbath
88·1% national aeronautics 81·7% self catering

Funny to see two British metal bands in there. I think the presence of ‘super saver’ and ‘other shoppers’ might indicate a large contribution to the corpus from commerce sites.

What about ‘immense’?

But if I had to try to get someone to say ‘immense’ by giving a noun it’s commonly used before, what would that noun be? There are two ways you could answer that. For all nouns, find the fraction of times it’s preceded by ‘immense’ and pick the highest. Or, for all nouns find the rank of ‘immense’ in the list of adjectives it’s preceded by, and choose the highest (i.e., numerically smallest) rank. These aren’t necessarily the same thing, but it turns out they are, and the best you can do is:

2·0% of the time multitude is used, it’s described as immense

And ‘immense’ is the sixth-most common adjective used to describe a multitude; the top ten being:

40·4% great 2·0% immense
11·3% whole 1·3% countless
8·2% vast 1·3% infinite
5·1% mixed 1·0% any
2·6% assembled 1·0% large

An example adjective where ‘best rank’ and ‘highest fraction’ do not lead to the same noun is ‘striking’:

striking contrast — the highest rank ‘striking’ reaches is when describing ‘contrast’, when it’s the sixth-most common adjective for ‘contrast’ and accounts for 2·1% of descriptions of ‘contrast’.

striking similarity — the highest fraction ‘striking’ reaches is when describing ‘similarity’, when it accounts for 2·8% of descriptions of ‘similarity’, and ranks seventh amongst adjectives for ‘similarity’.

And ‘proportions’?

And somebody with knowledge of the Google N-Gram data, when given the clue ‘<blank> proportions’?

6·9% of the time proportions is used, they’re described as epic

It wasn’t a very good clue for ‘immense’ at all.

Representativeness of the data

The data-set shows its origins as a corpus drawn from the web, in that there are several amusing features of the results which sadly are not suitable for a general readership. Even though I don’t really have a readership of any sort, I therefore omit them. Sigh.

Most common doubled word

A couple of years, a related question came up: What is the most common doubled word? Filtering for unsuitable content, the winners turned out to be:

1. blah blah 7. no no
2. had had 8. really really
3. very very 9. much much
4. ha ha 10. long long
5. la la 11. Duran Duran
6. big big 12. etc etc

A good showing from band names altogether in that data-set.