Most cliched adjectives and nouns

Posted on February 6, 2013February 19, 2020 by Ben North

Photo © Tom Newby Photography. CC-BY-2.0

While playing Articulate with my parents over Christmas, I had to describe ‘immense’, and tried to do so by saying that you could say something was ‘of <blank> proportions’. This didn’t work, but it got me thinking about what adjectives are typically only used to describe one or two things, and, conversely, which nouns are typically described by only one or two adjectives.

Time to dust off the Google N-Gram data from a few years ago, and combine it with a parts-of-speech database.

Most cliched adjectives

For each adjective, I found the noun it was most commonly used in front of, and the percentage of uses of that adjective explained by a use before that noun. The adjectives with the highest such percentage are the ‘most cliched’.

Taking the twenty most cliched adjectives, after manually weeding out not-really adjectives, geographical phrases (like ‘Saudi Arabia’), I found the most cliched adjective is:

91·5%	of the time	stainless	is used, it’s to describe	steel

(This is an approximate statement. A more accurate one would be: 91·5% of the time that ‘stainless’ precedes a noun, that noun is ‘steel’. But the above gets the point across.)

The full top twenty list is:

91·5%	stainless	steel	76·5%	sporting	goods
89·4%	objectionable	content	75·9%	stained	glass
87·0%	wrought	iron	75·1%	motley	fool
84·7%	typographical	errors	74·8%	designated	trademarks
84·0%	elapsed	time	74·5%	Grateful	Dead
82·6%	martial	arts	73·1%	respective	owners
81·9%	supreme	court	72·6%	vice	president
81·1%	movable	type	72·1%	deviant	comments
79·3%	untitled	document	71·3%	Looney	Tunes
76·9%	breaking	news	70·9%	nervous	system

‘Untitled document’ is pretty good, as is ‘Grateful Dead’ and ‘Looney Tunes’.

Most cliched nouns

In a similar way, I examined each noun and found the adjective most commonly used to describe it, and the percentage of occurrences of that noun which were paired with that adjective. With this measure, the most cliched noun is:

97·4%	of the time	annotation	is used, it’s described as	functional

(Again, this is an approximate description. A more accurate one would be: 97·4% of the time ‘annotation’ follows an adjective, that adjective is ‘functional’.)

Giving the list as <adjective> <noun>, the full top twenty list is:

97·4%	functional	annotation	87·7%	other	shoppers
97·0%	real	estate	86·5%	multiple	sclerosis
96·8%	creative	commons	86·3%	hot	tub
91·6%	global	warming	85·6%	simple	syndication
91·4%	registered	trademark	85·5%	due	diligence
91·0%	super	saver	84·7%	grand	theft
90·8%	free	counters	84·2%	Iron	Maiden
89·1%	planned	parenthood	82·8%	remote	sensing
88·5%	used	textbooks	82·3%	Black	Sabbath
88·1%	national	aeronautics	81·7%	self	catering

Funny to see two British metal bands in there. I think the presence of ‘super saver’ and ‘other shoppers’ might indicate a large contribution to the corpus from commerce sites.

What about ‘immense’?

But if I had to try to get someone to say ‘immense’ by giving a noun it’s commonly used before, what would that noun be? There are two ways you could answer that. For all nouns, find the fraction of times it’s preceded by ‘immense’ and pick the highest. Or, for all nouns find the rank of ‘immense’ in the list of adjectives it’s preceded by, and choose the highest (i.e., numerically smallest) rank. These aren’t necessarily the same thing, but it turns out they are, and the best you can do is:

2·0%	of the time	multitude	is used, it’s described as	immense

And ‘immense’ is the sixth-most common adjective used to describe a multitude; the top ten being:

40·4%	great	2·0%	immense
11·3%	whole	1·3%	countless
8·2%	vast	1·3%	infinite
5·1%	mixed	1·0%	any
2·6%	assembled	1·0%	large

An example adjective where ‘best rank’ and ‘highest fraction’ do not lead to the same noun is ‘striking’:

striking contrast — the highest rank ‘striking’ reaches is when describing ‘contrast’, when it’s the sixth-most common adjective for ‘contrast’ and accounts for 2·1% of descriptions of ‘contrast’.

striking similarity — the highest fraction ‘striking’ reaches is when describing ‘similarity’, when it accounts for 2·8% of descriptions of ‘similarity’, and ranks seventh amongst adjectives for ‘similarity’.

And ‘proportions’?

And somebody with knowledge of the Google N-Gram data, when given the clue ‘<blank> proportions’?

6·9%	of the time	proportions	is used, they’re described as	epic

It wasn’t a very good clue for ‘immense’ at all.

Representativeness of the data

The data-set shows its origins as a corpus drawn from the web, in that there are several amusing features of the results which sadly are not suitable for a general readership. Even though I don’t really have a readership of any sort, I therefore omit them. Sigh.

Most common doubled word

A couple of years, a related question came up: What is the most common doubled word? Filtering for unsuitable content, the winners turned out to be:

1.	blah blah	7.	no no
2.	had had	8.	really really
3.	very very	9.	much much
4.	ha ha	10.	long long
5.	la la	11.	Duran Duran
6.	big big	12.	etc etc

A good showing from band names altogether in that data-set.

Most cliched adjectives and nouns

Most cliched adjectives

Most cliched nouns

What about ‘immense’?

And ‘proportions’?

Representativeness of the data

Most common doubled word

Recent Posts

Archives

Archives

Categories

Meta