06 February 2013

Most cliched adjectives and nouns


Photo © Tom Newby Photography. CC-BY-2.0

While playing Articulate with my parents over Christmas, I had to describe 'immense', and tried to do so by saying that you could say something was 'of <blank> proportions'. This didn't work, but it got me thinking about what adjectives are typically only used to describe one or two things, and, conversely, which nouns are typically described by only one or two adjectives.

Time to dust off the Google N-Gram data from a few years ago, and combine it with a parts-of-speech database.

Most cliched adjectives

For each adjective, I found the noun it was most commonly used in front of, and the percentage of uses of that adjective explained by a use before that noun. The adjectives with the highest such percentage are the 'most cliched'.

Taking the twenty most cliched adjectives, after manually weeding out not-really adjectives, geographical phrases (like 'Saudi Arabia'), I found the most cliched adjective is:

91·5%

of the time

stainless

is used, it's to describe

steel

(This is an approximate statement. A more accurate one would be: 91·5% of the time that 'stainless' precedes a noun, that noun is 'steel'. But the above gets the point across.)

The full top twenty list is:

91·5%

stainless

steel

76·5%

sporting

goods

89·4%

objectionable

content

75·9%

stained

glass

87·0%

wrought

iron

75·1%

motley

fool

84·7%

typographical

errors

74·8%

designated

trademarks

84·0%

elapsed

time

74·5%

Grateful

Dead

82·6%

martial

arts

73·1%

respective

owners

81·9%

supreme

court

72·6%

vice

president

81·1%

movable

type

72·1%

deviant

comments

79·3%

untitled

document

71·3%

Looney

Tunes

76·9%

breaking

news

70·9%

nervous

system

'Untitled document' is pretty good, as is 'Grateful Dead' and 'Looney Tunes'.

Most cliched nouns

In a similar way, I examined each noun and found the adjective most commonly used to describe it, and the percentage of occurrences of that noun which were paired with that adjective. With this measure, the most cliched noun is:

97·4%

of the time

annotation

is used, it's described as

functional

(Again, this is an approximate description. A more accurate one would be: 97·4% of the time 'annotation' follows an adjective, that adjective is 'functional'.)

Giving the list as <adjective> <noun>, the full top twenty list is:

97·4%

functional

annotation

87·7%

other

shoppers

97·0%

real

estate

86·5%

multiple

sclerosis

96·8%

creative

commons

86·3%

hot

tub

91·6%

global

warming

85·6%

simple

syndication

91·4%

registered

trademark

85·5%

due

diligence

91·0%

super

saver

84·7%

grand

theft

90·8%

free

counters

84·2%

Iron

Maiden

89·1%

planned

parenthood

82·8%

remote

sensing

88·5%

used

textbooks

82·3%

Black

Sabbath

88·1%

national

aeronautics

81·7%

self

catering

Funny to see two British metal bands in there. I think the presence of 'super saver' and 'other shoppers' might indicate a large contribution to the corpus from commerce sites.

What about 'immense'?

But if I had to try to get someone to say 'immense' by giving a noun it's commonly used before, what would that noun be? There are two ways you could answer that. For all nouns, find the fraction of times it's preceded by 'immense' and pick the highest. Or, for all nouns find the rank of 'immense' in the list of adjectives it's preceded by, and choose the highest (i.e., numerically smallest) rank. These aren't necessarily the same thing, but it turns out they are, and the best you can do is:

2·0%

of the time

multitude

is used, it's described as

immense

And 'immense' is the sixth-most common adjective used to describe a multitude; the top ten being:

40·4%

great

2·0%

immense

11·3%

whole

1·3%

countless

8·2%

vast

1·3%

infinite

5·1%

mixed

1·0%

any

2·6%

assembled

1·0%

large

An example adjective where 'best rank' and 'highest fraction' do not lead to the same noun is 'striking':

striking contrast — the highest rank 'striking' reaches is when describing 'contrast', when it's the sixth-most common adjective for 'contrast' and accounts for 2·1% of descriptions of 'contrast'.

striking similarity — the highest fraction 'striking' reaches is when describing 'similarity', when it accounts for 2·8% of descriptions of 'similarity', and ranks seventh amongst adjectives for 'similarity'.

And 'proportions'?

And somebody with knowledge of the Google N-Gram data, when given the clue '<blank> proportions'?

6·9%

of the time

proportions

is used, they're described as

epic

It wasn't a very good clue for 'immense' at all.

Representativeness of the data

The data-set shows its origins as a corpus drawn from the web, in that there are several amusing features of the results which sadly are not suitable for a general readership. Even though I don't really have a readership of any sort, I therefore omit them. Sigh.

Most common doubled word

A couple of years, a related question came up: What is the most common doubled word? Filtering for unsuitable content, the winners turned out to be:

1.

blah blah

7.

no no

2.

had had

8.

really really

3.

very very

9.

much much

4.

ha ha

10.

long long

5.

la la

11.

Duran Duran

6.

big big

12.

etc etc

A good showing from band names altogether in that data-set.

Comments are closed.