Google's Role in Paying for Shoe Leather

This is a very rough approximation of the talk I gave at Google on March 16, 2006. Some parts have been added and some have been removed. The ordering has also changed to improve the flow. I should note that I prepared this talk and this text on my own. Nothing in it should be considered to be the opinion of any organization that has paid me in the past.

I would like to thank everyone for coming to this talk. Google has pointed me toward many pieces of information over the years and I hope to return part of the favor.

The question for today is what Google can do to help news organizations pay for shoe leather. Throughout history, the information ecology has shifted with the arrival of new technology. The telegraph carried information across the United States in record time. The radio first and the television second made it possible for people to witness live events without being there. The strength of pure visual connections let television news take a number of consumers from newspapers. Now, the immediacy of the Internet is taking consumers from both dead tree papers and fixed broadcast networks.

There is the general sense that the Internet is slowly eroding the value of any of the traditional techniques used to pay for gathering the news. The classic newspaper is a bundle of products designed to make it easy for people who want one particular stream of information to work with others. Everyone can spend a small amount and aggregate their support devoted to gathering the news. Some want the sports scores, others want an update on new shows, and some want to keep an eye on what happens to their tax dollar. Most people have wandering interests in all three. Each of these found their answer in the general pile of information called the newspaper.

Bundling the information gave the editors the freedom to spend vastly different amounts to gather information and that allowed them to deliver relatively expensive information as long as it was balanced with relatively inexpensive facts.

For instance, covering a war is very expensive in economic terms and occasionally infinite in personal terms. In 2005, two New York City police offers were killed in the line of duty. In that same year, one New York Times reporter died in Iraq. Overall, 61 reporters have died while covering the Iraq war. To put this in perspective, according to some reports 66 died in Vietnam and 68 died during WWII. (see here )

Luckily for newspaper publishers and readers, sending someone to cover the ball game isn't very expensiveand the only personal cost comes from the fact that the reporter often has to work nights. A general newspaper can subsidize expensive information gathering with the simple and local.

This old equation is failing as websites like ESPN.com and others start picking off the low hanging fruit. The folks who lived for the ballgame scores don't have the same need to buy the morning paper. The same goes for the financial results. So the daily paper becomes a harder sell.

Search engines like Google are helping this along by letting people focus on what they want. Let me say this isn't bad. It's just a consequence of building a machine that lets us find the tidbits we really want. The problem is trashing this old technique of bundling and that's making it harder to support some of the expensive information gathering. It should be no surprise that the bloggers are concentrating on reports of sightings of George Clooney and not the war effort. Gossip is cheap because it's trivial and often provided for free by sources. Spending real shoe leather to bring back reports from foreign bureaus or war zones means paying for people's time. There will be some soldiers and others who do it for free. Some of them will even do a better job than the generalists. But the odds are much lower that we'll ever see the kind of independent coverage delivered by the major media during Vietnam.

Google is not a direct competitor in the news business like some other Internet companies, but it does play in the space by letting readers search through text and video distributed throughout the web. The Google News site is an entirely automated mechanism for crawling major news sites and ordering the stories so the most important rise to the top. It may be pretty simple, but it is clearly a major first stop for people curious about what is going on in the world.

The first part of this talk will give you a quick summary of the bad news percolating through the newsrooms. It's a not meant to be a sob story because there is also much good news today. But the economics of news gathering is changing and this is spelled out in the news of layoffs and cutbacks in newsrooms.

The second half of the talk will be a summary of five things that Google might do to help the world support content creators who want to do more than blog about Jessica Simpson's break up. This isn't an exhaustive list. It's just a wish list I created with another reporter over dinner. Paying reporters to investigate stories and provide in-depth coverage is becoming harder in this environment and there are a few things that Google could do to help them. I don't expect that any of them will be very simple, but some of them aren't particularly hard to add.


Okay, first the context: Many newspapers are watching their circulation erode. The print runs for the dead tree editions are shrinking. In some cases, the numbers show dramatic drops of more than 10%. Some of this is the newspapers' own fault because they've been caught padding their subscription numbers, but there's no doubt that fewer people are buying dead tree editions.

This same effect, incidentally, is being felt in television newsrooms. Many fewer people watch the evening news and some local stations are even considering dropping the local coverage.

The managers of newspapers and television stations are usually responding with vigorous cuts. The Washington Post just announced layoffs of 80 people from the newsroom. The New York Times laid off 250 from across the organization in later half of 2005. These aren't the first round of cuts and the same story is being repeated throughout the industry. The well-managed newspapers make do with fewer staff members and usually continue to generate a nice profit and a good return on equity for the shareholders.

The effect for the reader, however is usually hidden from all but the careful observers. The newspapers are reusing content whenever they can. It's not unusual to see the New York Times use news with a byline of a reporter for the International Herald Tribune. In the past, the editor might have assigned the story to a different reporter maintained in the paper's local office. Wire stories are more common and many chain newspapers trade stories between them. The Baltimore Sun, for instance, often carries news originally written by a Chicago Tribune reporter.

Much of this reuse makes good economic sense. There's often no need for someone in Baltimore and someone in Chicago to write about imported olive oils for the food section. Both writers will probably come up with much of the same story. But local flavors disappear. Few in Chicago can write knowledgeably about crab cakes and I don't know if anyone at the Baltimore Sun knows that a hot dog in Chicago should come with celery salt.

It's impossible to  measure  the shrinking news coverage for much the same reason that it's philosophically impossible to prove something doesn't exist. We don't know which stories would have been written by the 80 people at the Washington Post. The folks given the hook might have been the slackers. But if we assume that each would write two stories a week on average, the odds are that there will be 8000 fewer stories in the Washington Post each year.

Some of those 8000 stories won't be missed. We can probably live without a few of the trashy items in the Style section. The blogs can easily replace the daily recommended quota of snark. The editors are sure to keep reporters on the most important topics, because it's still easy to send someone to a White House press conference. But the erosion will eat away at the coverage. Some important stories will be shorter and some beats  will go uncovered. We'll never really know what we missed.

The biggest losses will probably be complete surprises. There will be less coverage of the city council meetings and I'm sure that reporters will stop showing up at the zoning board meetings of the towns in the Maryland suburbs of Washington. It will be easier for politicians to come up with clever loopholes and I'm sure there will be an increase in slippery dealings. We'll wake up ten years from now and someone will say, "And it was all legal. We passed it in a meeting soon after the Post laid off 80 reporters." I know that the press is far from all powerful, but sometimes a bit of sunshine can keep the political process a bit cleaner.

There is still plenty of good news. The newspapers were quick to recognize the power of the Internet and they were some of the first to establish viable presences. Most of the top news sources in cyberspace are the top news sources in meatspace. One of the others, Yahoo, is largely known for repackaging AP wire stories, but is widely rumored to be building a real newsroom filled with living, breathing reporters.

The Internet is very friendly to newspapers. Now that most of the software is written and debugged, moving electrons is dramatically cheaper than moving paper. Newsprint continues to get more expensive as the costs of creating it and moving it rise with energy prices.

Many news companies also embraced the Internet by giving away all of their content. This has brought dramatically more readers. The New York Times usually distributes about 1.1m papers during the week and 1.7m on Sundays, but the website enjoyed about 18.6m unique visitors in January 2006. Most papers are reaching larger audiences than they ever have through their free websites. I regularly read my hometown newspaper online, something I could never do when the news was only available on paper.

One problem is that web ads aren't yielding the same revenues. The new readers are appreciated, but they don't generate the same cash as the folks that looked at many full-page ads. Print and full-page ads are still a better mechanism for advertisers because they manage to intrude without being too annoying.

Adam Penenberg analyzed the NY Times financial statements and found that in 2004 the dead tree subscribers generated $900 of revenue apiece. The website produced $11 per visitor. The gap between the two numbers has probably tightened since then because print advertising continues to become less popular while web advertising is booming. The NY Times is also experimenting with a walled garden for some of its content. There are said to be 150,000 TimesSelect subscribers.

The Wall Street Journal, by comparison, has about 761,000 subscribers, a number that's been relatively stable lately. It locks out non-subscribers from most of its content, something that may or may not increase the revenue depending upon who is speculating.

It's also important to note the good news from the world of blogs. Many of them are wonderful publications and some are truly outstanding. I read a number of them frequently and some of them every day. Someone really should coin a word like "jblog" or "journolog" to apply to bloggers who are doing more than just expressing their opinion. People who are making some attempt to double-check facts and generate a reasonably accurate report should have a different word that indicates that someone is burning some shoe leather.

The old joke used to be that freedom of the press belonged to anyone who could afford a press (and ink and libel lawyers etc). The invention of low cost blogging software erased that barrier, but it couldn't not change the brutal passage of time. Now, freedom of the press belongs to those who can afford the time (and energy and connections).  Only a serious few can make enough money in advertising and so the rest must pay the rent with another job.

This brings us to the question of shoe leather and what Google can do to support those who want to produce original content. If you asked me five to ten years ago, I would have thought that a search engine to the web was all that was needed. Helping the content consumer meet the right content creator is a marvelous gift to the world and something that is continuing to have amazing effects on almost every part of human life. Innovation is easier than ever before and magical creations are coming faster than ever before.

Today, I'm not as certain, in part because I think the ecology of free information has serious limitations. The Internet isn't supporting the shoe leather. I can't be certain of this, but my guess is that the blogs won't be able to replace the 8000 lost stories from the Washington Post. Yes, the blogs will replace some of them, perhaps as many of 7500 of the 8000. Yes, the blogs will offer a wider range of voices from a wider range of society. But I just don't see the same amount of serious journalism appearing. I can't quantify this and I don't know if anyone will ever be able to know. You just can't measure the depth of coverage very easily.

But even if I'm wrong, I'm beginning to see serious problems with the free information ecology. At first glance, free information seems like a great gift for the world. It's the kind of like the mythical frictionless economy, at least in ideas.

The danger is that the free information ecology will drown out the paid information ecology. Incidentally, this often happens in the world of money. People instinctively horde solid metal cash and spend paper money first. The economists call this Gresham's Law and summarize it with the phrase "the bad money drives out the good money". But in cyberspace, there's no such thing as money, just information and so it's no surprise that we could have a similar thing occur with bits. The cheap bits drive out the dear ones.

Google is a big supporter of the free information ecology but there's no great champion of the paid information ecology. Some publishers are doing great things with paid electronic content, but it's clear that it doesn't have the same mindshare as free content. This only makes sense given the price. But without some support for the paid information ecology, the freedom of the press belongs to those who have the time to blog. Everyone else is locked out.

This evolved by accident. Google has strict rules about eligibility and websites that require registration are verboten. Now, let me say that I'm not a big fan of registration. It annoys me as much as everyone else and I'm not sure that it really delivers much of value to the advertisers. But I'm not in ad sales so I can't be sure. It does have one interesting technical feature. Customized pages aren't cached by the various routers and firewalls, something that allows Internet websites to give their ad buyers some very accurate statistics.

Google's position was once a natural assumption for those in the Internet world and it was also one that I assumed myself. The Internet is so enjoyable because so much is free and open. I'm sure that no one even thought about including anything but free and open sites in the results of a Google search, even though it's possible to mix in anything you choose. Google, for instance, adds stock quotes if you type in GOOG. These come from a separate feed.

Let me give you a personal example of how the free ecology can drown the paid information ecology. This is a personal example and let me warn you that it's something that makes me bitter.

Let's say you want to see an article I wrote on encryption policy a long time ago, you can type "wayner encryption bill" into Google and the article appears. The  Google engine does a good job of finding the right text. Isn't the Internet great?

But if you click through, you might not notice anything wrong with this at all. You're not writers and you're not in the business of creating content. You'll look at this page and you might say, "The text is all there. The person who cut and pasted it left some extra crap from the NY Times website in the ASCII feed, but the whole thing is there. Isn't the Internet great?"

But I notice something that continues to bug me to this day. The text didn't come from the NY Times website, the company that paid for its creation. There are no ads that are helping pay off the investment in shoe leather. Nope. It's a website run by a pirate.

Now, I don't think that Dave Farber thinks of himself as a pirate when he ships complete copies of articles to thousands of his close and personal friends. Much of the traffic on his so-called Interesting People mailing list is generated by list members who are debating something, well, interesting. And he's a certified member of the technorati. People call him the "grandfather of the Internet" because his students helped build it.

This kind of attitude reminds me of rich shoplifters, the class of folks who steal things from dime stores and argue that no one could really care about something so inconsequential. A number of programmers and folks from the tech culture seem to believe that Farber's actions are laudable. Farber isn't stealing from my livelihood, they tell me, he's just "building a library." In fact, Farber is forced to do such a thing I'm told because the NY Times is going to put the content "behind a wall of pay". I'm not kidding about this. One particularly loathsome alpha geek actually told me, "Krugman and Dowd were once influential; now they're invisible" behind the wall of pay. Somehow he couldn't conceive of paying $50 for a subscription to TimesSelect and gaining the opportunity to see invisible things. And I've been told this by people with millions of dollars in the bank. It's not just an attitude of the poor kids at CMU struggling to make the tuition payments.

But this kind of piracy adds up in many ways. Not only does the NYT lose ad revenues and potential subscriptions, but the editors get incorrect data about the amount of interest in a topic. Piracy doesn't show up in the ratings. And you would think a professor of computer science would know that passing a pointer (URL) is more efficient than passing the complete text.

This is just one extreme example of how the free information ecology is strangling the paid information ecology. Farber works for an organization that pays few taxes but receives huge checks from the US government. It's easy to assume that "information wants to be free" when you've cut this kind of deal with society. (The information in a CMU education, though, will cost you. Sorry. That rule doesn't apply to diplomas.)

This kind of story is repeated day after day on smaller and grander scales. The reporters risk their lives to bring back relatively independent stories from Iraq and a bunch of fat, happy bloggers chatter about it. (Incidentally, I think bloggers are often very, very generous to the reporters because they've learned just how hard it is to generate content. They usually include gracious links when they lift half of the piece.) Or Aunt Millie posts her cookie recipe and ten other cookie fanatics copy it to their website with or without attribution. In either case, the creators are getting a small portion of the ad revenue.

So what can Google do about this? First, it might not care to do anything at all. The free ecology is quite wonderful. Google rules the information jungle and there's no obvious reason why it has to start mixing in text from the walled gardens built by some of the serious content providers.

But I think this would be a mistake. As we've seen again and again, when the creators aren't supported, they stop creating. If the anarchy of the Internet isn't rewarding the folks who are burning the shoe leather, then they'll be laid off like the 80 folks at the Washington Post. If the creators aren't rewarded, they'll go elsewhere and Google will have less and less worth indexing.

There's no reason why Google can't put paid information on the same level as free information. I think it can be done without harming the free information or by reversing the bias and putting the free information at a disadvantage.  Here are five suggestions that might help the content creators out there make enough money to pay for more than bandwidth.

Solution 1: End the Bias Against Walled Gardens

Why not loosen the rules on what content creators can do to protect their information? It's clear that the prohibition on registration rewards thieves who extract the information from the walled gardens. There's no reason why this information can't be woven into the search results. In fact some search engines have three columns of results: news, web, and ads. Make it simpler for publishers to get their information into the index even if they're not as wide open as we would like.

Solution 2: Tilt the Table Against the Copyists

Let me say that I'm a big believer in fair use. I think it's very important for people to be able to quote frequently and liberally. But some blogs take this to an extreme. It's easy to find blogs that are 80, 90, even 95 percent borrowed text. Some frequently cut huge chunks of an article and then wrap it with the thinnest amount of comment. Not surprisingly, some of these folks are big believers in "fair use". I can think of one blog where the writers spend more time agitating for fair use than they do writing their thin, snarky wrapper around huge blocks of borrowed text.

I don't think these sites are necessarily bad, but I think they end up taking an unfair amount of the return on the content. Many sell ads and some even support nice lifestyles without consuming too much shoe leather in gathering the content.

So why not add another term to the exponentially growing PageRank equation. Declan McCullagh suggested this during dinner last night. Why not compute the fraction of the text that's original and the fraction that's borrowed? This is possible to do because most bloggers are kind enough to include a link to the original text. If they don't, it's usually possible for a few searches of complete sentences to find the original.

Let's call this LeechRank. If 20% of the text is borrowed, let's do nothing to the PageRank. If 50% is borrowed, we bump them down a few notches. If 80% is borrowed, let's send them down 20 to 30 notches. And if 100% is borrowed, as some pirates do, well, let's just knock them straight out to the bottom of the listings, sort of a way station on their trip to the circle in hell reserved for people who steal and destroy a person's livelihood.

I realize that it's not Google's job to police the net for copyright violations but I do believe that people want original content from the original source. This LeechRank could help lift up the real contributors to the web and tilt the table by rewarding them with higher PageRanks. It should be noted that the PageRank is already sort of designed to do this. A good source of information is supposed to generate many links as blogs point to it. So I'm just asking that it be tweaked to penalize large scale copying.

Solution III: Offer a Better Split of the Revenue

There's no easy way to know how Google should split the revenue for Google's ads with the creators who host the ads, but I'm guessing that the split may not be as fair as it should be. Many of the bloggers complain about their revenues and it's difficult to know what's going on because everything is so secret. Some big companies like Netscape and foundations like Mozilla are powerful enough to negotiate very good rates, but most of the folks are too small and take what they can get.

My guess is that the marketplace will take care of this with time. There are a number of good companies that are offering bloggers more control over their ads and I'm sure that they'll succeed if they do a better job. Ads sales doesn't seem to require the same investment in capital as a web crawler and search engine.

Incidentally, I can point out that Google's operation is largely automated. Yes, there's plenty of upkeep that needs to be done, but it's nothing like the challenge facing the newspaper editors each morning. Many of the editors I know like to keep plenty of spare "evergreen" articles around because they live in mortal fear of not being able to fill the page. Google admins don't wake up to start their job all over again.

The point is that the equation is different for creator and indexer. Every good new article requires new work, but every new website to crawl requires very little new programming. Google can try to keep most of the revenues if it wanted to use all of its leverage with small bloggers, but I think this will hurt the web. Eventually the creators will get bored and move on. If the best bloggers start leaving, people will find another way to amuse themselves on line, another way that doesn't buy ads through Google.

Consider this other anecdote from Tuscany. I was once at an olive farm there and the guide explained that long ago, the landowners received some large percentage, say 80%, from revenue produced by their trees. This only made sense because the landowners owned the land and their ancestors invested in the trees. The peasants were just itinerant folks who didn't save or invest. They didn't deserve more.

The formula went unchallenged until the peasants started striking. The landowners reluctantly cut the percentage to something more equitable, say 50%, and found they were actually making more than before. The peasants had more incentive to do a better job harvesting the olives and everyone benefitted. This is what I think would happen if Google started raising the reward for small bloggers.

Solution IV: More Ad Types

Google's ads are currently a very good version of the Yellow Pages. It now has the market presence to start offering different types of display ads and I'm sure that it's investigating all avenues. The people at Google are very creative and I'm sure they can come up with some good tools that will help the advertisers and the content creators. Good luck.

Solution V: Micropayments

It's time to open up the index to articles that are kept behind a wall of pay. I imagine a system with three different columns of results from a search. The first would be pointers to articles from the free ecology. The last would be paid advertisements. In the middle could be articles from web sites that charge people to read the text.

Google could either help collect the payment or leave that marketplace to another company. Both have their advantages and limitations.

There are some neat collateral advantages of this solution. Cash is a great metric for measuring people's real intentions. Money talks and big talk walks. The original PageRank was pretty easy to game. I'm sure it's much harder now, but it's not impossible. But cash cuts through all of that. If people are paying real money for a document, then there's a good chance that people think that the information is quite valuable.

Why Free Isn't Enough

Let me close by giving you two anecdotes that explain why I think Google must start going beyond the advertising model if it wants to be serious about organizing all of the world's information.

First, think back to the time when MTV first started broadcasting music videos and mixing in a few ads for pimple cream. A friend of mine given to conspiratorial thinking told me, "Think of it, man. It's the first network that's 100% advertising." He meant that the music videos were themselves advertisements for record albums, not content produced for content's sake, whatever that means, if it can mean anything at all. (And who's to say that albums weren't just a gimmick the video artists used to support their true artistic talent in quick cutting montages of images.)

The free information ecology is close to MTV and getting closer. Sure, there are some passionate folks who are running earnest blogs that are vigorously fighting the good fight.

But everyone has to pay the rent and where someone stands on the issues depends upon where they sit. We're already beginning to see a proliferation of websites supported by groups that are political, religious or whatever. I'm sure there are bloggers who take cash from their buddies and some even disclose it up front. There's absolutely nothing wrong with them having a voice, but the danger is that they will have the only voice.

This is the major problem with the free-only ecology. A friend of mine sat me down when I first started writing a book and explained that it was a very different process than writing a long, long magazine article. The newspapers and magazines, he explained, have two loyalties: the subscribers and the advertisers. Both pay the bills. The job for a newspaper or magazine writer is to attract the kind of audience that will make the advertisers happy.

A book, however, is sold directly to the reader. The writer's loyalty is to the audience first and last. There's no complicated dance with an advertiser. That's why books continue to be the preferred ways for someone who really has a strong message to deliver. It's a medium built for Anne Coulters, the Dan Browns and the Popes. There's no editorial hand wringing or demands for "balance" to get in the way. There's a very tight feedback loop.

The free information ecology is the exact opposite. The same picky consumer who could make book authors dance has very little leverage over the free ecology. The free economy can only be dominated by those who get their rent money from other sources. Sometimes this won't affect their writing, but many times it will. The problem is that the free ecology doesn't have the feedback loop. The reader doesn't have the same leverage with the creator. Sometimes it may work out well, but in most cases, the creator will take care of the one who pays the bills first. It's just how the world has to work.

This brings me to the second proof by anecdote. Long ago but not really  long ago, the Olympics were closed to professional athletes. This rule kept the Olympics very pure. It was like the free information ecology because that icky money stuff didn't get in the way of the purity of sport. I think there's a lot of good wishes in this idealism, but it also had many nasty side effects.

The most important is that subtle and not so subtle biases emerged when the athletes had to find a way to pay the rent while training. (Before injection-molded plastic, the runners had to literally pay for shoe leather.) Jim Thorpe is famous for winning the gold medal at the decathlon only to return it when someone found out he accepted a few dollars for playing in a few ball games. Ooops.

Keeping money  out of the loop created many strange  inequalities because the rich could still fund their own workouts and maintain their purity. In 1928, the gold medal in the 400m hurdles was  won by David George Brownlow Cecil, 6th Marquess of Exeter. I'm sure he was a really fast guy and maybe the 6th Marquess of Exeter was really the fastest 400m hurdle runner in 1928. But maybe there was someone faster who took a few bucks to play ball and didn't qualify.  We'll never know.  It's like the question of those 8000 missing Washington Post stories.

Today, the Olympics are different. They aim to make sure the best athletes are in the arena. If money is involved, they aren't upset if the athletes get a cut. Is the play better? Undoubtably. Is money dominant? Perhaps, but anyone who's watched the coddled starts of the NHL or the NBA in the Olympics know that the less compensated folks still have a good chance.

This is why I think that Google needs to nurture the paid information ecology and find a way to support the creators in what they do. They don't need to abandon the free ecology or even favor the paid over the free. But the world will be a richer place if more people are given several ways to fund the shoe leather it takes to create content.