tsJensen

A quest for software excellence...

Respecting Terms of Use -- The Ethics of Meta-Searching

(Modified slightly on Monday, August 21, 2006, after considering issues of fairness and further introspection.)

For the first seven months of this year I worked on a project which, in part, was a meta-search tool designed to bypass the router pacing algorithms used by sites such as Google, Yahoo and MSN. I have come to believe that even if this were not a violation of their terms of use, it would be fundamentally unethical. I cannot claim the high road in having come to this conclusion. I did not arrive at this conclusion until some time after becoming unemployed and then re-employed.

Yesterday I began to question myself more seriously about the ethics of meta-searching than I had done before. Without doing any online research, a rarity for me, I just wrote down in long hand some basic questions and let one lead to another. My conclusion was that I could not ethically or morally justify the acquisition of data and resale of it in some form or another using meta-search techniques in violation of the source's terms of use.

On July 21, 2006, I was suddenly unemployed along with the rest of the development team at Provo Labs LLC, a Paul Allen (the lesser) venture. I didn't abandon the project even then. I looked for ways to keep the project alive. After all, I had spent months, including nearly every Saturday and Sunday, working on the code for this project. It was my baby. I was the only developer on it. A week or so after that fateful day in July, reality set in. I had no income and four children to feed. I had to find a job. And I did. A great job! The timing could not have been better.

In my first three days on the new job, I was impressed by the effort and expense the company is willing to expend to be sure that copyrighted material used in their product is properly licensed. This reminded me of a conversation I had had with management at Provo Labs earlier in the year. I had raised the question of the ethics of meta-searching and collecting data using automation from public search engines and other resources whose terms of use statements clearly prohibit such behavior. The discussion was brief and the subject was quickly swept aside. It boiled down to "everybody does it, including the search engines, so that makes it okay". I filed that rationalization away and kept going.

The intellectual property transfer from Provo Labs LLC to the new company Phil Burns is starting had not yet happened. I had even contemplated using my company, NetBrick Inc, an S corp of which I am the sole shareholder, as a holding company for this new venture. But I had become impatient and as Phil put it, "emotional and panicky".

I had my doubts about the whole deal and so today I pulled myself out of the deal entirely in part because I had lost faith that we would successfully negotiate the intellectual property rights to this product, in part because I did not believe I would have time to continue working on the project, but mostly because I had come to believe that it would simply be the wrong thing to do.

This process of introspection has been painful. I had to admit to myself that for the last seven months of my life, I have been building, enthusiastically, a product that was in large measure designed to violate the terms of use and possibly violate the law in the acquisition of meta-data from search engines and other sites for the express purpose of reselling that data in the form of market research and other such reports. I had rationalized this by thinking that we would not sell the data but only the conclusions we reached from the data. Splitting hairs like this was just another way to sweep the ethical inconsistency under the rug.

Today I informed Phil and Paul that I will no longer be involved with the project as it stands and that I will deliver the code in its existing form. I did not share with them my reasoning behind my decision because I really did not want to engage them in a debate on the merits of my decision. We had already been down that road.

After I informed Phil and Paul by email, I did some online research--something I really should have done, and unbelievably did not ever do, prior to starting the project. From any of the big three engines (Google, Yahoo, and MSN), you can click one or two links to get to the following terms of service information.

Google
http://www.google.com/intl/en/terms_of_service.html
"The Google Services are made available for your personal, non-commercial use only. You may not use the Google Services to sell a product or service, or to increase traffic to your Web site for commercial reasons... You may not take the results from a Google search and reformat and display them... You may not "meta-search" Google... You may not send automated queries of any sort to Google's system without express permission in advance from Google. Note that "sending automated queries" includes, among other things: using any software which sends queries to Google to determine how a website or webpage "ranks" on Google for various queries;
"meta-searching" Google; and performing "offline" searches on Google.

MSN
http://tou.live.com/en-us/
"In using the service, you may not:...use any automated process or service to access and/or use the service (such as a BOT, a spider, periodic caching of information stored by Microsoft, or “meta-searching”);"

Yahoo & Overture
http://docs.yahoo.com/info/terms/
"Except as expressly authorized by Yahoo! or advertisers, you agree not to modify, rent, lease, loan, sell, distribute or create derivative works based on the Service or the Software, in whole or in part."

Clearly, these search engines do not want you to use automated search software to mine their meta-data presented in search results and the results of other search related queries. It is clear that their intent is to only allow individual users through a normal web browser to access and use this information. Yahoo is more vague than the other two but the intent is still there.

So is the search engine behavior of crawling the content and indexing the content of other web sites unethical or immoral? Does that violate the terms of use posted by many other sites? Will the search engines remove your content from their site if you request it? I do not believe that it is unethical or immoral to drive traffic to a web site because its content contains what a search engine user probably wants to find. The search engine is not repackaging and reselling the data they find on the crawled sites. Yet they do profit in some measure from mining that content, for without the content, they would have no users. It seems to be a trade that most web site owners are willing to make.

I want to make it clear here and now that I believe that if I had made my concerns known to Provo Labs management more forcefully in the early days of this project, they would not have required me to work on it. They would have, I think, found something else for me to do. I hope this illustrates the flaw in my own character, which I hope to remedy in this, and does not leave the reader of this post to believe that Provo Labs LLC acted in an unethical manner.

The code is powerful and capable of being extended and used in a variety of ways. A friend of mine pointed out to me that not everything it does is a violation of terms of use document. In fact there is a lot of things that it is designed to do which goes no further than a typical web crawler in terms of gathering data. Perhaps a means can be found to make use of what it can do without violating terms of use policies. Perhaps the power of the code can be leveraged within the framework of licensed APIs. This is something that will have be determined.

Until that time, I'll continue my work at my new job and focus my personal coding efforts on my Forseti Project to keep my coding skills as sharp as I can. And I will take away an important lesson from this whole roller coaster ride: always examine and question the ethics of a project and then listen to your instincts.

If you'd like to comment and berate me here, go right ahead. I deserve it. If you're particularly vicious, I reserve the right to edit or remove the comment. If you've had similar experiences and stood up more valiantly, I'd like to hear about it and how it all turned out for you.