Respecting Terms of Use -- The Ethics of Meta-Searching

(Modified slightly on Monday, August 21, 2006, after considering issues of fairness and further introspection.)

For the first seven months of this year I worked on a project which, in part, was a meta-search tool designed to bypass the router pacing algorithms used by sites such as Google, Yahoo and MSN. I have come to believe that even if this were not a violation of their terms of use, it would be fundamentally unethical. I cannot claim the high road in having come to this conclusion. I did not arrive at this conclusion until some time after becoming unemployed and then re-employed.

Yesterday I began to question myself more seriously about the ethics of meta-searching than I had done before. Without doing any online research, a rarity for me, I just wrote down in long hand some basic questions and let one lead to another. My conclusion was that I could not ethically or morally justify the acquisition of data and resale of it in some form or another using meta-search techniques in violation of the source's terms of use.

On July 21, 2006, I was suddenly unemployed along with the rest of the development team at Provo Labs LLC, a Paul Allen (the lesser) venture. I didn't abandon the project even then. I looked for ways to keep the project alive. After all, I had spent months, including nearly every Saturday and Sunday, working on the code for this project. It was my baby. I was the only developer on it. A week or so after that fateful day in July, reality set in. I had no income and four children to feed. I had to find a job. And I did. A great job! The timing could not have been better.

In my first three days on the new job, I was impressed by the effort and expense the company is willing to expend to be sure that copyrighted material used in their product is properly licensed. This reminded me of a conversation I had had with management at Provo Labs earlier in the year. I had raised the question of the ethics of meta-searching and collecting data using automation from public search engines and other resources whose terms of use statements clearly prohibit such behavior. The discussion was brief and the subject was quickly swept aside. It boiled down to "everybody does it, including the search engines, so that makes it okay". I filed that rationalization away and kept going.

The intellectual property transfer from Provo Labs LLC to the new company Phil Burns is starting had not yet happened. I had even contemplated using my company, NetBrick Inc, an S corp of which I am the sole shareholder, as a holding company for this new venture. But I had become impatient and as Phil put it, "emotional and panicky".

I had my doubts about the whole deal and so today I pulled myself out of the deal entirely in part because I had lost faith that we would successfully negotiate the intellectual property rights to this product, in part because I did not believe I would have time to continue working on the project, but mostly because I had come to believe that it would simply be the wrong thing to do.

This process of introspection has been painful. I had to admit to myself that for the last seven months of my life, I have been building, enthusiastically, a product that was in large measure designed to violate the terms of use and possibly violate the law in the acquisition of meta-data from search engines and other sites for the express purpose of reselling that data in the form of market research and other such reports. I had rationalized this by thinking that we would not sell the data but only the conclusions we reached from the data. Splitting hairs like this was just another way to sweep the ethical inconsistency under the rug.

Today I informed Phil and Paul that I will no longer be involved with the project as it stands and that I will deliver the code in its existing form. I did not share with them my reasoning behind my decision because I really did not want to engage them in a debate on the merits of my decision. We had already been down that road.

After I informed Phil and Paul by email, I did some online research--something I really should have done, and unbelievably did not ever do, prior to starting the project. From any of the big three engines (Google, Yahoo, and MSN), you can click one or two links to get to the following terms of service information.

Google
http://www.google.com/intl/en/terms_of_service.html
"The Google Services are made available for your personal, non-commercial use only. You may not use the Google Services to sell a product or service, or to increase traffic to your Web site for commercial reasons... You may not take the results from a Google search and reformat and display them... You may not "meta-search" Google... You may not send automated queries of any sort to Google's system without express permission in advance from Google. Note that "sending automated queries" includes, among other things: using any software which sends queries to Google to determine how a website or webpage "ranks" on Google for various queries;
"meta-searching" Google; and performing "offline" searches on Google.

MSN
http://tou.live.com/en-us/
"In using the service, you may not:...use any automated process or service to access and/or use the service (such as a BOT, a spider, periodic caching of information stored by Microsoft, or “meta-searching”);"

Yahoo & Overture
http://docs.yahoo.com/info/terms/
"Except as expressly authorized by Yahoo! or advertisers, you agree not to modify, rent, lease, loan, sell, distribute or create derivative works based on the Service or the Software, in whole or in part."

Clearly, these search engines do not want you to use automated search software to mine their meta-data presented in search results and the results of other search related queries. It is clear that their intent is to only allow individual users through a normal web browser to access and use this information. Yahoo is more vague than the other two but the intent is still there.

So is the search engine behavior of crawling the content and indexing the content of other web sites unethical or immoral? Does that violate the terms of use posted by many other sites? Will the search engines remove your content from their site if you request it? I do not believe that it is unethical or immoral to drive traffic to a web site because its content contains what a search engine user probably wants to find. The search engine is not repackaging and reselling the data they find on the crawled sites. Yet they do profit in some measure from mining that content, for without the content, they would have no users. It seems to be a trade that most web site owners are willing to make.

I want to make it clear here and now that I believe that if I had made my concerns known to Provo Labs management more forcefully in the early days of this project, they would not have required me to work on it. They would have, I think, found something else for me to do. I hope this illustrates the flaw in my own character, which I hope to remedy in this, and does not leave the reader of this post to believe that Provo Labs LLC acted in an unethical manner.

The code is powerful and capable of being extended and used in a variety of ways. A friend of mine pointed out to me that not everything it does is a violation of terms of use document. In fact there is a lot of things that it is designed to do which goes no further than a typical web crawler in terms of gathering data. Perhaps a means can be found to make use of what it can do without violating terms of use policies. Perhaps the power of the code can be leveraged within the framework of licensed APIs. This is something that will have be determined.

Until that time, I'll continue my work at my new job and focus my personal coding efforts on my Forseti Project to keep my coding skills as sharp as I can. And I will take away an important lesson from this whole roller coaster ride: always examine and question the ethics of a project and then listen to your instincts.

If you'd like to comment and berate me here, go right ahead. I deserve it. If you're particularly vicious, I reserve the right to edit or remove the comment. If you've had similar experiences and stood up more valiantly, I'd like to hear about it and how it all turned out for you.

Revive an Old Turbo Flame?

Just found this referenced on an FTPOnline story: http://www.turboexplorer.com/

For an old Delphi aficionado (version 5 was my last), I can't wait to download the Explorer versions to see what they've done with the place.

I've always felt that starting with a Borland tool was a better place for a beginner to start. And then you take a corporate job and everyone is drinking the blue coolaid. Don't get me wrong. I like the coolaid too. Visual Studio 2005 is hands down the best IDE I've worked with. And no, for you Eclipse fans, I've not tried that highly vaunted IDE. I do know people that have used both and they invariably have good things to say about both.

Borland is spinning off the tools, so they say. So where will they be spun and how do these new dolled up Turbo versions fit into the equation. And so I don't have to wait so long, is there anyone at Borland that can get me a sneak peek copy.

I promise to run it through it's paces and report back here. I'm especially eager to try the C++ flavor. Could the good old days of Turbo be back? Let's see....

SOAP vs REST -- Clean vs Comfortable

SOAP vs REST
In my work I've had occassion to use both SOAP and REST in the client and the server. SOAP is easy if you have good tools. Hard wiring a WSDL is not my thing. At the risk of committing a pun foul, I'd rather eat a bar of soap than hard code good WSDL. Fortunately, .NET makes WSDL for simple web services easy, both on the server and client end of things. And WSCF makes more complex web services easy in the .NET world.

At the same time, REST is more comfortable, especially for those without nice support tools for consuming SOAP on a plate of WSDL. A nice simple HTTP POST. A simple agreement between friends to pass X, Y, and Z data along in a simple name value pair model.

Trust or Verify
I guess in some ways it comes down to trust. Do you trust the client to submit clean data? Can you trust your server application to parse through and make safe any data that is not clean? Or would you rather automate some of that authentication via schema and the rigidity of SOAP? For me, it all depends on the circumstances.

The Illusion of SOAP and Schema
How tight are your contracts? A good lawyer will take a two page agreement and expand it to ninety pages. Not only because she wants to bill you more but because she needs to cover all the bases. Are your web service contract bases covered? Is the schema and secondary validation sufficient.

Can REST Be Secure?
This line of thought takes me to the question. Can we trust REST? Well, the short answer is no. But the longer answer is, yes, just as much as we trust SOAP. The brilliance of SOAP is the contract is carried with the data, or at least that data is transported in a container with which the contract may be validated. So is that really better? Well, the underlying truth is that someone else wrote a bunch of helper code to help us perform the first level of validation in the message--form. But what about content. Yes, schema validation can do some content validation as well, especially of the type type of validation. Beyond that, it's up to you pretty much.

Validation Bottom Line
Ultimately the value and robustness of a web service, whether you use REST for it's simplicity or SOAP for the niceties of automated tools, will be determined by the code you write to validate and execute and respond with an appropriately formed response.

Consider Your Audience
Back in my poor old days as a technical writer, I had to always keep in mind and understand my audience. It really does matter. For example, my mother would not understand a single word of this post. If you are publishing web services, you must consider who will consume them. Will they be a hodge podge of PHP, JSP, ASP, and many other forms of "server pages" technology? Or will they be hard driving Visual Studio SOAP users who would rather have the tool do the heavy lifting and eliminate the need to parse?

Give your users what they want. And to do that, you may have to give them both SOAP and REST. I guess that won't hurt us too much. After all, a hot shower is always a good combination with a good night's rest.

Frameworks in Their Place

A friend shared this Joel on Software post with me today. An absolute hilarious read.

Excerpt:

"So you don't have any hammers? None at all?"

"No. If you really want a high-quality, industrially engineered spice rack, you desperately need something more advanced than a simple hammer from a rinky-dink hardware store."

"And this is the way everyone is doing it now? Everyone is using a general-purpose tool-building factory factory factory now, whenever they need a hammer?"

"Yes."

Frameworks have their place but time seems to be unkind to them. I'm not a Java guy but even the Java zealots I knew "back in the day" are now less bullish on the frameworks mess that exists on the Java stack. And no, for you remaining zealots, I don't want to fight about the point.

The fact is, the .NET Framework is only a few years behind. Will it bloat too? Has it already started with 2.0 and all the changes in ASP.NET and so forth and so on? Will 3.0 see bloat or a cohesive, more conservative growth pattern.

If we're all doomed to framework elephantitus, what is the solution? My hope is that Microsoft will learn from the failures of its biggest competitor and work as hard to keep the framework tight as they work to sell their operating systems. The same aggressive and successful behavior will be the only thing, perhaps, that can save us all from the same doom suffered by our Java brethren who are now escaping in droves to PHP.

Okay, I made that last part up. But it could be true. ;-)

Does Pay Per Click Work?

I"m not a marketing guru. Never have been. Never will be. You too? So how do we maximize traffic to our blog, our side project, or our main gig? Well, we tell our clients to hire us, the expert, when they need some coding done.

So hire an expert.

I know just the expert. I've watched these guys in action. They know what they're doing. Check them out at http://www.webevident.com/ppc-management.php.

They can handle all your pay per click campaigns. And you would be surprised how much traffic they can drive to your site on a very tight budget. They do a free analysis for you, so you have nothing to lose by at least checking them out.  

A Requirements Management Allegory

My wife won the lottery. Two hundred thousand dollars. Uncle Sam took half. She said, "I want a new car. Go buy me a new car."

So I took the checkbook and bought a brand new Honda Accord for $30,000. When I arrived home my wife said, "I didn't want an Accord. I want an SUV."

On the way back to the dealership, an accident occurred. I escaped with my life but the car was a total loss.

I still had the checkbook so I wrote a check for $40,000 and took home a nice, new Dodge Durango. I was so pleased with myself.

But my wife was not. She said, "The Durango is too small and I don't like the color red."

So I turned around and took it back to the dealer. I asked for my money back but he whipped out the magnifying glass and pointed out the small print: "absolutely, under no circumstances can you get a refund."

"Besides," said the salesmanager, "we've already spent the money and we can't take a new car in trade. It's just policy."

So I drove the Durango to the Ford dealership and on the way was rearended by a large truck. The Ford dealership gave me $10,000 in trade and I wrote a check for $40,000 more for the last of the new Ford Excursions.

I drove the Excursion home. Finally my wife was happy. "Now let's go buy the boat," she said.

"Sorry, honey," I said. "We're out of money."

So we have this giant SUV and we can't afford to put gas in it, and we have nothing to pull behind it.

But, we do have an SUV that cost $100,000 and in three years will be worth less than $20,000. And as a compensating note, I can haul a ton of groceries with it which helps save the cost of gasoline to get to the grocery store in the first place.

Now if only we could find another lottery to win.

Forget Fedora 5

Well, after struggling to get Fedora 5 to run on my machine and get the GUI up and running on an nVidia card, I've given up on this distribution after finding this bit of nasty news.

I think I'll try SUSE next. I've tried using the "YUM" updater and following a variety of instructions from a variety of posts to get my dual monitor eVGA GForce 7800 GT to work. All to no avail.

Once downloaded and installed, I'll post the results of my attempts with SUSE 10.

Venturing into Mono

I've begun the journey into Mono. Fedora 5 is nearly completely downloaded. I've freed up a partition on which to install it. I've downloaded the mono-1.1.1.13.6_0-installer.bin from the official site.

Why?

Because I'm building a system that must scale to many machines and we're considering using a virtual machine hosting system. And they only host virtual Linux boxes.

Will we definitely host the application on virtual system? No, not definitely. But if the port to Mono goes well, it's certainly an option.

My concerns about going to Mono is first, I know very little about Linux. Second, I'm using System.ServiceProcess.ServiceBase for my server, and that namespace, as far as I can tell, is not supported in Mono. So these two items may pose a bit of a learning curve.

After downloading some but not all of the Mono source files, I began wandering about and looking at how the Mono team has implemented various class libraries that we .NET developers take for granted every day. Talk about a wealth of code samples that will be extremely valuable in my daily work, regardless of whether I'm in Mono or MS .NET coding.

I'll post more on my progress into the world of Mono and Linux in the future. In the meantime, if you have any words of wisdom for me, please feel free...