Main menu:

Links

Site search

Categories

February 2010
M T W T F S S
« Jan    
1234567
891011121314
15161718192021
22232425262728

Tags

Blogroll


Apple iTunes


Paid Links and the Tragedy of the Commons

Everyone knows about hyperlinks – the highlighted text and images that we can click on to take us from one page to the next on the web. In the case of a text link, there is a simple piece of HTML code behind the link. For example, consider this link to Angie’s site.

If you look at the source for this web page you will see the following:

<a href="http://youlookfab.com">this link to Angie's site</a>

Very simple, but this little fragment is the basis of the revolution that is the world wide web. Without it there would be no web sites, no web surfing, and no Google search.

There are essentially two benefits that Angie gets when I create a link like this to (youlookfab.com) YLF. First, the obvious one. People will click on the link and end up on her page. So I am sending her additional direct traffic.

The second benefit is more subtle, but also more powerful. By creating this link to Angie’s page I am telling Google that YLF is important. As a result Google may place YLF closer to the top of their search results, and then send Angie additional search traffic. This second bump in traffic has the potential to have a far greater impact. There are only a few people today on TheBlogEasy, but there are millions of people doing Google searches.

In order to explain the implications of this I will now make some gross oversimplifications about the way Google search works.

Google Search and Pagerank

Google has a little piece of software called the crawler. This crawler goes from page to page, following all the links and building a giant database that captures all the links on the web. Another piece of Google software then uses this giant database to answer your search queries when you type them in on the Google homepage. But how does Google decide which search result goes on top? That’s where the links come in!

Google looks at the number of links coming in to a page and uses this as a reflection of the page’s importance. This makes sense – popular pages will have many incoming links. Unpopular pages that no-one cares about? They won’t have any incoming links at all. So by linking to YLF on this page, I have sent a little signal to Google that YLF has some importance. Google uses the term “pagerank” to describe this importance measurement that is based on incoming links.

Paid Links: An Industry is Born

People are smart, and when they realized that Google was sending sites a lot of traffic they starting to think about ways that they could influence the Google search engine. They needed to convince it to put their site closer to the top of the search results so they could get more traffic. One obvious way to do this is to convince other sites to link to you so that the crawler would find these links, their pagerank would increase and Google would send them more search traffic.

This worked so well that people started to pay each other for these links.

And therein lies a problem. Google’s algorithm is relying on the fact that the links are natural in order to use them as an indicator of a site’s importance. When people start buying links, they are buying importance. Even if their site is horrible to look at and contains unreliable information, with enough money they can trick Google’s search engine into thinking that the page is important.

Then Google starts sending people to crappy sites and they lose confidence in the search engine. Not good.

Policing Natural Links and Punishing those who Distort Them

Needless to say, Google is very concerned about anything that distorts the natural order of things. This includes paid links and also link networks, where everyone in a community links to each other in order to get more pagerank. So a few years ago they started to lay out guidelines about what webmasters should and shouldn’t do with links. And they started penalizing sites that they judged to be breaking these rules.

Google isn’t transparent about their penalties, but there are widely believed to be three main ones. The -30 penalty forces your site to the 30th place in the search results even if it would normally rank much higher than that. The page 99 penalty (or -950 penalty) relegates certain pages on your site to the very end of the results (page 99) for certain keywords. The third and most serious penalty is exclusion from the index.

One good example here is the company Text Link Ads. They provide a marketplace for people to buy and sell paid links. That is, they make money by distorting the natural order of things that Google cares about so much. If you search for “text link ads” in Google you will not find their website. It has been banished.

Interestingly, the topmost result when I searched for “text link ads” a few minutes ago was an article reviewing the Test Link Ads service. So they are managing to get good placement despite having been banished.

The Nofollow Exception

It is possible to create paid links that don’t anger Google. All you need to do is add the “nofollow” attribute to any unnatural link. Modifying the link I gave above, this would look as follows:

<a rel="nofollow" href="http://youlookfab.com">this link to Angie's site</a>

The rel=”nofollow” bit tells the Google crawler to ignore this link, so it won’t be counted when calculating the pagerank of YLF. Of course, this doesn’t solve the problem for people who buy and sell links because they want links that do impact the pagerank.

What this Means for You

I believe that if you care about getting traffic from Google, you need to care about their guidelines. For this reason we do not pay for natural links to YLF, nor do we accept money to put natural links on YLF . When we do link to an advertiser in a post thanking our sponsors or in a review article where the advertiser provided free merchandise, we use the “nofollow” attribute. We are also upfront with advertisers about this practice.

Note that while buying paid links is one example of Search Engine Optimization (SEO) — the set of techniques one can user to make ones site rank well in the search engines — I am not implying that all SEO is bad. There are many good SEO practices that make Google a better search engine by making it more knowledgeable about your site’s content (e.g. using a sitemap). What is bad is anything that distorts the natural link order.

Could you get away with buying and selling links? Probably. Is it worth it? If you are highly dependent on Google traffic, probably not. You might also agree with Google that the web is better off with people linking based on quality and relevance of content, rather than pure dollars. In fact, if paid links started to dominate the web, then Google would have a much harder time providing good search results. And then there would be no SEO-related  reason to pay for links in the first place.

Tragedy of the Commons

This brings us to the tragedy of the commons. From Wikipedia:

The tragedy of the commons refers to a dilemma described in an influential article by that name written by Garrett Hardin and first published in the journal Science in 1968. The article describes a situation in which multiple individuals, acting independently, and solely and rationally consulting their own self-interest, will ultimately deplete a shared limited resource even when it is clear that it is not in anyone’s long-term interest for this to happen.

Will this be true of links on the web? Will commercial interests ultimately ruin the relevance that one can infer from a hyperlink?

BBPress and Encoded URLs with Uppercase Hex

I recently discovered that the Google crawler was throwing a redirect error on certain pages in our YLF bbPress forum (bbPress 1.0.2 at time of writing). These URLs had one thing in common – the were encoded to handle special characters. Here is an example:

http://youlookfab.com/welookfab/topic/your-favourite-80%E2%80%99s-music-bands

The “%E2” in this URL is an encoded version of the apostrophe in “80’s”. I investigated a little using Firebug and web-sniffer.net, and discovered that this URL does indeed cause a 302 redirect. But why?

bb_repermalink() is the problem

The answer is in a bbPress function that runs for most forum pages: bb_repermalink(). (you can find this function in “functions.bb-core.php” in the “/bb-includes” folder of the bbPress distribution) I couldn’t find any documentation or discussion on this function, but it appears to check  the permalink and do a redirect if it finds an error. It turns out that the “correct” permalink (i.e. the one based on the post slug in the database) encodes special characters with lowercase hex (i.e. “%e2″ instead of “%E2″ using our example above). When bb_repermalink() compares the URL we typed into the address bar (with uppercase hex characters) to the “correct” one (with lowercase hex characters), it finds a discrepancy that it thinks is an error in the URL. So it redirects to the “correct” URL.

A plugin workaround

Fortunately there are some well-placed hooks in bb_repermalink(), so I was able to create a plugin that detected uppercase hex in the URL and then adjust the “correct” permalink accordingly. The code for this plugin is at the end of this post.

Why the Google crawl error?

One question still remains: why did this 302 redirect cause an error in the Google crawler? I can’t say for sure, but my theory is as follows…

  1. The crawler converts hex to uppercase before crawling the URL. So even though my sitemap specifies the URL with lowercase hex, Google’s crawler converts this to uppercase.
  2. Then when the crawler visits this URL, bbPress detects the uppercase discrepancy and issues a 302 redirect to the lowercase URL.
  3. Google’s crawler takes the new lowercase URL, AGAIN converts the hex to uppercase and then re-crawls the URL.
  4. This of course leads to an infinite loop as the crawler repeatedly converts to uppercase and bb_repermalink() repeatedly redirects back to the lowercase URL.

The crawler probably detects that it is getting the same URL back repeatedly, and interprets this as an error.

Again, I can’t be certain that this is what’s happening, but its a theory. If you know more than me about this issue, I would love to hear about it in the comments.

Here is the code for the plugin that will force bbPress to accommodate uppercase hex in encoded URLs without issuing a 302 redirect.

function _permalink_fix( $permalink, $location )
{
$matches = array();
/* are there any URL encoded hex characters with uppercase in the request URI? */
if (preg_match( '#\%([0-9][A-F]|[A-F][0-9]|[A-F][A-F])#', $_SERVER['REQUEST_URI'], $matches ))
{
/* replace ALL URL encoded HEX parameters with uppercase versions */
$patterns = array(
'#\%([0-9])([a-f])#e',
'#\%([a-f])([0-9])#e',
'#\%([a-f][a-f])#e' );
$replacements = array(
'"%" . $1 . strtoupper("$2")',
'"%" . strtoupper("$1") . $2',
'"%" . strtoupper("$1")' );
$permalink = preg_replace( $patterns, $replacements, $permalink );
}
return $permalink;
}
add_filter('bb_repermalink_result', '_permalink_fix', 10, 2);

Primer: Web Servers, Web Apps, Databases and Hosting

Even as a technical person who has spent most of their career programming or managing programmers, I was pretty confused when I first dipped my toes into web development. I really wished I had a simple primer that put things like the web server, web applications, the role of the database and hosting into perspective. Now that I have a better understanding of it all, I thought I’d try to write the simple primer that I was missing. Here’s a stab…

What is the web server?

This is the PC that you connect to the Internet to “serve” web pages. Other PCs on the Internet then send it requests for a page (the home page, for example) and the web server sends that page back to them. People use a web browser to send the request, and then the browser also interprets your web server’s response and displays it nicely on their screen.

So the “web server” is this server PC, but when people say “web server” they also mean the software running on this PC that interprets incoming requests and sends back the response. In our case, we are using the “Apache” web server, but there are also other options, like Microsoft’s Web Server offering.

What gets served?

Obviously the web server is not all you need. You also need some content for it to serve. This is as simple as adding the content to a folder on the server PC that the web server software knows about. Normally the content takes the form of files in the HTML format and the associated images that they use. In the early days of the web, it was *only* HTML text files and images that web servers “served”, but these days the web servers are way more sophisticated and can serve much more interesting content (videos, applications, etc.).

What is a web application?

There was a time when web servers only served up static content. You put a bunch of text files, images and videos into that special folder that the server knows about, and then people could view that content with their web browsers. At some point someone added some functionality to the web server that allowed it to dynamically serve up different content under different conditions. So the web pages now included logic, not only content. They became “applications”, as opposed to mere “pages”.

Why the database?

This is easiest to explain with an example. Take a blog. You could make your blog by manually creating a page for the front page, and then pages for each individual page. Each time you added a blog entry you would manually add it to the front page, move the last blog entry off the front page, and manually create the separate entry in a separate page. Clearly this is a crazy amount of work for each entry.

With a database, you can store the entry text as a record in the database. Then you have a blog “application” running on your server PC, working together with your server software. It takes entries from the database and formats them nicely for visitors to your blog. This application can format the blog entries in an infinite number of ways, with no change to the entries themselves! So if you wanted your blog entries to appear in a grid instead of the normal chronological order, you would only need to change the logic in the application.

What is Wordpress?

Wordpress is a blogging application that runs on a web server. It has a back-end part that makes it easy for you to add entries to your database, and it has a front-end part that displays these entries to your visitors. It is a big pile of HTML, CSS, javascript and images, but you install it the same way you would install the simplest of HTML web pages – you put it in that special folder that your web server knows about. There is some additional configuration, most importantly setting up the database that it will use to store your blog entries, comments, etc., but it is essentially just a web application that runs on top of your web server.

What is hosting?

You could run your own web server at home. You would buy a PC, install Apache or Microsoft Server, and install Wordpress. This would probably be unreliable though. You probably don’t have a proper cooled environment, with backup servers and a high bandwidth connection to your home. So people normally let a company that specializes in managing web servers do it on their behalf. This company “hosts” your web server. Logically, it is as if they put your server PC in their datacenter and managed it for you. You can log in remotely and add content to your web server, but they look after it and keep it running.

In my case Media Temple is hosting the web server. I installed Wordpress on the server, but they are hosting the server itself. You can take this hosting concept even further. Sites like Blogger.com also do hosting, but they host the web server AND the blogging application. There is a big difference, because in this situation you have way less control over your blog, and you don’t have the option of installing applications other than the blog (a forum, for example).

Single Sign-on for a Wordpress Blog and BBPress Forum

I recently experimented with single sign-on for an existing Wordpress blog (2.7.1) and BBPress forum (0.9.4). The two sites were already visually integrated, but behind the scenes everything, including user logins, was separate. With some help from BobbyH (webmaster at Weddingbee.com) and a lot of digging into the BBPress forums, things worked out well and now users have a single login across both blog and forum.

“Reverse” integration is the situation where you have an existing Wordpress blog and BBPress forum that you would like to integrate. The (by far) simpler option, referred to as “normal” integration, is to set up them up as integrated from the start, or even with an existing blog and a brand new forum. There is a great description of these options on the BBPress forum in this thread, and this screencast.

There is very little description of the more complicated reverse integration scenario, so I thought I would lay out the steps here. Most of them (all but the final two) are database operations. I used phpMyAdmin to do these operations (your host probably provides this tool), but you could use whatever tool you have for MySQL admin. You could even do it directly from the command line.

My basic approach here is to use the Wordpress database to hold one set of master tables for the user information, and to point BBPress to these tables. I use the BBPress tables as the starting point for these master tables because the forum has thousands of users. I choose to keep only the user tables in the same database. The rest of the BBPress data (e.g. the posts themselves) are still kept in the separate BBPress database.

So the high level steps are:

  • Copy the BBPress user tables into the Wordpress database
  • Replace the Wordpress user tables with these new user tables
  • Modify the new tables so that they are usable by Wordpress
  • Map the old Wordpress users to users in the new tables
  • Set things up to do cookie sharing correctly so people logged into the blog transition seamlessly into the forum and vice versa.

Here are the steps in more detail:

  1. I didn’t need to do backups because I was using a test site, but if you are trying this, don’t go any further without full backups of everything.
  2. Copied the “bb_users” and “bb_usermeta” tables from the BBPress database to the Wordpress database. I was a bit worried that the row sizes weren’t exactly the same in the destination copy, but as it turns out, that wasn’t an issue.
  3. Added a “user_activation_key” field to the copy of “bb_users” to ensure that the schema matched my current “wp-users” table in the Wordpress database.
  4. Renamed the “wp_users” and “wp_usermeta” tables to “wp_users_old” and “wp_usermeta_old” respectively.
  5. Renamed the new “bb_users” and “bb_usermeta” tables to “wp_users” and “wp_usermeta” respectively.
  6. Added Wordpress admin rights and metadata to the new admin user. To do this I ran the following query: INSERT INTO wp_usermeta (user_id, meta_key, meta_value) VALUES ('1', 'wp_capabilities', 'a:1:{s:13:"administrator";b:1;}');
  7. Added Wordpress user metadata for the rest of the users (since I’m using the
    BBPress user tables as the starting point they know nothing about WP capabilities). Here is the query: INSERT INTO wp_usermeta (user_id, meta_key, meta_value) SELECT user_id, 'wp_capabilities' AS meta_key, 'a:1:{s:10:"subscriber";b:1;}' AS meta_value from wp_usermeta WHERE user_id NOT IN (SELECT user_id from wp_usermeta WHERE meta_key = 'wp_capabilities') GROUP BY user_id;
  8. Changed the user_id references in the “wp_posts” and “wp_comments” tables so that they pointed to the correct users (you need to do this because equivalent users don’t necessarily have the same ID in the previously separate Wordpress and BBPress databases). Here are the queries I ran:UPDATE wp_posts SET post_author=NEW_ID WHERE post_author=OLD_ID
    UPDATE wp_comments SET user_id=NEW_ID WHERE user_id=OLD_ID
    NOTE that you need to think carefully about the order in which you do these changes. You don’t want to change a user’s ID to one that already exists in the Wordpress table.
  9. Installed superann’s plugin to downgrade WP 2.7.1’s cookie handling for compatibility with BBP 0.9.4
  10. Told BBPress to find its user data in the new Wordpress user tables. This is done in the BBPress admin page for Wordpress integration: [...]/bb-admin/options-wordpress.php. You need to follow the instructions carefully. In particular, note that the Wordpress database secret should be copied from the current Wordpress setting and not vice versa.

This is not something to do in a rush. I checked the results of each step carefully before moving on. This included browsing the database and, later in the process, logging in to the blog and forum to check whether things were working as expected.

And the job is not complete. I can now log in users on either blog or forum and they are automatically logged in when they transition to the other. If they log off on either side, they are logged of from both. What I still need to do is create a login panel that is visible from both blog and forum. Right now, users can only log in on the forum side.

NOTE:

  • This integration approach worked for me, but your mileage may vary. There are many permutations for the configuration of Wordpress and BBPress and I’ve made no attempt to present a completely general solution with the steps above.
  • Operating directly on your database is not for the faint of heart, so only tackle this if you are confident with phpMyAdmin, or if you are experimenting on a test site.

Wordpress Upgraded from 2.5 to 2.7 with No Fuss

Today I upgraded to Wordpress 2.7 on this blog.  I followed the abbreviated instructions and everything ran smoothly.

YLF will stick with an older version of the platform until we do a major update sometime in 2009, but it is great to install 2.7 on theBlogEasy to see the awesome progress the Wordpress team has made in this release. The admin UI has taken a big step forward in aesthetics and usability. It is easy on the eyes, and things seem to fall to hand in a very convenient way.

Good job Matt and the team at Automattic!

Reset your Admin Password in the Wordpress Database

Like most systems that require password authentication, Wordpress will help you out if you forget your password. Clicking the “lost your password” link at the login page will take you through a sequence of steps that ultimately sends a new password to the email address associated with your Wordpress account.

But what if your Wordpress installation can’t send email? This is the case with my development server, so recently when I lost the password I had to do a manual reset in the Wordpress database using phpMyAdmin. Here are the steps:

  1. Start phpMyAdmin
  2. Select your Wordpress database in the sidebar on the left
  3. Select the “wp-users” table in the sidebar on the left
  4. Select the “Browse” tab at the top of the page
  5. Click the pencil icon to edit the admin entry
  6. Enter a new password in the “Value” field (don’t press “Go” yet)
  7. Set the “Function” field to “MD5″
  8. Press the “Go” button

Done. Step 7 is the tricky one. MD5 is the password encryption that Wordpress uses in order to prevent people from seeing the passwords if they somehow gain access to your database.

Find Out how You Rank in Google’s Search Results

A lot of the traffic to YLF comes from search engines like Google, Yahoo and Live.com. This is easily the highest volume source of new readers right now. So it is important how YLF “ranks” for keywords that relate to YLF subject matter. For example, when someone goes to Google and searches for “smart casual style”, www.youlookfab.com is the often the first result at the top of the page. This is excellent. For other keywords, like “body type”, the ranking is less impressive.

One way to find out how your site ranks is to do a search on Google and see where you end up. This is time consuming, however, and there are online tools that make things a lot easier. The SEOBook rank checker is one such tool. Just type in the keyword and your site URL (in my case, the URL is “www.youlookfab.com”) and click “Check Rank”. The result comes back pretty quickly.

Here are some examples this morning (rankings in parentheses):

  • smart casual style (1)
  • structured clothing (1)
  • what to wear to a black tie event (7)
  • pretty tops (9)
  • smart casual (10)
  • how to get rid of deodorant stains on darks (10)
  • dress over leggings (10)
  • clothes to make you look slim (28)
  • body type (44)

There are some mysteries about Google keyword ranks. For example, the keyword “what to wear” doesn’t rank at all for YLF. It only appears in the paid results on the right hand side of the page (because we pay to put it there – more on that in a future post).

There are many other tools like the one I used above. Some of them will also check other search engines, like RankChecker, which gives results for Google, Yahoo and MSN (Live). Note that most of these sites are covered in advertising for products and services that promise to improve your rankings in the search engines. Be very wary of these promises – there are a lot of good search engine optimization (SEO) professionals out there, but there are also a lot of snake oil salesmen.

Google Takes us Back into the Fold

During July our traffic from Google, which had been growing steadily since we launched YLF in 2005, suddenly dropped more than 50%. To combat this I installed the All in One SEO Pack plugin, removed meta descriptions altogether, and resubmitted the sitemap.

Well, something worked. Our traffic returned to pre-plunge levels a few days ago. Courtesy of Google Webmaster Tools, here is a chart of our traffic from Google over the period.

Replicate your bbPress Forum Locally for Development

One part of setting up the YLF forum development server did not run very smoothly: replicating the database. I wanted to get a full version of the current YLF forum with all 20,272 posts by 1214 users. There is no backup or import/export functionality currently built in to bbPress (or available as a plug-in), so I needed to (1) do a database backup and (2) upload the backup to the new local database on my development server.

Export your production database. There are many ways to do this, but the most common is probably the tool “phpMyAdmin”. Most hosting services provide access to phpMyAdmin, and it is also part of the WampServer installation that I use on my local webserver. Since Wordpress and bbPress have so much in common, I thought it would be wise to follow the Wordpress steps for creating a database backup using phpMyAdmin. In particular, steps 1 through 7 worked just fine.

Step 8 is where I had my first problem. Our bbPress database is too big to be saved to a file, and would get cut off after the first 256 posts. For some reason you can get the full database contents by unchecking “Save as File” in the phpMyAdmin “Export” form. The contents are then displayed in a text area.

Copy it into a text file. I got the idea of pasting the contents of the textarea into a text file from Rex. Since it is a text area with over 10MB of text, it is a little unwieldy, but usable nonetheless.

  • CTRL-A to select all in the text area, then wait a minute for this to register.
  • CTRL-C to copy all that text, then wait another minute or so.
  • CTRL-V in a text editor to paste all the text into the file.

So far, so good. Unfortunately, things didn’t go as well when I used phpMyAdmin on the local server to Import the file. There were several odd syntax errors. After some experimentation I started to suspect the text encoding of the file was off. I had been using Notepad++ (a great free editor). To cut a long story short, I switched to trusty Notepad and when saving the file I specified “UTF encoding” in the Notepad “File Save As” dialog box.

Import the text file into your local database. With the database backed up in a UTF-8 encoded text file, the next step is to fire up phpMyAdmin on the local server, load the bbPress database and go to the “Import” form. Since we have a large database, I needed to take a few extra steps to ensure that phpMyAdmin loaded the file. These are the same steps that you need to take before importing a large Wordpress blog, namely, setting the following variables in your “php.ini” file:

upload_max_filesize = 20M
post_max_size = 20M
max_execution_time = 200
max_input_time = 200

Voila. The database contents were imported without any problems.

One small note: If your database had allready been configured by the bbPress install, then you need to go into phpMyAdmin on your local server and delete the existing tables to make way for the tables that will be restored from your backup. Please, please be careful. It would be all too easy to get confused and delete the tables in the wrong phpMyAdmin window, that is, the one on your production database. Bye bye forum.

Database Errors with a Local Install of bbPress 0.8.3

This is filed under the heading “I don’t know why the solution works, but it does, and that’s good enough for me”.

The YLF forum is currently based on bbPress 0.8.3 (great forum system, more on that in future posts). I was setting up a development server to test some new features and hit a strange error. For the record, my server is using WampServer 2.0c on a Vista SP1 PC. WampServer is great if you have a LAMP (Linux OS, Apache webserver, MySQL database, and PHP) production server and want an equivalent setup on your local Windows PC.

During the bbPress install, which is normally just as simple as the Wordpress install, I got the following database warnings:

Warning: mysql_get_server_info() [function.mysql-get-server-info]: Access denied for user ‘ODBC’@'localhost’ (using password: NO) in C:\xampp\htdocs\bbpress\bb-includes\db-mysqli.php on line 80

Warning: mysql_get_server_info() [function.mysql-get-server-info]: A link to the server could not be established in C:\xampplite\htdocs\bbpress\bb-includes\db-mysqli.php on line 80

Who exactly is “ODBC”, and why is he trying to access my database without a password? After much futzing with the database settings and the BBPress config, I starting looking around the web for a solution. Fortunately, it wasn’t long before I found one. No explanation for why it works, but it does.

In the file “bbincludes/db-mysqli.php”, change…

if ( !empty($this->charset) && version_compare( mysql_get_server_info(), '4.1.0', '>=') )
$this->query( "SET NAMES '$this->charset'" );

to…

if ( !empty($this->charset) && version_compare( mysqli_get_server_info( $this->$dbhname ), '4.1.0', '>=' ) )
$this->query( "SET NAMES '$this->charset'" );

Now, this does involve modifying a file in the bbPress distribution, which I would normally avoid. But in this case I just want the development server up and running asap, so I will live with the hack.