Links and Search Engines: The MSN edition

I’ve been promising for a while to complete this series with results relating to MSN (and, for the record, this has nothing to do with Scoble begging for it). I finally got around to cleaning up the HTML output of Excel and can now present the third (and probably final) installment in my analysis of search engine link features.

To recap, I initially took the list of Top 100 blogs listed by Technorati on May 19th, 2005 and started doing side by side comparisons. I initially looked at distribution of links among the top 100, then followed up with an analysis of Technorati against Google, this brought me to a subsequent chapter on Technorati against Google and Yahoo! (then comparing Google and Yahoo! to each other). All this created some fair amount of buzz in the search world, with people saying it was interesting to other saying I was way off the mark. Either way, it’s time to take a look at MSN, in order to complete this round-up.

So, to create some benchmarks, let’s start taking a look at distribution of Technorati links against MSN’s:

Technorati Top 100 MSN Links Technorati Links Technorati/MSN Links
Boing Boing 407172 22532 5.53378%
InstaPundit 241472 15190 6.29058%
Daily Kos 184666 15833 8.57386%
Gizmodo 252869 12278 4.85548%
Fark 352289 10216 2.89989%
EnGadget 198584 15051 7.57916%
Davenetics 3334 7571 227.08458%
Eschaton 138241 8713 6.30276%
Dooce 118385 6797 5.74144%
Andrew Sullivan 96315 7680 7.97384%
The Best Page In The Universe 92232 6333 6.86638%
Talking Points Memo: by Joshua Micah Marshall 193438 7592 3.92477%
lgf: anti-idiotarian 6067 8275 136.39360%
kottke.org 159861 7278 4.55271%
WIL WHEATON DOT NET 148587 6314 4.24936%
Metafilter 136052 7591 5.57948%
Doc Searls 95781 5690 5.94064%
(In)formacao e (In)utilidade 3272 6040 184.59658%
Wonkette 96768 5877 6.07329%
Scripting News 183067 5728 3.12891%
Power Line 92069 7477 8.12108%
Balmasque 409 4544 1111.00244%
Corante 23107 7686 33.26265%
A list Apart 220584 5536 2.50970%
Something Awful 97908 4512 4.60841%
Megatokyo 112902 4154 3.67930%
Michelle Malkin 72190 6091 8.43746%
Arts and Letters Daily 94718 3983 4.20511%
Gawker 72773 4453 6.11903%
Afterall it was the best I ever had 922 3591 389.47939%
The Volokh Conspiracy 88818 5873 6.61240%
Scobelizer 68282 5524 8.08998%
Jeffrey Zeldman 149539 4134 2.76450%
This Modern World 79038 3913 4.95078%
The Web Standards Project 211917 3810 1.79787%
Joel on Software 133853 4514 3.37236%
Media Matters for America 64867 6809 10.49686%
Television without pity 46391 3859 8.31842%
Kuro5hin 130549 4208 3.22331%
Lileks 50706 3824 7.54151%
Hugh Hewitt 64118 4573 7.13216%
Joel Veitch 23302 3774 16.19603%
Truthout 42693 6528 15.29056%
Baghdad Burning 51647 3519 6.81356%
Buzz machine 72649 4145 5.70552%
fleugel 201995 3670 1.81688%
Informed Comment 62822 3905 6.21598%
Doppler: redefining podcasting 12512 3040 24.29668%
geek and proud 714 3166 443.41737%
loadmemory (Asian site) 198 3324 1678.78788%
Photojunkie 3721 2860 76.86106%
Ross Rader 4830 2976 61.61491%
The Truth Laid Bear 51806 4127 7.96626%
Joi Ito 62642 5165 8.24527%
ScrappleFace 49953 3480 6.96655%
LexText 1741 2671 153.41758%
Google Blog 42967 3688 8.58333%
Xbox 86021 4221 4.90694%
My life in a Bush of Ghosts 12 2519 20991.66667%
Astronomy picture of the day 33625 3498 10.40297%
Crooked Timber 60675 3617 5.96127%
Vodka Pundit 58205 3085 5.30023%
Captain’s quarter 45609 3671 8.04885%
A small victory 54767 3223 5.88493%
Gato Fedorento 2294 2574 112.20575%
Mezzoblue 99511 2952 2.96651%
PostSecret 30794 2707 8.79067%
Samizdata.net 1712 2872 167.75701%
Lawrence Lessig 81047 2949 3.63863%
Counterpunch 52642 3278 6.22697%
Democractic Underground 35595 3913 10.99312%
Right Wing News 61379 2967 4.83390%
StopDesign 86165 3037 3.52463%
iBiblio 32301 3105 9.61271%
Samizdata.net (mistake?) 61443 2743 4.46430%
Abrupto 2698 2935 108.78428%
gene7299 (Asian MSNSpaces site) 28 3215 11482.14286%
Where is Raed? 24848 2409 9.69495%
B3TA: We love the web 38386 2614 6.80977%
Talkleft 60169 2901 4.82142%
Wizbang 60259 3358 5.57261%
m1net (MSN spaces site) 22 3548 16127.27273%
Hoder 1620 5422 334.69136%
CTRL+Alt+Del 32277 2315 7.17229%
Brad DeLong 48403 2715 5.60916%
Blogs for Bush 50820 3560 7.00512%
Neil Gaiman 71916 2194 3.05078%
Gothamist 47848 2729 5.70348%
Thought Mechanics 60736 2197 3.61729%
IMAO 45822 2905 6.33975%
Dan Gillmor (old weblog) 36369 2600 7.14895%
HINAGATA 176519 2186 1.23839%
Dean’s World 53150 2985 5.61618%
Defamer 49132 2372 4.82781%
USS Clueless 64725 2570 3.97065%
Dive into Mark 54167 2540 4.68920%
Pandagon 51286 2822 5.50248%
Blogging.la 8495 3061 36.03296%
Why are you worshipping the ground I blog on? 3481 2238 64.29187%
Daring Fireball 52381 2573 4.91209%

Of course, no big surprise here. This seems to be pretty consistent with what I had found in dealing with Google and Yahoo!, showing that Technorati does a good but not complete job at indexing link-backs. What’s interesting, however, is that Technorati seems to have a different pattern when dealing with MSN than it does with Yahoo or Google. Let me show you what I’m talking about. Following is the pattern of Technorati differential with MSN:

Technorati vs. MSN
Technorati vs. MSN

… and now is the differential between Technorati and Yahoo..

Technorati vs. Yahoo
Technorati vs. Yahoo

.. and finally the same graph between Technorati and Google

Techorati vs. Google: Averages
Techorati vs. Google: Averages

I’ve been trying to understand why this is and still have no clear answer, to be fully honest. Could be something, could be nothing. I’m not sure at this point and this is, in large part, one of the thing that was frustrating in working on this entry. I’m not sure there is something there, to be very honest.

Comparing the Search Engines

However, the picture gets more interesting when you get the three search engines side by side. Here’s a quick spreadsheet of the results:

Technorati Top 100 Google Links Yahoo Links MSN Links MSN Links/Google Links MSN Links/Yahoo Links
Boing Boing 45200 1880000 407172 900.8230% 21.6581%
InstaPundit 75000 2160000 241472 321.9627% 11.1793%
Daily Kos 59800 1690000 184666 308.8060% 10.9270%
Gizmodo 39300 1970000 252869 643.4326% 12.8360%
Fark 43600 1420000 352289 808.0023% 24.8091%
EnGadget 46800 2820000 198584 424.3248% 7.0420%
Davenetics 1780 66400 3334 187.3034% 5.0211%
Eschaton 62400 1400000 138241 221.5401% 9.8744%
Dooce 23600 653000 118385 501.6314% 18.1294%
Andrew Sullivan 41100 1260000 96315 234.3431% 7.6440%
The Best Page In The Universe 656 62000 92232 14059.7561% 148.7613%
Talking Points Memo: by Joshua Micah Marshall 74600 563000 193438 259.3003% 34.3584%
lgf: anti-idiotarian 14700 49300 6067 41.2721% 12.3063%
kottke.org 32000 1200000 159861 499.5656% 13.3218%
WIL WHEATON DOT NET 16900 564000 148587 879.2130% 26.3452%
Metafilter 34500 1160000 136052 394.3536% 11.7286%
Doc Searls 33600 1150000 95781 285.0625% 8.3288%
(In)formaco e (In)utilidade 1780 110000 3272 183.8202% 2.9745%
Wonkette 28800 1370000 96768 336.0000% 7.0634%
Scripting News 39400 1470000 183067 464.6371% 12.4535%
Power Line 7510 344000 92069 1225.9521% 26.7642%
Balmasque 24 40500 409 1704.1667% 1.0099%
Corante 6770 265000 23107 341.3146% 8.7196%
A list Apart 21100 620000 220584 1045.4218% 35.5781%
Something Awful 9020 372000 97908 1085.4545% 26.3194%
Megatokyo 7310 361000 112902 1544.4870% 31.2748%
Michelle Malkin 17300 537000 72190 417.2832% 13.4432%
Arts and Letters Daily 23900 866000 94718 396.3096% 10.9374%
Gawker 23500 1060000 72773 309.6723% 6.8654%
Afterall it was the best I ever had 95 34900 922 970.5263% 2.6418%
The Volokh Conspiracy 42000 1190000 88818 211.4714% 7.4637%
Scobelizer 21800 937000 68282 313.2202% 7.2873%
Jeffrey Zeldman 22500 528000 149539 664.6178% 28.3218%
This Modern World 32100 813000 79038 246.2243% 9.7218%
The Web Standards Project 1850 59800 211917 11454.9730% 354.3763%
Joel on Software 22400 966000 133853 597.5580% 13.8564%
Media Matters for America 24800 536000 64867 261.5605% 12.1021%
Television without pity 13300 356000 46391 348.8045% 13.0312%
Kuro5hin 17300 866000 130549 754.6185% 15.0749%
Lileks  39700 50706  127.7229%
Hugh Hewitt 26700 929000 64118 240.1423% 6.9018%
Joel Veitch 2830 135000 23302 823.3922% 17.2607%
Truthout 8780 371000 42693 486.2528% 11.5075%
Baghdad Burning 22700 552000 51647 227.5198% 9.3563%
Buzz machine 30600 1010000 72649 237.4150% 7.1930%
fleugel 1890 201000 201995 10687.5661% 100.4950%
Informed Comment 27900 787000 62822 225.1685% 7.9825%
Doppler: redefining podcasting 4420 607000 12512 283.0769% 2.0613%
geek and proud 355 9110 714 201.1268% 7.8375%
loadmemory (Asian site) 83 1550 198 238.5542% 12.7742%
Photojunkie 1540 51200 3721 241.6234% 7.2676%
Ross Rader 1070 48200 4830 451.4019% 10.0207%
The Truth Laid Bear 23900 717000 51806 216.7615% 7.2254%
Joi Ito 23400 1050000 62642 267.7009% 5.9659%
ScrappleFace 31100 807000 49953 160.6206% 6.1900%
LexText 1970 31200 1741 88.3756% 5.5801%
Google Blog 46 297000 42967 93406.5217% 14.4670%
Xbox 6600 237000 86021 1303.3485% 36.2958%
My life in a Bush of Ghosts 6 903 12 200.0000% 1.3289%
Astronomy picture of the day 5020 113000 33625 669.8207% 29.7566%
Crooked Timber 3560 67500 60675 1704.3539% 89.8889%
Vodka Pundit 4520 169000 58205 1287.7212% 34.4408%
Captain’s quarter 27100 730000 45609 168.2989% 6.2478%
A small victory 16700 460000 54767 327.9461% 11.9059%
Gato Fedorento 1630 126000 2294 140.7362% 1.8206%
Mezzoblue 12000 278000 99511 829.2583% 35.7953%
PostSecret 5790 202000 30794 531.8480% 15.2446%
Samizdata.net 1050 18000 1712 163.0476% 9.5111%
Lawrence Lessig 30600 959000 81047 264.8595% 8.4512%
Counterpunch 11700 295000 52642 449.9316% 17.8447%
Democractic Underground 14900 417000 35595 238.8926% 8.5360%
Right Wing News 27900 794000 61379 219.9964% 7.7304%
StopDesign 10200 255000 86165 844.7549% 33.7902%
iBiblio 9730 197000 32301 331.9733% 16.3964%
Samizdata.net (mistake?) 25500 697000 61443 240.9529% 8.8154%
Abrupto 550 44700 2698 490.5455% 6.0358%
gene7299 (Asian MSNSpaces site) 58 764 28 48.2759% 3.6649%
Where is Raed? 10100 232000 24848 246.0198% 10.7103%
B3TA: We love the web 12000 839000 38386 319.8833% 4.5752%
Talkleft 7170 221000 60169 839.1771% 27.2258%
Wizbang 21000 634000 60259 286.9476% 9.5046%
m1net (MSN spaces site) 104 579 22 21.1538% 3.7997%
Hoder 1480 20900 1620 109.4595% 7.7512%
CTRL+Alt+Del 2310 171000 32277 1397.2727% 18.8754%
Brad DeLong 30100 882000 48403 160.8073% 5.4879%
Blogs for Bush 16200 824000 50820 313.7037% 6.1675%
Neil Gaiman 13700 319000 71916 524.9343% 22.5442%
Gothamist 15200 491000 47848 314.7895% 9.7450%
Thought Mechanics 4400 190000 60736 1380.3636% 31.9663%
IMAO 23800 407000 45822 192.5294% 11.2585%
Dan Gillmor (old weblog) 10800 298000 36369 336.7500% 12.2044%
HINAGATA 10100 21100 176519 1747.7129% 836.5829%
Dean’s World 30600 784000 53150 173.6928% 6.7793%
Defamer 9310 725000 49132 527.7336% 6.7768%
USS Clueless 8470 264000 64725 764.1677% 24.5170%
Dive into Mark 14600 235000 54167 371.0068% 23.0498%
Pandagon 27300 743000 51286 187.8608% 6.9026%
Blogging.la 3200 67700 8495 265.4688% 12.5480%
Why are you worshipping the ground I blog on? 1430 85000 3481 243.4266% 4.0953%
Daring Fireball 12000 221000 52381 436.5083% 23.7018%

The most interesting thing here is that MSN seems to prove the assertion I had made regarding Google not providing as many links as Yahoo does. The same seems to be true between MSN and Google. There were, however, a few surprises here, as far as I’m concerned:

  • Sites located in the United States seem to fair better, on MSN, than other sites. Google and Yahoo seem to have a stronger indexing presence outside the US than MSN does.
  • MSN spaces sites are not getting particularly great representation in MSN search, compared to its competitors. I was surprised by this since they are part of the same service

Conclusions and more!

So there you have, no great insight here apart from the fact that this linking stuff is interesting and that even small scale analysis can bring up some interesting trends. As I mentioned before, I am not an expert on this and thought to put together the numbers and start an analysis. Enjoy!

Previous Post
Standard Power Chargers Would Be Nice
Next Post
NPR defining new Podcast strategy
%d bloggers like this: