martes, marzo 31

Your Blog or Twitter Could Become Your Primary Identity

Marshall Kirkpatrick’s Site About How to Use the New Internet » Add One Line To Your Blog or Twitter Could Become Your Primary Identity

Add One Line To Your Blog or Twitter Could Become Your Primary Identity

03.30.09

OpenID community leader Scott Kveton noticed this morning that his Twitter profile page is now the #1 search result in Google for his last name, not his blog. This is something TechCrunch reported on earlier this month, but people are just starting to wrap their heads around it. I know I want this blog to remain the #1 search result for my name, not my Twitter profile.

In a conversation on FriendFeed, Ben Hedrington pointed out that in addition to the page title change that TechCrunch reported on - Twitter also uses the rel=”me” markup and Kveton’s blog does not. I looked and realized that my blog here doesn’t either!

So the long and short of this story is that if you want to make sure that Google understands your blog to be your primary beacon on the web, then you should add the words rel=”me” to a relevant link on your blog. I’ve added that tag to the link on my sidebar that goes to my feedback page, because that’s a good page for me. It’s as simple as making the link text read a href=”http://marshallk.com/feedback” rel=”me”.

That may not solve the entire problem but it should help and it’s good form. Machine readable microformats like rel=”me” are likely to be an increasingly important part of the web in the future. Would readers here suggest otherwise? If I’m reading too much into this, let me know.



Un sociólogo (mirando) en Twitter

Continuando con la investigación cualitativa que publiqué ayer en Twitter, usos y utilidades: primer estudio cualitativo y también con el post Vida en un Twittinstante planteo, de manera adicional, algunas de las reflexiones que la investigación me ha ido sugiriendo.

Estas ideas, evidentemente, no podía incluirlas como conclusiones de esa investigación, así que las presento separadas y como forma de preparar la mesa redonda sobre Twitter de mañana 1 de abril a las 17:00 en el espacio bloggers del OME.


Las ideas que me ha sugerido la investigación son las siguientes:

  • Parece deducirse que los usuarios de Twitter no lo usan en exclusiva sino que pueden estar siendo usuarios intensivos de otras medios sociales o herramientas web; sea como meros lectores o creadores de contenidos (blog, sites, foros, etc.).
  • Los medios sociales, y Twitter podría estar jugando un papel como acelerador de información, parecen presentarse como un complemento y, en parte, como alternativa que canibaliza la atención sobre los medios de información clásicos.
  • A pesar de la gran repercusión y amplitud de las redes sociales, y en especial en el caso de Twitter, la interacción parece tender a limitarse a un grupo reducido de personas con el que se intercambian mensajes personales o directos y no está relacionada con el número de personas “seguidas” ni del número de “seguidores”.
  • En Twitter el intercambio de atención, como recurso escaso, es altamente selectivo y puede tener una relación inversa al número de personas “seguidas” y de “seguidores”.
  • El uso real de Twitter parecería estar apuntando a que ese reducido número de personas a las que se envía mensajes directos o personalizados parece ser, en realidad, la verdadera red social de cada usuario de Twitter más que el número de personas “seguidas” y de “seguidores”. Al igual que sucede, por ejemplo, con las agendas telefónicas de las que solo se usa un reducido número de contactos de forma regular y que forman la base nuclear relacional.
  • En Twitter parece darse una relación entre la recepción de mayor atención y un mayor número de post. O dicho de otra manera, una menor frecuencia de post puede estar relacionada con una atención recibida menor (número de seguidores).
  • En Twitter se puede estar dando un fenómeno que podríamos denominar spamfollowing o social networking spam, que podemos entender cómo el hecho de hacerse seguidor de otras personas de forma intensiva e indiscriminada con el único objetivo de conseguir más “seguidores”. Según un experimento que he llevado a cabo durante ocho semanas con un perfil adhoc, un alto porcentaje entre el 40% y el 45% de los que reciben el mensaje de seguimiento devuelven el “favor” (post scriptum: 42% resultado final). Por tanto, se puede conseguir una red de seguidores haciéndose seguidor masivo de otros y sin aportar necesariamente valor alguno en la interacción.
  • Parece, por último, una hipótesis sostenible que en Twitter el número de seguidores tiene más relación con la audiencia que con la influencia. Una mayor audiencia no supone necesariamente mayor influencia, ya que ésta parece residir en la calidad e interés de los contenidos y, buen parte, de los contenidos en Twitter parecen ser mero ruido sin información.

¿Cuál es tu experiencia y opinión sobre Twitter?
"feel free": Envía o comenta tu experiencia con tu comentario. Gracias.

lunes, marzo 30

2009 Social Media Optimization: Back to Basics?

2009 Social Media Optimization: Back to Basics? » aimClear Search Marketing Blog



2009 Social Media Optimization: Back to Basics?

Posted by Matt Peterson on March 27th 2009 in Google, SES New York 2009, Social Media | 3 comments

social-media-optimization-panel-at-search-engine-strategies-new-york-2009

Social media has crept into nearly every aspect of SEO to where it’s now quite difficult to imagine them disentangled from eachother. Keyword research traditional to SEO is being used to advise every tag, tier, and title in brand social media efforts. Go ahead and find me an SEO who says “I don’t do social media”,  they’re either speaking figuratively, are blissfully unaware, or have an extremely rare vendor relationship.

What do we now know to be true about SMO that we didn’t know 1 year ago or even 3 months ago? What social media principles have stood the test of time? Should we reignite a serious dialogue on ethics in social media optimization?

Search Engine Strategies New York 09 brought some of the best and most outspoken search & social media marketing figures together for the session “An Update on Social Media Optimization“, moderated by Search Engine Watch’s managing editor Kevin Newcomb.

Leading the session was Liana Evans, director of internet marketing at KeyRelevance.

Li asks, how did we get here with social media, how did it become a powerhouse? We used to market with TV & Radio, throwing out messages while consumers sat and took it. The internet has now allowed consumers to express feelings, share experiences and much more. This is what social media is about, connecting those who share similar feelings,  interests and experiences.

Social media does not mean drop your PPC and SEO, it should compliment them. It’s time consuming, not a fast process, and it’s not about putting any article on digg and automatically getting 10,000 visits.  It’s different for everybody.

The biggest thing to remember is that people hate to be marketed to. The minute you start marketing to them, they will cut you off.

It’s also about the end user and the new signals of search. Search engines are looking at reviews, product tags, chatter in social media sites. It’s not so much about links, not to say that links are going away.

It’s also about where your audience are. There’s photo sharing on Flickr, video sharing on Youtube. There are all different types of people in social media: creators, critics, collectors, joiners, spectators and inactives as archetypes among them. Joiners join facebook, creators create blogs, critics review. What are these people doing in  social media, watching videos? Start looking at video sharing communities if you haven’t.

Start out by defining your goals,  some useful social media metrics include comments, ratings, links, twitter followers, retweets, number of reviews, # of people that ask questions.

From Social Media Rockstar to President
Barack Obama is the prime social media success story. He needed to reach college kids, African Americans, women, blue-collars, and independent voters. He created the conversation in his own community and it facilitated amazing  buzz right out of the gate.

He went full force into video sharing networks; he uploaded over 1,000 videos to his Youtube channel and had over 19 million views and 133,000 subscribers. He put photos on Flickr and moved other users to upload their own Obama-inspired photos out as well. He had a Linkedin profile, with Q & A and groups.  He had a profile on Black Planet, one of the largest social networking sites and he had over 480,000 friends. He twittered, and though it was not technically him, it lead to heavy discussion.

When you search for Obama, his profiles are showing up all over, they completely dominated the SERPs. As for the end results: he of course won the female, African American, young, blue collar and independent vote.

To sum Li’s advice: be social, start a conservation, be transparent and remember the end user.

Next was Dave Snyder, co founder of  Search and Social, presenting on the problem of “cookie cutter social media” approaches. Dave warned us in advance that his speaking style is stream of consciousness, but I found his message to be actually very linear and natural to absorb.

The biggest problem Dave sees in social media is that it’s mostly approached from a high level. “Cookie cutter process people” are delving into social media in the wrong way. Don’t keep using the same social media tactics client to client, platform to platform, expecting similar results.

When a Cookie Cutter Goes Wrong
Don’t just use templated social media examples like things you’ve heard at conferences. Don’t apply these social media tactics to ridiculous products that don’t match the format.

His prime example of a social media offender was Overstock.com

  1. Overstock.com put a social community on their site that nobody used, it became a haven for spam; advertising for teeth whitening kits and the like.
  2. They decide to take their “success” on the road by creating a Facebook page, ending up with near zero human engagement.
  3. They tried integrating product purchases into Facebook stream. So the guy who buys an engagement ring on Overstock for his fiancee ends up tipping her off before the big surprise.
  4. Overstock tried twitter, and their early tweets were about birds pooping on their head.

Why have a Facebook business page for a Motoroil company? Think about why  anyone would want to interact with your company in that channel? Pizza Hut has a Facebook application to help you order pizza directly on Facebook, why?

Just because “motor oil” doesn’t play in Facebook, it doesn’t mean that social media isn’t for you. Find niche sites, understand what content works on which platforms. Digg and Reddit are both social news sites, but the content is completely different.

Set measurable goals, this is the biggest flaw in social media ( I thought the cookie cutter templates were?) We’re in this business to make money, don’t inflate your job. We all want to think of ourselves as creative, but who cares if you aren’t making money.

Know how each community can actually benefit you and what they can bring you. If your end goal is links, focus on social news sites like Digg and Reddit. If your goal is conversation, focus elsewhere.

Use an analytical approach, there are lots of ways to measure social media, think outside of the box.

Number 1 rule: you get what you give. You give content back to the community that they really like. If all that you’re giving a network just bothers users and doesn’t add value, you’ll get no value in return.

Speaking next was president of Milestone Internet Marketing, Benu Aggarwal. Benu spoke on the agency and customer side of managing a brand online.

You need to tell your client what is important. Explain to them why it is important to have a Facebook profile. Understand that you really need to do it all, you have to kind of do everything together to impact your universal search.

User Generated Content - Find out what are the top 3 most important UGC sites related to your client. You of course need to enhance your profile, tag the most important keyword phrases, and even add video. If your client’s customers are going there, you need to create a cross-awareness.

Video Content - How do you create relevant video content and video on the fly? It may be as simple as a Flip camera that can shoot raw and compelling video that you can upload to Youtube in 10 minutes. Make videos your customers are looking for and that have value. You can advise the video content and tags by keyword research, as you would with traditional SEO efforts.

Photo Sharing - Yes you can create a Flickr account, but go even further. One thing that works well is to upload pictures to Flickr, then create a community map, and reference your pictures across social communities you’re active in. Link to your Flickr pictures from your blog or Facebook profile. Go ahead and integrate links to your social media profiles in your local listings. Tag up every one of your properties with consistent but community-relevant tags from keyword research.

Personal Social Networks -  Devise how can you make your business profile(s) personal. Add widgets, twitter feeds, blog feeds, and be sure to join groups and associated networks, add special offers, give away white papers; these are just a small few of the potential avenues you can pursue.

Twitter (microblogging) - If you’re writing a blog post, go on twitter and tweet about it. Use tweet deck or tweet beep to alert yourself to people who you should care about and who would care about you.

*At this point Benu’s phone goes off and scares Marty Weintraub

Blog Architecture - Before you create the blog, define what your going to talk about, advise by keyword research, take a holistic approach.

Putting it all Together
Put links to your social media profiles across eachother, integrate them all. Your social media properties with higher page rank will help pull up your profiles that may not rank as well.

Don’t forget the pitch, what are you going to say, show or do that would make your customers actually want to be your friend in a social media network?

Presenting next was Marty Weintraub, president of aimClear. Marty let the crowd know that he was mainly sharing the social media stuff that “rocked the most” to him.

Twitter is to virility as SEO is to PPC - We now have access to the ultimate focus groups to prove marketing messages. I would have died for that kind of demographic research in the past.

Use PPC to SEO multivariate testing - PPC moves much faster and is precise/controllable. You’re using paid search to prove organic success in funnels, conversion, design etc. You can then turn off PPC… or not.

With linkbait, it’s all about the idea - Use twitter to find out if something is cool or sucks really fast –  it validates marketer’s instincts.  Use Twitter as massive research tool. Look back at the tweets for “Zombie Dating” for example.

Use Tweetdeck for stunning demographic filtering - You can set up searches by # (unique hashtags) or keywords and receive on-the-fly updates.

Publishing properly means more in 2009 - Inputting content to the grid by the intersection of your content management system and social media. Ideally, as soon as you hit “publish” on a blog post, it’s prewired up to automatically be pushed through your Twitter feeds, Facebook feeds etc.  You can touch millions of users quickly.

Take an inside -out content promotion strategy -  Audit your inner circle of marketing team members, identify which insiders are already active in communities. They are pushing content through Twitter and Facebook and quickly expanding this inner circle

Sock puppets are dead, long live avatars! - Pseudonyms make business sense sometimes but you should use only corporate brand ambassadors that are genuinely engaged. No fake Linkedin or Facebook “people”.  Whether you choose to use your actual name or not, always be authentic and holistic.

sock-puppets-are-dead-long-live-avatars

Our closing speaker was the affable Chris Winfield, president of 10e20.

Chris says to get back to the basics, forget most of what you hear about the “newest thing”, there are so many new things that suck, like Plurk (aimClear does not necessarily share or refute this opinion). Tune out the noise and focus on figuring out what works for you. This stuff is simple, and people try to make it more complicated than it is.

Experiment, test and try new things. Don’t get caught up in the hype of Twitter, Facebook,  Digg, whatever it is; they’re not the end all be all. Be suspect when someone says “this community is all we do to market our company.”

Balance what works for you.  Should you do a Facebook Page or a Facebook Group? Or don’t do a brand group at all; make a group about a subject that your customer would talk about.

Don’t forget about forums just because they’re not sexy; they were social media before social media (Amen man).  Bigdashboards.com is a site that ranks forums across different criteria, check it out to find a forum where your customers are.

You have to get involved, not just to give back and be a good person, but to understand what people like, what content is successful and what offers people will react to and get business for you. Never forget the end goal. Don’t use twitter just to tweet.

Large Social Sites vs Targeted Niches
Everyone hears about Digg, the bigger blogs are going on social news sites looking for content. The sad part for most people is that Digg won’t work for them. At this point the bigger blogs are really looking for “big magazine” stuff you read everyday.

One of the most powerful things to do is to find targeted niche social news sites and Chris is going to give us every single one that works! Out of over 18,000 that exist, Chris whittles the list down to about 40 useful sites. Though they may drive just 500 visitors rather than 50,000,  it’s 500 very focused visitors. Often because of the niche site’s smaller community, it’s easier to make things go hot if your content lines up.

How a Niche Social News Site Works:
Stories that go hot on these sites have likely potential to get picked up by the larger social  networks, then on to 2nd tier blogs and so on.

Chris gives away his good list of 40 sites, too many to list during the course of a presentation. Check back soon at aimClearblog to get the full list. Some sites he mentioned offhand include Tip’d, The Motley Fool, Dealigg, ThisNext, BallHype, and N4G.

Chris Winfield’s Final Tips
Test to find what works, scrap what doesn’t, be active and helpful in communities by giving back,  leverage all of the different social site, and go niche.



Es oficial: Skype para el iPhone disponible mañana

Es oficial: Skype para el iPhone disponible mañana

Es oficial: Skype para el iPhone disponible mañana

skype_3_2.png

Tal y como lo habíamos comentado el viernes, se confirma que Skype para el iPhone estará disponible a partir de mañana permitiendo hacer llamadas VoIP gratuitas a otros contactos de Skype o llamadas muy baratas a teléfonos normales por medio de la conexión Wi-Fi.

Skype para el iPhone también permitirá, supuestamente, chatear por medio de texto con otros contactos usando el teclado del teléfono, como si fuera un sistema de mensajería instantánea “normal”.

Son casi obviedades, pero no se podrán hacer llamadas de voz por medio de la conexión 3G o EDGE, no porque la velocidad no sea suficiente para mantener una calidad decente de voz, sino porque las operadoras telefónicas lloran al imaginar que los usuarios usen su red celular para comunicarse con otros a costos extremadamente menores que las llamadas normales. Una pena, pero algo es algo.

Skype para iPhone será gratuito y estará disponible en la App Store.

Vía: CrunchGear



Twitter, usos y utilidades: primer estudio cualitativo en España

Hace unos días lancé por Twitter entre mi red social de contactos dentro de Twitter la siguiente pregunta con el objetivo de llevar a cabo una primera investigación cualitativa sobre Twitter entre ususarios intensivos de esta red social: “¿Me podéis dar una o dos razones de por qué usáis twitter?” [No voy a escribir aquí la metodología porque entiendo que no es imprescindible, pero si a alguien le interesa le puedo dar la información]

Las principales conclusiones de la investigación basadas en el análisis de las respuestas recibidas sobre los usos y utilidades de Twitter entre usuarios intensivos son:

  • Twitter se presenta como una suerte de acelerador social que aporta valor como medio de identificación de otra personas con intereses comunes.


  • El proceso de identificación de iguales (peers) que aportan valor a la interacción por medio de Twitter es utilizado para la difusión de ideas, gustos, información, conocimiento, relaciones personales además de mensajes de marketing y comunicación de información comercial. Se presenta así como un medio muy eficaz de prácticas comunicativas: para conocer, compartir y reenviar información, conocimientos, opiniones, sugerencias, ideas.


  • Los argumentos aportados apuntan a la creación de valor informacional por medio de la reciprocidad y, por tanto, a que el valor de la red social real en Twitter se construye con un número limitado de personas y no por medio del valor total de la red existente en Twitter.


  • Los valores asociados a la utilidad más destacados de Twitter, además de las posibilidades de difusión y contenidos, reside en la instantaneidad, la alta disponibilidad y la simplicidad. El hecho de que se restrinja a 140 caracteres no se considera una limitación sino más un facilitador de la eficacia de los mensajes.


  • La mayor utilidad de Twitter parece residir en el proceso de selección de las personas que aportan mayor valor por medio de los contenidos generados por los usuarios en la interacción en la red social. Un proceso que necesita de cierto aprendizaje para detectar aquellos interlocutores que aportan utilidad o valor a cada persona. Por tanto, la maximización de la utilidad de Twitter necesita de un periodo de tiempo y de madurez en el uso y práctica.


  • Parece darse en Twitter una constante interinidad de los vínculos, resulta extremadamente fácil crear y disolver relaciones, ya que los vínculos son débiles y solo están basados en la afinidad, el compartir información u opiniones; haya o no mediado vínculo o interacción. En lugar de presentarse esa interinidad como una debilidad para los usuarios de Twitter se trataría, por contra, de una fortaleza y ventaja de flexibilidad y libertad de elección.


  • Twitter aporta también el valor de ser un canal de ocio, conocimiento de novedades, convocatoria de eventos, de simple diversión sin más; y en términos generales, como modo online de socialización y ampliación de la red de conocidos; y, en definitiva, como forma de participar en una conversación general.


  • Otro uso declarado de Twitter apuntaría a una forma de autopromoción personal: poniendo al alcance de otros el blog personal, site de empresa, comunicación promocional, wikis, etc.


  • En alguna medida se apunta que el uso de Twitter puede estar canibalizando el consumo de medios de comunicación, debido a la rapidez y una mayor eficacia. Twitter parece aportar la ventaja de ser un canal de acceso y consulta múltiple a información formal e informal filtrada por otras personas a las que se les asigna credibilidad y relevancia para ese filtrado previo.


  • Sin duda, Twitter se presenta también como una herramienta de expresión personal e identitaria, de pensamientos rápidos (limitados a la extensión de los 140 caracteres), de formulación de denuncias, de opiniones sobre temas sociales, donde vuelve a aparecer la eficacia sin distracciones innecesarias por la limitación expresiva.


    ¿Cuál es tu experiencia y opinión sobre Twitter?
    Envía o comenta tu experiencia "feel free" de escribir tu comentario. Gracias.

Spotify y la industria musical


El éxito de expansión y adopción de Spotify entre miles de usuarios de cualquier parte del mundo demuestra que los principales players del mercado musical ni escuchan ni han comprendido a los consumidores, unificando la anécdota con la categoría: no somos descargadores de música.

Descargar música en las redes P2P no es el objetivo, ni tan siquiera tener la música en propiedad sino poder acceder de forma rápida y a un coste (precio) razonable a partir de la satisfacción de una elección musical absolutamente personal y privada.


¿Cuál es tu experiencia y opinión sobre Soptify?
Envía o comenta tu experiencia "feel free" de escribir tu comentario. Gracias.


PD: Veremos cuanto tardan en cargarse Spotify. Aún tengo invitaciones para acceder a Spotify si alguien necesita una, solo email me.

sábado, marzo 28

Twitter y las marcas

Why Brands ABSOLUTELY DO Belong on Twitter

Lon S. Cohen is a writer and social media strategist. He is @obilon on Twitter (Twitter reviews).

A Mashable (Mashable reviews) article by Dr. Mark Drapeau was passed around on Twitter this Friday, calling for a ban on brands on Twitter. I respectfully disagree.

1. Twitter is Opt-In

twitterfollow2Drapeau said that Twitter was for people to talk to people and not brands to project their message. Particularly distasteful to Drapeau was a humanless brand dumping useless information or worse, some SEO company marketing in a company’s Twitter account.

Fundamentally, I agree with what Drapeau says about the spammy Twitter accounts that are used just to get one more silly site link out there by an SEO company or brands that totally misunderstand and therefore misuse Twitter. It undermines Twitter’s usefulness in a small part. But since, as the author himself points out, Twitter is an opt-in service (meaning I can follow who I want and not follow advertisers) the impact is minimal.

2. Twitter is the New Phone Company

twitterphoneThe debate did not rage on Twitter so much a simmer, mostly with brands themselves coming to their own defense. This is by no means the only debate out there on the usefulness of Twitter either as a form of communication or as a marketing tool. Many purists will probably cringe to hear me mention Twitter as a “marketing tool” and I sympathize with them.

Look, I am a centrist. Sorry to sound so wishy-washy about it but I believe that there is room for both brands and for person-to-person communication on Twitter. In fact, that is what I would argue the thing that makes Twitter so great. I believe it was Chris Brogan who recently Tweeted that he follows so many people because he thinks of Twitter as the new phone company. It is certainly a useful utility that might even grow up to be even too useful and powerful to ever be meaningfully monetized. Not that it can’t happen but Twitter has become such an extremely dynamic form of communication that it may transcend that simplistic, “where is your business model” mentality.

3. Brands Can Have Personalities Too

breakfastclubLike snowflakes, no two Tweeple (as some call Twitter users) are alike. It’s like a geek version of the Breakfast Club: there’s the shy lurker follower that follows everyone but rarely Tweets. The social butterfly who just @ replies to everyone all day. The loudmouthed soapboxer who just likes to talk about what is best for other people. The intellectual sharer who provides useful links and retweets. The big mouth that just goes around starting trouble with random Tweeple. Or the egoist Twitterer who can only talk about themselves or their newest, greatest vidcast.

In the end, we follow who we follow for our own reasons. On the TWIT podcast someone said that we shape our own stream on Twitter. Nothing could be truer. My personal strategy is to keep the people I follow to around 100 people or under. For that, I must be selective. I have people I just like. People who are big time influencers. Others are loudmouths that entertain me with their Tweets. And others whom I respect their intellectualism. In there are some brands. I actually have a lot of respect for people who Tweet under a brand. Brands can have personalities too.


Twitter Tips for Brands in 140 Characters

twitterbrands

So what’s a brand to do? My tips, in 140 characters or less:

Brands have to be more than just faceless organizations online. They need to offer value added content about their brand/industry/sector.

I hope that we contribute to the Twitter conversation by bringing news and info not only about our cause but related topics as well.

Each brand can represent more than its product or service. It represents a whole industry and related content attached to that industry.

You don’t have to talk about your competitors but you should talk about what your customers come to you for.


Creative Ideas for Brands on Twitter

I would also add that a brand has to use every marketing tool according to the players already in the game. Don’t come to Twitter as a new brand and expect people to follow you just because you are well known. You need to offer more. I believe every brand can offer more, especially on Twitter because of the nature of the conversations that go on, not in spite of it. I would love to hear from some of my beloved brands like Coca Cola, Proctor & Gamble, Sam Adams, and on and on.

P&G

pgP&G can tell about the old days of sponsoring Soap Operas and how that went. Or they could talk about some of the staple food items in a historical context. I was fascinated by a show on TV once that traced the history of ketchup of all things. Can’t P&G or Heinz give me that for free on twitter? Intersperse Tweets with links and Twitpics and blog posts that craft a whole story. Speaking of craft, Kraft could talk about cheese all day long and you CAN make it interesting.

Coca Cola

cocacolasmallCoca Cola has millions of ways to go with this, from showing old ads, to trivia to history and answering questions about the product. I see many ways that staple brands—ones that people would think would be boring online—can be exciting. Not all brands need to reinvent the wheel with their own Social Networking sites. Some of the best tools like Twitter are out there for free to let people know all this great stuff about you.

Sam Adams

samadamsGetting back to it, Sam Adams or any other wine or beer company can hire a great writer to craft a campaign where they tell the story of their brand across multiple platforms over time. Twitter is a great place to start. I love beer. I can think of many ways of using Twitter alone to really engage an audience online just with Twitter, a blog, an RSS feed and a few well-placed Social Networking groups. The brand brings the recognition and the power to Twitter, not the other way around. Brands need to learn to use it wisely by supplying people (Tweeple are people too, you know) with content that engages and informs.


Brands on Twitter? Absolutely Yes!

twitter-logoIt’s funny but every time a new technology comes around like Twitter, people scrabble to figure out how their brand can market to it. But, in reality the people who are using it every day already know how. In a new media space, new media rules still apply. What I mean is don’t revert back to the tried and true methods to market whenever a new media technology comes along. Brands should watch the space and learn how others are effectively using it on a personal level and then just play along. It is quote simply, watch and learn.

The group will take care of the spammers and insincere brands on Twitter. Nobody will follow them back. They will get reported. They will be ridiculed into submission, eventually. There is no reason to call for a wholesale ban of brands on Twitter. I for one, want to hear what they have to say.



The rise and rise of Twitter

The rise and rise of Twitter | Technology | guardian.co.uk

The rise and rise of Twitter

Could Twitter really become 'the consciousness of the planet', or is it merely 'this year's Facebook'?

In November 2008 a total of 40 articles appeared in British local and national newspapers that included the word "Twitter". Though a quarter of them were published by the Guardian, this paper's technology correspondent nonetheless found himself explaining to general readers that "Twitter, a mobile social network, has generated lots of buzz". The Daily Telegraph, quaintly, was still using the word to describe a way of talking.

The following month, 85 articles appeared on the subject. By January 2009, it was 206. But those were still the dark ages. Hot on the heels of the Twitter plane crash came the site's first live action celebrity lift catastrophe, when the actor Stephen Fry, a tweeter so prolific that one hopes he still eats, offered breathless updates from the stationary elevator in which he briefly found himself marooned. (His followers total is now 350,000).

Before long the mainstream media had spotted that countless celebrities were taking time and effort on Twitter to reveal themselves every bit as witless as their PRs took time and effort to conceal. Jennifer Aniston dumped her boyfriend over his Twitter habit. Ashton Kucher posted a picture of Demi Moore's bum. Madonna announced on Twitter she was no longer dating a man called Jesus who was young enough to be her son. (Though signs are now emerging that even this is too arduous for the celebrities themselves to undertake without assistance.)

But it was, perhaps, the Guardian's revelation this week of government plans to teach young children about new media tools, including Twitter and Wikipedia, that tipped the site into the mainstream consciousness. "Over the centuries, mankind has developed thousands of ways to communicate eternal truths," the shadow education minister Michael Gove wrote in the Daily Mail. "The complex interplay of voice and orchestra in classical opera gave full rein to Mozart's genius. The delicate rhyme scheme of the 14-line sonnet, in Shakespeare's hands, produced some of the most sublime poetry ever written ... [But] instead of teaching our children the glories of the past, or introducing them to the best that has been thought and written, ministers want our children to 'Twitter'."

Gove, who helpfully described the service as "a new form of texting", appears not to have noticed that the Daily Mail itself is now an enthusiastic Twitterer, using the site to offer an automatic running feed of headlines and links. "Attorney General orders unprecedented Met police probe into Guantánamo prisoner's allegations of torture against," it offered today - those 140 characters are a challenge. The Conservative party is also, as it happens, sending its 4,600 followers a link to Gove's article this morning, in case any of them need a Twitter reminder of what Twitter is.

In an act to rival in tastefulness the US paper which offered minute-by-minute updates of a three-year-old's funeral, Sky News recently Tweeted from inside the court in which Josef Fritzl was being tried for murder, rape and incest:

9:29 juliareid21: [Austrian TV] now have a shot of the back of his head - thinning white grey hair but he won't speak

9:29 juliareid21: huge moment for ORF reporter. Watch this live on Sky

9:31 juliareid21: Interviewed LIVE in a courtroom before a trial! Imagine!

The network has also appointed a Twitter correspondent, a position one hopes will last longer than Reuters's dedicated bureau in Second Life.

So is the Daily Mail's Twitter feed the equivalent of your dad dancing in public to your favourite nu-acid-crunk band? Does a government decision to "teach Twitter" represent the site's ultimate shark-jump into banal unfashionability?

"All new technologies hit this point," says Mike Butcher, editor of the new media blog Tech Crunch Europe, who has been using Twitter for almost three years. "You always have these old crusties who have been on it for a while, and then a generation of 'newbies' turn up as if it's something they have just invented." The scale of the exponential boom in Twitter's popularity, however, is "really unusual", he says. Far from killing off the site's popularity among early adopters, he argues that "the power of any network grows exponentially as the number of people using it grows." A world in which many more people are tweeting, and those tweets are fully searchable, would potentially allow a real-time search facility of "the consciousness of the planet".

In the meantime, though, we will have to be satisfied with the knowledge that Andy Murray is having steak for his tea, North Lincolnshire council has been presented with Member Development Charter status, and one of Jonathan Ross's dogs threw up last night.

Meanwhile, so far in March, 684 articles have been published about Twitter. Truly it is, as the Bristol Evening Post put it in a helpful explanatory note to readers, "this year's Facebook".



When Famed Twitter Friend Proves Faux

That Famous Twitter Feed Could Be a Lot of Baloney - washingtonpost.com
When Famed Twitter Friend Proves Faux
Behind Some Celeb Feeds Lie Only Tweet Nothings


"I spoke to a lovely reporter today," wrote cwalken on his (or her) Twitter account this week. "I don't know if she was really who she said she was but that's fine. I secretly used an ironic tone."

Sounds about right. But does anybody know who anybody really is anymore?

The popular cwalken Twitter feed, stocked with oddball observations that seem as if they could've popped out of the mouth of actor Christopher Walken, is read by more than 90,000 users. It is not, reportedly, written by Walken -- though his picture is parked atop the page. (Late yesterday afternoon, the page appeared with a notice that the account has been "suspended due to strange activity.")

Things have gotten a little confusing for fans. Thanks to the democratizing powers of the Web and the rapid rise in popularity of Twitter, the very famous and the only slightly famous are finding themselves with virtual doppelgangers.

Already, a Web site has been launched to try and resolve such important questions of online celebrity identity. The U.K.-based Valebrity.com seeks to verify that the famous folks you're following online really are who they say they are.

"Nobody knows who's who on these social networking sites," said Valebrity's founder, Steven Livingstone. "Even the celebrities themselves are coming to us now and saying, 'Is this one real?' "

Livingstone's site identifies personalities like Ashton Kutcher and Ryan Seacrest on its list of real Twitter users, but for many Twitter users, authenticity may be beside the point. A few weeks ago, a Twitter feed supposedly belonging to "30 Rock" star Tina Fey was identified as fake. At the time, the faux Fey's feed had 50,000 readers. Today, it has more than 200,000.

Typically, social networking sites pull down fake accounts if there are complaints or if the site suspects fraud. But sometimes that can backfire: Facebook temporarily deleted actress Lindsay Lohan's page in December, under the impression that it was bogus. The move became news after the actress complained in a letter posted to her MySpace page.
ad_icon

Ronald R. Snider, an Alexandria lawyer who sometimes handles copyright issues, said that the matter is "uncharted territory" from a legal standpoint. "As far as whether it's legal or not, that's a big issue," he said.

But Snider said he would be disinclined to pursue a case against such Internet impostors. "People like this are assured to be judgment-proof," he said. "They don't have any money."

You don't even have to be all that famous to attract an impersonator, it seems. Livingstone said most people assume -- wrongly -- that people want to impersonate globally famous celebrities. But he spends just as much time trying to verify the online identities of tastemakers who are experts in their field but aren't household names.

A Twitter feed supposedly run by political consultant Frank Luntz scored 2,000 followers before the joke, or whatever it was, was revealed earlier this month. That feed, which was written by one of Luntz's former employees, has since been taken down.

Washington Post art critic Blake Gopnik recently attracted a Twitter impostor of his own. As with the fake Luntz feed, the impostor generally posted non-malicious comments that likely seemed plausible to the casual observer. But after the fake Gopnik posted a dismissive comment about a museum, the real Gopnik received some snarky remarks on an art blog at the Seattle Post-Intelligencer. The fake Twitter feed has been removed.

Not surprising, said Livingstone. "When it comes to the more niche markets, you'd think, 'Why would anybody bother?' But if you have 1,500 people following you and you're in a niche market, those people are all focused on what you're going to say. The people who are in it are much more likely to do something if you tell them to. They'll act on your every word."

What does Twitter make of this identity confusion?

"Doesn't happen too often," Twitter co-founder Biz Stone wrote in an e-mail that was short enough to be a Twitter post. "Impersonation is against our terms."

Christopher Walken, the real one, could not be reached for comment.



Online Attacks Are Very Real Crises

Communicating Through a Crisis: Online Attacks Are Very Real Crises

Tuesday, March 10, 2009

Online Attacks Are Very Real Crises

Larry Smith, president of the Institute for Crisis Management, sent me an e-mail that made me chuckle. “I have been following the internal crisis upon crisis at Holland & Knight for a couple of years and they keep shooting themselves in the feet...and now there is an 'enemy' that has ties inside the giant law firm and knows how to use digital media to launch almost daily attacks that even the Taliban would envy.” Holland and Knight is a worldwide law firm based in Florida.

Larry linked me to Kara Smith’s blog (http://blog.karasmamedia.com/2009/01/legal-firms-dont-allow-outside-parties.html). She wrote, “Since the end of December, @hklaw has been sending out Twitter Tweets at the rate of approximately 20 per day. The tweets contain links that lead readers back to (HKLaw Investigator) whose profile lists six other sites, each describing its contents as ‘Information, Articles and Complaints involving Holland & Knight Attorneys.’… The problem is, none of the blogs or the @hklaw Twitter page belong to the Holland & Knight law firm, whose URL is http://www.hklaw.com/.”

A knock-off URL! Apparently, some savvy insider has decided to get even over some real or imagined hurt. The @hklaw site links to articles such as Five lawyers leave Holland & Knight in part because of conflicts of interest and The Disappearing Associate... Valentine’s Day Massacre: Holland & Knight fired 70 lawyers and 173 staff.

“This blog and the many links in it,” Larry concluded in his note to me, “tell the story about why every law practice needs its own ‘crisis communication plan’ as well as crisis communications counsel on call and ready to help clients, as well.”

Who could carry a grudge? In Chicago several years ago, H&K defended the billing practices of a partner, Edward Ryan, who was alleged, with other lawyers, to have overcharged a client by 450 hours. H&K said no one did anything wrong. But just last November, “the Illinois Attorney Registration and Disciplinary Commission decided to file charges against Ryan, accusing him of falsifying time on client invoices.” Guess what? Ryan left the firm last year. (http://blogs.wsj.com/law/2008/12/22/former-holland-knight-partner-accused-of-the-perfect-crime/)

In 2005, nine female lawyers accused another partner, Arthur Wright, of sexual harassment. Wright received a private reprimand – and a promotion the following March. The women blew the whistle to reporters, and Wright later “voluntarily” returned to his old job. (http://www.law.com/jsp/article.jsp?id=1112618116450)

(Interestingly, H&K has a page on its web site on How to Conduct a Sexual Harassment Investigation, written by a female in the firm’s Litigation Section. Do as I say, not as I do, I guess. (http://www.hklaw.com/content/whitepapers/RoyalSection.pdf)

If I searched longer, I'll bet I could find many more people in a company this size who have plenty of axes to grind. The moral is: if someone in your organization messes up, investigate in good faith and take the proper corrective action, even if it’s your favorite brother-in-law. Our stakeholders deserve and expect us to do the right thing. Sometimes ethics are a shade of gray. But usually we know what to do for the best of the organization, and that’s to prevent smoldering crises from bubbling into embarrassing and expensive crises.


El video del fin de semana

viernes, marzo 27

Web 2.0 is a buzzword

Key differences between Web 1.0 and Web 2.0 by Graham Cormode and Balachander Krishnamurthy



Abstract
Web 2.0 is a buzzword introduced in 2003–04 which is commonly used to encompass various novel phenomena on the World Wide Web. Although largely a marketing term, some of the key attributes associated with Web 2.0 include the growth of social networks, bi–directional communication, various ‘glue’ technologies, and significant diversity in content types. We are not aware of a technical comparison between Web 1.0 and 2.0. While most of Web 2.0 runs on the same substrate as 1.0, there are some key differences. We capture those differences and their implications for technical work in this paper. Our goal is to identify the primary differences leading to the properties of interest in 2.0 to be characterized. We identify novel challenges due to the different structures of Web 2.0 sites, richer methods of user interaction, new technologies, and fundamentally different philosophy. Although a significant amount of past work can be reapplied, some critical thinking is needed for the networking community to analyze the challenges of this new and rapidly evolving environment.

Contents

1. Introduction
2. What is Web 2.0?
3. Analysis issues
4. Web 2.0 substrate and enabling technologies
5. Measurement issues
6. Technical and external issues
7. Summary of metrics of interest
8. Beyond Web 2.0

 


 

1. Introduction

“Web 2.0” captures a combination of innovations on the Web in recent years. A precise definition is elusive and many sites are hard to categorize with the binary label “Web 1.0” or “Web 2.0.” But there is a clear separation between a set of highly popular Web 2.0 sites such as Facebook and YouTube, and the “old Web.” These separations are visible when projected onto a variety of axes, such as technological (scripting and presentation technologies used to render the site and allow user interaction); structural (purpose and layout of the site); and sociological (notions of friends and groups).

These shifts collectively have implications for researchers seeking to model, measure, and predict aspects of these sites. Some methodologies which have grown up around the Web no longer apply here. We briefly describe the world of Web 2.0 and enumerate the key differences and new questions to be addressed. We discuss specific problems for the networking research community to tackle. We also try to extrapolate the current trends and predict future directions. Our intended audience consists of technical readers familiar with some of the basic properties of the Web and its measurement, and who seek to understand the new challenges presented by recent shifts in Web technology and philosophy.

At the outset we need to distinguish between the concepts of Web 2.0 and social networks. Web 2.0 is both a platform on which innovative technologies have been built and a space where users are treated as first class objects. The platform sense consist of various new technologies (mashups, AJAX, user comments) on which a variety of popular social networks, such as Facebook, MySpace, etc. have been built (we adopt the convention of referring to sites by name when their URL can be formed by appending .com to the name). Inter alia, in all these social networks participants are as important as the content they upload and share with others.

However, the essential difference between Web 1.0 and Web 2.0 is that content creators were few in Web 1.0 with the vast majority of users simply acting as consumers of content, while any participant can be a content creator in Web 2.0 and numerous technological aids have been created to maximize the potential for content creation. The democratic nature of Web 2.0 is exemplified by creations of large number of niche groups (collections of friends) who can exchange content of any kind (text, audio, video) and tag, comment, and link to both intra–group and extra–group “pages.” A popular innovation in Web 2.0 is “mashups,” which combine or render content in novel forms. For example, street addresses present in a classified advertisement database are linked with a map Web site to visualize the locations. Such cross–site linkage captures the generic concept of creating additional links between records of any semi–structured database with another database.

There is a significant shift in Internet traffic as a result of a dramatic increase in the usage of Web 2.0 sites. Most of the nearly half a billion users of online social networks continue to use Web 1.0 sites. However, there is an increasing trend in trying to fence social network user traffic to stay within the hosting sites. Intra–social network communication traffic (instant messages, e–mail, writing on shared boards etc.) stay entirely within the network and this has significant impact on the ability to measure such traffic from without. There is also a potential for “balkanization” of users as the key reason to join a particular online social network is the presence of one’s friends. If a subset of friends are not present there then communication across social networks would be needed but currently this is not a feature. Such balkanization impacts future applications such as search engines that could span social networks.

We do not study social aspects of how users’ interaction with each other in real life might change as a result of online social networks. Nor do we speculate on the lifetimes of some of the currently popular Web 2.0 applications; for example, the constant stream of short messages that are sent to interested participants detailing minutiae of daily life. Instead we concentrate on technical issues and how work done earlier in Web 1.0 can benefit the ongoing work in Web 2.0. At least one important aspect — user privacy — is left for future analysis.

Contributions

The contributions of this paper are as follows:

  • We describe the tell–tale features of Web 2.0 and highlight the broad differences between Web 2.0 and Web 1.0. We illustrate this with a detailed case analysis, where we evaluate a number of Web sites and show which of the observable features they exhibit that make them either Web 1.0 or Web 2.0.
  • We describe issues of structure of Web 2.0 sites, which tend to resemble social networks more than the hierarchical model of Web 1.0. We pose challenges of connecting users across multiple sites, and measuring the impact and scope of group membership. We identify site features that lead to ‘stickiness,’ and formulate problems of measuring this adhesion. We discuss connections across sites, in the form of ‘para’sites which provide additional functionality for specific hosts, and through embeddings and mashups.
  • We identify new problems of measurement in Web 2.0, specifically related to the new models of interaction given by: clicking, connecting, commenting, and content creation. Each of these requires new techniques to measure. We also describe the challenging of crawling and scraping Web 2.0, and to build tools and new techniques to help this data collection.
  • We cover technical issues such as performance and latency, and the prospect of flash crowds in Web 2.0, not just in the traffic flood sense, but also as floods of comments and links. The user–created content common to Web 2.0 creates new distributions of access patterns, leading to re–evaluations of the value of object caching.
  • We conclude by looking beyond Web 2.0 to connections to P2P, and examine future trends.

 

++++++++++

2. What is Web 2.0?

“Web 2.0” is a term that is used to denote several different concepts: Web sites based on a particular set of technologies such as AJAX; Web sites which incorporate a strong social component, involving user profiles, friend links; Web sites which encourage user–generated content in the form of text, video, and photo postings along with comments, tags, and ratings; or just Web sites that have gained popularity in recent years and are subject to fevered speculations about valuations and IPO prospects. Nevertheless, these various categories have significant intersections, and so it is meaningful to talk broadly about the class of Web 2.0 sites without excessive ambiguity about which definition is being used (from now on, we use Web2 and Web1 respectively for brevity).

Deciding whether a given site is considered Web2 or Web1 can be a difficult proposition. This is not least because sites are dynamic, rolling out new features or entire redesigns at will, without the active participation of their users. In particular, there is no explicit version number and active upgrade process as there is with a piece of software or a communication protocol, and many sites are referred to as being in “permanent beta.” Some sites are easy to classify [1]: social networking sites such as Facebook and MySpace are often held up as prototypical examples of Web2, primarily due to their social networking aspects which include the user as a first–class object, but also due to their use of new user interface technologies (Facebook in particular). Other sites are resolutely Web1 in their approach: Craigslist, for example, emulates an e–mail list server, and has no public user profiles, or fancy dynamic pages.

Many sites are hard to categorize strictly as Web1 or Web2. For example, Amazon.com launched in the mid–1990s and has gradually added features over time. The principal content (product descriptions) is curated rather than user–created, but much of the value is added by reviews and ratings submitted by users. Profiles of users do exist, but social features such as friend links, although present, are not widely adopted. Each product has a wiki page associated with it, but these are little used. Other sites also contain a mixture of the old and the new; we focus our discussion on the new aspects.

Another heuristic to aid distinguishing Web2 and Web1 can be based on time: the term “Web 2.0” was coined around 2004, and many of the first truly Web2 sites began emerging in late 2003 and early 2004. So sites which have changed little in structure since the early 2000s or before may safely be considered Web1 (such as IMDB). A definition of Web2 by O’Reilly (2005) emphasizes viewing the Web as a platform. It is fair to say that many of the ideas that are now called Web2 were seen in earlier forms in the efforts of AOL and GeoCities. While AOL brought the Internet to the masses, it also emphasized the notion of contained communities within which people could interact. GeoCities initially operated with an enforced metaphor of ‘neighborhoods.’ These are precursors of current notions of groups and communities finding new and larger audiences in Web2. However, most Web2 sites differ by more forcefully making the user a first class object in their systems, and by employing new technology to make interaction easier for the user.

Some of the important site features that mark out a Web2 site include the following:

  • Users as first class entities in the system, with prominent profile pages, including such features as: age, sex, location, testimonials, or comments about the user by other users.
  • The ability to form connections between users, via links to other users who are “friends,” membership in “groups” of various kinds, and subscriptions or RSS feeds of “updates” from other users.
  • The ability to post content in many forms: photos, videos, blogs, comments and ratings on other users’ content, tagging of own or others’ content, and some ability to control privacy and sharing.
  • Other more technical features, including a public API to allow third–party enhancements and “mash–ups,” and embedding of various rich content types (e.g., Flash videos), and communication with other users through internal e–mail or IM systems.

Some additional explanation is required. Testimonials are comments from other users posted directly on a user’s profile. These can be general approbation (as in Flickr), or more for chatting in public (Facebook’s “wall”). These are common in Web2 but missing in the less user–centric Web1. Other data can often be added on the user’s profile page: in Web2 this is information such as job, favorite music, education, etc., whereas in Web1 this is more often contact details (e–mail addresses). Our category of subscriptions means the ability to “subscribe” to a feed of news or updates from select other users; this is handled internally, in contrast to RSS feeds which are publicly visible. Some sites offer many RSS feeds, per–user/group, whereas others like Slashdot only have feeds for a handful of broad categories. In contrast to this public sharing of information, ‘Friends only’ means that ability to make some or all information visible only to “friend” users. One can quickly verify that a site such as Facebook provides many of the above features, whereas Craigslist provides few, with many being inapplicable.

On the technical side, some of the common presentation technologies associated with Web2 sites include AJAX (autonomous Javascript and XML), in particular use of XMLHttpRequest to dynamically update a page without explicit reload actions; embedded flash objects e.g., to play music or videos without additional browser plug–ins. Likewise, we have not discussed issues like the ability to “remix” or mash–up content, embed, reference or annotate; we will mention all these issues in later sections.

Analysis of popular sites

A set of examples are analyzed in Table 1, based on the above list of site features. The first five listed (Facebook, YouTube, Flickr, LiveJournal and MySpace) are Web 2.0 sites, while Slashdot and Craigslist are Web 1.0. Amazon, Digg, eBay and Friendster fall in between. These assignments are debatable: Friendster seems to have many of the ‘social’ features in common with Facebook, but we consider it ‘Web 1.5’ since it fails to offer sufficient ways for users to interact with the content.

 

Table 1: Features of some popular Web sites.
Note: + Only content creators are allowed to assign tags.
Feature classFeatureFacebookYouTubeFlickrLiveJournalMySpaceDiggFriendsterAmazoneBayCraiglistSlashdot
 
Profile detailsAge      
Location   
Gender        
Testimonials     
Other data   
 
ConnectivityFriends  
Subscriptions       
Groups      
 
ContentMain contentprofilesvideosphotosblogsprofiles
blogs, video
linksprofilesproductsproductsadsarticles
Other contentphotos  (photos)photos photosphotosphotos  
Tagging++     
Friends only       
Comments  
Editable content          
Rateable content     
Viewing statistics       
 
TechnicalPublic API  
Embedding allowed        
Many RSS feeds     
Private messages     

 

Although much touted, features such as the ability to collaboratively edit content (i.e., Wikis) are insignificant amongst the sites considered here. The notion of “tagging” is widely discussed, but only Flickr and, to a lesser degree, Facebook and Amazon, support tagging of other people’s content; in other cases, tagging is limited to the content creator assigning tags to their content. Assigning ratings to content, or seeing statistics such as number of views is also far from ubiquitous. “Social” features, such as identifying friends, are integral to many of the Web2 sites. These are present on other sites, but less prevalent: Amazon does have friends, but this feature seems little used; Slashdot also has friends, primarily to adjust the importance of comments submitted by friends. These sites are quite functional if no friends are listed. In contrast, Facebook requires the user to add friends in order to to access most of its functionality.

 

++++++++++

3. Analysis issues

We now examine the various analytical properties of interest in Web 2.0 and contrast them against the properties that have been studied extensively in Web 1.0. These properties deal with how the Web2 sites interact with individual users. Some are relatively new and do not have a Web1 counterpart but many do, and we can compare the methodologies used to study them earlier in Web1.

3.1. Site structure

Studies of Web1 highlighted a distinctive ‘bow–tie’ structure (Broder, et al., 2000), with three distinct pieces of a massive connected component. Individual sites typically adopted an approximately hierarchical structure, with a front page leading to various subpages, augmented by cross–links and search functions. Web2 sites are often more akin to real–world social networks (Milgram, 1967), which show somewhat different structures, due in part to implicit bi–directionality of links. There are some tattered remains of a bow–tie still visible (Kumar, et al., 2006). Studying a Web2 site in detail can be inherently harder than studying the Web1 ecosystem, since it requires crawling deep inside the particular Web2 site. Some sites enforce a very user–centric view of the site, meaning that each account can only see detailed information about explicit ‘friends’ (see Facebook and other examples detailed in Table 1), in comparison to Web1 which is typically stateless. In particular, the trend is towards an increasingly customized ‘front page’ so that no two users have the same experience. In the Web1 case the crawling could be done externally without a login using a generic crawler. Increased use of a variety of server–side and browser–side technologies, in particular Javascript, can give further challenges for crawling Web2 sites.

The nature of a ‘page’ in a Web2 site is different from a Web1 site and the rate of change is likely to be significantly different due to increased interactive features. In commercial Web1 sites, content is centrally updated at somewhat predictable intervals. Individual users edit Web1 sites at differing frequencies. An early study (Douglis, et al., 1997) showed that many resources were not modified while some were modified quite frequently. There was a direct correlation between the popularity of a site and its rate of change: popular sites tend to change frequently. In Web2, with a lot of user generated content, it is not uncommon to have small incremental additions to the site. The changes do not have to be done by the content ‘owner’–friends can write comments (e.g., on their Facebook ‘wall’) which would constitute a change. A page is more a shared space in Web2 while in Web1 it is often a single–user writing medium.

Web2 often involves dynamically generated pages from multiple sources of information. It is thus harder to come up with a clean definition of a resource and determine when the resource has changed. This has implications on how often contents in Web2 need to be re–examined, how frequently could contents be fetched by a crawler, as well as implications on any caching (examined further in Section 6.3). A Web2 site is live in the sense that it can be updated while a user is examining it. The content within a Web2 page is a broader mixture of audio, video, text, and images, compared to typical Web1 site. The content types have additional implication on the rate of change but are probably similar to Web1. A more frequently changing entity on a Web2 page might be links to friends etc. which does not have a Web1 counterpart.

Recent work is starting to study the underlying “graph” structure of the social networks embedded in Web2 sites. So far, these typically look at a single site and measure properties such as degree distribution, clustering coefficient, connected components and so on. Initial work has plotted the degree distributions of various social networking sites, fitted them to power laws, and explored other properties of the induced graphs (Mogul, et al., 1997). There will be interesting differences between sampling needed to compare within Web2 sites as opposed to the link structure in Web1 sites. It is a given that there will be many Web2 sites that have intra–linkage but blogs (Web 1.5?) don’t fall in this category. Prior analysis shows that there are a lot of links to other blogs and non–blog sites (Bhagat, et al., 2007).

Issues

This leads to many questions about how individuals use Web2 sites which simply do not arise in the Web1 world. For example, do users “live” on one site, or are they spread across multiple sites? This is hard to quantify from the researcher’s perspective, since they do not have access to logs from any of the sites. Site owners may only be able to find some information via third–party aggregators (e.g., outsourced advertisers can match up user visits). But there is likely to be very little outsourced from a Web2 site and thus third–party aggregation may not have much resonance. This is a key difference between Web1 and Web2. Instead, one can look for explicit application level indicators of the same user across multiple sites — instances of the same username, profile links between sites, syndication of content (e.g., Flickr streams) and so on.

Other questions that arise include:

  • Can we match individual users across multiple sites (and hence learn more of their attributes)? This is a more challenging exercise requiring more machine–learning techniques, bringing with it a higher potential for false–positives.
  • Does the same user on different sites show the same behavior/connectivity? This seems much harder than simply determining if it is the “same” user.
  • Given that a user is on one site, what is the probability that they are in another (affinity)? On Web1 visitors who visited cnn.com may visit nytimes.com but a user may typically spend time examining photos regularly on a single photo site on Web2. But the amount of time spent on Web2 sites may differ as the reasons are more social while content is the focus in Web1 sites.

3.2. Advanced structure

In Web1, all links and pages can be treated essentially equally, whereas trying to understand a Web2 site in detail requires looking at different link types (friend links, navigation links, etc.) and page types (user pages, content pages, etc.), which are rarely explicitly marked as such in a machine–readable fashion.

Other structures are often present on Web2 sites beyond generalized links, such as groups, subscriptions to feeds or message streams. These generalize and enrich features offered by Web1 sites which were originally just glorified e–mailing lists (egroups/Yahoo! groups), and make them more integrated parts of a Web2 site. The importance of such features is still to be determined. Many Web2 sites have no notion of groups. A common way in which the group feature is used is simply to make additional statements about the individual user — people join groups to express views in support of or against politicians, site features, movies, activities, etc. [2]

Issues

Similar to links, many natural measurement questions arise:

  • How widely used are groups, and how active are they?
  • What is the distribution of group membership (power laws, size distribution, membership distribution)? What is the distribution of duration of membership (joining to leaving)?
  • How important are groups, subscriptions etc. in engaging users?

Such questions do not seem to have arisen or attracted much study in Web1 (although there is some analysis of usenet groups [Smith, 1999]). Yet these questions affect our understanding of how to provision and serve Web2 sites.

3.3. Site mechanisms and incentives

A key difference in Web2 is that many sites encourage users to spend as much time as possible on their site. There are strong incentives for increasing such stickiness: opportunities for higher advertising revenue. Further, the more the users interact with the site, the more can be learnt about their interests and habits.

In Web1 most sites have links to external sites and users may easily follow links to other sites. The main reason for this is that most Web1 sites tend to cover a single topic and do not require users to log in to access them. Web2 sites promote intra–site activities, often requiring users to log in and build links to others on the site. When users have logged in, sites can more easily track individual’s browsing habits, and serve up personalized content. Users are encouraged to create an account in order to more fully engage with the site — some sites require accounts to post comments, others require accounts before any content is visible. Navigation links are often directed solely within the site, and where user content is allowed, external links may be made difficult or impossible to add. The mix of content in a Web2 site is typically more diverse than a Web1 site, reflecting the mix of interests of their user base, and increasing the probability of users to stick around the site. Web1 sites that do not allow user participation in a visible manner can only compete on the basis of content. User generated content even in the form of comments have been rare until recently on Web1 sites.

Explicit attempts to create stickiness for a Web2 site lead to ‘portalization’: trying to build every possible feature into the site, where once the user signs in, they never need to leave. This echoes the attempt of Web1 sites to become portals, with many features (news, weather, sports) accessible from a single front page. However, Web2 instead relies mostly on its users to bring content. Web2 examples of this trend include MySpace, which now provides hosting for users’ photos and videos, and has intermittently blocked external content from being included on MySpace pages; and the opening of the Facebook API, which allows many features to be added to users pages, all within the Facebook domain.

Such portalization leads to a large amount of duplication of features: almost every Web2 site gives its users an ‘inbox’, essentially creating an internal e–mail system which recalls the pre–Internet world of many non–interoperable local e–mail systems. In order to ensure that users see their messages, often a (standard) e–mail alert is sent to the registered e–mail address of the user alerting them to the fact that a new message has arrived in their (Web2) inbox: Table 1 shows that this is common in most Web2 sites. Other sites are creating their own parallel Instant Messaging networks, allowing pairs of online users of the site to chat through their browsers, etc.

Issues

The following questions may need some careful modeling or innovative measurement studies to address:

  • Will portalization efforts succeed? What is the equilibrium state as new ideas emerge? How many inboxes can one person cope with? Will the rate of active interactions with their inboxes vary across different social networks?
  • What are the various technologies that can go inside the portal (IM, VoIP, P2P)? In other words, will all/most of a member’s interaction with others be done through or as part of the social network, and how much through ‘interoperable’ e–mail? Will such distinctions erode due to open standards?
  • What are the non–economic incentives which keep users coming back to the same sites (e.g., casual games, announcements, active presence of friends, status updates)? How can the effectiveness of such incentives be measured?

 

++++++++++

4. Web 2.0 substrate and enabling technologies

We next provide an overview of the underlying communication model and technologies in Web2. Since users are first level objects in Web2 they are both producers and consumers of content. The role of the Web2 substrate is to help in the production of such content, host it, and allow interested users to consume it while interacting with other like–minded users.

4.1. Web 2.0 viewed as Publish/Subscribe model

We consider the ways in which Web2 sites move content between creators and consumers. Web2 widens ways to view content: on the site associated with the publishing site, syndicated to other sites, aggregated to RSS readers and e–mail, and short content or alerts directed to cell phones. Figure 1 shows some of the possible pathways from users who create content through “publishers” to users who subscribe to content and view it on “displays.” The data travels from publishers to displays by a mixture of push and pull: Twitter (which supports publishing status updates of up to 140 characters) can push these to a cell phone as an SMS, or have the content pulled by an RSS reader.

 

Figure 1: Paths from content creator to consumer in Web2
Figure 1: Paths from content creator to consumer in Web2.

 

There is no inherent technological reason why any particular connection from a publisher to a display is not currently possible. Yet, many pairings are not currently supported, and others are addressed on an ad hoc pairwise basis: many applications are being written solely to move content from publishers into Facebook using public APIs, for example. An obvious challenge is to encourage common interchange formats and protocols to eliminate redundant work. RSS is a candidate, but is limited since it does not allow authentication (some transfers should only be allowed to authorized subscribers). In some cases, publishers restrict which displays their content is permitted on, via artificial barriers.

Some routes from publishers to displays involve intermediate steps, such as gateways for SMS to e–mail. Others routes create mash–ups by combining information from multiple sources, via open APIs, scraping, and structured information (RSS and other XML sources). This notion simply does not exist in Web1, and reasoning about such dynamic objects in Web1 terminology makes little sense, since they are not hosted or owned by any one site. Note that Table 1 shows that most popular Web2 offer a powerful open API to address their content.

Issues

It remains to study the pub/sub view ofWeb2, and analyze the extent to which information can be channeled from publishers to displays, and which routes are blocked. Various sites are more or less permissive of what kinds of objects (e.g., Javascript or flash) are allowed to be embedded in their pages, and it seems inevitable that the large number of possible interactions will lead to interesting security holes in the vein of cross–site scripting.

4.2. Web 2.0 as a platform

A recent trend, driven by Facebook, is to view Web 2.0 as a platform supporting other applications. This is enabled by the opening of APIs and allowing users to add applications to their account, and share some information (such as their neighbors in a social graph) with the application. This development is quite recent (Facebook launched this feature in May 2007) and so there has been little formal study, and only a few months of history in comparison to the fifteen plus years of history of the Web as a whole.

The simplest enhancements allow a users to include content from a large variety of sources, which is often explicitly encouraged by the publishers: each YouTube video page (by default) includes the code required to embed the video into another page. Because of this flexibility, we start to observe a new class of site, a “para” site which provides services and features designed around a single host site. For example, many sites offer page designs, layouts, and other graphical embellishments for MySpace (e.g., WhateverLife [Salter, 2007], Pimp–My–Profile). In Web1, many sites offered generic enhancements — the one–time ubiquitous “Web counter” — which were suitable for adding to any page; in contrast, these para sites offer functionality only for a single targeted host at a time.

Richer applications make more extensive use of more recently opened APIs, and several of the most successful applications seem to derive from the para sites mentioned above. Within Facebook, the current most popular applications are provided by RockYou and Slide, which add additional “flair” to profiles in the form of photo slide shows and embedded video, and extend the capabilities for user interaction via posting richer messages and drawings, and giving virtual gifts. Such applications are somewhat akin to Firefox extensions, in that they may be displayed in a central repository with a tepid endorsement, but the actual application is maintained and executed at the external site. A Facebook application in a user’s profile is rendered by calls from Facebook to the application hosting site. The external site can thus get accurate metrics of usage but the access information is split between the various external sites and the host which provided the distribution channel and enabled the downloading. Unlike Firefox extensions, where communication is mostly local to the user’s browser once the extension has been downloaded, external applications trigger a considerable amount of intra–site traffic in Web2.

The benefit to placing an application within a Web2 site with a social networking component (compared to directly hosting it on the Web) is the ability to leverage the existing network of friends of the user, and grow in popularity by viral spread. The disadvantage is that applications are at the mercy of the host, which can change its API or acceptable use policy at any moment, and can block any applications at will. Applications also compete for the scarce resource of screen real estate: installed applications or other plugins are typically shown one after another on the user’s profile page.

Issues

Just as with adoption of Web2 sites as a whole, the spread of application usage within Web2 sites is open to study along with factors affecting the speed of infection and the duration of popularity. With the possibility of such viral spread comes the possibility of flash crowds which may knock over the application servers, while the hosting platform remains up and running. We have to model the ecosystem of sites and applications/para sites, and understand what populations evolve. A more fundamental question is whether such applications represent a quantum leap in the evolution of Web2, or merely a brief fad. Do these add–ins represent missing functionality from the host site, and what happens when the host site adds this functionality to its core set of functions?

4.3. Key underlying Web 2.0 technologies

AJAX stands for asynchronous Javascript and XML, and is one of the key visible building box in popular Web 2.0 technologies. Ajax is a mixture of several technologies that integrate Web page presentation, interactive data exchange between client and server, client side scripts, and asynchronous updates of server response. The Ajax intermediary sits on the client side sending requests to a server and updating the page asynchronously. A key component of the open standards–based AJAX is the Application Programmer Interface called XMLHttpRequest (XHR) that scripting languages use to exchange data between a client and a Web server. The data is often in XML format but can be HTML, text, Javascript arrays, or even a few customized formats. Likewise, the scripting language does not have to be Javascript. XHR is not a protocol extension and was not introduced in any formal manner, rather, it was a feature of Microsoft’s ActiveX extended to other platforms.

The key purpose of Ajax is to let scripts act as HTTP (or HTTPS) clients and send/receive data from Web servers using a variety of common HTTP methods (GET, HEAD, POST, PUT, DELETE, and OPTIONS are supported currently). Thus, Ajax can be used for dynamic layout and reformatting of a Web page, reduce the amount of reloading needed by sending a request for just a small portion, and interact on demand with the server. The responses from the server are handled asynchronously by the browser without having to keep the user’s attention frozen. Numerous popular dynamic Web applications, such as maps, use XHR.

Similarly, Flash objects can offer similar functionality in that once downloaded they can communicate asynchronously with a server. Consequently, YouTube videos can begin playing before the whole movie has been received: the user downloads a compact flash object which downloads a small prefix of the video and begins playing it out while asynchronously fetching the remainder of the video. Supporting Flash requires an appropriate Adobe plug–in to be installed, although user penetration of this plug–in is in the high ninety percentiles. Toolkits exist allowing Internet applications to be written in a high level language and then rendered either as Flash objects or pages with Ajax components, meaning that it may be helpful to sometimes think of Flash and Ajax merely as object code. Ajax apps are typically easier for the researcher to reverse–engineer and understand for measurement purposes than the Shockwave Flash (SWF) format. Currently, Flash is mostly used for rendering rich embedded objects (video, audio, games): few entire applications which store and recall data are implemented in Flash.

Issues

For both Ajax and Flash, it remains a challenge to develop tools and general techniques to be able to parse and analyze client/server communications. What are the implications of AJAX, asynchronous transfer, etc., for servers? What is the distinction between the “auto–refresh” feature in Web1 whereby a site is automatically reloaded and the Web2 sites that contact a server to update part of the page on a regular basis.

 

++++++++++

5. Measurement issues

We examine how data can be collected from and about Web2 sites. Much can be learnt from prior work in Web1 and reused.

5.1. Traffic measurements

In Web1 traffic measurement was based on precise, comparable metrics. The click count and page view defined quantities which could be measured through site traffic logs, and compared. More generally, Web1 measurements include popularity of sites, fraction of traffic on Internet, number of clients, servers, proxies (number of clients behind a proxy), and so on.

The shift in technologies that has accompanied the rise of Web2, in particular asynchronous transfers, has weakened the precision and comparability of these measures. A user can spend a significant amount of time interacting with a single page without ever triggering an explicit “page load” event (a ‘click’ in Web1 world): for example, consider a user scrolling and zooming in and out of an interactive map to plan a route. Thus there is a gradual shift to a less technologically driven metric in order to rate popularity (and hence set ad rates): measuring the amount of time a user spends on a site, instead of the number of discrete pages they view (Bausch, 2007) and moving from pay–per–click advertising to pay–per–action (Kniaz, 2007). This still leaves some uncertainty: just because a page is open in a tab of a browser, it does not mean it has the user’s attention. Indeed, due to the asynchronous push–based technology, users are encouraged to leave tabs open in the background, so that they can quickly scan the page later for updates (new messages, status updates, etc.).

The metric of hits on a Web site becomes problematic if a page sends out multiple XHR requests in Ajax, for small updates to the page. A site can easily inflate hit counts based on the micro–requests and responses. Is the additional scripts or responses sent back as a result of a client–side script that was invoked viewed as part of the contents of the page and made available to search engines? From a traffic point of view the number of HTTP requests are potentially larger as users trigger dynamic requests by interacting with the application. However, in many cases the requests can be in the form of a Javascript call that is handled locally at the client end avoiding a round trip to the server. If the requests are sent to the server, the responses are typically small and only a small portion of the page requires to be updated. If the user does not interact with certain parts of the downloaded page, additional data/scripts need not be downloaded, thus reducing overall response size.

Given some metric, in Web2 it is still easy for a site to measure its own audience: all the necessary information is available to the site. However, to measure audience from outside requires different strategies. Moreover, Web2 is meant to shift its users from being passive viewers of Web1 content to being active creators of Web2 content. This brings new questions of measurement of activity which did not arise in Web1: how to measure the various actions of users? We consider the following classes of actions:

  • “Clicks and connections”: simple activities which only require a single click to complete, such as rating a movie, voting in a poll or voting for a story (as in Digg), or adding a semantic link, such as adding a friend.
  • “Comments”: adding a short response, comment or tag to existing content, such as a news story, blog entry, photo, etc.
  • “Casual communication”: sending a message to another user, either via an e–mail–like system or via instant messaging. These are typically short, a sentence or two per communication.
  • “Communities”: interacting in larger groups or communities by joining a group or posting a message to a group.
  • “Content Creation”: uploading or entering some entirely new content, such as a webcam movie, digital photo, or blog posting.

For a given site, it is relevant to measure the frequency of these actions, and to measure what fraction of users participates in each one. Much of the hype surrounding Web2 focuses on content creation as being a key element. However, while many users have created and shared some content (Madden and Fox, 2006), on any given site new content is only created by a few users. Similarly, the most popular content may be dominated by a small clique (e.g., perceptions of bias in Digg front page [Carr, 2007]), or by professionally produced content (e.g., YouTube’s most viewed list is dominated by music videos [YouTube, 2007a]). Other activities are also key to the success or failure of a Web2 site, and so it is also important to understand the less active interactions (comments, clicks and links), and also the virtually passive (views). Finding the relative distributions of these activities (distributions of comments per posting, votes per poll, ratings per movie) are key to understanding the more complex interactions. Simply taking a ratio of uploads to views for YouTube (2007b) gives a 1:1500 ratio of active to passive. But this is overly simplistic, and does not capture the fact that a few videos get millions of views, while a large fraction (the long tail) receive only a handful. It also does not answer how many users of YouTube only ever view content.

Issues

In Web1 these questions do not arise, since the typical model is that all visitors are read–only, and all content and comment is provided by the site owners. In Web2 there is a co–mingling of commentators and creators, and every visitor has the opportunity to click, comment, create, etc. Extracting data on these actions is challenging, but possible, since the quantification and presentation of these actions is an expected part of the information delivered to the users: number of comments, number of views, number of ratings are all often presented publicly as metadata for each piece of content. Scraping these values currently requires significant effort, and is addressed in the next section. The following specific challenges present themselves:

  • What is the correct metric for measuring Web2 traffic? How can this be accurately measured from outside a site?
  • How to measure and calibrate the different levels of user interaction (clicks, comments, content creation) across different sites?

5.2. Crawling and scraping

Underlying many of the issues discussed above is the need to be able to crawl a large Web2 site and the induced social network, and from the retrieved information extract rich metadata from the pages. As Web2 sites contain a variety of information, and each is structured in a different way, this typically requires building a crawler which is capable of parsing a page into different semantic elements (navigation links, friend links, group links, other links) in order to extract the social network and associated data. Some sites, such as Flickr and YouTube, offer APIs to extract comments, friends, views, and tags, which simplifies the extraction of data and links, but still requires appropriate custom code in order to allow large scale extraction of data. Recent presentation technologies (Javascript and XML) further complicate the crawling task, since in addition to parsing returned pages, crawlers also have to simulate user clicks in order to extract some additional data. Exploration by crawling can be seriously hindered if the site presents each logged in user only information about their reciprocated friends, since most crawlers have few friends. Finding “backlinks” to nodes is also a challenge, especially if the inward–pointing nodes have few or no incoming links themselves (Mislove, et al., 2007). For similar reasons, it is much harder to passively sniff such traffic and extract the pattern of user behavior, marking a move away from stateless interactions.

Thus far, studies have looked at individual or small collections of sites in isolation, such as YouTube, blog sites (LiveJournal, blogger), MySpace etc. (Backstrom, et al., 2006; Gu, et al., 2006; Kumar, et al., 2004; Kumar, et al., 2005; Kumar, et al., 2006; Lento, et al., 2006). These give detailed views of certain popular sites, or some comparison between a couple of similar sites. Other studies have looked at wider settings, but only studying Web1 properties such as server properties, numbers of links, and not the additionalWeb2 semantics of friends, etc., e.g., in studying collections of blogs (Cohen and Krishnamurthy, 2005). Detailed, large scale comparison of sites currently requires significant effort in data collection, and so has not been realized. A more common approach is to look at use of specific Web2 sites traffic use at the edge of a campus network and obtaining a measure of popularity of resources (Zink, et al., 2008).

YouTube in particular has attracted significant recent study. These studies have tried to analyze several aspects, such as the number of views and rankings, and local (geographic) popularity of videos over time (Cha, et al., 2007; Zink, et al., 2008); object sizes and access patterns (Gill, et al., 2007); and properties of the embedded social network such as degree and cluster coefficient (Mislove, et al., 2007). Simply crawling a single largeWeb2 site such as YouTube brings out differences (Gill, et al., 2007): Web2 sites are a moving target with a significantly higher rate of change than popularWeb1 sites. The volume of data to be fetched can be significantly high for a single Web2 site compared to that of the most popular Web1 site or even a collection of popular Web1 sites: CNN will only add at most a few hours of of its video output per day, whereas YouTube has claimed over 65,000 new videos per day [3].

Issues

Professional–level crawling bandwidth would be needed to fetch even small portions of YouTube–like sites. In some cases this cost can be reduced by only “indexing” the site and its content, i.e., by only collecting statistics on large video and graphical objects, rather than fetching them. Indexing necessarily limits further analysis that can be done (e.g., analyzing bit–rate choices, other encoding features [Gill, et al., 2007]). Finally, one expects that Web2 site owners such as YouTube should be protective of their bandwidth and the content stored on their sites, and so would prevent excessive crawling.

  • A challenge to the community is to start developing general purpose tools for crawling and parsing Web2 sites, which can be quickly customized for a particular site. Initial attempts to create such tools may have the side benefit of exposing commonalities across specific Web2 sites, and highlighting generic technical and data presentation issues.
  • What techniques can be designed to probe “closed” sites such as Facebook, which only reveal information on friends of the user? One approach is to design plug–in applications via the API which collects (anonymized) data about users who use the application.

 

++++++++++

6. Technical and external issues

We now explore issues related to understanding and measuring Web2 sites as part of the Internet ecosystem. This is based on externalities imposed by user’s presence and interaction with a site.

6.1. Performance and latency

This is probably the best studied aspect of Web1 and largely irrelevant to Web2. It is significantly easier to provision for in Web2 and most popular Web2 sites are significantly over–provisioned. Except for rare cases like cross–site scripting worm attacks, they do not experience significant daily latency fluctuation. Part of the reason is that the number of individual end users are largely fixed during small intervals of time and can only redirect their interest to different parts of the site. They can move from chatting to sending e–mail to uploading audio/video; none of the individual actions can cause significant ripples on the overall network. However, external events such as a large number of users simultaneously applying operating system patches can have an impact. Many Web 2.0 sites impose a variety of restrictions on users and enforce them to ensure prevention of viral spreading of communication and data.

In Web2 there is considerable data about expected load based on the number of subscribed users. For example, Facebook’s 55 million users provide a reasonable bound on the expected load. In Web1 the potential for flash crowd may be slightly higher. The key difference is that most social networks require a registration phase which effectively provides them with an upper bound on the number of users at any given time (and a control mechanism to slow down admission in the event of a mass influx of new signups). Sites that do not mandate logging in, such as MySpace and YouTube, can experience sudden increases in load but use CDNs to relieve surges.

Flash crowds — where a large number of users simultaneously and unexpectedly try to access a Web site — have been documented since the mid–nineties. Popular Webcasts like that of Victoria’s Secret or special events like the Olympics, popular sites such as Slashdot highlighting a site cause flash crowds directing a large amount of new traffic. A natural question is what are the equivalents in Web2, and how do they differ from a Web1 flash crowd. If they occur, what are their durations and distributions? Is there a single peak, or multiple waves?

A first observation is that a flash crowd in a large well–provisioned Web 2.0 system has no observable impact on the system performance. For example, the sudden popularity of a particular video clip on YouTube, or a band on MySpace will draw a lot of traffic to a specific page, but without significantly increasing the aggregate overall traffic to the site. Flash crowds consist of a large number of individual visitors. One can imagine the equivalents for different actions in Web2: a piece of content suddenly attracting a large number of ratings or comments, or a group gaining many new members. Again, since sites offering such facilities are typically large and overprovisioned, there is no technical reason why the impact on service should be visible. There may be reasons to try to ‘fake’ a large number of comments or ratings: to attract interest, or to drive traffic (directly, or via ‘comment–spam’ intended to raise search engine ranking of the target). This issue frequently arises in discussions of Digg, a user–voted “cool links” list, which can be considered a Web2 counterpart of Slashdot. Manipulation of voting is a concern, but has not been disruptive enough to lead to mass desertion of users. Successful manipulation appears to require exploits such as the MySpace worm that managed to generate a very large number of friends in a few minutes, or the involvement of site owner (for example, MySpace automatically adds founder Tom Anderson as a “friend” to all new accounts; Digg’s creators initially removed all stories including an HD–DVD processing key).

Issues

The questions of interest here include, how new Web2 sites can scale with increasing popularity? Are there generic services that will help them to scale quickly while not increasing latency to all users? How can one independently measure popularity of applications within Web2 sites, e.g., Facebook or MySpace applications such as “Where I’ve been?”. Although the host Web2 sites typically scale well with load, external embedded sites and application servers may not cope so well, especially when their popularity spreads “virally” through the user community. Load can certainly be high, and skewed: 87 percent of usage of the thousands of Facebook applications go to just 84 applications (O’Reilly, 2007). Measuring and predicting these trends is a new challenge for Web2.

6.2. Configuration, distribution, and location

Software used for Web2 is quite different as the set of requirements and expectations are different. YouTube does not have the same load mix as CNN. Redirection rates may be somewhat similar but only with highly popular Web1 sites. Redirections are likely to be at the HTTP–level in Web2 rather than at a lower level in the protocol stack. If a particular Web2 site has wide client distribution and large load potential then a CDN might be used (e.g., the Joost video delivery service and YouTube’s reliance on LimeLight). However, the decision to update content and possibly redistribute it is still carried out centrally.

It is easy to locate Web2 sites as they are fewer in number, more concentrated, with limited need for replication. For example, location–indicative subdomains such as foo.myspace.com and bar.facebook.com exist, (where foo is one of a number of countries, and bar one of many universities). Here, the intended location of the page is advertised, even if this does not correspond to the actual physical location. Given the “weight” of a Web2 site (much heavier than even popular Web1 sites) the bytes are expected to be closer to the users of the site and thus geographically constrained at country level. The differences betweenWeb1 andWeb2 are expected to be pronounced here. Tools used to narrow down locations in Web1 can be reused with a higher hit rate. Clients of a Web2 site are likely to be less-well spread out than a highly popular Web1 site, due to the emphasis on social aspects and linking to ‘friends.’

Issues

Web2 compliance is entirely unstudied as are client connectivity issues. Given the significant differences in user demographics of Web2 sites (mostly younger with internal differences between MySpace and Facebook of a few years), there is an expectation of better connectivity for the more active younger users. Increasingly, however, mobile connections to Web2 sites bring up connectivity issues and ways to send data down a thin pipe similar to earlier work in Web1 (e.g., WebExpress, delta mechanisms [Mogul, et al., 1997], etc.). In Web1 alternate sites were created and content tailored to respond to devices that had bandwidth limitation. In the case of Web2, given the dynamism with which the site changes, sending content to mobile devices with constraints can be significantly harder. However, short updates can be easily handled, e.g., as text messages.

6.3. User workload models

Reference patterns, object size distributions and so on are just starting to be studied (Gill, et al., 2007; Mislove, et al., 2007). Initial studies indicate that size distributions approximately follow the familiar heavy–tailed size distributions, even when limitations are enforced (e.g., most YouTube users are restricted to uploading 10 minutes worth of video). The manner in which data is gathered to carry out such studies shows an important difference: crawling across Web1 is much easier as the load imposed on a particular Web site is rather low and back–off strategies can be used; however crawling a heavy Web2 site will impose a significant load on that site and may have to be staggered over a much longer period. Early indications are that there is a spread of content types and the aggregation potential of individual content types’ contribution to a Web2 site and potential caching impact remains to be studied. Studies (Gill, et al., 2007) have found, for example, that video clips were longer on YouTube than the overall Web, and at higher bit rates.

Recent work (Gill, et al., 2007; Zink, et al., 2008) examining popularity of YouTube in campus environments showed that local and global popularity of video clips are significantly different. Thus, proxy caching can be beneficial but modeling workload based on local conditions may be problematic. Different access patterns in Web2 vs. Web1 affect the efficacy of cache deployment: skewed distributions with long tails of object popularity means fewer cache hits, rendering caching to be not worthwhile. In the case of static video streams, it might make sense for ISPs to deploy caches if a significant number of videos are popular locally, as (Zink, et al., 2008) indicated. However, if videos being streamed are modified with advertisements and constantly changing annotations, then caching becomes harder.

Although not solely a Web2 application, Instant Messaging (IM) shares characteristics with developing Web2 applications, which often offer IM capabilities within the browser (e.g., Meebo). A session in Web1 could be largely determined by examining connections between a client and a server site whereas in IM, there is a significant variance. This stems from different IM clients which impose different quiescent periods before a timeout. Protocol and ‘keep alive’ messages exceed chat traffic in IM (Xiao, et al., 2007).

Issues

The questions in the context of workload models include, is in–network caching of Web2 objects worth the effort and cost? This requires modeling growth in object size, object access distributions, and disk vs. bandwidth cost to determine. It also requires predicting trends in access distributions: is the long tail nature of access getting fatter? How should one deal with newer applications, such as Twitter, that have frequently updated micro–content, as compared to YouTube where macro–content is almost never updated, but much larger? How is the performance of a Web site measured in the presence of Ajax? Is there a contribution to visible latency, given that the updating is done asynchronously?

 

++++++++++

7. Summary of metrics of interest

Comparing the various metrics computed over the recent years in Web1 and what might be of interest in Web2, we see some obvious overlap but there are also several new metrics of interest. We follow the set of issues raised in Chapter 7 of Crovella and Krishnamurthy (2007) to identify these metrics.

Web 1.0 metrics of similar relevance in Web2 include the overall share of Internet traffic, number of users and servers, and share of various protocols. Around half a billion users are present in few tens of social networks with the top few responsible for most of the users and thus traffic. These sites work hard to keep traffic within their own network via their own versions of e–mail, instant messaging, etc. Traffic inside a Web 2 site is harder to measure without help from the site itself. For example, a user writing on a board (‘wall’) of a friend may result in notifications generated to other friends who have expressed interest. The notifications may only result in actual traffic when other friends log in, view the message and possibly respond. With a large fraction of users returning to the site more than once a day, there is bound to be considerable internal communication. However, such messages are short and human–generated and are likely to be fairly bounded in overall bytes.

Growth patterns have been similar to some popular Web sites. Since there are registration requirements and a fairly quick drying up of close friends for each user, there is some tailing off effect. Almost all the popular Web2 sites are accessed over the Web implying HTTP and thus TCP connections. Facebook is the seventh most visited Web site (currently just behind Google). Traffic generated by some popular applications (such as Twitter) is mostly UDP, and the people requesting such notifications are pre–registered. If there is a steady increase of external feeds into the growing volume of users there could be an explosion in the number of connection setups. This will lead to pressure to streamline feeds to reduce the overhead much like persistent connections and pipelining introduced in HTTP/1.1. Given the control that individual Web2 sites have over their interface, streamlining can be much more rapid than the years taken for HTTP/1.1 adoption.

As discussed earlier in Section 4.2, the set of external applications (almost 6,000 in Facebook alone) and widgets introduce a very different kind of challenge; one that is unique to Web2. Facebook has claimed that at least one application has been installed and used by virtually every user. Applications allow new interactions between friends, and trigger internal notifications after actions are taken (such as a move in a turn–based game). The overall traffic as a result of a Web2 site is thus the product of the set of interactive applications and the set of participating friends.

Web1 went through considerable effort to streamline popular sites for mobile users; in Web2 the challenges are slightly different. The fact that most communications are short and episodic allow for instant notification to users via mobile devices. Context is generally deprecated and any potential accompanying rich media can be deposited at the site for later perusal. Real–time requirements in Web1 only mattered for certain classes of Web sites (stock tickers and game score updates). In Web2, there are a class of communications (IM) that are time–immediate but writing on a user’s board does not carry the same urgency. In fact, there is an explicit attempt to allow both kinds of communication in support applications: twitter for example allows the followers to get the notification (‘tweet’) on their board or on their cellphone.

In Web1 the communication between a client and a site is fairly limited and highly restrictive: a request is sent and the site responds. If there are too many requests some requests can be dropped and others can be delayed. A site can choose between classes of users. In Web2 since most communication is between users, the site has no easy way to select during overload. However, the sites can (and do) impose varying limitations to ensure that overall load and thus latency is maintained at a reasonable level. With increasing users any lack of scalability will result in additional restrictions. Decisions made by the Web2 site affect all users uniformly as there is no incentive in prioritizing classes of users.

 

++++++++++

8. Beyond Web 2.0

We conclude by going beyond Web 2.0, to draw connections to other application types, such as P2P and Skype, and analyze the impact of wider adoption of Web 2.0 paradigms.

Implicit social applications

Skype, which offers voice calling over the Internet, now has over 80 million users globally, and is constantly adding new features (conference calls, voice mail). There is no reason not to think of it as a social network that allows people to exchange voice bits and text and form a community of interest (call list). As such, it can be modeled and viewed through the same lens as the Web2 sites discussed in this paper, and many of the same questions apply (some initial measurements are in [Chen, et al., 2006]). It differs from other examples we have identified in the main content (voice conversations) and hence the volume of bits involved.

Peer–to–peer and Web 2.0

A P2P peer who supplies content of interest is not be a friend in the social networking sense. Friends in real life may share interests in similar content (books, music, etc.) but often they share pointers in the form of recommendations. In the P2P sense, friends in real life act as .torrent files. There may well be interest in consuming the bits simultaneously and interacting as people do now over the phone while watching a sports event.

The Web 2.0 electronic fence

Web 2.0 may lead to a kind of balkanization — people in one social network may not communicate frequently with some of their friends who spend more time on other social networks. Artificial separation into tribes is encouraged by some of the Web2 sites who want to maximize and retain the set of members inside their “electronic fence.” However, there is a counter–current due to the prevalent link–based nature of Web users constantly linking to sites outside the fence. This activity is sufficient to prevent complete balkanization.

Web2.0ification and sharing friends

More sites are inviting users to “add friends,” but there are only so many times that a user wants to find which of their friends on the same site. If this is not necessary to use the site, then users can ignore this, or use a ‘bugmenot’ equivalent. But for some sites (such as Facebook), all value comes from connecting to friends. Sites currently offer the highly dubious (in terms of both security and accuracy) technique of users sharing their e–mail address books in order to find contacts via e–mail address matching. One proposal is to allow users to record their “social graph” (encoded in XML formats such as FOAF) once, and allow different sites to access this information, essentially linking up all the currently isolated graphs (the MySpace graph, the Facebook graph, the Flickr graph). More insidiously, third party sites can tap into a user’s social connection via open APIs and cookie–sharing agreements with a Web 2.0 site acting as an identity manager, akin to a widened notion of the Microsoft Passport.

Privacy and security

We iterate that there are significant challenges in allowing users to understand privacy implications and to easily express usage policies for their personal data. Privacy is typically not well understood by Web2 users, resulting in unintended consequences. Many teenagers accept that their posted data may unintentionally identify them (Lenhart and Madden, 2007). Simultaneously, dynamic presentation technologies also raise security concerns (Stamos and Lackey, 2006). Both privacy and security in Web2 demands explicit study and analysis. End of article

 

About the authors

Graham Cormode is a Principal Member of Technical Staff at AT&T Shannon Laboratories in New Jersey. Previously, he was a researcher at Bell Labs, after postdoctoral study at the DIMACS center in Rutgers University from 2002–2004. His PhD was granted by the University of Warwick in 2002. He works on social network analysis, large–scale data mining, and applied algorithms, with applications to databases, networks, and fundamentals of communications and computation.
Web:
http://dimacs.rutgers.edu/~graham/
E–mail: graham [at] research [dot] att [dot] com

Balachander Krishnamurthy of AT&T Labs — Research has focused his research of late in the areas of online social networks, unwanted traffic, and Internet measurements. He has authored and edited ten books, published over 70 papers, and holds twenty patents. He has collaborated with over 75 researchers worldwide. His most recent book — Internet measurements: Infrastructure, traffic and applications (525 pp, John Wiley, with M. Crovella) — was published in July 2006. His earlier book — Web protocols and practice (672 pp, Addison–Wesley, with J. Rexford) — has been translated into Portuguese, Japanese, Russian, and Chinese. Bala is homepage–less but many of his papers can be found at http://www.research.att.com/~bala/papers.
E–mail: bala [at] research [dot] att [dot] com

 

Acknowledgements

We thank Dave Kormann, Hal Purdy, Walter Willinger, and Craig Wills for their helpful comments.

 

Notes

1. Our discussion is based on the structure of sites at the time of writing, Fall 2007, unless otherwise specified.

2. There is even an (utterly boring) SIGCOMM Facebook group.

3. This statistic originally appeared on YouTube’s (2007b) fact sheet page, but has since been removed. This highlights the need for independent methods of measuring activity.

 

References

L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan, 2006. “Group formation in large social networks: Membership, growth, and evolution,” Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Philadelphia), pp. 44–54; version at http://www.cs.cornell.edu/~lars/kdd06-comm.pdf, accessed 30 May 2008.

S. Bausch, 2007. “Nielsen/netratings adds ‘total minutes’ metric” (July), at http://www.nielsen-netratings.com/pr/pr_070710.pdf, accessed 30 May 2008.

S. Bhagat, G. Cormode, S. Muthukrishnan, I. Rozenbaum, and H. Xue, 2007. “No blog is an island: Analyzing connections across information networks,” International Conference on Weblogs and Social Media (26–28 March, Boulder, Colo.), at http://www.icwsm.org/papers/paper20.html, accessed 30 May 2008.

A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener, 2000. “Graph structure in the Web,” Computer Networks, volume 33, numbers 1–6, pp. 309–320; version at http://www.cis.upenn.edu/~mkearns/teaching/NetworkedLife/broder.pdf, accessed 30 May 2008.

B. Carr, 2007. “Is digg broken beyond repair?” (April), at http://www.thenewbusinessblog.com/miscellaneous/is-digg-broken-beyond-repair/, accessed 30 May 2008.

M. Cha, H. Kwak, P. Rodriguez, Y.–Y. Ahn, and S. Moon, 2007. “I tube, you tube, everybody tubes: Analyzing the world’s largest user generated content video system,” Proceedings of the Seventh ACM SIGCOMM Conference on Internet Measurement (San Diego), pp. 1–14; version at http://www.imconf.net/imc-2007/papers/imc131.pdf, accessed 30 May 2008.

K.–T. Chen, C.–Y. Huang, P. Huang, and C.–L. Lei, 2006. “Quantifying Skype user satisfaction,” ACM SIGCOMM Computer Communication Review, volume 36, number 4, pp. 399–410; version at http://www.sigcomm.org/sigcomm2006/discussion/getpaper.php?paper_id=36, accessed 30 May 2008.

E. Cohen and B. Krishnamurthy, 2005. “A short walk in the Blogistan,” Computer Networks, volume 50, number 5, pp. 615–630; version at http://www.research.att.com/~bala/papers/chablis.pdf, accessed 30 May 2008.

M. Crovella and B. Krishnamurthy, 2007. Internet measurement: Infrastructure, traffic, and applications. Hoboken, N.J.: Wiley.

F. Douglis, A. Feldmann, B. Krishnamurthy, and J. Mogul, 1997. “Rate of change and other metrics: A live study of the World Wide Web,” USENIX Symposium on Internet Technologies and Systems, at http://www.usenix.org/publications/library/proceedings/usits97/douglis_rate.html, accessed 30 May 2008.

P. Gill, M. Arlitt, Z. Li, and A. Mahanti, 2007. “YouTube traffic characterization: A view from the edge,” Proceedings of the Seventh ACM SIGCOMM Conference on Internet Measurement (San Diego), pp. 15–28; version at http://www.imconf.net/imc-2007/papers/imc78.pdf, accessed 30 May 2008.

L. Gu, P. Johns, T.M. Lento, and M.A. Smith, 2006. “How do blog gardens grow? Language community correlates with network diffusion and adoption of blogging systems,” AAI Symposium on Computational Approaches to Analyzing Weblogs; abstract at http://www.aaai.org/Library/Symposia/Spring/ss06-03.php, accessed 30 May 2008.

R. Kniaz, 2007. “Pay–per–action beta test” (March), at http://adwords.blogspot.com/2007/03/pay-per-action-beta-test.html, accessed 30 May 2008.

R. Kumar, J. Novak, and A. Tomkins, 2006. “Structure and evolution of online social networks,” Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Philadelphia), pp. 611–617.

R. Kumar, J. Novak, P. Raghavan, and A. Tomkins, 2005. “On the bursty evolution of blogspace,” Proceedings of the Twelfth International Conference on World Wide Web (Budapest), pp. 568–576.

R. Kumar, J. Novak, P. Raghavan, and A. Tomkins, 2004. “Structure and evolution of blogspace,” Communications of the ACM, volume 47, issue 12, pp. 35–39.

A. Lenhart and M. Madden, 2007. “Teens, privacy & online social networks,” Pew Internet and American Life Project (18 April), at http://www.pewinternet.org/pdfs/PIP_Teens_Privacy_SNS_Report_Final.pdf, accessed 30 May 2008.

T. Lento, H.T. Welser, and L. Gu, 2006. “The ties that blog: Examining the relationship between social ties and continued participation in the wallop weblogging system,” Third Annual Workshop on the Weblogging Ecosystem.

M. Madden and S. Fox, 2006. “Riding the waves of ‘Web 2.0’,” Pew Internet and American Life Project (5 October), at http://www.pewinternet.org/pdfs/PIP_Web_2.0.pdf, accessed 30 May 2008.

S. Milgram, 1967. “The small–world problem,” Psychology Today, volume 1, number 1, pp. 61–67.

A. Mislove, M. Marcon, K.P. Gummadi, P. Druschel, and B. Bhattacharjee, 2007. “Measurement and analysis of online social networks,” Proceedings of the Seventh ACM SIGCOMM Conference on Internet Measurement (San Diego), pp. 29–42; version at http://www.imconf.net/imc-2007/papers/imc170.pdf, accessed 30 May 2008.

J.C. Mogul, F. Douglis, A. Feldmann, and B. Krishnamurthy, 1997. “Potential benefits of delta encoding and data compression for HTTP,” Proceedings of the ACM SIGCOMM ’97 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (Cannes), pp. 181–194.

T. O’Reilly, 2007. “Good news, bad news about Facebook application market: Long tail rules” (5 October), at http://radar.oreilly.com/archives/2007/10/facebook_long_tail_report.html, accessed 30 May 2008.

T. O’Reilly, 2005. “What is Web 2.0” (30 September), at http://www.oreilly.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html, accessed 30 May 2008.

C. Salter, 2007. “Girl power” (September), at http://www.fastcompany.com/magazine/118/girl-power.html, accessed 30 May 2008.

M.A. Smith, 1999. Invisible crowds in cyberspace: Mapping the social structure of the Usenet,” In: M.A. Smith and P. Kollock (editors). Communities in cyberspace. New York: Routledge.

A. Stamos and Z. Lackey, 2006. “Attacking AJAX Web applications” (3 August), at http://www.isecpartners.com/files/iSEC-Attacking_AJAX_Applications.BH2006.pdf, accessed 30 May 2008.

Z. Xiao, L. Guo, and J. Tracey, 2007. “Understanding instant messaging traffic characteristics,” Proceedings of the 27th International Conference on Distributed Computing Systems (Toronto), p. 51.

YouTube, 2007a. “YouTube: Most viewed videos,” at http://www.youtube.com/browse?s=mp&t=a&c=0&l=, accessed 30 May 2008.

YouTube, 2007b. “YouTube fact sheet,” at http://web.archive.org/web/20070221115744/http://youtube.com/t/fact_sheet, accessed 30 May 2008.

M. Zink, K. Suh, Y. Gu, and J. Kurose, 2008. “Watch global, cache local: YouTube network traffic at a campus network — Measurements and implications,” at http://gaia.cs.umass.edu/networks/papers/MMCN08-0.2.pdf, accessed 30 May 2008.