Aug 22 2009

Free: a new rule in the digital economy

Published by Jiwei under Economics

I’ve been always fascinated by the new economics in the digital economy.  The most amazing thing is that the old economical principles have to be rewritten entirely.  Cases in point are the substantial startup capital, amount of physical materials, and human labor required in the old economy.

And new principles are being invented/discovered.  There are quite a few of them.

After unearthed the “Long Tail’ principle, Chris Anderson just summarized another one: “Free! The Future of a Radical Price”. To prove the power of Free, he stated:

FREE will be available in all digital forms–ebook, web book, and audiobook–for free shortly after the hardcover is published on July 7th (exact dates will be announced in the posts at left as each form is released).

And he’s serious! Now the audio version is free downloadable: http://www.wired.com/techbiz/it/magazine/17-07/mf_freer

Since I spend marginal ratio of my time with paper books, I just cancelled my order from Amazon.  I guess Mr. Andreson had accounted me in the small percentage.  I do have his Long Tail book on my bookshelf and almost certainly will buy his future books.

You can also read his earlier article to under how the work had evolved here.

http://www.wired.com/print/techbiz/it/magazine/16-03/ff_free

  • Digg
  • del.icio.us
  • Slashdot
  • StumbleUpon
  • YahooMyWeb
  • Technorati
  • Google Bookmarks

Related posts

No responses yet

Jul 18 2009

Is the Mobile Future in Apps or Browsers?

Published by Jiwei under Internet, Mobile, Mobile device, Wireless

On a post entitled “App stores are not the future, says Google“, Chris Nuttall reported

App stores do not represent the future of the mobile industry according to Google’s vice president of engineering Vic Gundotra, who maintains consumers will instead turn to web browsers to fill their information and entertainment needs. Speaking Thursday at the Mobilebeat conference in San Francisco, Gundotra said no one, including Google, is rich enough to support all of the myriad mobile platforms in existence, a circumstance that mandates a shift in thinking away from the fragmented app store model.

“What we clearly see happening is a move to incredibly powerful browsers,” Gundotra said. “Many, many applications can be delivered through the browser and what that does for our costs is stunning. We believe the web has won and over the next several years, the browser, for economic reasons almost, will become the platform that matters and certainly that’s where Google is investing.” Gundotra added that Apple CEO Steve Jobs proclaimed “Build for the web” with the initial launch of the iPhone, a statement that met with resistance from developers: “I think Steve really did understand that, over the long term, it would be the web, and I think that’s how things will play out.”

There’s certainly merit in Gundotra’s arguments.  However, we have to remember mobile bandwidth is still obscenely expensive, especially when you’re roaming.  Also from the user’s perspective, their patience is relatively low while mobile and in need of information, and the latency on mobile is intrinsically long, regardless of the bandwidth.

Imagine an online dictionary in both website and app (local in the phone) formats. The iPhone Wordweb app is infinitely better than a beautiful dictionary server out on the web.

So, I’d draw a different conclusion.  Unless mobile bandwidth become as cheap as the ADSL and roaming charges are bannished, there’s a very high probably that apps will beat the browsers.  The pre-condition is very unlikely to change in the next 10 years.

We shall see in 5 years.

  • Digg
  • del.icio.us
  • Slashdot
  • StumbleUpon
  • YahooMyWeb
  • Technorati
  • Google Bookmarks

Related posts

No responses yet

Dec 16 2008

Speedy ride on a VPS with Lighttpd

Published by Jiwei under Internet, Software

Apache is not adequate in restricted environment

As I said earlier, on a HostIcan VPS, my applications were running unbearably slow and erratically. I was determined to get it right even with the hard RAM limitation. BTW, I found VPSLink offers the most affordable package and reliable servers, use the referral code CSR42P to get a 10% lifetime discount.

Most Wordpress blog would get an F rating on Yahoo’s YSlow, a Firefox plug-in, for having too many HTTP requests (averaging 20~30 image, CSS and Javascripts). With the loose collaboration nature of open-source development, the number of HTTP requests will only increase with more plug-in and pretty themes. The Javascript driven Web 2.0 demands even more HTTP requests.

Apache default setup performs particularly bad in such a situation. If your applications are heavily PHP based, it’s not recommended (not all PHP modules is thread-safe) to run thread-based MPM (Multi-Processing Module, such as Worker MPM). You’re left with multi-process based prefork MPM. Each prefork process can grow to 250MB using 15~70MB RAM. PHP opcode cache occupies some 64MB. 15~20 Apache processes will take up over 300MB RAM. And you want to leave room to MySQL and have spare RAM for other memory hungry scripts. Remember I only have 512MB RAM on my VPS. If Apache process runs in HTTP/1.1 keep-alive mode, the server becomes very irresponsive when there’re many concurrent accesses.

“So I’m doomed!” thought I.

Lighttpd rocks with keep-alive

It’s at those moment of desperation, I found Lighttpd (pronounced Lighty). Lighty has a single-thread asynchronous IO architecture. Because of the asynchronous nature, an idle connection does not lock down a process or a thread as in Apache. This is perfect for keep-alive connections. In fact, Lighttpd scales very well with large number of concurrent requests.

I did a little benchmarking. For a PHP/MySQL script which generates a 40KB HTML page, 50 concurrent clients ( ab –n5000 –c50 [–k] )

  • Without keep-alive:
  • Apache prefork MPM: 200 requests/second
  • Lighttpd: 204 r/s
  • With keep-alive
  • Apache: 244 r/s
  • Lighttpd: 256 r/s

HTTP/1.1 Keep-alive makes a 25% performance difference.

Lighty is good, leaky is bad

With Lighty in hand, it seems I found a solution. Migrating from Apache to Lighttpd is pretty simple. It’s easier to run Lighttpd as User: apache and Group: apache, so that php-cgi won’t have permission problem with the existing PHP session directory. The only problem exists with scripts that rely on .htaccess to define mod_rewrite rules, such as Wordpress. Lighty mod_rewrite rules can only be implemented in configuration file. For Wordpress, you need to define the following:


$HTTP["host"] =~ "(^|\.)MyDomain\.com$" {
server.document-root = "/var/www/html/MyDomain"
server.error-handler-404 = "/index.php"
}

My FastCGI configuration is as follows:
fastcgi.server = ( ".php" =>
( "localhost" =>
(
"socket" => "/var/run/lighttpd/php-fastcgi.socket",
"bin-path" => "/usr/bin/php-cgi",
"bin-environment" =>
(
"PHP_FCGI_CHILDREN" => "32",
# terminate php process after the number of requests
# being processed, in case Php leaks memory
"PHP_FCGI_MAX_REQUESTS" => "1000"
),
"bin-copy-environment" => ( "PATH", "SHELL", "USER" ),
"min-procs" => 1,
"max-procs" => 1,
"max-load-per-proc" => 8,
"idle-timeout" => 50,
# Fix PATH_INFO for PHP scripts that rely on it (like Wordpress).
"broken-scriptfilename" => "enable"
) ) )

The number of php processes = max-procs * (PHP_FCGI_CHILDREN + 1).
There are max-procs watcher processes which do not handle requests and max-procs * PHP_FCGI_CHILDREN real php backends which serve requests. If you are using an opcode cache such as eAccelerator, XCache or APC it’s advisable to keep max-procs at a very low number (1 is perfectly fine) and raise PHP_FCGI_CHILDREN instead. Those opcode caches will create a separate memory space for each parent process, otherwise. If you leave max-procs at 4, you’ll end up with four separate opcode memory cache segments.

I also bumped up the APC cache size from 32MB to 64MB (apc.shm_size=64 in /etc/php.d/apc.ini) and can run 32 PHP FastCGI processes using 210MB RAM. After that, my sites are running at a lightening speed.

While everything is rosy, I was in dismay seeing Lighty bloated from 3MB to 100MB over night. Well, there’s a memory leak in Lighttpd 1.4.18. It took over a year for someone to fix it in 1.4.20. That’s probably why Redhat EL5 doesn’t include Lighty.

I took the lighttpd-1.4.20-1.fc10.src.rpm from Fedora 10 and rebuild it in Fedora 8. Now everything runs smoothly. Download the RPMs if you want.

How to enjoy this labour of love?

Looking back this quest for running busy websites on a budget VPS, it’s not an easy route. Lighty document is rather well “lighty”, in drastic contrast to Apache. Every time there’s a glitch, you have to Google for other people’s pearls of wisdom.

When I turn on Lighty RRDTool performance graphs, it is rather enjoyable knowing that my server can handle a substantial load at a fast speed.

Here’s two useful commands. To find out the RAM usage on your VPS.

  • % free

Look for “-/+ buffers/cache: 179724 344740″

To find out a  process sizer,

  • % top -bn 1 | grep lighttpd

Read Part 1: Choose the right VPS to host your busy websites

  • Digg
  • del.icio.us
  • Slashdot
  • StumbleUpon
  • YahooMyWeb
  • Technorati
  • Google Bookmarks

Related posts

One response so far

Dec 16 2008

Choose the right VPS to host your busy websites

Published by Jiwei under Internet, Software

I have a few busy websites running Wordpress, phpBB, phpList, eTicket, Magento and other PHP/MySQL based scripts. I get close to 1M hits a month with a peak observed at 9 hits/second.  A shared hosting won’t work for me anymore.  A dedicated server is too expensive and perhaps an over-kill. So I set off to find a VPS (Virtual Private Server) 8 month ago.  It’s a quest full of frustrations and learning.  I’ll write my experience down here in hope that it can be of some use to others about to start a similar quest.  I’ll cover two primary issues:

  • Picking the right VPS, and
  • Running a light system

VPSLink referral/promotion code: CSR42P 10% lifetime discount.

How to pick the right VPS

I first started with a HostIcan VPS-Rage (512MB RAM) using Virtuozzo virtualization. It came with a reseller package which I don’t need and it gave me a lot of headache in setting things up. The performance of the applications was unpredictable, in the sense that response latency could be unbearably long (>15 seconds) or reasonably fast (~1 second). Apache log showed many child processes being terminated. I wasn’t sure whether this was because of either my VPS being deprived of CPU or hiccups of Apache.

The last straw came when I changed the MySQL root password and the monitoring software kept restarting MySQL every 10~15 mins.  And I was demanded $30 to reset MySQL password.  No, I wouldn’t give in to this type of bullying behavior and decided to move on.  To make it the worst experience to me, HostICan won’t refund the remaining 6 month fee.  Their “Risk Free Money Back Guarantee” only applies to the first 30 days. Buyer beware!

Virtuozzo/OpenVZ gives you a bit speed but not stability

After realizing that life with a VPS is a bumpy ride, I tried to understand virtualization better. Virtuozzo is the commercial version of OpenVZ by Parallels. It runs a single instance of OS and claims to offer the highest levels of density, performance and manageability.  However, a fatal drawback is that each Virtuozzo VPS does not have virtual memory, but rather some additional burstable RAM. The burstable RAM are shared by all VPS on the same server.

When RAM on a server is exhausted, malloc() return an error and it is upto applications to handle the error gracefully.  Without exception, applications terminate with or without core dump.  When memory leak is still common (e.g. in PHP) and server setup is done by armatures (e.g. configuring 256 Apache processes), the chance for RAM exhaustion is pretty high.  This may explain why I saw many Apache child process termination on my HostIcan VPS.

Xen is the right choice

While stability being an issue with Virtuozzo or OpenVZ based VPS, the alternative is Xen. Xen is fundamentally different from Virtuozzo or OpenVZ.  Xen provides a hardware emulation and each VPS runs its own instance of OS, with virtual memory. When the allotted RAM is used up, the OS starts swapping.  It slows down the system but no sudden death occurs.  This is the same predictable behaviour of a dedicated server.  The following diagrams from VPSLink.com explains the differences pretty well.

After much search and research, I found VPSLink which has good reviews and reasonable pricing. BTW, when checking out hosting companies, it’s useful to have a read of their facilities and look at the server room pictures.  Pay particular attention the rack structure, heat vents, raised floor and wiring neatness. I managed to filter out a couple of hosting company showing Wal-mart racks on wheel, bare floor, hairy wiring and Linksys routers.

I signed up for VPSLink Link-4 Xen package ($30/month, 512M RAM) and installed Fedora 8.  Fedora 8 is in the same blood line of the vetted Redhat Enterprise Linux 5 (Fedora 11 will be in EL6 to be released 2010).  I have faith in its stability as I deployed multi-million dollar servers with Redhat EL5 to top tier mobile operators around the world before.

So far, VPSLink’s service is excellent. I asked for IP address reverse mapping and it was done in 10 minutes. If you want to sign up with a VPSLink package, use the referral/promotion code CSR42P.  You’ll get a 10% lifetime discount and I’ll get some credit to my package. Don’t feel obliged if there’s a better deal.

Read the Part 2 of this quest: Speedy ride on a VPS with Lighttpd

  • Digg
  • del.icio.us
  • Slashdot
  • StumbleUpon
  • YahooMyWeb
  • Technorati
  • Google Bookmarks

Related posts

7 responses so far

Mar 19 2008

The Fallacy in our Search for Information

Published by Jiwei under Internet, Philosophy

Jan. 8th, 2008 marked the official 10 year anniversary of Google’s famous PageRank algorithm.  The idea was conceived earlier, but Larry Page filed the PageRank algorithm patent (U.S. Patent 6,285,999 ) on Jan. 8th 1998. PageRank together with the business acumen of the Google folks powered Google to the throne of Internet. Is this worth celebrating?  Maybe not.  Here’s why.

aol-search-distribution-combined.jpg
Internet search results follows the power law (Pareto distribution), i.e. listings on the first few pages of search results get all the traffic. Based on AOL search data with 20 million searches for 650,000+ users over a 3-month period, the 1st search result was chosen 42.3% of time, the 1st  page (first 10 results) 89.6%.  The percentage drop drastically fast after 1st page and nobody ever picks a search result beyond the 502nd position. See the diagram.

The order of search results on the planet’s most popular search engine Google is governed by the PageRank algorithm.  PageRank algorithm assigns a numerial weight to each HTML page based on the number and the weight of other pages referring to it (called incoming links).

It’s of utmost importance to Internet marketers that your websites are listed on the first page of the chosen keywords. And this requires a lot incoming links with high PageRank to your websites. Tragically, much of the creativity of our generation of netizens has been consumed in engaging paid listing, link exchange, SEO, spam comments, etc. etc.  Sites not engaging in self-promotion for incoming links disappeared into oblivion in the search space. Plenty of SEO marketers promise 1st page listing within a week, for a fee.  And they fulfill the promise most of time.

Can the true knowledge compete with the assault of SEO marketers armed with an excessive array of modern weapons (tools, bots and phony web sites)? The competition is like that between the European settlers and the American India in the New World.

What Google search results present are, at best, our popular believes.  And a large collection of Google Bombs  is the living proof how easily and regularly the popular believes are manipulated.     Try searching for “French Victory”, “Dangerous cult” and “Miserable failure”.  The No.1 in Google results point to pages about “French defeat”, “Scientology” and “George Bush” respectively. Google defused some of the bombs in Jan. 2008. But God knows how many more are still at large or in the making.

Pages filled with disinformation swamp the ones with useful information, pushing them down in the ranking.   To see this effect, you only need to search for term “credit card”.  You won’t find a credit card provider in the first page.  To get noticed, the credit card providers have to pay, normally over $2 per click.  One may speculate that it is precisely this effect which sustains the growth of Google’s revenue.

Alas, we have created the great Internet in our own intellectural image. Are we aware of how fallable and distorted this image has become?  The image was shaped not by our understanding and truth, but rather by the commercial value of keywords. Most of us would think homogeneously because we read exactly the same information in our Google searches. I wish Google has “research philosopher” on staff to understand the philosophical impact of its search engine.

Back to PageRank.  The PageRank idea is similar to bibliography citation ranking in academic papers, where the most cited paper are considered most original.  Researchers in the early 70’s  showed the high citation ranking often do not reflect the importance but rather a phenomenon called the Matthew Effect (the famous authors get disproportionally more citations than the less famous ones). As Nassim Taleb put it in “The Black Swan”,

“Academic success is partly (but significantly) a lottery” (page 217).

PageRank had shown its age of relevance. To prolong the aging PageRank algorithm, Google proposed the “NoFollow” attribute in HTML tag in 2005.  Recent Google crack down on link farms and paid links is another attempt to maintain the meaning of PageRank. Will paid links be completely stamped out?  Betting on it would mean admitting that our generation of netizens do not have any ingenuity.

However, these being said, I believe Google only wants to prolong the life of PageRank until the next search ranking algorithm is ready.

The PageRank is dead; Long live the PageRank!

What will the next algorithm look like?  Well, for the sake of brevity, I’ll leave it to the next blog.  And here’s a hint. Why was Google toolbar aggressively promoted?  Why do other companies, e.g. Yahoo, Alexa/Amazon, promote their toolbar equally aggressively?

Until next time, please write down your praise, disapproval, criticism and suggestions.

  • Digg
  • del.icio.us
  • Slashdot
  • StumbleUpon
  • YahooMyWeb
  • Technorati
  • Google Bookmarks

Related posts

2 responses so far

Next »