Tuesday, July 7, 2009

Lies, damned lies, and statistics: EE edition

I came across some figures regarding downloads of Glassfish and JBoss AS that really puzzled me. Basically Glassfish was downloaded 700.000 times a month (end of '08) while JBoss AS was only downloaded around 115.000 times a month at the same. My first reaction was "Well done to you, Sun!" and then I realized that the gap was too good to be true. Let's have a look at these numbers.

For JBoss AS, the Sun team has only counted the direct number of downloads out of SourceForge. That's public knowledge by the way, go there for the JBoss stats.

For Glassfish (according to this), downloads numbers (not public BTW) come from:

Surprisingly, no stats on the direct number of downloads from the Glassfish project page. Let's analyze that a bit.

JDK bundles is the thing downloaded even by my grand'ma by accident. I can tell you right there, she has never ever used Glassfish ( nor JBoss ;) ). How many times, did I download the whole enchilada while I just wanted the plain old JDK!

Java EE SDK. This one is hard. Of course, people interested in EE will go download this package. Are they interested in GF? Hard to say. On the other hand providing a SDK without runtime will do no good.

NetBeans. I'm surprised at the popularity of NB, but there it is. Are all NB users actually GF users? BTW NB also comes with a JDK bundle but I don't know if this bundle also bundle GF :)

Let's call this strategy the Russian doll statistic generation strategy. Frankly, that's not very honest for your users and customers to use this strategy and then compare apple to oranges with your competition. When someone downloads JBoss AS, for sure he did not do it by accident (thanks to SourceForce's sense of UI :) )

So we have two strategies here, JBoss could start playing the Russian doll statistics generation strategy and we can be pretty good at it:

  • include JBoss AS in Fedora and count it
  • include JBoss AS in RHEL and count it (Dell, IBM and the like are pretty good at delivering RHEL on their hardware)
  • include JBoss AS in IcedTea and count it
  • include JBoss AS in JBoss Tools and count it
  • include JBoss AS Core in JBoss ESB, Portal etc etc and count it
  • include the number of downloads from our maven repository (with the number of times you have to nuke your local repo that will be a big hit :) )
  • I've only added a few ideas but for sure our marketing guys can be more productive

Of course we won't do that. An alternative strategy would be for the Glassfish team to only display their direct download numbers (the one they fail to display) and stop using bogus charts in their public and private slides.

On a side note, I find it disappointing that open source projects don't keep their stats open and preferably via a third party provider like SourceForge. Granted the SF stats are somewhat flaky but it keeps everyone honest with their own (lack of) success.

Disclaimer: I am not saying Glassfish is not a success, I am quite happy to have them as coopetitors on the server market in general and Java EE in particular.

14 comments:

Vinicius Carvalho said...

Well, in the end all that matters is that JBoss is the most widely used AS in the world. And after Oracle acquisition I would be damn afraid of using glassfish in my environment.

Alexis MP said...

Salut,

Check the NetBeans download page to see the options. People with the appropriate bandwidth will go for the whole enchilada which does include GlassFish.

The JDK bundle doesn't account for much (non-developer audiences get their install via java.com, not java.sun.com), but I agree may be people getting GlassFish when they only wanted the JRE...

We don't track GlassFish downloads in opensolaris, ubuntu, debian or Maven.

Overall I think the trend is what matters most. Also, we acknowledge that downloads numbers is not the best metric that's why we have le "pink dot maps" (people using the admin console on a regular basis, so they installed and use it) and registered user data. In all cases those indicators are looking darn good.

I'm curious though, what triggered you to spend time writing this?

Kango_V said...

err, The fact that you have not revealed where all your download stats come from?

Post your direct download numbers please so that a fair comparison can be made.

prakash said...

EE SDK plays a major role in these download nos. People go to java.sun.com to download Java EE technology which is only delivered by Sun through Glassfish. Think about all the 5M+ java developers out there and about 1M+ Java EE developers.
I agree with your comment, do they all go to java.sun.com for the luv of Glassfish. Most of them wouldnt even know what it is. They are looking for a reference implementation of Java EE which is delivered as Glassfish -smart move by Sun.
Any sensible engineer knows that these numbers are skewed.
No doubt that Glassfish has gained momentum in recent years. Is it at the expense of JBoss. Thats what Sun thinks and thats what they want to let the world know. During the same time, JBoss has grown much more in terms download numbers as well as number of deployments.
Where did this growth came from ?
Partly due to expansion of technology adoption. And partly due to consolidation in the marketplace.

pelegri said...

Hi Emmanuel - I'll comment in two parts, for readability.

I agree that it is difficult to translate d/l numbers to actual usage and that's why we post a number of other indicators including the (optional) registration data, the Admin Pings and the Update Center Pings - see [1] and [2].

All methodologies have limitations. The IPS-based connected pings will probably deliver the best metric but they won't be fully effective until when all the GlassFish users have switched to v3. The (GFv2) UC pings are also useful, and so are the Admin pings, but they all have some limitations. Our data is posted at [1]; I don't know of an equivalent for JBoss.

- eduard/o

[1]http://blogs.sun.com/pelegri/tags/adoption
[2]http://blogs.sun.com/theaquarium/tags/adoption+glassfish

Unknown said...

Hi all, thanks for the comments

@Kango_V I did publish the link to the stats I am referring to as links in the blog:
JBoss As
http://sourceforge.net/project/stats/detail.php?group_id=22866&ugn=jboss&mode=12months&type=prdownload

Glassfish
http://blogs.sun.com/pelegri/entry/glassfish_download_stats_jan_2009

@Alexis I wrote this blog because I've seen slides from Sun showing the graph comparing downloads of apples to downloads of oranges.

@pelegri We don't track people using our admin console so we don't have such stats.

pelegri said...

Now, back to d/l numbers - despite their limitations they do provide some ballpark data and trends.

There are many sources of variability around d/ls. Some you mention; one you didn't mention is the frequency of GA releases. Since December JBoss has released 5.0, 5.0.1 and 5.1. All these add up, and that is totally appropriate.

Another source of variability is download completion rates. That varies a lot depending on the network infrastructure and size of download. I know ours but don't know those at SourceForge - so I don't try to compare that.

You emphasize intent. Although some of the d/ls will be unintended I believe most are intentional. We provide some basic grouping so people can understand where the d/ls are coming from; I'll see if we can refine the categories in the
future.

I'm pretty confident on our numbers. Even if one removed a fraction of them because of "unintended downloads", the resulting number is still very big. The May 09 numbers are [2]; they show 172,190 d/ls for runtime bundles, plus 680,361 via NetBeans bundles.

We try to be careful counting our d/ls. We only count the NBs that include GF and we have other data showing healthy linkage between NB users and GF users but we separate the two categories.

Incidentally, the JDK SDK you mention is the same as the JavaEE SDK; they are just links in two different pages to the same bundle.

BTW, as historical background, I started publishing our d/l numbers last year after JBoss posted [1]. I went through the SF's numbers and could not figure where those 20M came from. We had been tracking our own downloads for years but never provided monthly details; that post convinced me to give more visibility to our own numbers.

[1]http://www.fnokd.com/2008/02/13/20-million-downloads/
[2]http://blogs.sun.com/pelegri/entry/glassfish_adoption_stats_may_2009

Unknown said...

@epl The 20 million - probably closer to 30 million right now is probably an undercount if anything. I have a spreadsheet on my laptop that tracks all the downloads from all the JBoss projects for as long as their have been stats. to track.

I checked some of the larger numbers and they match those reported by SF. There are some projects that don't us SF - but they don't have a big impact on the total. JBoss' huge adoption is due to a large part to the technology that JBoss' acquired - typically de-facto implementations like Hibernate, Cache, Drools, Tools, etc. Many of these projects' downloads match or even exceed the downloads of something like AS.

But I don't think we have the same obsession as Sun with downloads - we don't need them to justify our existence to our new fiducially responsible overlords in Redwood Shores :)

pelegri said...

@ richs

re: 30M -- would love to see the pointer to the SF links.

re: d/l stats and oracle -- there is no relation between those; as you know, we have been tracking d/ls for years. Unfortunately, I can't tell you anything about our conversations with Oracle; like everybody else, stay tuned.

re: focus w/ d/l numbers -- That was certainly the case for a number of years, but for the last couple of years I've been more interested in metrics of actual use like the UC data and admin pings.

Note that my stats always track multiple metrics. D/Ls is only one, but it seems we always end up talking about thgem w/ you. I will be very happy to talk about other public adoption data; that's why I asked Emmanuel about other metrics you track.

Tristan said...

Personally, I know I have downloaded NetBeans with unwanted GlassFish because according to the matrix it had most of what i wanted (ee + javaFx - glassfish) Also note that the "Java" netbeans bundle include gf 2 and 3.

Also I know that I or one of my coworkers has downloaded jBoss, made the ussual config changes and placed it on the lan.

1gbps is much faster for redistribution than 10mbps.

I would expect the 'ping' from jboss if available would yield a much higher number.

Anonymous said...

I don't quite see where lies are. Sun publishes some stats that everybody is free to interpret.

I doubt it is the best indication of an AS popularity. Especially after reading this post.

If there is no lies what is this buzz all about... JBoss PR?

demetrio.it said...

It happens a lot of time to download glassfish in bundle so I don't think that *some* downloads are unintended but *a lot*...

Demetrio

Anonymous said...

EE edition = Enterprise Edition edition .. :P

Martina Tycova said...

Hi,
I am surprised that they are giving some false stats to us.I like the GF for its good server features but they should be honest with stats.

hdmi adapter