TL;DR 1 we must eliminate the “mini” threshold (group’s minimum interactions). because influence is not based in how much a person express itself, but in how relevant and trustworthy is their expression to others. e.g. one person may just talk once in a month, but being considered by their peers as a genius by his creativity, assertiveness, etc. also, another person may talk a lot in the same month, but being considered a clown by its senseless contributions.
TL;DR 2 here is no need to divide the rating score by “(cc+pc)”, if one person made a comment in a group where he obtained 10 likes and another one made a similar comment in a smaller group and only got 3 likes. the person who got the 10 likes is in fact more influent, he actually affected the opinion of at least 7 people more. no need to dilute his voice because he talked in a crowded group.
TL;DR 3 the rating final formula should be something simpler like: 7pl + 7mpc + mmc + cl, because when somebody posts something relevant and get likes or get mentioned in the root of a discussion -the post itself-, that’s in my opinion at least seven times as influent as being mentioned in a comment or having a comment being liked. i.e. I propose a weight of 7 preliminarily, because that’s the average number of comments per post I can see after a short skimming of dev.do. we may get it preciser by computing in a per group basis like “(ccg/pcg)”. finally, we should be incentivising content creation, shouldn’t we?
The long tale, for avid readers only
pc = Post authored by the user
cc = Comments authored by the user
mini = Authoring threshold to exclude non-prolific users
cl = Likes obtained in comments authored by the user
pl = Likes obtained in posts authored by the user
mpc = Mentions in other users posts
mmc = Mentions in other commments posts
pcg = Group publications in a given period (e.g. monthly)
ccg = Group comments in a given period (e.g. monthly)
This morning, I was pleased to discover a very interesting project qualifying the influence of individual developers in our country. You can read about the project and see it in production here. Nonetheless, I discover some issues that affect very negatively the precision of the algorithm. I would like to propose some modifications for the sake of fairness.
First, let’s talk about influence and the variables considered by the algorithm to compute it. thereafter, we are going to define an improved arrangement of the variables of the algorithm. even if it wouldn’t be perfect, it will represent a much better snapshot of reality.
What’s influence anyway?
Years ago, Larry Page -by then, an Stanford student- developed a method to qualify the importance of web pages based in their mentions in other web pages. taking the idea from one way the scientific papers’ influence was measured by then. after all, if lots of academics mention a particular paper, something relevant must have been published on it. He called his algorithm PageRank.
Conceptually, a vastly popular page, (e.g. the NY times) mentioning an not-so-popular page (e.g. my blog) gives my blog more “influence”, than a mention by another not-so-popular page (e.g. one of my relatives’ blog). please, observe that I used this example to introduce the idea of “influence”. this is relevant, because in my opinion, an user that is being mentioned in a post (where somebody is proposing an idea) has more “influence” than an user’s mention in a comment (where the idea is being discussed) -i.e. consider also the probability of fake references in user comments created by trivial conversations or parodies-.
Also, consider hundreds of references from not very influent users to an specific user. we may consider this user as “influent” too. because even if the users are not very influent, they in aggregate represent an influent group.
Finally, consider a brilliant scientific who only publishes one paper but changes forever the way we think about a certain topic. this publication is mentioned by millions of academics around the world constantly, even if it’s getting old. this scientific is of course, very influent. this idea is important later, please remember it.
The Status Quo
The idea of formulating a dominican influence rating is very interesting. and it’s admirable the work Carlos Vasquez has throw upon it implementing it. though, a preciser rating should take extra considerations (like the fact that influencing a prolific developer is far more important than influencing an very sporadic enthusiast), this is a very good start.
I won’t extend very much, the algorithm initial modifications are expressed above. which give us:
((ccg/pcg) *( pl + mpc)) + mmc + cl
As the simpler algorithm form.
Conceptually, I’m starting by the assumption that we should use the same set of variables to compute the rating. I’m sure there is a preciser way using more or less variables. that would need some thought. Also, I want to reiterate it would be awesome to take in consideration the quality of a given person “likes”.
Great start Carlos, for the sake of curiosity, I would love to run this algorithm upon the same dataset you used in order to see how much the national rating may vary.