Scraping member stats?

daniel_gudman

KING (In Land of Blind)
Staff member
#1
So there have been a few posts in the General Fanfiction section recently, by people doing a report on "fanfiction" as a subculture for their Anthropology 101 course or whatever.

They tend to be asking for demographic information, and that's a pretty piecemeal way to get data, it's not going to be very useful. I was thinking about that as a data problem, and realized that most of that information already exists in the auto-completed member accounts.

I think it would be interesting to scrape the information off the "user profiles" for stuff like, age / gender / location; and then other stuff like join date, post count, and time spent online. And then, once that information was slapped in a spreadsheet or a MDB or whatever, run some stats to see what kind of patterns and correlations emerge. The more I think about it, the more interesting that's starting to sound to me, as a member of the forum.

So I guess there are two questions I have:

1) I don't have the technical saavy to grab the data (once it's in a single file I can run from there though). Is this something that someone else in the audience could do, or is it super-hard for reasons I'm ignorant of, or what?

2) Would people be okay with this? I mean, are there any objections, like it makes you feel uncomfortable (ideally for reasons you can articulate)?
 

PCHeintz72

The Sentient Fanfic Search Engine mk II
#2
I would have little issue with this (note they should have mod approval, not my approval) as long as whomever did it one item was done to the collected data before distribution or any collating.

Remove all indicators of user names.

It is a privacy concern. Showing a individual user is 33 year old from Germany is *not* the same as saying specific user XXX is 33 and from Germany. Yes... most do not use real life names... it is still a concern that it is potentially possible to track specific users down when looking at stats.



Note there is a different issue to whomever is to collect and whomever is to use the data. If *all* users are in the list, then the data is going to be seriously skewed. because if the statistics hold based on my last check, only some 22% of forum members actually had posts.

EDIT: A fresh check of member page shows
- 2,025 or so members with 10 or more posts
- 1,447 or so members with 1-9 posts
- 3,472 or so members with any posts
- 11,531 or so members with 0 posts

A check of the front page shows 15,003 members. So if true, then in percentages that becomes:
~13.5% or so members with 10 or more posts
~9.7% or so members with 1-9 posts
~23.2% or so members with any posts
~76.8% or so members with 0 posts
 
Top