Alright! So, here's the deal. About a week ago in Critical Failures I made a
wild claim that I was working on some program to replace this
dumb guy who everybody loves.
They think he's great because he can count votes in Phalla. Like,
super good. The man is essentially a machine.
However, I made this claim that I was working on a vote counting program so that I could hang that over his head, to show the capabilities of a real algorithm! Unfortunately, I'm stumped.
I have no clue as to what coding language would be best suited for this project. PHP with XML? Java? I didn't even consider those, and I've already begun testing some things out using ASP.NET with VB.NET for the sake of laziness.
Basically all this is for naught without two questions being answered:
1.) How do I make a program that allows the user to enter the URL they want scanned for votes?
2.) Is this even kosher for the forum? It's been pulled off before by making a program that scanned a singular thread for pictures and downloaded them, but that was a while ago.
Essentially I'm looking for some insight on either what language I should be going for if this sort of thing isn't possible using ASP.NET, or if I'm on the right track, some guidance with using Google's API service would greatly appreciated.
Thanks for anybody takin' the time to look at this. I'm scourin' around lookin' for some good tutorials, templates, and other things that can get me started on this project and I'm makin' a small bit of headway, just not enough to really tell if MrBlarney's days are numbered.
Posts
Which environments/languages are you familiar with?
I was thinking about doing something similar in response to Hylianbunny's repeated changing of votes in day 1 of thorgot's current Phalla. Sort of an escalation of vote changing vs vote counting tools, if you will. I'm still tossing up if it's a good thing to do or not, to be honest.
as far as all this project entails, it'd have to get a lot of info for it to go down smooth. A lot more than what's possible to go down with no complications whatsoever. On top of having it search for !* (where * will be the player roster the user imports for the session), it'll have to keep track of who voted on what so that retractions can be counted for appropriately.
That, and being able to create an array of sorts to allow Nicknames to redirect to the actual user name. (Zot redirecting to Cold Salmon and Hatred)
but all that's just theorizing if I can't get the damn thing off the ground first. I've tested out importing Google's search engine and all I've managed to do so far is make a local version of Google Fight.
So basically I'm just kinda hoping for somebody who's done this before to kinda give me a push in the right direction as far as the initial web referencing thing goes.
Shell scripting isn't a bad idea, but you may want to make the program more robust than what was posted. If the task is "given a thread, count how many people say 'vote'" then something like Perl or Ruby might work well too. You can basically write regexs to scrape the HTML in the page and break it into a list of {post #, poster, text} and then have a reasonable way to scan the text part for the word vote.
Explain what you're doing a bit more and tell us what languages and platforms you've used.
Phalla is a forum-based game that goes on often on these boards, the main aspect around it is that there's a "mafia" that is hidden within the playerbase, and they need to be removed from the game in order for the good side to win. The basic aspect around that is having users declare a !vote in the thread, with the person with the most votes getting kicked out of the game.
Generally it doesn't become a huge hassle dealing with votes when it's a small game, but there are times when the playerbase goes past 70 or so and when you're short on time to go through all the votes, it gets a bit taxing. I figure a program that can count votes for you could be a big help.
Basically the program has to boil down like this:
In short:
That's essentially what I'm trying to pull off. Right now I'm trying to attempt this using ASP.NET with VB.NET and by importing Google's API service to try and customize into the sort of search engine I need.
Why not just write a simple screen scraping app that looks for instances of each Player's Name in each page?
Aren't most phalla votes green or red (or some other color)? Subtract from their count with red and add with green. That way you don't have to worry about who is voting for who, just the tally.
This way you have the start of a thread as input, it scrapes the page searching for each user, adding on green and subtracting on red. Then it goes to the next page, and does the same.
If this is going to be a public application, you should probably cache local copies of each page so you don't need to keep hitting the server every time someone wants to check their thread's tally.
edit: I guess if the colors aren't always accurate you could still do one pass per person on each page, just make sure you're not re-downloading the page for each player or you're going to add a fuckload of unnecessary strain on your own server and PAs
I'll look into that, anything to get me somewhere with this can be a big help. I was originally hoping to get something that can count using the colors voted with. The biggest problem I have with that is it would require the retract color (lime) where as if you're able to track the user who's voting, you don't have to deal with players who forget to retract, or even users who try to screw with the program by putting a bunch of !votes in their post.
but yeah, I'll look into screen scraping. That should help a lot, thanks!
Using web services implies that the forums supply you with some sort of web service to read the data.
But I don't think these forums do that.
The problem you're going to have is that in order to have any program that is worth a damn you're going to have to code against fraud.
I'm not totally familiar with Phalla or how it works at all but if you're doing Regexes for something simple like !name, then it will be easy to game.
How easy it is depends on how consistence the HTML source is for the forum pages (I don't know).
It will require multiple layers of regex and a lot of thought to prevent people from doing sneaky things.
You'll probably spend more time trying to remove exploits than actually writing the base code.
we also talk about other random shit and clown upon each other
Once you have that data, you can use regular expressions to pull out names and see whether the text has the word vote or not in it. I don't know how good of a programmer you are, but then you want to use something like a hashmap (or a database if you want this persistent) to map names --> vote counts. You may also want to have a system to prevent people from voting twice.
If you want the whole thing posted online so that you can access it at any time to see vote counts, you'll want to put everything on the web and do it as a MySQL-backed site. Any scripting language can provide you with a simple front-end. I would then write a scraper that rolls through the post history and counts votes, and set that scraper up as a cron job that runs every hour or whatever and updates the db.
I think this gets solved by just having a one-vote-per-person policy. If someone games the system, they've just changed their vote (assuming you can do that) and they haven't really gamed anything at all. As long as they can't affect other people's choices, there's no real way to mess with stuff. The HTML for pages is EXTREMELY consistent-- it's being generated by a computer program that's sending out the same output every single time.
Yeah but not necessarily, you could have things like div or cell ID's that vary, which makes the regex that much more complex.
You are right about the 1 vote per person per post thing, and that works as long as your post detection is reliable. I think I was headed in that direction mentally but just didn't type it out for him
we also talk about other random shit and clown upon each other
e.g. the post above mine, and only the post above mine:
http://forums.penny-arcade.com/showpost.php?p=5932384&postcount=14
Maybe, but that would force you to hit the servers a lot, potentially hundreds of times, even thousands in a longer thread.
I'm not sure how the dbase works but that possibly entails not only calling post data up from it, but also querying user data like avatar and sig hundreds and hundreds of times.
It would be slow and stressful to the forum servers.
Collecting the entire page and parsing it would be more efficient I think.
we also talk about other random shit and clown upon each other
The extension would scrape the page the user was reading on a button press or something.
This is because in Phallas, the user should hopefully be reading the thread anyway. So an extension that works on the page being read will not increase server load at all since, well, the user would have loaded that page in their browser anyway.
<!-- message -->
<!-- / message -->
and then after that is <!-- sig --> , etc
so it should be pretty simple to make sure only one vote is counted per post.
You guys have been incredibly helpful, I owe you one. If something comes up, or if I have an alpha version of the program available, I'll probably bring it up to see if there's anything I can do to make it more effecient.
Mostly it's just a matter of looking at the HTML source of the given pages until you can figure out a reliable way to extract the needed data out of every post, and then just hardcode your extraction methods. The only problem is if the showthread.php script changes how it generates HTML your program is instantly useless (most likely).
I don't know anything about Phallas, but DeathPrawn and Monoxide made good suggestions.