As was foretold, we've added advertisements to the forums! If you have questions, or if you encounter any bugs, please visit this thread: https://forums.penny-arcade.com/discussion/240191/forum-advertisement-faq-and-reports-thread/
Options

Tryin' to make a semi-complex program, input desired

ObbiObbi Registered User, ClubPA regular
edited June 2008 in Help / Advice Forum
Alright! So, here's the deal. About a week ago in Critical Failures I made a wild claim that I was working on some program to replace this dumb guy who everybody loves.

They think he's great because he can count votes in Phalla. Like, super good. The man is essentially a machine.

However, I made this claim that I was working on a vote counting program so that I could hang that over his head, to show the capabilities of a real algorithm! Unfortunately, I'm stumped.

I have no clue as to what coding language would be best suited for this project. PHP with XML? Java? I didn't even consider those, and I've already begun testing some things out using ASP.NET with VB.NET for the sake of laziness.

Basically all this is for naught without two questions being answered:

1.) How do I make a program that allows the user to enter the URL they want scanned for votes?
2.) Is this even kosher for the forum? It's been pulled off before by making a program that scanned a singular thread for pictures and downloaded them, but that was a while ago.

Essentially I'm looking for some insight on either what language I should be going for if this sort of thing isn't possible using ASP.NET, or if I'm on the right track, some guidance with using Google's API service would greatly appreciated.

Thanks for anybody takin' the time to look at this. I'm scourin' around lookin' for some good tutorials, templates, and other things that can get me started on this project and I'm makin' a small bit of headway, just not enough to really tell if MrBlarney's days are numbered.

Obbi on

Posts

  • Options
    ecco the dolphinecco the dolphin Registered User regular
    edited June 2008
    Since it's a small project and you're the sole developer, I'd recommend that if you were to go ahead with this, that you pick whichever language/platform/environment you are familiar with. The end user(s) won't care what language it's written in, as long as it runs.

    Which environments/languages are you familiar with?

    I was thinking about doing something similar in response to Hylianbunny's repeated changing of votes in day 1 of thorgot's current Phalla. Sort of an escalation of vote changing vs vote counting tools, if you will. I'm still tossing up if it's a good thing to do or not, to be honest.

    ecco the dolphin on
    Penny Arcade Developers at PADev.net.
  • Options
    ObbiObbi Registered User, ClubPA regular
    edited June 2008
    I'm most familiar with the .NET environment, so I'm looking into importing service references to handle as a customized search engine.

    as far as all this project entails, it'd have to get a lot of info for it to go down smooth. A lot more than what's possible to go down with no complications whatsoever. On top of having it search for !* (where * will be the player roster the user imports for the session), it'll have to keep track of who voted on what so that retractions can be counted for appropriately.
    That, and being able to create an array of sorts to allow Nicknames to redirect to the actual user name. (Zot redirecting to Cold Salmon and Hatred)

    but all that's just theorizing if I can't get the damn thing off the ground first. I've tested out importing Google's search engine and all I've managed to do so far is make a local version of Google Fight.

    So basically I'm just kinda hoping for somebody who's done this before to kinda give me a push in the right direction as far as the initial web referencing thing goes.

    Obbi on
  • Options
    ecco the dolphinecco the dolphin Registered User regular
    edited June 2008
    You might also want to try this thread. It's the programming help thread in Moe's Tech Tavern. I'm not sure if the guys who check that thread also read H/A, so you might get some responses there too.

    ecco the dolphin on
    Penny Arcade Developers at PADev.net.
  • Options
    LegionnairedLegionnaired Registered User regular
    edited June 2008
    You can one-line this on the unix command line with something like:
    wget -O - <URL> | egrep "/\!vote/i"
    

    Legionnaired on
  • Options
    CrystalMethodistCrystalMethodist Registered User regular
    edited June 2008
    Sorry, can you explain exactly what you're doing? I don't do RPG stuff so I don't quite understand what the program needs to do (and I'm betting that I'm not alone).

    Shell scripting isn't a bad idea, but you may want to make the program more robust than what was posted. If the task is "given a thread, count how many people say 'vote'" then something like Perl or Ruby might work well too. You can basically write regexs to scrape the HTML in the page and break it into a list of {post #, poster, text} and then have a reasonable way to scan the text part for the word vote.

    Explain what you're doing a bit more and tell us what languages and platforms you've used.

    CrystalMethodist on
  • Options
    ObbiObbi Registered User, ClubPA regular
    edited June 2008
    Ah, sorry, I didn't want to completely cover the first post in the idea of what I'm tryin' to accomplish here.
    Phalla is a forum-based game that goes on often on these boards, the main aspect around it is that there's a "mafia" that is hidden within the playerbase, and they need to be removed from the game in order for the good side to win. The basic aspect around that is having users declare a !vote in the thread, with the person with the most votes getting kicked out of the game.

    Generally it doesn't become a huge hassle dealing with votes when it's a small game, but there are times when the playerbase goes past 70 or so and when you're short on time to go through all the votes, it gets a bit taxing. I figure a program that can count votes for you could be a big help.

    Basically the program has to boil down like this:
    • Able to scan multiple pages of a thread that the user submits: Such as the user submits page 30 out of a specific thread (Page 30 being a page that holds the announcement that a new round has started and new votes are being placed.), and the program goes to that page in that thread to scan for votes, and continues to the next page in the thread until there aren't any more pages left. There may be a way for it to just scan from Page X to Page Y, but that's not necessary.
    • Able to store the list of players as a local data set for the game: As such, that would allow for the program to search with a wildcard that would be the array of player names it has stored to look for when that wildcard is declared. On top of this, there needs to be a way to allow for certain things such as having "known nicknames" redirect to the proper player name so the program does not pass over someone. This could be redone differently, but it would have to have the user to some sorting on their own in order to get it format proplerly.
    • Able to make notes of who voted on who: This would allow the user to make use of the program via a decent vote-track record when the time comes to try and distinguish people that could be a "mafia". On top of this, since players are allowed to change their votes at any time before the alloted time, the program will have to be able to subtract a vote from one player and add it to the new target.
    • Able to format the visible data to the player for readability: Simply put, the program would need to be able to make a list of all who was voted on, how many votes each player has on them, and everyone who voted on these players. This isn't an actual necessity in order for the program to function, but it would make the program a bit less useful if you couldn't make a lot of sense at what you're looking at.

    In short:
    1. Player imports the Playerlist from the Phalla thread (via copy & paste most likely)
    2. user submits the URL they want to begin tracking votes
    3. Program attempts the connection, if successful begins to look for votes
    4. if votes are found, it adds a counter to whoever was the target of the vote, in the event it finds that a vote is placed by a person it already voted, it subtracts from the previous target and adds it to the new target.
    5. if no further votes are found, it attempts to move on to the next page in the thread and continue the sequence
    6. if it is impossible to move on to another page (presumably because there are no more pages to move to) the program stops the search and makes a list of the players that were voted on, ranked from most votes to least

    That's essentially what I'm trying to pull off. Right now I'm trying to attempt this using ASP.NET with VB.NET and by importing Google's API service to try and customize into the sort of search engine I need.

    Obbi on
  • Options
    MonoxideMonoxide Registered User, ClubPA regular
    edited June 2008
    I think you're kind of making this way more difficult than it has to be

    Why not just write a simple screen scraping app that looks for instances of each Player's Name in each page?

    Aren't most phalla votes green or red (or some other color)? Subtract from their count with red and add with green. That way you don't have to worry about who is voting for who, just the tally.

    This way you have the start of a thread as input, it scrapes the page searching for each user, adding on green and subtracting on red. Then it goes to the next page, and does the same.

    If this is going to be a public application, you should probably cache local copies of each page so you don't need to keep hitting the server every time someone wants to check their thread's tally.

    Monoxide on
  • Options
    MonoxideMonoxide Registered User, ClubPA regular
    edited June 2008
    You know, what you should actually do is just search for any terms that match the vote colors, instead of checking it against the player list, so you're only making one pass through each page. Then at the end you could check those tallies against the player list, and display the extraneous ones seperately. Just so if someone misspells someones name, you can manually add it to the tally.

    edit: I guess if the colors aren't always accurate you could still do one pass per person on each page, just make sure you're not re-downloading the page for each player or you're going to add a fuckload of unnecessary strain on your own server and PAs

    Monoxide on
  • Options
    ObbiObbi Registered User, ClubPA regular
    edited June 2008
    I'll admit I've had this far more complicated than it has to be. Screen Scraping was definitely something I didn't think of, as anything I've looked up now is all like "Use Web Services, dawg!"

    I'll look into that, anything to get me somewhere with this can be a big help. I was originally hoping to get something that can count using the colors voted with. The biggest problem I have with that is it would require the retract color (lime) where as if you're able to track the user who's voting, you don't have to deal with players who forget to retract, or even users who try to screw with the program by putting a bunch of !votes in their post.

    but yeah, I'll look into screen scraping. That should help a lot, thanks!

    Obbi on
  • Options
    JasconiusJasconius sword criminal mad onlineRegistered User regular
    edited June 2008
    Obbi wrote: »
    7as anything I've looked up now is all like "Use Web Services, dawg!"

    Using web services implies that the forums supply you with some sort of web service to read the data.

    But I don't think these forums do that.

    The problem you're going to have is that in order to have any program that is worth a damn you're going to have to code against fraud.

    I'm not totally familiar with Phalla or how it works at all but if you're doing Regexes for something simple like !name, then it will be easy to game.

    How easy it is depends on how consistence the HTML source is for the forum pages (I don't know).

    It will require multiple layers of regex and a lot of thought to prevent people from doing sneaky things.

    You'll probably spend more time trying to remove exploits than actually writing the base code.

    Jasconius on
  • Options
    CrystalMethodistCrystalMethodist Registered User regular
    edited June 2008
    If you don't know about them, learn about regular expressions. Using those bad boys, you can find out the urls of the rest of the pages/the number of pages in the thread/separate posts/etc.

    Once you have that data, you can use regular expressions to pull out names and see whether the text has the word vote or not in it. I don't know how good of a programmer you are, but then you want to use something like a hashmap (or a database if you want this persistent) to map names --> vote counts. You may also want to have a system to prevent people from voting twice.

    If you want the whole thing posted online so that you can access it at any time to see vote counts, you'll want to put everything on the web and do it as a MySQL-backed site. Any scripting language can provide you with a simple front-end. I would then write a scraper that rolls through the post history and counts votes, and set that scraper up as a cron job that runs every hour or whatever and updates the db.

    CrystalMethodist on
  • Options
    CrystalMethodistCrystalMethodist Registered User regular
    edited June 2008
    Jasconius wrote: »
    Obbi wrote: »
    7as anything I've looked up now is all like "Use Web Services, dawg!"

    Using web services implies that the forums supply you with some sort of web service to read the data.

    But I don't think these forums do that.

    The problem you're going to have is that in order to have any program that is worth a damn you're going to have to code against fraud.

    I'm not totally familiar with Phalla or how it works at all but if you're doing Regexes for something simple like !name, then it will be easy to game.

    How easy it is depends on how consistence the HTML source is for the forum pages (I don't know).

    It will require multiple layers of regex and a lot of thought to prevent people from doing sneaky things.

    You'll probably spend more time trying to remove exploits than actually writing the base code.

    I think this gets solved by just having a one-vote-per-person policy. If someone games the system, they've just changed their vote (assuming you can do that) and they haven't really gamed anything at all. As long as they can't affect other people's choices, there's no real way to mess with stuff. The HTML for pages is EXTREMELY consistent-- it's being generated by a computer program that's sending out the same output every single time.

    CrystalMethodist on
  • Options
    JasconiusJasconius sword criminal mad onlineRegistered User regular
    edited June 2008
    Jasconius wrote: »
    Obbi wrote: »
    7as anything I've looked up now is all like "Use Web Services, dawg!"

    Using web services implies that the forums supply you with some sort of web service to read the data.

    But I don't think these forums do that.

    The problem you're going to have is that in order to have any program that is worth a damn you're going to have to code against fraud.

    I'm not totally familiar with Phalla or how it works at all but if you're doing Regexes for something simple like !name, then it will be easy to game.

    How easy it is depends on how consistence the HTML source is for the forum pages (I don't know).

    It will require multiple layers of regex and a lot of thought to prevent people from doing sneaky things.

    You'll probably spend more time trying to remove exploits than actually writing the base code.

    I think this gets solved by just having a one-vote-per-person policy. If someone games the system, they've just changed their vote (assuming you can do that) and they haven't really gamed anything at all. As long as they can't affect other people's choices, there's no real way to mess with stuff. The HTML for pages is EXTREMELY consistent-- it's being generated by a computer program that's sending out the same output every single time.

    Yeah but not necessarily, you could have things like div or cell ID's that vary, which makes the regex that much more complex.

    You are right about the 1 vote per person per post thing, and that works as long as your post detection is reliable. I think I was headed in that direction mentally but just didn't type it out for him ;)

    Jasconius on
  • Options
    ecco the dolphinecco the dolphin Registered User regular
    edited June 2008
    Post detection can be made extremely reliable by using the showpost.php URL that these forums have.

    e.g. the post above mine, and only the post above mine:

    http://forums.penny-arcade.com/showpost.php?p=5932384&postcount=14

    ecco the dolphin on
    Penny Arcade Developers at PADev.net.
  • Options
    JasconiusJasconius sword criminal mad onlineRegistered User regular
    edited June 2008
    eecc wrote: »
    Post detection can be made extremely reliable by using the showpost.php URL that these forums have.

    e.g. the post above mine, and only the post above mine:

    http://forums.penny-arcade.com/showpost.php?p=5932384&postcount=14

    Maybe, but that would force you to hit the servers a lot, potentially hundreds of times, even thousands in a longer thread.

    I'm not sure how the dbase works but that possibly entails not only calling post data up from it, but also querying user data like avatar and sig hundreds and hundreds of times.

    It would be slow and stressful to the forum servers.

    Collecting the entire page and parsing it would be more efficient I think.

    Jasconius on
  • Options
    ecco the dolphinecco the dolphin Registered User regular
    edited June 2008
    I agree - when I was thinking about implementing something similar, I followed a similar chain of thought, and came to the conclusion that a Firefox/Opera/IE extension would be the best solution.

    The extension would scrape the page the user was reading on a button press or something.

    This is because in Phallas, the user should hopefully be reading the thread anyway. So an extension that works on the page being read will not increase server load at all since, well, the user would have loaded that page in their browser anyway.

    ecco the dolphin on
    Penny Arcade Developers at PADev.net.
  • Options
    MonoxideMonoxide Registered User, ClubPA regular
    edited June 2008
    if you take a look at the page source, each post is contained within comments, like so

    <!-- message -->
    <!-- / message -->

    and then after that is <!-- sig --> , etc

    so it should be pretty simple to make sure only one vote is counted per post.

    Monoxide on
  • Options
    DeathPrawnDeathPrawn Registered User regular
    edited June 2008
    In terms of watching out for people who vote multiple times or change their vote, just store the votes in a hash table / associative array / whatever it's called in your language of choice. When you get to what you recognize as a vote, add it to the array with the key being that person's username (i.e. votes contains John Smith's vote). If a person posts more than one vote for whatever reason, the most recent vote will automatically override the others. It's easy to get a vote total, and if you do it right it shouldn't be tough to get the voting record of any individual.

    DeathPrawn on
    Signature not found.
  • Options
    ObbiObbi Registered User, ClubPA regular
    edited June 2008
    this is excellent. I can pick things like this up easily enough, it's just hard to ask the right questions in programming.

    You guys have been incredibly helpful, I owe you one. If something comes up, or if I have an alpha version of the program available, I'll probably bring it up to see if there's anything I can do to make it more effecient.

    Obbi on
  • Options
    SushisourceSushisource Registered User regular
    edited June 2008
    I've done a few things like this using C# in combination with regexes.

    Mostly it's just a matter of looking at the HTML source of the given pages until you can figure out a reliable way to extract the needed data out of every post, and then just hardcode your extraction methods. The only problem is if the showthread.php script changes how it generates HTML your program is instantly useless (most likely).

    I don't know anything about Phallas, but DeathPrawn and Monoxide made good suggestions.

    Sushisource on
    Some drugee on Kavinsky's 1986
    kavinskysig.gif
Sign In or Register to comment.