The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
The Guiding Principles and New Rules document is now in effect.

Word clouds

AntoineAntoine __BANNED USERS regular
edited November 2010 in H.Q. Reception Desk
When you mouse over a thread title a little word cloud should pop up (similar to how the first post's text pops up) to help you see what the hell the thread is really talking about at the moment.

Also makes it easy to see if a thread has derailed hard, without really reading the posts.

There's just too many threads to read these days.

Antoine on

Posts

  • AldoAldo Hippo Hooray Registered User regular
    edited November 2010
    I don't get it, don't the alt-text pop ups on hover over already show the first few words of the OP? That's usually an indication on what the thread is about. A way to see if a thread has gone off-topic without having to read it sounds like dark majick.

    Aldo on
  • BahamutZEROBahamutZERO Registered User regular
    edited November 2010
    do you mean a word cloud sort of like what the comic archive search page does? http://www.penny-arcade.com/archive/

    BahamutZERO on
    BahamutZERO.gif
  • AntoineAntoine __BANNED USERS regular
    edited November 2010
    Allow me to illustrate:

    wordcloud.png

    This is what you would see, as you moused over a thread.

    Just by looking at those words, and their popularity indicated by their relative sizes, could you guess what the thread is talking about?

    Yup.

    But would you have to read a single post to figure that out?

    Nope.

    What does this mean for PA?

    It means mods can better identify the threads that are going way too off topic. For instance, that Olbermann thread is probably going off topic. There's some shit in there that doesn't make sense. This saves time, and thus, saves money.

    It also helps us, the posters. Many times, we don't even read posts. We just skim them, and we skim them for a while, looking for threads that look interesting. Wouldn't it be great if we can just "taste" a thread before we even open it? It would allow us to find good threads quicker, without reading a bunch of bullshit first.

    This is especially useful too for stuff like chat threads or pretty much any thread in SE++, where the topic may be very volatile and subject to change from post to post. A word cloud would give us an indication of where the thread is at.
    I don't get it, don't the alt-text pop ups on hover over already show the first few words of the OP? That's usually an indication on what the thread is about.

    Ha, that's incredibly naive.

    We are in the year of two thousand ten, in the third millennium of human history. Threads and conversations are no longer moving at mere internet speeds, they are moving at mobile speeds.

    Thread posts are like stars, the earliest posts in the thread are like the light of distant stars just reaching us. They look up to date, but in fact, those thoughts and conversations died long ago, and we are now talking about new things.

    Antoine on
  • FyreWulffFyreWulff YouRegistered User, ClubPA regular
    edited November 2010
    That's really database intensive though, because

    1) you'd have to iterate through every word in the thread - this is just as intensive as running a search

    2) and just like search, you'd have to drop indexing 2 letter and 3 letter words

    3) to keep it from killing the forum, you'd have to cache the 'tag cloud', and in fast moving threads this would make the Tag Cloud of Relevance lag too far behind to be useful.

    We DID have a manual tagging feature that came as part of the new forum upgrade, but it was disabled. Maybe it can come back some day? (It probably won't, though)

    FyreWulff on
  • Captain CarrotCaptain Carrot Alexandria, VARegistered User regular
    edited November 2010
    Not on this forum. Some threads move very quickly, but most don't.

    Captain Carrot on
  • AntoineAntoine __BANNED USERS regular
    edited November 2010
    FyreWulff wrote: »
    That's really database intensive though, because

    1) you'd have to iterate through every word in the thread - this is just as intensive as running a search

    2) and just like search, you'd have to drop indexing 2 letter and 3 letter words

    3) to keep it from killing the forum, you'd have to cache the 'tag cloud', and in fast moving threads this would make the Tag Cloud of Relevance lag too far behind to be useful.

    We DID have a manual tagging feature that came as part of the new forum upgrade, but it was disabled. Maybe it can come back some day? (It probably won't, though)


    Not really that intense.

    See what you do, is everyone time someone makes a new post or edits a post, you iterate through every word in their post, and use the word as a "key" for a hash table of values. Basically everytime a word is found in the hash table dictionary, you see if it has a stored count value, and if it does, increment by 1, and if not, set the value to 1. Or something.

    Later, you would take the values and display the word cloud words based on some parameters, for instance, words would start showing up after a certain count value (which may or may not depend on how long the thread is), and the font sizes of the words would increase as the value gets higher.

    Antoine on
  • FyreWulffFyreWulff YouRegistered User, ClubPA regular
    edited November 2010
    Yes, but forum features must be throttled/designed for worst case scenarios, otherwise fast threads like chat threads or live-event threadsd would impact the rest of the forum negatively.

    FyreWulff on
  • FyreWulffFyreWulff YouRegistered User, ClubPA regular
    edited November 2010
    Antoine wrote: »
    FyreWulff wrote: »
    That's really database intensive though, because

    1) you'd have to iterate through every word in the thread - this is just as intensive as running a search

    2) and just like search, you'd have to drop indexing 2 letter and 3 letter words

    3) to keep it from killing the forum, you'd have to cache the 'tag cloud', and in fast moving threads this would make the Tag Cloud of Relevance lag too far behind to be useful.

    We DID have a manual tagging feature that came as part of the new forum upgrade, but it was disabled. Maybe it can come back some day? (It probably won't, though)


    Not really that intense.

    See what you do, is everyone time someone makes a new post or edits a post, you iterate through every word in their post, and use the word as a "key" for a hash table of values. Basically everytime a word is found in the hash table dictionary, you see if it has a stored count value, and if it does, increment by 1, and if not, set the value to 1. Or something.

    Later, you would take the values and display the word cloud words based on some parameters, for instance, words would start showing up after a certain count value (which may or may not depend on how long the thread is), and the sizes of the words would increase as the value gets higher.

    The PA forums are a really big and really intensive thing that have to live alongside and not impact the front money-making page of the site at all in any way shape or form.

    There's a reason why the search timer is almost two minutes long (well, the last time I actually used in-forum search anyway - a lot of people just use google search with parameters set to just search the forums).

    If you could figure out how to make indexing faster than what exists out there that doesn't require a Google datacenter, you would win a Nobel easily.

    FyreWulff on
  • AntoineAntoine __BANNED USERS regular
    edited November 2010
    FyreWulff wrote: »
    Yes, but forum features must be throttled/designed for worst case scenarios, otherwise fast threads like chat threads or live-event threadsd would impact the rest of the forum negatively.


    Keeping count of words used in a thread would be very fast I suspect, especially with a hash table. Increasing the count of a word is close to like O(1) timing or something.

    I think indexing for search requires a shitload more work.

    The only bottleneck I see is then iterating through the entire hash table and finding the words with the highest values.

    I suspect though, there may be a good way to do that shit. Maybe throw each word entry into a binary heap where the count value is it's weight. This would allow you to instantly pop out the next highest value word in O(1) timing.

    But shit, then you'd have to get all the words in ABC order...

    Antoine on
  • TubeTube Registered User admin
    edited November 2010
    Antoine wrote: »
    Ha, that's incredibly naive.

    Don't be rude in my forum. I don't appreciate it.

    We have no plans to implement your suggestion.

    Tube on
This discussion has been closed.