As was foretold, we've added advertisements to the forums! If you have questions, or if you encounter any bugs, please visit this thread: https://forums.penny-arcade.com/discussion/240191/forum-advertisement-faq-and-reports-thread/
Options

[Programming] Kafkaesque rabbits in the queue at the pub

13567100

Posts

  • Options
    ecco the dolphinecco the dolphin Registered User regular
    Monday Morning.

    Get latest from source control.
    Build.
    Run.
    Exceptions thrown from failed db migrations.
    Hm. Maybe I accidentally forked the db schema during development on Friday? Oh well, no big deal.
    Restore db from a full copy a month ago.
    Run.
    Exceptions thrown from failed db migrations, again.
    Wot?
    Manually apply migrations, because I need to finish my own dev, darn it.
    Yes, able to get to the start screen.
    Log in to test changes.
    Db connection failures - everything is broken oh my god Monday whhhyyyy

    Penny Arcade Developers at PADev.net.
  • Options
    OrcaOrca Also known as Espressosaurus WrexRegistered User regular
    Sounds like someone's got a case of the Mondays!

  • Options
    ecco the dolphinecco the dolphin Registered User regular
    Okay.

    I think it was a terrible case of cascaded failures.

    From a quick and cursory glance, a POCO was updated to the latest schema.

    However, this POCO was used in a DB transaction before the migrations were run for some reason?

    Of course, the DB didn't contain the columns necessary for the POCO, so an exception was thrown.

    The exception handler caught it, and... I suspect that by changing flow control to the exception handler, it meant that the migrations weren't run.

    But it let the server continue running, which hit the login screen okay, because nothing in the front page that used an updated db schema.

    Once I logged in though, all hell broke loose due to missing columns etc.

    Penny Arcade Developers at PADev.net.
  • Options
    EchoEcho ski-bap ba-dapModerator mod
    Nogs wrote: »
    Are you just using setState for state management?

    You might have hit that spot where things like MobX or redux will finally be worth looking into.

    I actually just used MobX for a small little personal thing and was amazed at how simple it was in comparision to redux.

    So I had a quick look at MobX now that I had some time and oh boy, this feels seriously amazing.

  • Options
    templewulftemplewulf The Team Chump USARegistered User regular
    So... working at Generic MegaCorp, the only guy who had any kind of real web experience in my extended team-family left. We went many years without one and I suspect it will be many years before we get another. Which basically means I need to learn to crank out a simple internal-use-only web UI without really having the time to learn heavy frameworks or npm or node or stuff like that. If I can't do it with a
    <script src="https://some-cdn/nice-lib.js" />
    
    tag, I don't want to hear about it.

    I don't need to do fancy things. 95% of the time I'm just going to be dumping database tables into sortable, filterable tables. I might have a d3.js chart occasionally. It will usually be coming from some kind of REST API because I at least know enough to do it that way.

    I tried to check out Angular 2, since I have once written a tiny bit of simplistic Angular 1 stuff, probably badly, but it makes wild assumptions about everything before it even gets to hello world, like having or wanting npm or node.js.

    So at this point I'm thinking Bootstrap and Angular 1, since that's what I (barely) know. Does anyone else have any good ideas for "Web UI for people who don't Web"?

    I prefer zurb foundation to bootstrap, but that's largely a matter of taste. I picked it at a stage in development when it had more semantic class composition support, but I think they're at parity now.

    Bootstrap's community is much larger, though.

    Twitch.tv/FiercePunchStudios | PSN | Steam | Discord | SFV CFN: templewulf
  • Options
    gavindelgavindel The reason all your software is brokenRegistered User regular
    Large scale performance improvements are satisfying if rare. I managed to cut the time to delete for a subset of financial plan entries by 60 percent.

    Even after a year here, it is strange to think the sheer number of people my work can hit. I may have shaved five minutes off the task for 10,000 people four times a year.

    Book - Royal road - Free! Seraphim === TTRPG - Wuxia - Free! Seln Alora
  • Options
    [Michael][Michael] Registered User regular
  • Options
    TofystedethTofystedeth Registered User regular
    You don't know that! What if it took Gav 3500 hours to fix it? It make take 2 years to see the savings!

    steam_sig.png
  • Options
    AngelHedgieAngelHedgie Registered User regular
    XBL: Nox Aeternum / PSN: NoxAeternum / NN:NoxAeternum / Steam: noxaeternum
  • Options
    gavindelgavindel The reason all your software is brokenRegistered User regular
    You don't know that! What if it took Gav 3500 hours to fix it? It make take 2 years to see the savings!

    Two days. 90% of which was "Why was it built this way?!" and 10% of which was "wait, why isn't that set based execution code firing?...oooooh."

    Book - Royal road - Free! Seraphim === TTRPG - Wuxia - Free! Seln Alora
  • Options
    Alistair HuttonAlistair Hutton Dr EdinburghRegistered User regular
    So I'm trying to search for examples of a technology and I am stumped becausee I don't know what the correct terminology is so if you could chime in that would be great.

    Basically I presume it is an annex of machine learning of some sort or other data munging type tech

    Given Data:
    I have an input (say a object representing a search on a price comparison website) and an example output (A url to hit a third party supplier's website with all the GET/POST parameters/Headers filled in for the given input object).

    What I Want:
    What I'm looking for is the tech that would be automatically be able to take the example Input/Output pair work out the generalised mapping/transformation between them so that for any other new flights search I present it spits out the correctly formatted URL. Whether that internals of that mapping is then an opaque black box or a nicely formatted script is not that important to me.

    This seems like something that must already exist but I don't know the right magic words to type into Google to make them appear.

    I have a thoughtful and infrequently updated blog about games http://whatithinkaboutwhenithinkaboutgames.wordpress.com/

    I made a game, it has penguins in it. It's pay what you like on Gumroad.

    Currently Ebaying Nothing at all but I might do in the future.
  • Options
    bowenbowen How you doin'? Registered User regular
    Can you give an example? It sounds like you more want to write your own intermediary RESTful service that does the transposing of the data from A to B.

    not a doctor, not a lawyer, examples I use may not be fully researched so don't take out of context plz, don't @ me
  • Options
    Alistair HuttonAlistair Hutton Dr EdinburghRegistered User regular
    bowen wrote: »
    Can you give an example? It sounds like you more want to write your own intermediary RESTful service that does the transposing of the data from A to B.

    Sure

    User has come to my Widget price comparison website as they want to compare price of widgets, they've entered the Search terms Colour=Blue, MinSize=48, MaxSize=123, Volume=1000000so my web front end pakcages that all up as a JSON object (whatever) and sends it to my backed PriceFetcher service.

    What the Pricefetcher service does is take the object that represents the user query
    {
    Colour: Blue
    MinSize: 48,
    MaxSize: 123,
    Volume: 1000000
    }
    

    and call the API of a dozen different widget suppliers who all have their own unique API's

    So Widget's R Us would expect to see
    GET http://www.widgestrus.io/api/search?c=blue&min=48&max=123&vol=1000000&auth=supersecretpassword
    
    while WigStore uses a hideous XML interface and want something like
    POST https://www.wigstore.com/interaface.asp
    <xml>
     <blah
       <seventeen
         <levels
             <deep
               <ohgod
                 <pleasemakeitstop
                    <isoStandardColours type="isostandard">Blue</isoStandardColours>
                    <moreThanOneHundredWidgets>Yes</moreThanOneHundredWidgets>   
    ...
    etc
    
    with a bunch of terrible headers

    So I need to write a dozen different pieces of code that format the request for each supplier. Individually each formatter is a mostly trivial piece of code (although never underestimate the absurdness of people's APIs). But say you have 1000 suppliers, then it becomes a really tedious, error prone process to write the formatters.

    Effectively what I want is two boxes where I can paste the query object into one, the example output URL in the other and the system automatically calculates ans spits out the transformations required to create the output from the input with no human involvement. In the case of Widgest'R'Us that's a really simple set of operations, for WigStore it is more complicated and might require more than 1 example to work out the transforms required.

    This seems like a really classic computer science problem but I just can't put my finger on the right term to Google it.

    I have a thoughtful and infrequently updated blog about games http://whatithinkaboutwhenithinkaboutgames.wordpress.com/

    I made a game, it has penguins in it. It's pay what you like on Gumroad.

    Currently Ebaying Nothing at all but I might do in the future.
  • Options
    mightyjongyomightyjongyo Sour Crrm East Bay, CaliforniaRegistered User regular
    There's a non-zero chance I'm going to end up becoming the DevOps manager in the next few months.

    There's also a good chance I'm going to end up doing outside hires instead of internal for the team.
    AHHHHHHHHH god I have no clue what I'm doing

  • Options
    AngelHedgieAngelHedgie Registered User regular
    bowen wrote: »
    Can you give an example? It sounds like you more want to write your own intermediary RESTful service that does the transposing of the data from A to B.

    Sure

    User has come to my Widget price comparison website as they want to compare price of widgets, they've entered the Search terms Colour=Blue, MinSize=48, MaxSize=123, Volume=1000000so my web front end pakcages that all up as a JSON object (whatever) and sends it to my backed PriceFetcher service.

    What the Pricefetcher service does is take the object that represents the user query
    {
    Colour: Blue
    MinSize: 48,
    MaxSize: 123,
    Volume: 1000000
    }
    

    and call the API of a dozen different widget suppliers who all have their own unique API's

    So Widget's R Us would expect to see
    GET http://www.widgestrus.io/api/search?c=blue&min=48&max=123&vol=1000000&auth=supersecretpassword
    
    while WigStore uses a hideous XML interface and want something like
    POST https://www.wigstore.com/interaface.asp
    <xml>
     <blah
       <seventeen
         <levels
             <deep
               <ohgod
                 <pleasemakeitstop
                    <isoStandardColours type="isostandard">Blue</isoStandardColours>
                    <moreThanOneHundredWidgets>Yes</moreThanOneHundredWidgets>   
    ...
    etc
    
    with a bunch of terrible headers

    So I need to write a dozen different pieces of code that format the request for each supplier. Individually each formatter is a mostly trivial piece of code (although never underestimate the absurdness of people's APIs). But say you have 1000 suppliers, then it becomes a really tedious, error prone process to write the formatters.

    Effectively what I want is two boxes where I can paste the query object into one, the example output URL in the other and the system automatically calculates ans spits out the transformations required to create the output from the input with no human involvement. In the case of Widgest'R'Us that's a really simple set of operations, for WigStore it is more complicated and might require more than 1 example to work out the transforms required.

    This seems like a really classic computer science problem but I just can't put my finger on the right term to Google it.

    This is where interfaces come in to play. Basically, you want to have an IWidgetFormatter interface that defines the WidgetFormat function. Then, you implement each WidgetFormatter with the code to generate the formatted code. Finally, you'll need a WidgetFormatterFactory to dynamically get the proper WidgetFormatter based on where you're making the request.

    XBL: Nox Aeternum / PSN: NoxAeternum / NN:NoxAeternum / Steam: noxaeternum
  • Options
    bowenbowen How you doin'? Registered User regular
    bowen wrote: »
    Can you give an example? It sounds like you more want to write your own intermediary RESTful service that does the transposing of the data from A to B.

    Sure

    User has come to my Widget price comparison website as they want to compare price of widgets, they've entered the Search terms Colour=Blue, MinSize=48, MaxSize=123, Volume=1000000so my web front end pakcages that all up as a JSON object (whatever) and sends it to my backed PriceFetcher service.

    What the Pricefetcher service does is take the object that represents the user query
    {
    Colour: Blue
    MinSize: 48,
    MaxSize: 123,
    Volume: 1000000
    }
    

    and call the API of a dozen different widget suppliers who all have their own unique API's

    So Widget's R Us would expect to see
    GET http://www.widgestrus.io/api/search?c=blue&min=48&max=123&vol=1000000&auth=supersecretpassword
    
    while WigStore uses a hideous XML interface and want something like
    POST https://www.wigstore.com/interaface.asp
    <xml>
     <blah
       <seventeen
         <levels
             <deep
               <ohgod
                 <pleasemakeitstop
                    <isoStandardColours type="isostandard">Blue</isoStandardColours>
                    <moreThanOneHundredWidgets>Yes</moreThanOneHundredWidgets>   
    ...
    etc
    
    with a bunch of terrible headers

    So I need to write a dozen different pieces of code that format the request for each supplier. Individually each formatter is a mostly trivial piece of code (although never underestimate the absurdness of people's APIs). But say you have 1000 suppliers, then it becomes a really tedious, error prone process to write the formatters.

    Effectively what I want is two boxes where I can paste the query object into one, the example output URL in the other and the system automatically calculates ans spits out the transformations required to create the output from the input with no human involvement. In the case of Widgest'R'Us that's a really simple set of operations, for WigStore it is more complicated and might require more than 1 example to work out the transforms required.

    This seems like a really classic computer science problem but I just can't put my finger on the right term to Google it.

    Looks like just simple data transformation/translation.

    You're likely not going to find a prebaked one for you. Should be pretty trivial.

    You've got keys you can work with in both JSON and XML.

    not a doctor, not a lawyer, examples I use may not be fully researched so don't take out of context plz, don't @ me
  • Options
    bowenbowen How you doin'? Registered User regular
    edited July 2016
    Some places call it 'mapping' instead of translation or transformation though, since you're doing a lot of extra work.

    bowen on
    not a doctor, not a lawyer, examples I use may not be fully researched so don't take out of context plz, don't @ me
  • Options
    OrcaOrca Also known as Espressosaurus WrexRegistered User regular
    If both sides were under your control, the classic example would be to make a compiler for a domain specific language that would take your domain-specific language, transform it into an IR, and use different backends to generate the code for each the targets. What you want is more complicated: given an input/output pair (and presumably an appropriate intermediate representation), you want something that can generate the generator for your output target. You want to code generate your code generator.

    Yo dawg, I herd u like code generation, so I made a code generator to generate your code generator.

    You may want to look up lexing, parsing, compilers, and code generation.

    The unfortunate thing is I don't think it's going to help you much since the output formats can be so different. I'm not sure there's a way around manually writing that glue layer. How, after all, can a tool tell that Volume : 1000000 maps onto <moreThanOneHundredWidgets>Yes</moreThanOneHundredWidgets>? Unless you're going to set up a learning system that can identify the tokens, given enough input/output pairs. I won't say it's impossible, but it does seem like it would be difficult. On the other hand, manually writing 1000 of these sounds tedious at best, and maintenance will be a pain.

    At minimum though, you should be able to come up with an IR that makes sense for all of the output formats--leaving just the output side to worry about.

    It makes me wonder how Google does it for things like Google News. Surely they're not just maintaining a separate parser for each of the tens of thousands of news sources they're scraping.

    Disclaimer: this isn't the space I normally work in, so there might be something I'm missing.

  • Options
    AngelHedgieAngelHedgie Registered User regular
    Basically, in the "real world", the solution to having multiple situation dependant transformations/mappings is to create a system that compartmentalizes the actual transform into a black box (usually using interfaces and factories), then building and testing each transform as a discrete unit. That way, the majority of the code is implemented once, and the transformations, once built, just slot in and run. Since everything is standardized, writing unit tests is simple and most can be reused.

    XBL: Nox Aeternum / PSN: NoxAeternum / NN:NoxAeternum / Steam: noxaeternum
  • Options
    OrcaOrca Also known as Espressosaurus WrexRegistered User regular
    Pluggable backends. Just like LLVM!

  • Options
    EchoEcho ski-bap ba-dapModerator mod
    That sounds like something I did where I pulled data from a bunch of streaming site backends to display in a unified format.

    Wrote one interface, implemented transformations for each site. Doable since it was just 4-5 sites.

  • Options
    OrcaOrca Also known as Espressosaurus WrexRegistered User regular
    edited July 2016
    The way I would do it (and again, not an expert in this space) would be to follow the typical compiler model.
    1. Separate the front-end. Maybe it makes sense or not to have separate lexing and parsing phases, but either way, have a distinct step that transforms the input into an intermediate representation.
    2. Ensure the IR is flexible enough to account for all of your known outputs and all expected changes to your inputs. Ideally, if your input format changes, you only change the front-end; nothing else needs to change. This may not be realizable, depending on how much the input format changes.
    3. Separate transformation stage. Consider omitting, but at worst it's just a passthrough.
    4. Separate backend for each target. The backend takes the IR and transforms it into the specific website's DSL (its query).

    Each of the stages is a distinct entity that can be swapped out and replaced.

    So the flow looks like: input data -> front-end (preprocessing, lexing, parsing, whatever else is appropriate) -> intermediate representation -> optimization/other transformations (if necessary) -> back-end (code generation)

    Ideally, the front-end and back-end can be changed independently. Changes to the IR may mean changes to everything else.

    My two cents.

    Orca on
  • Options
    Alistair HuttonAlistair Hutton Dr EdinburghRegistered User regular
    bowen wrote: »
    Can you give an example? It sounds like you more want to write your own intermediary RESTful service that does the transposing of the data from A to B.

    Sure

    User has come to my Widget price comparison website as they want to compare price of widgets, they've entered the Search terms Colour=Blue, MinSize=48, MaxSize=123, Volume=1000000so my web front end pakcages that all up as a JSON object (whatever) and sends it to my backed PriceFetcher service.

    What the Pricefetcher service does is take the object that represents the user query
    {
    Colour: Blue
    MinSize: 48,
    MaxSize: 123,
    Volume: 1000000
    }
    

    and call the API of a dozen different widget suppliers who all have their own unique API's

    So Widget's R Us would expect to see
    GET http://www.widgestrus.io/api/search?c=blue&min=48&max=123&vol=1000000&auth=supersecretpassword
    
    while WigStore uses a hideous XML interface and want something like
    POST https://www.wigstore.com/interaface.asp
    <xml>
     <blah
       <seventeen
         <levels
             <deep
               <ohgod
                 <pleasemakeitstop
                    <isoStandardColours type="isostandard">Blue</isoStandardColours>
                    <moreThanOneHundredWidgets>Yes</moreThanOneHundredWidgets>   
    ...
    etc
    
    with a bunch of terrible headers

    So I need to write a dozen different pieces of code that format the request for each supplier. Individually each formatter is a mostly trivial piece of code (although never underestimate the absurdness of people's APIs). But say you have 1000 suppliers, then it becomes a really tedious, error prone process to write the formatters.

    Effectively what I want is two boxes where I can paste the query object into one, the example output URL in the other and the system automatically calculates ans spits out the transformations required to create the output from the input with no human involvement. In the case of Widgest'R'Us that's a really simple set of operations, for WigStore it is more complicated and might require more than 1 example to work out the transforms required.

    This seems like a really classic computer science problem but I just can't put my finger on the right term to Google it.

    This is where interfaces come in to play. Basically, you want to have an IWidgetFormatter interface that defines the WidgetFormat function. Then, you implement each WidgetFormatter with the code to generate the formatted code. Finally, you'll need a WidgetFormatterFactory to dynamically get the proper WidgetFormatter based on where you're making the request.

    I must have been a bit unclear in my explanation, I'm not looking for a way to structure the code to perform the task of creating the API calls. I'm looking for a way to not write the code at all.

    We already have 1000 third party websites to connect to where we've written the code to go from query object to API call by hand, we want to got to 10000 websites. It just isn't scaleable to write and maintain the code manually - we want to automate everything.

    I've just realised that I want to be looking at some variant of an Inference Engine or possibly a Planning System variant.

    I have a thoughtful and infrequently updated blog about games http://whatithinkaboutwhenithinkaboutgames.wordpress.com/

    I made a game, it has penguins in it. It's pay what you like on Gumroad.

    Currently Ebaying Nothing at all but I might do in the future.
  • Options
    Alistair HuttonAlistair Hutton Dr EdinburghRegistered User regular
    edited July 2016
    Orca wrote: »
    How, after all, can a tool tell that Volume : 1000000 maps onto <moreThanOneHundredWidgets>Yes</moreThanOneHundredWidgets>? Unless you're going to set up a learning system that can identify the tokens, given enough input/output pairs.

    That is totes what I am aiming for. An AI system that does this automatically. I know there are systems that do similar things I just was struggling to work out the correct terminology to allow we to search for what is out there.

    Alistair Hutton on
    I have a thoughtful and infrequently updated blog about games http://whatithinkaboutwhenithinkaboutgames.wordpress.com/

    I made a game, it has penguins in it. It's pay what you like on Gumroad.

    Currently Ebaying Nothing at all but I might do in the future.
  • Options
    bowenbowen How you doin'? Registered User regular
    I don't think such a thing exists, is the problem.

    However, there's no reason you couldn't do a transformation yourself by providing a templated XML and coding a system that takes JSON data and places it in the templated XML.

    That's the gist of what you're trying to do. There's no real prebuilt package for this.

    Of course as programmers we're going to tell you to program it.

    not a doctor, not a lawyer, examples I use may not be fully researched so don't take out of context plz, don't @ me
  • Options
    bowenbowen How you doin'? Registered User regular
    edited July 2016
    Oh learning AI for translating XML? Good luck then.

    bowen on
    not a doctor, not a lawyer, examples I use may not be fully researched so don't take out of context plz, don't @ me
  • Options
    Alistair HuttonAlistair Hutton Dr EdinburghRegistered User regular
    Yeah, it's totally a Planning System I want, it was a real tip of my tounnge situation.

    Define a few operations and then let the planner determine the sequence of operations.

    I have a thoughtful and infrequently updated blog about games http://whatithinkaboutwhenithinkaboutgames.wordpress.com/

    I made a game, it has penguins in it. It's pay what you like on Gumroad.

    Currently Ebaying Nothing at all but I might do in the future.
  • Options
    durandal4532durandal4532 Registered User regular
    edited July 2016
    Alright so I've been at this job for a bit over a year, and basically my boss uses STATA and I use R

    We both kind of came out of academics, hence academic-ish stats programs, that's just what we're used to.

    But like... it is becoming very clear that what we're doing is not the right way to do what we do long term. We wind up with poorly built rickety scripts that are really hard to keep track of and really difficult to show other people or keep up to date effectively. Just lots and lots of individual scripts that rely on specific prepping to run.

    So what I'm trying to look for is a change in language and/or approach for the both of us to leap into and try to focus on doing this stuff properly.

    A task we often do:
    - Take information in the form of .csv and .xls files, usually tables wit some mixture of data types, saved locally
    - Alter that information, table operations and reshaping and all that, making new variables out of old ones, correcting errors
    - Report data from these tables/reshaped things
    - Most of the data reporting winds up being done in Tableau or like just some raw numbers like "20 people blah, 40 people blahed".

    But like basically I feel like the sort of thing we were doing in academia isn't coming up too often here. We're not doing complex stats all that often so much as just some straightforward math and fussing about with graphing in a separate much prettier program because we can actually buy a thing to make nice graphs. We're doing data manipulation more often than we're doing stats, because often the .csv and .xls files MUST come in exactly that layout, and we've got to fuss with them a lot before they work right for our purposes.

    And I've definitely run into some issues where we were doing operations on such giant data sets that R would chug for like 20-40 minutes to complete operations. There was one moment with literal millions of rows of thousands of columns where it just was not possible to perform certain operations, even after making the code as efficient as I could manage.

    Is there another language that people seem to enjoy for this sort of work? I've had a bit of experience with SQL, a little C#, a bit of Python. I'm open to learning whatever, though, my boss is also eager to pick up something other than STATA scripting. Also would help to have a nice strong style guide/tutorial so we don't get into bad habits again!

    durandal4532 on
    Take a moment to donate what you can to Critical Resistance and Black Lives Matter.
  • Options
    PhyphorPhyphor Building Planet Busters Tasting FruitRegistered User regular
    bowen wrote: »
    I don't think such a thing exists, is the problem.

    However, there's no reason you couldn't do a transformation yourself by providing a templated XML and coding a system that takes JSON data and places it in the templated XML.

    That's the gist of what you're trying to do. There's no real prebuilt package for this.

    Of course as programmers we're going to tell you to program it.

    It pretty much doesn't exist. A template/formatted string system will get you 90% of the way there but you will need tens of thousands or more examples and counterexamples to properly train any NN/GA properly

    As for how things like Google News works, HTML is very structured and everyone uses the tags more or less properly. Just identify the div containing the most text, locate either large font size text or html header elements to pull out the title, bam

    Also, google parses the web for a living, so that's sort of their thing anyway

  • Options
    zeenyzeeny Registered User regular
    edited July 2016
    Alright so I've been at this job for a bit over a year, and basically my boss uses STATA and I use R

    We both kind of came out of academics, hence academic-ish stats programs, that's just what we're used to.

    But like... it is becoming very clear that what we're doing is not the right way to do what we do long term. We wind up with poorly built rickety scripts that are really hard to keep track of and really difficult to show other people or keep up to date effectively. Just lots and lots of individual scripts that rely on specific prepping to run.

    So what I'm trying to look for is a change in language and/or approach for the both of us to leap into and try to focus on doing this stuff properly.

    A task we often do:
    - Take information in the form of .csv and .xls files, usually tables wit some mixture of data types, saved locally
    - Alter that information, table operations and reshaping and all that, making new variables out of old ones, correcting errors
    - Report data from these tables/reshaped things
    - Most of the data reporting winds up being done in Tableau or like just some raw numbers like "20 people blah, 40 people blahed".

    But like basically I feel like the sort of thing we were doing in academia isn't coming up too often here. We're not doing complex stats all that often so much as just some straightforward math and fussing about with graphing in a separate much prettier program because we can actually buy a thing to make nice graphs. We're doing data manipulation more often than we're doing stats, because often the .csv and .xls files MUST come in exactly that layout, and we've got to fuss with them a lot before they work right for our purposes.

    And I've definitely run into some issues where we were doing operations on such giant data sets that R would chug for like 20-40 minutes to complete operations. There was one moment with literal millions of rows of thousands of columns where it just was not possible to perform certain operations, even after making the code as efficient as I could manage.

    Is there another language that people seem to enjoy for this sort of work? I've had a bit of experience with SQL, a little C#, a bit of Python. I'm open to learning whatever, though, my boss is also eager to pick up something other than STATA scripting. Also would help to have a nice strong style guide/tutorial so we don't get into bad habits again!

    Conceptually, the problems we needed to solve were:
    - feeding data
    - processing
    - computation
    - presentation
    - maintainability

    Feeding data is something you just need to be flexible with. If you use large datasets, hdfs/cassandra are excellent. If you are using a stream, kafka in front of either is nice. If your data arrives as directly prepared dumps, then just hdfs is adequate enough.
    Processing languages are a bit of a hot topic. People love R (I don't), but most of the CS(not academic) crowd is deeply into Python for ML/stats. IPython is their god and Jupyter is the prophet. I have no practical experience with either, but you should at least take a look the Python options especially if you have previous experience. We use Scala and Clojure and are happy with both.
    Computation - If it's heavy, you'll need something for distributed computing. Spark does it for us.
    Presentation - Notebooks is what you are most likely after. Try out the mentioned above Jupyter. We recently deployed http://zeppelin.apache.org/ zeppelin and we're very impressed by the progress they are making on the project. We also use gorilla repl and clojure repl in general.
    Maintainability - This is kind of the beauty of notebooks again. Anything you create is out there, it can be consulted, reran, cloned, adjusted. It's excellent for the stuff we want, which in most cases is testing hypothesis and getting answers easy, as well as building a bit of a dashboard.
    As the domain is quiet new to me, I can not make any more concrete suggestions, but the combination of [notebook/repl, spark, hdfs/cassandra, scala/clojure] seems to be working very well for us and python is certainly worth a look.

    zeeny on
  • Options
    durandal4532durandal4532 Registered User regular
    That's extremely helpful, thank you! I'll try setting this up and let you know how it goes.

    Take a moment to donate what you can to Critical Resistance and Black Lives Matter.
  • Options
    AkimboEGAkimboEG Mr. Fancypants Wears very fine pants indeedRegistered User regular
    Alright so I've been at this job for a bit over a year, and basically my boss uses STATA and I use R

    We both kind of came out of academics, hence academic-ish stats programs, that's just what we're used to.

    But like... it is becoming very clear that what we're doing is not the right way to do what we do long term. We wind up with poorly built rickety scripts that are really hard to keep track of and really difficult to show other people or keep up to date effectively. Just lots and lots of individual scripts that rely on specific prepping to run.

    So what I'm trying to look for is a change in language and/or approach for the both of us to leap into and try to focus on doing this stuff properly.

    A task we often do:
    - Take information in the form of .csv and .xls files, usually tables wit some mixture of data types, saved locally
    - Alter that information, table operations and reshaping and all that, making new variables out of old ones, correcting errors
    - Report data from these tables/reshaped things
    - Most of the data reporting winds up being done in Tableau or like just some raw numbers like "20 people blah, 40 people blahed".

    But like basically I feel like the sort of thing we were doing in academia isn't coming up too often here. We're not doing complex stats all that often so much as just some straightforward math and fussing about with graphing in a separate much prettier program because we can actually buy a thing to make nice graphs. We're doing data manipulation more often than we're doing stats, because often the .csv and .xls files MUST come in exactly that layout, and we've got to fuss with them a lot before they work right for our purposes.

    And I've definitely run into some issues where we were doing operations on such giant data sets that R would chug for like 20-40 minutes to complete operations. There was one moment with literal millions of rows of thousands of columns where it just was not possible to perform certain operations, even after making the code as efficient as I could manage.

    Is there another language that people seem to enjoy for this sort of work? I've had a bit of experience with SQL, a little C#, a bit of Python. I'm open to learning whatever, though, my boss is also eager to pick up something other than STATA scripting. Also would help to have a nice strong style guide/tutorial so we don't get into bad habits again!

    Coming from a very similar background, I cannot recommend Python enough. Python with NumPy, Pandas, SciPy and matplotlib is incredibly powerful for data science and is super easy to learn.

    Give me a kiss to build a dream on; And my imagination will thrive upon that kiss; Sweetheart, I ask no more than this; A kiss to build a dream on
  • Options
    InfidelInfidel Heretic Registered User regular
    GRAHAHHHHHHHHHHHHH MOBILE PERFORMANCE

    Our react site works pretty great on modern phones but previous generations are sluggish as hell.

    So many places to optimize, not enough time...

    OrokosPA.png
  • Options
    Alistair HuttonAlistair Hutton Dr EdinburghRegistered User regular
    Phyphor wrote: »
    bowen wrote: »
    I don't think such a thing exists, is the problem.

    However, there's no reason you couldn't do a transformation yourself by providing a templated XML and coding a system that takes JSON data and places it in the templated XML.

    That's the gist of what you're trying to do. There's no real prebuilt package for this.

    Of course as programmers we're going to tell you to program it.

    It pretty much doesn't exist. A template/formatted string system will get you 90% of the way there but you will need tens of thousands or more examples and counterexamples to properly train any NN/GA properly


    Wouldn't be using anything as general purpose as a NN or GA. I've been working on this some more and for the simple cases, the widgetsRus of the world, you'd need a max of two examples. The first example might have ambiguities (maybe the Api token has a string of numbers in it that could be considered the minimum widget size) the second example would clear up the ambiguities as the unchanging Api token would be eliminated as a potential place the changed minwidget size field gets entered.

    I've rough up a system this morning in that automatically detects what parts of the query parameters could be parts of a date and what strftime parameters are needed to generate them which is a subcomponent fbwhat I need to do.

    This is proving be both fun and satisfying now that I know I want a Planning System. It gets morr complicated when I start considering XML documents but for now I'm solving the widgetsRus of the world.

    I have a thoughtful and infrequently updated blog about games http://whatithinkaboutwhenithinkaboutgames.wordpress.com/

    I made a game, it has penguins in it. It's pay what you like on Gumroad.

    Currently Ebaying Nothing at all but I might do in the future.
  • Options
    zeenyzeeny Registered User regular
    Infidel wrote: »
    GRAHAHHHHHHHHHHHHH MOBILE PERFORMANCE

    Our react site works pretty great on modern phones but previous generations are sluggish as hell.

    So many places to optimize, not enough time...

    Just forget it. Even if you optimize for performance it will keep breaking. Ios7 is a nightmare. android default browser is a close 2nd.

  • Options
    NogsNogs Crap, crap, mega crap. Crap, crap, mega crap.Registered User regular
    Infidel wrote: »
    GRAHAHHHHHHHHHHHHH MOBILE PERFORMANCE

    Our react site works pretty great on modern phones but previous generations are sluggish as hell.

    So many places to optimize, not enough time...

    https://github.com/garbles/why-did-you-update

    rotate.jpg
    PARKER, YOU'RE FIRED! <-- My comic book podcast! Satan look here!
  • Options
    jaziekjaziek Bad at everything And mad about it.Registered User regular
    edited July 2016
    I've been tasked with investigating moving all our stuff off of SQL server onto a cheaper DB server, and redesigning our database interaction in general.

    Anybody got any good resources for designing data access APIs or database agnostic data access layers? I have some ideas, and I've been reading white papers and articles all week, but I'd love to see some kind of information on how large scale companies have migrated from one database to another.

    jaziek on
    Steam ||| SC2 - Jaziek.377 on EU & NA. ||| Twitch Stream
  • Options
    bowenbowen How you doin'? Registered User regular
    sounds like you want a RESTful service

    Check out any of the sinatra-based frameworks like sinatra itself, nancyfx, node.js with express/loopback/etc, servicestack (not really sinatra based but it's a neat concept)

    not a doctor, not a lawyer, examples I use may not be fully researched so don't take out of context plz, don't @ me
  • Options
    bowenbowen How you doin'? Registered User regular
    Pick a language and I can probably point you in the right direction!

    not a doctor, not a lawyer, examples I use may not be fully researched so don't take out of context plz, don't @ me
  • Options
    SpawnbrokerSpawnbroker Registered User regular
    So my new job has finally stopped being slow.

    All of a sudden, I need 5 Visio diagrams and to prepare for 3 meetings with company higher-ups to help determine the direction the company will go for its intranet migration.

    I've basically become a half developer, half SharePoint evangelist/explainer to people who don't know what SharePoint is or does. I don't mind it, it's just different.

    Steam: Spawnbroker
This discussion has been closed.