The new forums will be named Coin Return (based on the most recent
vote)! You can check on the status and timeline of the transition to the new forums
here.
The Guiding Principles and New Rules
document is now in effect.
[Programming] Kafkaesque rabbits in the queue at the pub
Posts
Get latest from source control.
Build.
Run.
Exceptions thrown from failed db migrations.
Hm. Maybe I accidentally forked the db schema during development on Friday? Oh well, no big deal.
Restore db from a full copy a month ago.
Run.
Exceptions thrown from failed db migrations, again.
Wot?
Manually apply migrations, because I need to finish my own dev, darn it.
Yes, able to get to the start screen.
Log in to test changes.
Db connection failures - everything is broken oh my god Monday whhhyyyy
I think it was a terrible case of cascaded failures.
From a quick and cursory glance, a POCO was updated to the latest schema.
However, this POCO was used in a DB transaction before the migrations were run for some reason?
Of course, the DB didn't contain the columns necessary for the POCO, so an exception was thrown.
The exception handler caught it, and... I suspect that by changing flow control to the exception handler, it meant that the migrations weren't run.
But it let the server continue running, which hit the login screen okay, because nothing in the front page that used an updated db schema.
Once I logged in though, all hell broke loose due to missing columns etc.
So I had a quick look at MobX now that I had some time and oh boy, this feels seriously amazing.
I prefer zurb foundation to bootstrap, but that's largely a matter of taste. I picked it at a stage in development when it had more semantic class composition support, but I think they're at parity now.
Bootstrap's community is much larger, though.
Even after a year here, it is strange to think the sheer number of people my work can hit. I may have shaved five minutes off the task for 10,000 people four times a year.
I thiiiink you came out ahead.
Two days. 90% of which was "Why was it built this way?!" and 10% of which was "wait, why isn't that set based execution code firing?...oooooh."
Basically I presume it is an annex of machine learning of some sort or other data munging type tech
Given Data:
I have an input (say a object representing a search on a price comparison website) and an example output (A url to hit a third party supplier's website with all the GET/POST parameters/Headers filled in for the given input object).
What I Want:
What I'm looking for is the tech that would be automatically be able to take the example Input/Output pair work out the generalised mapping/transformation between them so that for any other new flights search I present it spits out the correctly formatted URL. Whether that internals of that mapping is then an opaque black box or a nicely formatted script is not that important to me.
This seems like something that must already exist but I don't know the right magic words to type into Google to make them appear.
I made a game, it has penguins in it. It's pay what you like on Gumroad.
Currently Ebaying Nothing at all but I might do in the future.
Sure
User has come to my Widget price comparison website as they want to compare price of widgets, they've entered the Search terms Colour=Blue, MinSize=48, MaxSize=123, Volume=1000000so my web front end pakcages that all up as a JSON object (whatever) and sends it to my backed PriceFetcher service.
What the Pricefetcher service does is take the object that represents the user query
and call the API of a dozen different widget suppliers who all have their own unique API's
So Widget's R Us would expect to see
while WigStore uses a hideous XML interface and want something like with a bunch of terrible headers
So I need to write a dozen different pieces of code that format the request for each supplier. Individually each formatter is a mostly trivial piece of code (although never underestimate the absurdness of people's APIs). But say you have 1000 suppliers, then it becomes a really tedious, error prone process to write the formatters.
Effectively what I want is two boxes where I can paste the query object into one, the example output URL in the other and the system automatically calculates ans spits out the transformations required to create the output from the input with no human involvement. In the case of Widgest'R'Us that's a really simple set of operations, for WigStore it is more complicated and might require more than 1 example to work out the transforms required.
This seems like a really classic computer science problem but I just can't put my finger on the right term to Google it.
I made a game, it has penguins in it. It's pay what you like on Gumroad.
Currently Ebaying Nothing at all but I might do in the future.
There's also a good chance I'm going to end up doing outside hires instead of internal for the team.
This is where interfaces come in to play. Basically, you want to have an IWidgetFormatter interface that defines the WidgetFormat function. Then, you implement each WidgetFormatter with the code to generate the formatted code. Finally, you'll need a WidgetFormatterFactory to dynamically get the proper WidgetFormatter based on where you're making the request.
Looks like just simple data transformation/translation.
You're likely not going to find a prebaked one for you. Should be pretty trivial.
You've got keys you can work with in both JSON and XML.
Wrote one interface, implemented transformations for each site. Doable since it was just 4-5 sites.
I must have been a bit unclear in my explanation, I'm not looking for a way to structure the code to perform the task of creating the API calls. I'm looking for a way to not write the code at all.
We already have 1000 third party websites to connect to where we've written the code to go from query object to API call by hand, we want to got to 10000 websites. It just isn't scaleable to write and maintain the code manually - we want to automate everything.
I've just realised that I want to be looking at some variant of an Inference Engine or possibly a Planning System variant.
I made a game, it has penguins in it. It's pay what you like on Gumroad.
Currently Ebaying Nothing at all but I might do in the future.
That is totes what I am aiming for. An AI system that does this automatically. I know there are systems that do similar things I just was struggling to work out the correct terminology to allow we to search for what is out there.
I made a game, it has penguins in it. It's pay what you like on Gumroad.
Currently Ebaying Nothing at all but I might do in the future.
However, there's no reason you couldn't do a transformation yourself by providing a templated XML and coding a system that takes JSON data and places it in the templated XML.
That's the gist of what you're trying to do. There's no real prebuilt package for this.
Of course as programmers we're going to tell you to program it.
Define a few operations and then let the planner determine the sequence of operations.
I made a game, it has penguins in it. It's pay what you like on Gumroad.
Currently Ebaying Nothing at all but I might do in the future.
We both kind of came out of academics, hence academic-ish stats programs, that's just what we're used to.
But like... it is becoming very clear that what we're doing is not the right way to do what we do long term. We wind up with poorly built rickety scripts that are really hard to keep track of and really difficult to show other people or keep up to date effectively. Just lots and lots of individual scripts that rely on specific prepping to run.
So what I'm trying to look for is a change in language and/or approach for the both of us to leap into and try to focus on doing this stuff properly.
A task we often do:
- Take information in the form of .csv and .xls files, usually tables wit some mixture of data types, saved locally
- Alter that information, table operations and reshaping and all that, making new variables out of old ones, correcting errors
- Report data from these tables/reshaped things
- Most of the data reporting winds up being done in Tableau or like just some raw numbers like "20 people blah, 40 people blahed".
But like basically I feel like the sort of thing we were doing in academia isn't coming up too often here. We're not doing complex stats all that often so much as just some straightforward math and fussing about with graphing in a separate much prettier program because we can actually buy a thing to make nice graphs. We're doing data manipulation more often than we're doing stats, because often the .csv and .xls files MUST come in exactly that layout, and we've got to fuss with them a lot before they work right for our purposes.
And I've definitely run into some issues where we were doing operations on such giant data sets that R would chug for like 20-40 minutes to complete operations. There was one moment with literal millions of rows of thousands of columns where it just was not possible to perform certain operations, even after making the code as efficient as I could manage.
Is there another language that people seem to enjoy for this sort of work? I've had a bit of experience with SQL, a little C#, a bit of Python. I'm open to learning whatever, though, my boss is also eager to pick up something other than STATA scripting. Also would help to have a nice strong style guide/tutorial so we don't get into bad habits again!
It pretty much doesn't exist. A template/formatted string system will get you 90% of the way there but you will need tens of thousands or more examples and counterexamples to properly train any NN/GA properly
As for how things like Google News works, HTML is very structured and everyone uses the tags more or less properly. Just identify the div containing the most text, locate either large font size text or html header elements to pull out the title, bam
Also, google parses the web for a living, so that's sort of their thing anyway
Conceptually, the problems we needed to solve were:
- feeding data
- processing
- computation
- presentation
- maintainability
Feeding data is something you just need to be flexible with. If you use large datasets, hdfs/cassandra are excellent. If you are using a stream, kafka in front of either is nice. If your data arrives as directly prepared dumps, then just hdfs is adequate enough.
Processing languages are a bit of a hot topic. People love R (I don't), but most of the CS(not academic) crowd is deeply into Python for ML/stats. IPython is their god and Jupyter is the prophet. I have no practical experience with either, but you should at least take a look the Python options especially if you have previous experience. We use Scala and Clojure and are happy with both.
Computation - If it's heavy, you'll need something for distributed computing. Spark does it for us.
Presentation - Notebooks is what you are most likely after. Try out the mentioned above Jupyter. We recently deployed http://zeppelin.apache.org/ zeppelin and we're very impressed by the progress they are making on the project. We also use gorilla repl and clojure repl in general.
Maintainability - This is kind of the beauty of notebooks again. Anything you create is out there, it can be consulted, reran, cloned, adjusted. It's excellent for the stuff we want, which in most cases is testing hypothesis and getting answers easy, as well as building a bit of a dashboard.
As the domain is quiet new to me, I can not make any more concrete suggestions, but the combination of [notebook/repl, spark, hdfs/cassandra, scala/clojure] seems to be working very well for us and python is certainly worth a look.
Coming from a very similar background, I cannot recommend Python enough. Python with NumPy, Pandas, SciPy and matplotlib is incredibly powerful for data science and is super easy to learn.
Our react site works pretty great on modern phones but previous generations are sluggish as hell.
So many places to optimize, not enough time...
Wouldn't be using anything as general purpose as a NN or GA. I've been working on this some more and for the simple cases, the widgetsRus of the world, you'd need a max of two examples. The first example might have ambiguities (maybe the Api token has a string of numbers in it that could be considered the minimum widget size) the second example would clear up the ambiguities as the unchanging Api token would be eliminated as a potential place the changed minwidget size field gets entered.
I've rough up a system this morning in that automatically detects what parts of the query parameters could be parts of a date and what strftime parameters are needed to generate them which is a subcomponent fbwhat I need to do.
This is proving be both fun and satisfying now that I know I want a Planning System. It gets morr complicated when I start considering XML documents but for now I'm solving the widgetsRus of the world.
I made a game, it has penguins in it. It's pay what you like on Gumroad.
Currently Ebaying Nothing at all but I might do in the future.
Just forget it. Even if you optimize for performance it will keep breaking. Ios7 is a nightmare. android default browser is a close 2nd.
https://github.com/garbles/why-did-you-update
PARKER, YOU'RE FIRED! <-- My comic book podcast! Satan look here!
Anybody got any good resources for designing data access APIs or database agnostic data access layers? I have some ideas, and I've been reading white papers and articles all week, but I'd love to see some kind of information on how large scale companies have migrated from one database to another.
Check out any of the sinatra-based frameworks like sinatra itself, nancyfx, node.js with express/loopback/etc, servicestack (not really sinatra based but it's a neat concept)
All of a sudden, I need 5 Visio diagrams and to prepare for 3 meetings with company higher-ups to help determine the direction the company will go for its intranet migration.
I've basically become a half developer, half SharePoint evangelist/explainer to people who don't know what SharePoint is or does. I don't mind it, it's just different.