Does anyone have any familiarity with the Python requests library? I'm trying to use it to be able to download a file from a web server, but the file is downloaded from click a JS button, and I can't figure out how to craft the request to get the file. All attempts at intercepting the request have yielded bupkis. It's not a big deal, since I'm working on a Selenium-based solution, but I would like one using requests, since it would otherwise be more straightforward.
Does anyone have any familiarity with the Python requests library? I'm trying to use it to be able to download a file from a web server, but the file is downloaded from click a JS button, and I can't figure out how to craft the request to get the file. All attempts at intercepting the request have yielded bupkis. It's not a big deal, since I'm working on a Selenium-based solution, but I would like one using requests, since it would otherwise be more straightforward.
You just need to know host/path/etc of wherever the file lives, plus any required headers/whatnot on the request. Requests is... So simple/straightforward that "this cannot be the right answer" comes out of my mouth every time I use it. It is basically networking cheating. Then it's literally "response = requests.iforgetexactsyntax(with: all, the: arguments)".
So I'm way overdue for a job change and between general distaste for what my work is becoming (less software engineering, more configuring software another person in my division installed) and some people at work becoming particularly goosey, I know I want to do so within the next year or two.
However, I've been working primarily with Perl the last 10 years (the product I was hired to work with involves a lot of text processing) and would really prefer to learn/brush up on other languages before I start submitting resumes. What are some good resources to learn/relearn languages when you've already been in the field for a quite a while? I'm specifically learning towards learning Python and maybe refreshing my Java and C++ knowledge but I'm very open to other things people would recommend learning.
I'm going to shout out again for Exercism. You can grab a track follow the exercises as regular, but you can also request feedback from other coaches on your attempts. So this way you can have an actual human go over your code and tell you different ways you could've tackled it given the language, which is specially useful when you're not that familiar with it.
Does anyone have any familiarity with the Python requests library? I'm trying to use it to be able to download a file from a web server, but the file is downloaded from click a JS button, and I can't figure out how to craft the request to get the file. All attempts at intercepting the request have yielded bupkis. It's not a big deal, since I'm working on a Selenium-based solution, but I would like one using requests, since it would otherwise be more straightforward.
You just need to know host/path/etc of wherever the file lives, plus any required headers/whatnot on the request. Requests is... So simple/straightforward that "this cannot be the right answer" comes out of my mouth every time I use it. It is basically networking cheating. Then it's literally "response = requests.iforgetexactsyntax(with: all, the: arguments)".
The problem is that the file is dynamically generated when the button is pressed, since it's exporting live information to a .csv file (It even generates a timestamp for the name of the document).
Does anyone have any familiarity with the Python requests library? I'm trying to use it to be able to download a file from a web server, but the file is downloaded from click a JS button, and I can't figure out how to craft the request to get the file. All attempts at intercepting the request have yielded bupkis. It's not a big deal, since I'm working on a Selenium-based solution, but I would like one using requests, since it would otherwise be more straightforward.
You just need to know host/path/etc of wherever the file lives, plus any required headers/whatnot on the request. Requests is... So simple/straightforward that "this cannot be the right answer" comes out of my mouth every time I use it. It is basically networking cheating. Then it's literally "response = requests.iforgetexactsyntax(with: all, the: arguments)".
The problem is that the file is dynamically generated when the button is pressed, since it's exporting live information to a .csv file (It even generates a timestamp for the name of the document).
My instinct is you'll need to replicate whatever the JS is doing, then. Which shouldn't be awful, assuming you have access to it and/or it's not been minified. If it has been and you don't have source access, that's a headache.
0
AkimboEGMr. FancypantsWears very fine pants indeedRegistered Userregular
Does anyone have any familiarity with the Python requests library? I'm trying to use it to be able to download a file from a web server, but the file is downloaded from click a JS button, and I can't figure out how to craft the request to get the file. All attempts at intercepting the request have yielded bupkis. It's not a big deal, since I'm working on a Selenium-based solution, but I would like one using requests, since it would otherwise be more straightforward.
You just need to know host/path/etc of wherever the file lives, plus any required headers/whatnot on the request. Requests is... So simple/straightforward that "this cannot be the right answer" comes out of my mouth every time I use it. It is basically networking cheating. Then it's literally "response = requests.iforgetexactsyntax(with: all, the: arguments)".
The problem is that the file is dynamically generated when the button is pressed, since it's exporting live information to a .csv file (It even generates a timestamp for the name of the document).
Then requests by itself won't cut it. You can try using requests-html (or other similar libraries) to programmatically handle the human interaction bits.
Give me a kiss to build a dream on; And my imagination will thrive upon that kiss; Sweetheart, I ask no more than this; A kiss to build a dream on
TBH I'd still go for "replicate what the JS is doing", assuming it's not like, taking in form data or whatnot. May be some extra work, but probably (IME) more robust than messing with the html or the actual button.
Note this could also be utterly impractical, depending on how much stuff Just Happens when you hit that button.
Does anyone have any familiarity with the Python requests library? I'm trying to use it to be able to download a file from a web server, but the file is downloaded from click a JS button, and I can't figure out how to craft the request to get the file. All attempts at intercepting the request have yielded bupkis. It's not a big deal, since I'm working on a Selenium-based solution, but I would like one using requests, since it would otherwise be more straightforward.
You just need to know host/path/etc of wherever the file lives, plus any required headers/whatnot on the request. Requests is... So simple/straightforward that "this cannot be the right answer" comes out of my mouth every time I use it. It is basically networking cheating. Then it's literally "response = requests.iforgetexactsyntax(with: all, the: arguments)".
The problem is that the file is dynamically generated when the button is pressed, since it's exporting live information to a .csv file (It even generates a timestamp for the name of the document).
Then requests by itself won't cut it. You can try using requests-html (or other similar libraries) to programmatically handle the human interaction bits.
I'll give this library a try, but I'm gonna switch to another project and just ask a coworker with more network experience for more help on Monday. Thanks for the feedback!
For actually doing stuff productively, I prefer having a nice ORM to wrap it up, but I do love writing some deep SQL queries to figure out bugs
This is a hot take...but:
ORM's are poison. They hide the complexity of data storage in the worst possible way and they give engineers a false sense of security about their data access layers. They rob engineers of really important knowledge. When they attempt to abstract away what database you're connecting to they almost always do it in the worst, most generic, way possible that doesn't leverage the actual features of the database you're using.
One of the big epiphanies I've had since getting to the principal engineer level is that engineers were basically lied to for most of my career about databases and how much we needed to understand them. Maybe in the before times it made sense, when every company had a cadre of DBA's...but that's really not true anymore at anything except the biggest shops. When engineers don't understand databases, and how to design good data models, you get things like databases with zero foreign key relationships (and not because they made a performance trade off), UUID's as strings instead of the native UUID type the database supports and timestamps as big integers instead of actual timestamps. All things I've seen recently that were reinforced by the ORM they were using because they were generic and "simple".
As a data analyst type, you're 100% right that most engineers don't understand data and data modeling at all. The modern era has made it even worse with storage and compute being cheap and available so there are even fewer penalties to bad data modeling.
Then they wonder why no one can pull insights out.
The Go language developers have a lot of explaining todo about the god awful abomination that is the default Go templating language.
The Hugo static site generator uses Go templates, and I miss Ruby's ERB syntax so badly every time I have to make a change to my blog theme. It's not even just that it's confusing, it's that all the docs around it are so weirdly awful.
As someone who primarily comprehends computer time as "units since epoch", what is an "actual timestamp"? A more structured format like specific bit spans for the various denominations of time (year, month, day, hour, minute, second, etc.)?
My favorite musical instrument is the air-raid siren.
As someone who primarily comprehends computer time as "units since epoch", what is an "actual timestamp"? A more structured format like specific bit spans for the various denominations of time (year, month, day, hour, minute, second, etc.)?
I mean unix timestamp or actual timestamp type in SQL are both fine, but I've seen it stored as 'other' before in a similar scenario as the above, and its not great! Especially when you have to start translating it to other applications and have to figure out what the fuck they were trying to store it as when the people who made the original mess were long gone
All things I've seen recently that were reinforced by the ORM they were using because they were generic and "simple".
Overall I've been liking Ent for Go, but yeah, these are things I need to investigate before I want to use it for anything of value. it's maintained by Facebook engineers though, so I expect some level of quality.
For actually doing stuff productively, I prefer having a nice ORM to wrap it up, but I do love writing some deep SQL queries to figure out bugs
This is a hot take...but:
ORM's are poison. They hide the complexity of data storage in the worst possible way and they give engineers a false sense of security about their data access layers. They rob engineers of really important knowledge. When they attempt to abstract away what database you're connecting to they almost always do it in the worst, most generic, way possible that doesn't leverage the actual features of the database you're using.
One of the big epiphanies I've had since getting to the principal engineer level is that engineers were basically lied to for most of my career about databases and how much we needed to understand them. Maybe in the before times it made sense, when every company had a cadre of DBA's...but that's really not true anymore at anything except the biggest shops. When engineers don't understand databases, and how to design good data models, you get things like databases with zero foreign key relationships (and not because they made a performance trade off), UUID's as strings instead of the native UUID type the database supports and timestamps as big integers instead of actual timestamps. All things I've seen recently that were reinforced by the ORM they were using because they were generic and "simple".
I agree and I "learned" in an ass-backwards way...I started with the object relational mapper and worked backwards. More and more it does seem knowledge of databases is important, like how to choose the right one for a specific application and modeling as you said. It does seem a bit overwhelming at times...how much of it do I need to know, I personally don't know (although the zero foreign key in a relational database example is surprising). Me personally, I think this is the problem with a lot of improvements. We keep getting abstraction layers on top of abstraction layers and the base knowledge gets lost, but you really need it. I'm finding this myself because I am frequently starting at the top with all the abstraction layers already applied and don't understand the basics (this is especially true with networking and cloud). That's great for people who've been around 20+ years and understood the leap, but if you're just getting to it, it's a mountain of shit.
We keep getting abstraction layers on top of abstraction layers and the base knowledge gets lost, but you really need it. I'm finding this myself because I am frequently starting at the top with all the abstraction layers already applied and don't understand the basics (this is especially true with networking and cloud).
Time to post this one again (though I guess in this case it's 'starting in the middle'). Also, the author of the original post made the screenshot to say "look how much work is getting done for me by all the frameworks", so in his case it's a positive thing, interestingly.
0
admanbunionize your workplaceSeattle, WARegistered Userregular
No amount (high or low) of abstraction is going to keep people from writing bad code. You can build horrifying data structures directly in SQL -- easily!
Both low abstraction and high abstraction require onboarding effort for new developers. Low abstraction demands it -- a new developer will not be able to build anything quickly, for a long time. High abstraction doesn't -- a new developer will be able to build stuff immediately... but you may not want them to. Both of those have the same solution, which is actually teaching developers shit, but since most companies won't allocate time for that, we pay for it.
tl;dr abstraction isn't the problem, it's (lack of) developer education.
No amount (high or low) of abstraction is going to keep people from writing bad code. You can build horrifying data structures directly in SQL -- easily!
Both low abstraction and high abstraction require onboarding effort for new developers. Low abstraction demands it -- a new developer will not be able to build anything quickly, for a long time. High abstraction doesn't -- a new developer will be able to build stuff immediately... but you may not want them to. Both of those have the same solution, which is actually teaching developers shit, but since most companies won't allocate time for that, we pay for it.
tl;dr abstraction isn't the problem, it's (lack of) developer education.
i think ORMs are fine as long as you have enough self knowledge to recognize what you know and dont know
i do not know sql. at all. zero. under pressure i might be able to write a simple join.... with autocomplete hints.
but i keep it simple, trend toward normalization, never pre-optimize, if i step out of my comfort zone I always hit the docs (django has really good docs for their orm that digs into "ok this is what postgres is doing actually")
you can definitely have this lustful view of an orm as a magic api that just makes data stuff happen but like... that is so naive that you'd probably not fair much better at the SQL prompt anyway
We have one legacy service we really want to replace (crossed fingers it happens soon...) that started with "oh, we'll just make a bunch of string constants for SQL queries!" and now it has a horribly convoluted query builder with partial queries concatenated together based on whatever logic and it's just godawful to try to follow the logic to see what the final query will end up being.
The actual final queries themselves aren't that complex, but they did need a chunk of non-SQL logic inbetween slamming the various parts of it together.
I'd definitely prefer an ORM with a fluent-ish syntax where it's way easier to read what that particular chunk will build before you do some intermediate logic and then continue building your query with the ORM.
0
Ear3nd1lEärendil the Mariner, father of ElrondRegistered Userregular
The only ORM I've liked is Entity Framework. Mongoose for Mongo is OK, but I have always preferred database-first development instead of code-first. But I cut teeth on dBase IV and MSSQL 5 back in the 90s, so I have a pretty good handle on the do's and don'ts of database development.
Let me take you on a journey of C++20 constant evaluation context insanity. Seriously egregious use of constexpr, macros and templates within
It all starts simply
TRACE_EVENT("category1,category2", "name");
This is chromium tracing, now replaced by perfetto and it has a nice UI that I wouldn't mind using, https://ui.perfetto.dev/
Originally it dumped JSON but the updated version dumps protocol buffers instead and trace events take about 300ns
Unacceptable! Perfetto needs more perf. It turns out the only variable information in a typical trace packet is the track ID which is more or less the thread ID and the current timestamp
And very importantly the fast path requires both the category string and name string be literals, and not runtime strings. Additionally all valid categories have to be defined in some common header using literals and are guaranteed to be visible. This is already required by the library to get the fast path, so I can use it too
This can be initialized with any string, byte aggregate or static_array pair, and we can add them. But we also need a compile time switch to enable or disable these things
So if condition is true we return the array, otherwise we return an empty array
Protocol buffers are really just a series of varints (and some strings), 7-bit integers and you keep decoding as long as the high bit is set. Importantly the official libraries will decode sequences with trailing zeros, so 81 80 80 00 and 01 both encode 1
We can construct a varint and it's size like so
template<size_t N>
constexpr static_array<N> pb_varint(uint64_t v)
{
static_array<N> s;
for(size_t i = 0; i < N - 1; i++, v >>= 7) {
s.arr[i] = (uint8_t)v | 0x80;
}
s.arr[N - 1] = (uint8_t)v;
return s;
}
consteval size_t pb_varint_len(uint64_t v)
{
size_t n = 0;
do {
n++;
v >>= 7;
} while(v != 0);
return n;
}
Unfortunately parameters/intermediates inside a consteval/expr function aren't compile time constants (even through it can compute compile time constants) so parameterization has to be done in templates
But wait, there's more!
For space saving reasons perfetto allows you to either include the data directly in the trace packet or just include an ID that references another data packet
Normally these IDs are allocated starting from 1 so you get small values, but we can just sacrifice some space and generate unlikely to collide constants that will fit in 5 bytes
constexpr uint64_t intern_string(const char *s, uint32_t line, uint32_t lineshift)
{
uint32_t crc32 = 0xFFFFFFFFu;
for(; *s; s++) {
uint32_t lookupIndex = (crc32 ^ *s) & 0xff;
crc32 = (crc32 >> 8) ^ crcdetail::table[lookupIndex]; // CRCTable is an array of 256 32-bit constants
}
// Finalize the CRC-32 value by inverting all the bits
crc32 ^= 0xFFFFFFFFu;
return crc32 ^ ((uint64_t)line << lineshift);
}
#define INTERN_STRING(str) intern_string(str, std::source_location::current().line(), 28)
#define INTERNED_FIELD(id, str) PB_CONTAINER(id, INTERN_STRING(str))
Note that __LINE__ (& friends) are not available in a constant evaluation context, but std::source_location::current is
It's not possible to 100% guarantee no collisions at compile time, but you can truncate the hashes down and easily check at startup if any collisions exist (as part of writing the giant info packet)
Next, categories (this is broadly similar to how perfetto actually handles it as well). We parse the provided string as a comma separated list of up to 4 categories
It gets slightly more complicated as this is the contents of a protobuf that is itself nested in a containing protobuf
Finally we can optimize the track ID write by making the ID we use just be the varint-encoded version of the counter instead so it can be written directly. Functionally we then execute
All that's left is to stuff all the interned raw data into some custom section, parse that section at dump thread startup (and ensure the compiler is confused enough that it can't strip it)
Unfortunately there seems to be no clever way to speed up writing the timestamp (and I default to writing a large version too)
and it ends up being a good chunk of all the instructions executed
Currently everything is inlined for each trace, but it should be easy to reduce code bloat and pass only the custom precomputed array to a common function and lose little (if any) efficiency. Could also move the proper formatting to a different thread as well and only write some pointers and lengths in the trace event
The only ORM I've liked is Entity Framework. Mongoose for Mongo is OK, but I have always preferred database-first development instead of code-first. But I cut teeth on dBase IV and MSSQL 5 back in the 90s, so I have a pretty good handle on the do's and don'ts of database development.
In Ruby I like Sequel a lot. ActiveRecord gets all the love here, and it's fine for the vast majority of common database operations. But when you need to drop abstractions and write something close to raw SQL, Sequel is so much less painful.
I came across ULID again and started pondering orbs things. We do a lot of cursor pagination on things that have a UUID as a PK, which means we need another unchanging field to sort on for cursor paginating, which would be the insertion timestamp since it's there anyway.
ULID would be sortable on that alone, since it's a timestamp+entropy in one field you could sort on. There wouldn't be any major benefits to bother migrating existing stuff to that, but I'll consider it for future things.
tl;dr the problem with UUID v4 is that they're randomly generated, so if you order by them, you'll get new insertions placed at random places so the "ORDER BY uuid LIMIT 100" would not be an immutable result, which is why you add that second constant field (like an auto-incrementing ID, or an insertion timestamp which also happens to give you sorting by the timestamp) so the UUIDs aren't the sole ordering factor.
Also UUIDs look ugly which is clearly my primary reason here.
0
gavindelThe reason all your softwareis brokenRegistered Userregular
While I understand the need to prevent leaking information to potential attackers, man I hate staring at the same generic auth error for six hours straight. The only part of this that's even right is the authority!*
Hmm it looks like constexpr strings and vectors did make it into 20, I thought that got bumped. That simplifies things dramatically! No longer having to change type to resize means I can actually build proper structs
I have a presentation I'm giving today to some BA/PM/QA folks. My topic is about Jira Point Estimation and how it's a headache for devs. I already have a ton of talking points but I'm curious if anyone here has anything to say on the topic? If not that's okay but I figured I'd open it up here to see what others have to say about it.
I have a presentation I'm giving today to some BA/PM/QA folks. My topic is about Jira Point Estimation and how it's a headache for devs. I already have a ton of talking points but I'm curious if anyone here has anything to say on the topic? If not that's okay but I figured I'd open it up here to see what others have to say about it.
I mean - don't overthink it? I'm a person who does agile measurement and consulting as part of my job. Points are just a tool to help size stories. They cannot generally be compared across teams and should just be proxies for relative effort within a team to help make planning more consistent. They will sometimes be wrong. Generally I recommend people start with some basic straw man framework - 1 point - half day to one day of one person. 3 points - 3 days effort or so. 5 points - likely a whole 2 week sprint or close to. Adjust as needed.
A common trap people fall into is "hey here's a story, and each role gets a subtask". You want to be really careful about that, as every story should fit into a sprint. So your five point story that needs QA to test things (and note, generally you want people to be trained by the QA person to test their own shit) will need a separate story (NOT SUB TASK) for testing as it's unlikely that fits into a sprint. More appropriate would be to see where you can break up the five point story, as those should be relative unicorns if you're iterating properly.
A lot of other people get the bright idea to track pts/person/sprint as well, which is stupid because it discourages pairing and teaming while preventing easy transfer of work items. So with that you end up with more stories with inflated points and more admin overhead.
EDIT: Also - unless you're assigning when you're pointing with the people in the room, point for the average dev. Many senior devs point as if they were doing it rather than if a random person their team did it, which leads to lots of carryover as the junior dev cannot execute as quickly as the senior.
Can anyone with experience using CryptoJS explain to me why the input and output strings to this are different?
var ciphertext = CryptoJS.enc.Base64.parse(inData);
var outData = ciphertext.toString(CryptoJS.enc.Base64));
Followup question that's been plaguing me for an entire day: How do I get the IV of an AES encrypted string when all I have is the key it was encrypted with? Do I even need it? It seems like if it can be derived from the encrypted data then the library should handle it without additional input.
I have a presentation I'm giving today to some BA/PM/QA folks. My topic is about Jira Point Estimation and how it's a headache for devs. I already have a ton of talking points but I'm curious if anyone here has anything to say on the topic? If not that's okay but I figured I'd open it up here to see what others have to say about it.
I mean - don't overthink it? I'm a person who does agile measurement and consulting as part of my job. Points are just a tool to help size stories. They cannot generally be compared across teams and should just be proxies for relative effort within a team to help make planning more consistent. They will sometimes be wrong. Generally I recommend people start with some basic straw man framework - 1 point - half day to one day of one person. 3 points - 3 days effort or so. 5 points - likely a whole 2 week sprint or close to. Adjust as needed.
A common trap people fall into is "hey here's a story, and each role gets a subtask". You want to be really careful about that, as every story should fit into a sprint. So your five point story that needs QA to test things (and note, generally you want people to be trained by the QA person to test their own shit) will need a separate story (NOT SUB TASK) for testing as it's unlikely that fits into a sprint. More appropriate would be to see where you can break up the five point story, as those should be relative unicorns if you're iterating properly.
A lot of other people get the bright idea to track pts/person/sprint as well, which is stupid because it discourages pairing and teaming while preventing easy transfer of work items. So with that you end up with more stories with inflated points and more admin overhead.
EDIT: Also - unless you're assigning when you're pointing with the people in the room, point for the average dev. Many senior devs point as if they were doing it rather than if a random person their team did it, which leads to lots of carryover as the junior dev cannot execute as quickly as the senior.
I like the method of having each person point it as if they were doing the work, and then take the average. This can help a lot when there's a big experience disparity.
Also, make sure people factor uncertainty into their scores. If there's still requirements not fully defined, or if people haven't read or don't understand part of the codebase, that should make the points go up, even if it seems like a simple task on the surface.
edit:: I've been at some places where they want everyone to agree on a score, and I don't think that's a good process. Leads to arguments that I think are unnecessary, when the differences are valid. Just take the average and move on. Take an initial vote, then have discussion, see if people's opinions change, but don't force it.
I'd also say - the biggest issue I usually see is poorly defined stories and acceptance criteria, so you could assign point values of 1-500 based on the vagueness and be "right". Fixing that usually fixes all the other symptoms.
I have a presentation I'm giving today to some BA/PM/QA folks. My topic is about Jira Point Estimation and how it's a headache for devs. I already have a ton of talking points but I'm curious if anyone here has anything to say on the topic? If not that's okay but I figured I'd open it up here to see what others have to say about it.
I mean - don't overthink it? I'm a person who does agile measurement and consulting as part of my job. Points are just a tool to help size stories. They cannot generally be compared across teams and should just be proxies for relative effort within a team to help make planning more consistent. They will sometimes be wrong. Generally I recommend people start with some basic straw man framework - 1 point - half day to one day of one person. 3 points - 3 days effort or so. 5 points - likely a whole 2 week sprint or close to. Adjust as needed.
A common trap people fall into is "hey here's a story, and each role gets a subtask". You want to be really careful about that, as every story should fit into a sprint. So your five point story that needs QA to test things (and note, generally you want people to be trained by the QA person to test their own shit) will need a separate story (NOT SUB TASK) for testing as it's unlikely that fits into a sprint. More appropriate would be to see where you can break up the five point story, as those should be relative unicorns if you're iterating properly.
A lot of other people get the bright idea to track pts/person/sprint as well, which is stupid because it discourages pairing and teaming while preventing easy transfer of work items. So with that you end up with more stories with inflated points and more admin overhead.
EDIT: Also - unless you're assigning when you're pointing with the people in the room, point for the average dev. Many senior devs point as if they were doing it rather than if a random person their team did it, which leads to lots of carryover as the junior dev cannot execute as quickly as the senior.
I like the method of having each person point it as if they were doing the work, and then take the average. This can help a lot when there's a big experience disparity.
Also, make sure people factor uncertainty into their scores. If there's still requirements not fully defined, or if people haven't read or don't understand part of the codebase, that should make the points go up, even if it seems like a simple task on the surface.
edit:: I've been at some places where they want everyone to agree on a score, and I don't think that's a good process. Leads to arguments that I think are unnecessary, when the differences are valid. Just take the average and move on. Take an initial vote, then have discussion, see if people's opinions change, but don't force it.
I disagree a bit, while sometimes the difference is just because of experience making it seem easier/harder, most of the time the arguments can help expose hidden requirements and complexities that the pm didn't think of
We do pointing where everyone gives a number, then as long as they're all close enough we just take the average. If some estimates are significantly higher/lower than the rest, we'll ask why, maybe that person thought of something that got missed, or maybe that person already knows how to do the work and it'll go quicker. (but then we have to make sure that the low-estimator gets the story -- sometimes it makes sense to assign work to someone that _doesn't_ know how to do it yet, so that they get to learn about that bit of the code along the way)
Yeah, "why did you say 3 and why did he say 8 for this?" exposes some Stuff, let me tell you.
Yep, doing planning poker the exercise is more important than the actual values, ime.
We do it, and we call out outliers, as a means of prompting further discussion.
Had a very memorable ticket where every developer said 3 and the QA said 13. That was a very vital conversation and why I just use estimation as a tool to get people to engage with tickets, I only pretend to care about the sprint reports.
Posts
Steam: pazython
You just need to know host/path/etc of wherever the file lives, plus any required headers/whatnot on the request. Requests is... So simple/straightforward that "this cannot be the right answer" comes out of my mouth every time I use it. It is basically networking cheating. Then it's literally "response = requests.iforgetexactsyntax(with: all, the: arguments)".
I'm going to shout out again for Exercism. You can grab a track follow the exercises as regular, but you can also request feedback from other coaches on your attempts. So this way you can have an actual human go over your code and tell you different ways you could've tackled it given the language, which is specially useful when you're not that familiar with it.
The problem is that the file is dynamically generated when the button is pressed, since it's exporting live information to a .csv file (It even generates a timestamp for the name of the document).
Steam: pazython
For actually doing stuff productively, I prefer having a nice ORM to wrap it up, but I do love writing some deep SQL queries to figure out bugs
My instinct is you'll need to replicate whatever the JS is doing, then. Which shouldn't be awful, assuming you have access to it and/or it's not been minified. If it has been and you don't have source access, that's a headache.
Then requests by itself won't cut it. You can try using requests-html (or other similar libraries) to programmatically handle the human interaction bits.
Note this could also be utterly impractical, depending on how much stuff Just Happens when you hit that button.
I'll give this library a try, but I'm gonna switch to another project and just ask a coworker with more network experience for more help on Monday. Thanks for the feedback!
Steam: pazython
This is a hot take...but:
ORM's are poison. They hide the complexity of data storage in the worst possible way and they give engineers a false sense of security about their data access layers. They rob engineers of really important knowledge. When they attempt to abstract away what database you're connecting to they almost always do it in the worst, most generic, way possible that doesn't leverage the actual features of the database you're using.
One of the big epiphanies I've had since getting to the principal engineer level is that engineers were basically lied to for most of my career about databases and how much we needed to understand them. Maybe in the before times it made sense, when every company had a cadre of DBA's...but that's really not true anymore at anything except the biggest shops. When engineers don't understand databases, and how to design good data models, you get things like databases with zero foreign key relationships (and not because they made a performance trade off), UUID's as strings instead of the native UUID type the database supports and timestamps as big integers instead of actual timestamps. All things I've seen recently that were reinforced by the ORM they were using because they were generic and "simple".
Then they wonder why no one can pull insights out.
The Hugo static site generator uses Go templates, and I miss Ruby's ERB syntax so badly every time I have to make a change to my blog theme. It's not even just that it's confusing, it's that all the docs around it are so weirdly awful.
HISSSSS
I'm "kupiyupaekio" on Discord.
I mean unix timestamp or actual timestamp type in SQL are both fine, but I've seen it stored as 'other' before in a similar scenario as the above, and its not great! Especially when you have to start translating it to other applications and have to figure out what the fuck they were trying to store it as when the people who made the original mess were long gone
Overall I've been liking Ent for Go, but yeah, these are things I need to investigate before I want to use it for anything of value. it's maintained by Facebook engineers though, so I expect some level of quality.
I agree and I "learned" in an ass-backwards way...I started with the object relational mapper and worked backwards. More and more it does seem knowledge of databases is important, like how to choose the right one for a specific application and modeling as you said. It does seem a bit overwhelming at times...how much of it do I need to know, I personally don't know (although the zero foreign key in a relational database example is surprising). Me personally, I think this is the problem with a lot of improvements. We keep getting abstraction layers on top of abstraction layers and the base knowledge gets lost, but you really need it. I'm finding this myself because I am frequently starting at the top with all the abstraction layers already applied and don't understand the basics (this is especially true with networking and cloud). That's great for people who've been around 20+ years and understood the leap, but if you're just getting to it, it's a mountain of shit.
Time to post this one again (though I guess in this case it's 'starting in the middle'). Also, the author of the original post made the screenshot to say "look how much work is getting done for me by all the frameworks", so in his case it's a positive thing, interestingly.
Both low abstraction and high abstraction require onboarding effort for new developers. Low abstraction demands it -- a new developer will not be able to build anything quickly, for a long time. High abstraction doesn't -- a new developer will be able to build stuff immediately... but you may not want them to. Both of those have the same solution, which is actually teaching developers shit, but since most companies won't allocate time for that, we pay for it.
tl;dr abstraction isn't the problem, it's (lack of) developer education.
i think ORMs are fine as long as you have enough self knowledge to recognize what you know and dont know
i do not know sql. at all. zero. under pressure i might be able to write a simple join.... with autocomplete hints.
but i keep it simple, trend toward normalization, never pre-optimize, if i step out of my comfort zone I always hit the docs (django has really good docs for their orm that digs into "ok this is what postgres is doing actually")
you can definitely have this lustful view of an orm as a magic api that just makes data stuff happen but like... that is so naive that you'd probably not fair much better at the SQL prompt anyway
we also talk about other random shit and clown upon each other
The actual final queries themselves aren't that complex, but they did need a chunk of non-SQL logic inbetween slamming the various parts of it together.
I'd definitely prefer an ORM with a fluent-ish syntax where it's way easier to read what that particular chunk will build before you do some intermediate logic and then continue building your query with the ORM.
Originally it dumped JSON but the updated version dumps protocol buffers instead and trace events take about 300ns
Unacceptable! Perfetto needs more perf. It turns out the only variable information in a typical trace packet is the track ID which is more or less the thread ID and the current timestamp
Logically this is
And very importantly the fast path requires both the category string and name string be literals, and not runtime strings. Additionally all valid categories have to be defined in some common header using literals and are guaranteed to be visible. This is already required by the library to get the fast path, so I can use it too
Begin with a compile time std::vector This can be initialized with any string, byte aggregate or static_array pair, and we can add them. But we also need a compile time switch to enable or disable these things
So if condition is true we return the array, otherwise we return an empty array
Protocol buffers are really just a series of varints (and some strings), 7-bit integers and you keep decoding as long as the high bit is set. Importantly the official libraries will decode sequences with trailing zeros, so 81 80 80 00 and 01 both encode 1
We can construct a varint and it's size like so
Unfortunately parameters/intermediates inside a consteval/expr function aren't compile time constants (even through it can compute compile time constants) so parameterization has to be done in templates
So now yields (effectively) {8, 0xE8, 7} which is the correct encoding for a field with id 1 and value 1000
A string (or protobuf) would be
where contents is a static_array, and we can construct static_arrays from strings with
But wait, there's more!
For space saving reasons perfetto allows you to either include the data directly in the trace packet or just include an ID that references another data packet
Normally these IDs are allocated starting from 1 so you get small values, but we can just sacrifice some space and generate unlikely to collide constants that will fit in 5 bytes Note that __LINE__ (& friends) are not available in a constant evaluation context, but std::source_location::current is
It's not possible to 100% guarantee no collisions at compile time, but you can truncate the hashes down and easily check at startup if any collisions exist (as part of writing the giant info packet)
Next, categories (this is broadly similar to how perfetto actually handles it as well). We parse the provided string as a comma separated list of up to 4 categories
We can check if our event is enabled
So we can produce the guts of a perfectly valid packet like so It gets slightly more complicated as this is the contents of a protobuf that is itself nested in a containing protobuf
Finally we can optimize the track ID write by making the ID we use just be the varint-encoded version of the counter instead so it can be written directly. Functionally we then execute
All that's left is to stuff all the interned raw data into some custom section, parse that section at dump thread startup (and ensure the compiler is confused enough that it can't strip it)
Unfortunately there seems to be no clever way to speed up writing the timestamp (and I default to writing a large version too) and it ends up being a good chunk of all the instructions executed
Currently everything is inlined for each trace, but it should be easy to reduce code bloat and pass only the custom precomputed array to a common function and lose little (if any) efficiency. Could also move the proper formatting to a different thread as well and only write some pointers and lengths in the trace event
I haven't timed it yet, but it's gotta beat 300ns, I'm eliding almost all of the awkward protobuf stuff
In Ruby I like Sequel a lot. ActiveRecord gets all the love here, and it's fine for the vast majority of common database operations. But when you need to drop abstractions and write something close to raw SQL, Sequel is so much less painful.
ULID would be sortable on that alone, since it's a timestamp+entropy in one field you could sort on. There wouldn't be any major benefits to bother migrating existing stuff to that, but I'll consider it for future things.
tl;dr the problem with UUID v4 is that they're randomly generated, so if you order by them, you'll get new insertions placed at random places so the "ORDER BY uuid LIMIT 100" would not be an immutable result, which is why you add that second constant field (like an auto-incrementing ID, or an insertion timestamp which also happens to give you sorting by the timestamp) so the UUIDs aren't the sole ordering factor.
Also UUIDs look ugly which is clearly my primary reason here.
*(Spoiler: The authority was the problem)
I mean - don't overthink it? I'm a person who does agile measurement and consulting as part of my job. Points are just a tool to help size stories. They cannot generally be compared across teams and should just be proxies for relative effort within a team to help make planning more consistent. They will sometimes be wrong. Generally I recommend people start with some basic straw man framework - 1 point - half day to one day of one person. 3 points - 3 days effort or so. 5 points - likely a whole 2 week sprint or close to. Adjust as needed.
A common trap people fall into is "hey here's a story, and each role gets a subtask". You want to be really careful about that, as every story should fit into a sprint. So your five point story that needs QA to test things (and note, generally you want people to be trained by the QA person to test their own shit) will need a separate story (NOT SUB TASK) for testing as it's unlikely that fits into a sprint. More appropriate would be to see where you can break up the five point story, as those should be relative unicorns if you're iterating properly.
A lot of other people get the bright idea to track pts/person/sprint as well, which is stupid because it discourages pairing and teaming while preventing easy transfer of work items. So with that you end up with more stories with inflated points and more admin overhead.
EDIT: Also - unless you're assigning when you're pointing with the people in the room, point for the average dev. Many senior devs point as if they were doing it rather than if a random person their team did it, which leads to lots of carryover as the junior dev cannot execute as quickly as the senior.
Followup question that's been plaguing me for an entire day: How do I get the IV of an AES encrypted string when all I have is the key it was encrypted with? Do I even need it? It seems like if it can be derived from the encrypted data then the library should handle it without additional input.
I like the method of having each person point it as if they were doing the work, and then take the average. This can help a lot when there's a big experience disparity.
Also, make sure people factor uncertainty into their scores. If there's still requirements not fully defined, or if people haven't read or don't understand part of the codebase, that should make the points go up, even if it seems like a simple task on the surface.
edit:: I've been at some places where they want everyone to agree on a score, and I don't think that's a good process. Leads to arguments that I think are unnecessary, when the differences are valid. Just take the average and move on. Take an initial vote, then have discussion, see if people's opinions change, but don't force it.
I disagree a bit, while sometimes the difference is just because of experience making it seem easier/harder, most of the time the arguments can help expose hidden requirements and complexities that the pm didn't think of
Yep, doing planning poker the exercise is more important than the actual values, ime.
We do it, and we call out outliers, as a means of prompting further discussion.
Had a very memorable ticket where every developer said 3 and the QA said 13. That was a very vital conversation and why I just use estimation as a tool to get people to engage with tickets, I only pretend to care about the sprint reports.