The Coin Return Foundational Fundraiser is here! Please donate!

[Programming] Page overflow to new thread

19293959798101

Posts

  • ZythonZython Registered User regular
    Does anyone have any familiarity with the Python requests library? I'm trying to use it to be able to download a file from a web server, but the file is downloaded from click a JS button, and I can't figure out how to craft the request to get the file. All attempts at intercepting the request have yielded bupkis. It's not a big deal, since I'm working on a Selenium-based solution, but I would like one using requests, since it would otherwise be more straightforward.

    Switch: SW-3245-5421-8042 | 3DS Friend Code: 4854-6465-0299 | PSN: Zaithon
    Steam: pazython
  • dporowskidporowski Registered User regular
    Zython wrote: »
    Does anyone have any familiarity with the Python requests library? I'm trying to use it to be able to download a file from a web server, but the file is downloaded from click a JS button, and I can't figure out how to craft the request to get the file. All attempts at intercepting the request have yielded bupkis. It's not a big deal, since I'm working on a Selenium-based solution, but I would like one using requests, since it would otherwise be more straightforward.

    You just need to know host/path/etc of wherever the file lives, plus any required headers/whatnot on the request. Requests is... So simple/straightforward that "this cannot be the right answer" comes out of my mouth every time I use it. It is basically networking cheating. Then it's literally "response = requests.iforgetexactsyntax(with: all, the: arguments)".

  • DrovekDrovek Registered User regular
    So I'm way overdue for a job change and between general distaste for what my work is becoming (less software engineering, more configuring software another person in my division installed) and some people at work becoming particularly goosey, I know I want to do so within the next year or two.

    However, I've been working primarily with Perl the last 10 years (the product I was hired to work with involves a lot of text processing) and would really prefer to learn/brush up on other languages before I start submitting resumes. What are some good resources to learn/relearn languages when you've already been in the field for a quite a while? I'm specifically learning towards learning Python and maybe refreshing my Java and C++ knowledge but I'm very open to other things people would recommend learning.

    I'm going to shout out again for Exercism. You can grab a track follow the exercises as regular, but you can also request feedback from other coaches on your attempts. So this way you can have an actual human go over your code and tell you different ways you could've tackled it given the language, which is specially useful when you're not that familiar with it.

    steam_sig.png( < . . .
  • ZekZek Registered User regular
    i hate SQL

  • ZythonZython Registered User regular
    dporowski wrote: »
    Zython wrote: »
    Does anyone have any familiarity with the Python requests library? I'm trying to use it to be able to download a file from a web server, but the file is downloaded from click a JS button, and I can't figure out how to craft the request to get the file. All attempts at intercepting the request have yielded bupkis. It's not a big deal, since I'm working on a Selenium-based solution, but I would like one using requests, since it would otherwise be more straightforward.

    You just need to know host/path/etc of wherever the file lives, plus any required headers/whatnot on the request. Requests is... So simple/straightforward that "this cannot be the right answer" comes out of my mouth every time I use it. It is basically networking cheating. Then it's literally "response = requests.iforgetexactsyntax(with: all, the: arguments)".

    The problem is that the file is dynamically generated when the button is pressed, since it's exporting live information to a .csv file (It even generates a timestamp for the name of the document).

    Switch: SW-3245-5421-8042 | 3DS Friend Code: 4854-6465-0299 | PSN: Zaithon
    Steam: pazython
  • SpoitSpoit *twitch twitch* Registered User regular
    Zek wrote: »
    i hate SQL

    For actually doing stuff productively, I prefer having a nice ORM to wrap it up, but I do love writing some deep SQL queries to figure out bugs

    steam_sig.png
  • dporowskidporowski Registered User regular
    Zython wrote: »
    dporowski wrote: »
    Zython wrote: »
    Does anyone have any familiarity with the Python requests library? I'm trying to use it to be able to download a file from a web server, but the file is downloaded from click a JS button, and I can't figure out how to craft the request to get the file. All attempts at intercepting the request have yielded bupkis. It's not a big deal, since I'm working on a Selenium-based solution, but I would like one using requests, since it would otherwise be more straightforward.

    You just need to know host/path/etc of wherever the file lives, plus any required headers/whatnot on the request. Requests is... So simple/straightforward that "this cannot be the right answer" comes out of my mouth every time I use it. It is basically networking cheating. Then it's literally "response = requests.iforgetexactsyntax(with: all, the: arguments)".

    The problem is that the file is dynamically generated when the button is pressed, since it's exporting live information to a .csv file (It even generates a timestamp for the name of the document).

    My instinct is you'll need to replicate whatever the JS is doing, then. Which shouldn't be awful, assuming you have access to it and/or it's not been minified. If it has been and you don't have source access, that's a headache.

  • AkimboEGAkimboEG Mr. Fancypants Wears very fine pants indeedRegistered User regular
    Zython wrote: »
    dporowski wrote: »
    Zython wrote: »
    Does anyone have any familiarity with the Python requests library? I'm trying to use it to be able to download a file from a web server, but the file is downloaded from click a JS button, and I can't figure out how to craft the request to get the file. All attempts at intercepting the request have yielded bupkis. It's not a big deal, since I'm working on a Selenium-based solution, but I would like one using requests, since it would otherwise be more straightforward.

    You just need to know host/path/etc of wherever the file lives, plus any required headers/whatnot on the request. Requests is... So simple/straightforward that "this cannot be the right answer" comes out of my mouth every time I use it. It is basically networking cheating. Then it's literally "response = requests.iforgetexactsyntax(with: all, the: arguments)".

    The problem is that the file is dynamically generated when the button is pressed, since it's exporting live information to a .csv file (It even generates a timestamp for the name of the document).

    Then requests by itself won't cut it. You can try using requests-html (or other similar libraries) to programmatically handle the human interaction bits.

    Give me a kiss to build a dream on; And my imagination will thrive upon that kiss; Sweetheart, I ask no more than this; A kiss to build a dream on
  • dporowskidporowski Registered User regular
    TBH I'd still go for "replicate what the JS is doing", assuming it's not like, taking in form data or whatnot. May be some extra work, but probably (IME) more robust than messing with the html or the actual button.

    Note this could also be utterly impractical, depending on how much stuff Just Happens when you hit that button.

  • ZythonZython Registered User regular
    AkimboEG wrote: »
    Zython wrote: »
    dporowski wrote: »
    Zython wrote: »
    Does anyone have any familiarity with the Python requests library? I'm trying to use it to be able to download a file from a web server, but the file is downloaded from click a JS button, and I can't figure out how to craft the request to get the file. All attempts at intercepting the request have yielded bupkis. It's not a big deal, since I'm working on a Selenium-based solution, but I would like one using requests, since it would otherwise be more straightforward.

    You just need to know host/path/etc of wherever the file lives, plus any required headers/whatnot on the request. Requests is... So simple/straightforward that "this cannot be the right answer" comes out of my mouth every time I use it. It is basically networking cheating. Then it's literally "response = requests.iforgetexactsyntax(with: all, the: arguments)".

    The problem is that the file is dynamically generated when the button is pressed, since it's exporting live information to a .csv file (It even generates a timestamp for the name of the document).

    Then requests by itself won't cut it. You can try using requests-html (or other similar libraries) to programmatically handle the human interaction bits.

    I'll give this library a try, but I'm gonna switch to another project and just ask a coworker with more network experience for more help on Monday. Thanks for the feedback!

    Switch: SW-3245-5421-8042 | 3DS Friend Code: 4854-6465-0299 | PSN: Zaithon
    Steam: pazython
  • This content has been removed.

  • GnomeTankGnomeTank What the what? Portland, OregonRegistered User regular
    edited September 2022
    Spoit wrote: »
    Zek wrote: »
    i hate SQL

    For actually doing stuff productively, I prefer having a nice ORM to wrap it up, but I do love writing some deep SQL queries to figure out bugs

    This is a hot take...but:

    ORM's are poison. They hide the complexity of data storage in the worst possible way and they give engineers a false sense of security about their data access layers. They rob engineers of really important knowledge. When they attempt to abstract away what database you're connecting to they almost always do it in the worst, most generic, way possible that doesn't leverage the actual features of the database you're using.

    One of the big epiphanies I've had since getting to the principal engineer level is that engineers were basically lied to for most of my career about databases and how much we needed to understand them. Maybe in the before times it made sense, when every company had a cadre of DBA's...but that's really not true anymore at anything except the biggest shops. When engineers don't understand databases, and how to design good data models, you get things like databases with zero foreign key relationships (and not because they made a performance trade off), UUID's as strings instead of the native UUID type the database supports and timestamps as big integers instead of actual timestamps. All things I've seen recently that were reinforced by the ORM they were using because they were generic and "simple".

    GnomeTank on
    Sagroth wrote: »
    Oh c'mon FyreWulff, no one's gonna pay to visit Uranus.
    Steam: Brainling, XBL / PSN: GnomeTank, NintendoID: Brainling, FF14: Zillius Rosh SFV: Brainling
  • schussschuss Registered User regular
    As a data analyst type, you're 100% right that most engineers don't understand data and data modeling at all. The modern era has made it even worse with storage and compute being cheap and available so there are even fewer penalties to bad data modeling.
    Then they wonder why no one can pull insights out.

  • FremFrem Registered User regular
    edited September 2022
    The Go language developers have a lot of explaining todo about the god awful abomination that is the default Go templating language.

    The Hugo static site generator uses Go templates, and I miss Ruby's ERB syntax so badly every time I have to make a change to my blog theme. It's not even just that it's confusing, it's that all the docs around it are so weirdly awful.

    Frem on
  • NaphtaliNaphtali Hazy + Flow SeaRegistered User regular
    GnomeTank wrote: »
    and timestamps as big integers instead of actual timestamps

    HISSSSS

    Steam | Nintendo ID: Naphtali | Wish List
  • KupiKupi Registered User regular
    As someone who primarily comprehends computer time as "units since epoch", what is an "actual timestamp"? A more structured format like specific bit spans for the various denominations of time (year, month, day, hour, minute, second, etc.)?

    My favorite musical instrument is the air-raid siren.

    I'm "kupiyupaekio" on Discord.
  • NaphtaliNaphtali Hazy + Flow SeaRegistered User regular
    Kupi wrote: »
    As someone who primarily comprehends computer time as "units since epoch", what is an "actual timestamp"? A more structured format like specific bit spans for the various denominations of time (year, month, day, hour, minute, second, etc.)?

    I mean unix timestamp or actual timestamp type in SQL are both fine, but I've seen it stored as 'other' before in a similar scenario as the above, and its not great! Especially when you have to start translating it to other applications and have to figure out what the fuck they were trying to store it as when the people who made the original mess were long gone

    Steam | Nintendo ID: Naphtali | Wish List
  • EchoEcho ski-bap ba-dapModerator, Administrator admin
    GnomeTank wrote: »
    All things I've seen recently that were reinforced by the ORM they were using because they were generic and "simple".

    Overall I've been liking Ent for Go, but yeah, these are things I need to investigate before I want to use it for anything of value. it's maintained by Facebook engineers though, so I expect some level of quality.

  • TelMarineTelMarine Registered User regular
    edited September 2022
    GnomeTank wrote: »
    Spoit wrote: »
    Zek wrote: »
    i hate SQL

    For actually doing stuff productively, I prefer having a nice ORM to wrap it up, but I do love writing some deep SQL queries to figure out bugs

    This is a hot take...but:

    ORM's are poison. They hide the complexity of data storage in the worst possible way and they give engineers a false sense of security about their data access layers. They rob engineers of really important knowledge. When they attempt to abstract away what database you're connecting to they almost always do it in the worst, most generic, way possible that doesn't leverage the actual features of the database you're using.

    One of the big epiphanies I've had since getting to the principal engineer level is that engineers were basically lied to for most of my career about databases and how much we needed to understand them. Maybe in the before times it made sense, when every company had a cadre of DBA's...but that's really not true anymore at anything except the biggest shops. When engineers don't understand databases, and how to design good data models, you get things like databases with zero foreign key relationships (and not because they made a performance trade off), UUID's as strings instead of the native UUID type the database supports and timestamps as big integers instead of actual timestamps. All things I've seen recently that were reinforced by the ORM they were using because they were generic and "simple".

    I agree and I "learned" in an ass-backwards way...I started with the object relational mapper and worked backwards. More and more it does seem knowledge of databases is important, like how to choose the right one for a specific application and modeling as you said. It does seem a bit overwhelming at times...how much of it do I need to know, I personally don't know (although the zero foreign key in a relational database example is surprising). Me personally, I think this is the problem with a lot of improvements. We keep getting abstraction layers on top of abstraction layers and the base knowledge gets lost, but you really need it. I'm finding this myself because I am frequently starting at the top with all the abstraction layers already applied and don't understand the basics (this is especially true with networking and cloud). That's great for people who've been around 20+ years and understood the leap, but if you're just getting to it, it's a mountain of shit.

    TelMarine on
    3ds: 4983-4935-4575
  • djmdjm Registered User regular
    TelMarine wrote: »
    We keep getting abstraction layers on top of abstraction layers and the base knowledge gets lost, but you really need it. I'm finding this myself because I am frequently starting at the top with all the abstraction layers already applied and don't understand the basics (this is especially true with networking and cloud).

    Time to post this one again (though I guess in this case it's 'starting in the middle'). Also, the author of the original post made the screenshot to say "look how much work is getting done for me by all the frameworks", so in his case it's a positive thing, interestingly.
    jtrac-callstack1.png

  • admanbadmanb unionize your workplace Seattle, WARegistered User regular
    No amount (high or low) of abstraction is going to keep people from writing bad code. You can build horrifying data structures directly in SQL -- easily!

    Both low abstraction and high abstraction require onboarding effort for new developers. Low abstraction demands it -- a new developer will not be able to build anything quickly, for a long time. High abstraction doesn't -- a new developer will be able to build stuff immediately... but you may not want them to. Both of those have the same solution, which is actually teaching developers shit, but since most companies won't allocate time for that, we pay for it.

    tl;dr abstraction isn't the problem, it's (lack of) developer education.

  • This content has been removed.

  • JasconiusJasconius sword criminal mad onlineRegistered User regular
    admanb wrote: »
    No amount (high or low) of abstraction is going to keep people from writing bad code. You can build horrifying data structures directly in SQL -- easily!

    Both low abstraction and high abstraction require onboarding effort for new developers. Low abstraction demands it -- a new developer will not be able to build anything quickly, for a long time. High abstraction doesn't -- a new developer will be able to build stuff immediately... but you may not want them to. Both of those have the same solution, which is actually teaching developers shit, but since most companies won't allocate time for that, we pay for it.

    tl;dr abstraction isn't the problem, it's (lack of) developer education.

    i think ORMs are fine as long as you have enough self knowledge to recognize what you know and dont know

    i do not know sql. at all. zero. under pressure i might be able to write a simple join.... with autocomplete hints.

    but i keep it simple, trend toward normalization, never pre-optimize, if i step out of my comfort zone I always hit the docs (django has really good docs for their orm that digs into "ok this is what postgres is doing actually")

    you can definitely have this lustful view of an orm as a magic api that just makes data stuff happen but like... that is so naive that you'd probably not fair much better at the SQL prompt anyway

    this is a discord of mostly PA people interested in fighting games: https://discord.gg/DZWa97d5rz

    we also talk about other random shit and clown upon each other
  • EchoEcho ski-bap ba-dapModerator, Administrator admin
    We have one legacy service we really want to replace (crossed fingers it happens soon...) that started with "oh, we'll just make a bunch of string constants for SQL queries!" and now it has a horribly convoluted query builder with partial queries concatenated together based on whatever logic and it's just godawful to try to follow the logic to see what the final query will end up being.

    The actual final queries themselves aren't that complex, but they did need a chunk of non-SQL logic inbetween slamming the various parts of it together.

    I'd definitely prefer an ORM with a fluent-ish syntax where it's way easier to read what that particular chunk will build before you do some intermediate logic and then continue building your query with the ORM.

  • Ear3nd1lEar3nd1l Eärendil the Mariner, father of Elrond Registered User regular
    The only ORM I've liked is Entity Framework. Mongoose for Mongo is OK, but I have always preferred database-first development instead of code-first. But I cut teeth on dBase IV and MSSQL 5 back in the 90s, so I have a pretty good handle on the do's and don'ts of database development.

  • PhyphorPhyphor Building Planet Busters Tasting FruitRegistered User regular
    Let me take you on a journey of C++20 constant evaluation context insanity. Seriously egregious use of constexpr, macros and templates within
    It all starts simply
    TRACE_EVENT("category1,category2", "name");
    
    This is chromium tracing, now replaced by perfetto and it has a nice UI that I wouldn't mind using, https://ui.perfetto.dev/
    Originally it dumped JSON but the updated version dumps protocol buffers instead and trace events take about 300ns

    Unacceptable! Perfetto needs more perf. It turns out the only variable information in a typical trace packet is the track ID which is more or less the thread ID and the current timestamp

    Logically this is
    if(any_enabled(categories)) {
      produce_pbuf_packet(get_thread_id(), get_timestamp(), ...);
    }
    

    And very importantly the fast path requires both the category string and name string be literals, and not runtime strings. Additionally all valid categories have to be defined in some common header using literals and are guaranteed to be visible. This is already required by the library to get the fast path, so I can use it too

    Begin with a compile time std::vector
    template<size_t N>
    struct static_array
    {
    	consteval size_t size() const { return N; }
    	consteval const uint8_t *data() const { return arr; }
    	consteval uint8_t operator[](size_t n) const { return arr[n]; }
    
    	consteval static_array() ...
    	consteval static_array(const char *s) ...
    	consteval static_array(const uint8_t *s) ...
    	consteval static_array(const uint8_t(&s)[N]) ...
    
    	template<size_t N1, size_t N2>
    	consteval static_array(const static_array<N1> &s1, const static_array<N2> &s2)
    	{
    		static_assert(N1 + N2 == N);
    		for(size_t i = 0; i < N1; ++i)
    			arr[i] = s1[i];
    
    		for(size_t i = 0; i < N2; ++i)
    			arr[N1 + i] = s2[i];
    	}
    
    	uint8_t arr[N == 0 ? 1 : N] = {};
    };
    
    template<size_t N1, size_t N2>
    consteval auto operator+(const static_array<N1> &s1, const static_array<N2> &s2) -> static_array<N1 + N2>
    {
    	return static_array<N1 + N2>(s1, s2);
    }
    
    This can be initialized with any string, byte aggregate or static_array pair, and we can add them. But we also need a compile time switch to enable or disable these things
    template<bool condition>
    struct maybe_include
    {
    	template<typename T, typename = std::enable_if_t<condition>>
    	static consteval T test(T v) { return v; }
    
    	template<typename T, typename = std::enable_if_t<!condition>>
    	static consteval static_array<0> test(T v) { return static_array<0>(); }
    };
    
    So if condition is true we return the array, otherwise we return an empty array

    Protocol buffers are really just a series of varints (and some strings), 7-bit integers and you keep decoding as long as the high bit is set. Importantly the official libraries will decode sequences with trailing zeros, so 81 80 80 00 and 01 both encode 1

    We can construct a varint and it's size like so
    template<size_t N>
    constexpr static_array<N> pb_varint(uint64_t v)
    {
    	static_array<N> s;
    	for(size_t i = 0; i < N - 1; i++, v >>= 7) {
    		s.arr[i] = (uint8_t)v | 0x80;
    	}
    	s.arr[N - 1] = (uint8_t)v;
    
    	return s;
    }
    
    consteval size_t pb_varint_len(uint64_t v)
    {
    	size_t n = 0;
    	do {
    		n++;
    		v >>= 7;
    	} while(v != 0);
    	return n;
    }
    

    Unfortunately parameters/intermediates inside a consteval/expr function aren't compile time constants (even through it can compute compile time constants) so parameterization has to be done in templates
    template<size_t x, size_t min_size = 1>
    constexpr auto pb_make_varint()
    {
    	return pb_varint<std::max(min_size, pb_varint_len(x))>(x);
    }
    
    consteval uint32_t pb_type(uint32_t field, uint32_t type = 0)
    {
    	return (field << 3) | type;
    }
    

    So now
    pb_make_varint<pb_type(1, 0)>() + pb_make_varint<1000>()
    
    yields (effectively) {8, 0xE8, 7} which is the correct encoding for a field with id 1 and value 1000

    A string (or protobuf) would be
    #define PB_CONTAINER(id, contents) pb_make_varint<pb_type(id, 2)>() + pb_make_varint<contents.size()>() + contents
    

    where contents is a static_array, and we can construct static_arrays from strings with
    constexpr size_t immediate_strlen(const char *s)
    {
    	size_t sz = 0;
    	while(*s) s++, sz++;
    	return sz;
    }
    static_array<immediate_strlen(str)>(str)
    

    But wait, there's more!
    For space saving reasons perfetto allows you to either include the data directly in the trace packet or just include an ID that references another data packet

    Normally these IDs are allocated starting from 1 so you get small values, but we can just sacrifice some space and generate unlikely to collide constants that will fit in 5 bytes
    constexpr uint64_t intern_string(const char *s, uint32_t line, uint32_t lineshift)
    {
    	uint32_t crc32 = 0xFFFFFFFFu;
    
    	for(; *s; s++) {
    		uint32_t lookupIndex = (crc32 ^ *s) & 0xff;
    		crc32 = (crc32 >> 8) ^ crcdetail::table[lookupIndex];  // CRCTable is an array of 256 32-bit constants
    	}
    
    	// Finalize the CRC-32 value by inverting all the bits
    	crc32 ^= 0xFFFFFFFFu;
    
    	return crc32 ^ ((uint64_t)line << lineshift);
    }
    #define INTERN_STRING(str) intern_string(str, std::source_location::current().line(), 28)
    #define INTERNED_FIELD(id, str) PB_CONTAINER(id, INTERN_STRING(str))
    
    Note that __LINE__ (& friends) are not available in a constant evaluation context, but std::source_location::current is

    It's not possible to 100% guarantee no collisions at compile time, but you can truncate the hashes down and easily check at startup if any collisions exist (as part of writing the giant info packet)


    Next, categories (this is broadly similar to how perfetto actually handles it as well). We parse the provided string as a comma separated list of up to 4 categories
    template<typename... Args>
    consteval size_t varadic_size(Args&&... args)
    {
    	return sizeof...(args);
    }
    
    consteval bool is_equal(const char *a, const char *b)
    {
    	while(*a && *a == *b) a++, b++;
    	return *a == *b || *a == ',';
    }
    
    template<typename A0>
    consteval const char *ith(size_t i, A0 &&a0)
    {
    	return a0;
    }
    
    template<typename A0, typename... Args>
    consteval const char* ith(size_t i, A0 &&a0, Args&&... args)
    {
    	return i ? ith(i - 1, args...) : a0;
    }
    
    consteval uint32_t count_categories(const char *categories)
    {
    	uint32_t n = 1;
    	bool in = false;
    	for(; *categories; categories++) {
    		if(*categories == ',') {
    			n++;
    		}
    	}
    	return n;
    }
    
    #define DEFINE_STATIC_CATEGORIES(...) \
    	inline alignas(std::hardware_destructive_interference_size) std::atomic<uint8_t> flags[varadic_size(__VA_ARGS__)] = {}; \
    	consteval uint32_t get_trace_index(const char *name) { \
    		for(uint32_t i = 0; i < varadic_size(__VA_ARGS__); i++) { \
    			if(is_equal(name, ith(i, __VA_ARGS__))) \
    				return i; \
    		} \
    		return invalid_category_string_category_not_in_global_list(); \
    	} \
    	consteval std::array<uint32_t, 4> parse_categories(const char *cat) { \
    		uint32_t e[4] = {0, 0, 0, 0}; \
    		for(uint32_t i = 0; i < 4 && *cat; i++) { \
    			e[i] = get_trace_index(cat); \
    			while(*cat && *cat != ',') cat++; \
    			if(*cat == ',') cat++; \
    		} \
    		return std::array<uint32_t, 4>{e[0], e[1], e[2], e[3]}; \
    	} \
    	inline std::vector<const char*> static_categories = {__VA_ARGS__};
    
    // some_common_header.h
    DEFINE_STATIC_CATEGORIES("category1", "category2", ...);
    

    We can check if our event is enabled
    #define CHECK_ENABLED(categories) \
    	flags[parse_categories(categories)[0]].load(std::memory_order_relaxed) || \
    		(count_categories(categories) > 0 && flags[parse_categories(categories)[1]].load(std::memory_order_relaxed)) || \
    		(count_categories(categories) > 1 && flags[parse_categories(categories)[2]].load(std::memory_order_relaxed)) || \
    		(count_categories(categories) > 2 && flags[parse_categories(categories)[3]].load(std::memory_order_relaxed))
    


    So we can produce the guts of a perfectly valid packet like so
    #define TRACK_BEGIN_EVENT(categories, name) \
    	VARINT_FIELD(9, 1) /* optional Type type = 9; */ \
    	+ INTERNED_FIELD(10, name) /* uint64 name_iid = 10; */ \
    	+ INTERNED_FIELD(34, std::source_location::current().file_name()) /* uint64 source_location_iid = 34; */ \
    	+ VARINT(pb_type(3, 2)) /* repeated uint64 category_iids = 3; */ \
    	+ VARINT(count_categories(categories)) \
    	+ maybe_include<(count_categories(categories) > 0)>::test(static_array({1 + parse_categories(categories)[0]})) \
    	+ maybe_include<(count_categories(categories) > 1)>::test(static_array({1 + parse_categories(categories)[1]})) \
    	+ maybe_include<(count_categories(categories) > 2)>::test(static_array({1 + parse_categories(categories)[2]})) \
    	+ maybe_include<(count_categories(categories) > 3)>::test(static_array({1 + parse_categories(categories)[3]}))
    
    It gets slightly more complicated as this is the contents of a protobuf that is itself nested in a containing protobuf

    Finally we can optimize the track ID write by making the ID we use just be the varint-encoded version of the counter instead so it can be written directly. Functionally we then execute
    if(CHECK_ENABLED(categories)) [[unlikely]]
    	uint8_t *mem = rb_get();
    	constexpr size_t sz = TRACK_BEGIN_EVENT_CONTAINER(categories, name).size();
    	memcpy(mem, TRACK_BEGIN_EVENT_CONTAINER(categories, name).data(), sz);
    	mem += sz;
    	memcpy(mem, &trackid, 4);
    	mem += 4;
    	TRACE_WRITE_TIMESTAMP(mem);
    	rb_advance((uint32_t)sz + 14);
    }
    

    All that's left is to stuff all the interned raw data into some custom section, parse that section at dump thread startup (and ensure the compiler is confused enough that it can't strip it)

    Unfortunately there seems to be no clever way to speed up writing the timestamp (and I default to writing a large version too)
    *mem++ = pb_type(8, 0); /* optional uint64 timestamp = 8; */
    	for(int i = 0; i < 8; i++, ts >>= 7) *mem++ = 0x80 | (uint8_t)ts;
    	*mem++ = (uint8_t)ts;
    
    and it ends up being a good chunk of all the instructions executed


    Currently everything is inlined for each trace, but it should be easy to reduce code bloat and pass only the custom precomputed array to a common function and lose little (if any) efficiency. Could also move the proper formatting to a different thread as well and only write some pointers and lengths in the trace event
    ; check for 4 categories. Maybe make uint64s and test for bitpatterns? Though multiple categories are less common anyway
    00007FF754941453 0F B6 05 26 E9 00 00 movzx       eax,byte ptr [trace::details::flags (07FF75494FD80h)]  
    00007FF75494145A 84 C0                test        al,al  
    00007FF75494145C 75 34                jne         main+52h (07FF754941492h)  
    00007FF75494145E 0F B6 05 1C E9 00 00 movzx       eax,byte ptr [trace::details::flags+1h (07FF75494FD81h)]  
    00007FF754941465 84 C0                test        al,al  
    00007FF754941467 75 29                jne         main+52h (07FF754941492h)  
    00007FF754941469 0F B6 05 12 E9 00 00 movzx       eax,byte ptr [trace::details::flags+2h (07FF75494FD82h)]  
    00007FF754941470 84 C0                test        al,al  
    00007FF754941472 75 1E                jne         main+52h (07FF754941492h)  
    00007FF754941474 0F B6 05 08 E9 00 00 movzx       eax,byte ptr [trace::details::flags+3h (07FF75494FD83h)]  
    00007FF75494147B 84 C0                test        al,al  
    00007FF75494147D 75 13                jne         main+52h (07FF754941492h)  
    
    ; [[unlikely]] branch
    00007FF754941492 E8 19 00 00 00       call        `main'::`2'::internal_trace_struct83::trace (07FF7549414B0h)  
    
    
    ; entry
    00007FF7549414B0 40 53                push        rbx  
    00007FF7549414B2 48 83 EC 40          sub         rsp,40h  
    ; access thread buffer
    00007FF7549414B6 65 48 8B 04 25 58 00 00 00 mov         rax,qword ptr gs:[58h]  
    00007FF7549414BF 0F 10 44 24 20       movups      xmm0,xmmword ptr [rsp+20h]  
    00007FF7549414C4 BB 38 01 00 00       mov         ebx,138h  
    00007FF7549414C9 F2 0F 10 4C 24 30    movsd       xmm1,mmword ptr [rsp+30h]  
    00007FF7549414CF 48 8B 08             mov         rcx,qword ptr [rax]  
    00007FF7549414D2 8B 44 24 38          mov         eax,dword ptr [rsp+38h]  
    00007FF7549414D6 48 03 D9             add         rbx,rcx  
    ; calculate write ptr
    00007FF7549414D9 4C 8B 0B             mov         r9,qword ptr [rbx]  
    00007FF7549414DC 45 8B 41 04          mov         r8d,dword ptr [r9+4]  
    00007FF7549414E0 4D 03 C1             add         r8,r9  
    ; write the precomputed packet
    00007FF7549414E3 41 0F 11 40 0C       movups      xmmword ptr [r8+0Ch],xmm0  
    00007FF7549414E8 F2 41 0F 11 48 1C    movsd       mmword ptr [r8+1Ch],xmm1  
    00007FF7549414EE 41 89 40 24          mov         dword ptr [r8+24h],eax  
    00007FF7549414F2 0F B6 44 24 3C       movzx       eax,byte ptr [rsp+3Ch]  
    00007FF7549414F7 41 88 40 28          mov         byte ptr [r8+28h],al  
    ; write track ID
    00007FF7549414FB B8 30 01 00 00       mov         eax,130h  
    00007FF754941500 8B 04 08             mov         eax,dword ptr [rax+rcx]  
    00007FF754941503 41 89 40 29          mov         dword ptr [r8+29h],eax  
    ; get timestamp
    00007FF754941507 0F 31                rdtsc  
    00007FF754941509 48 C1 E2 20          shl         rdx,20h  
    00007FF75494150D 48 0B D0             or          rdx,rax  
    00007FF754941510 48 2B 15 29 EA 00 00 sub         rdx,qword ptr [trace::details::global_clock_zero (07FF75494FF40h)]  
    ; encode timestamp.... /sigh
    00007FF754941517 41 C6 40 2D 40       mov         byte ptr [r8+2Dh],40h  
    00007FF75494151C 0F B6 CA             movzx       ecx,dl  
    00007FF75494151F 48 C1 EA 07          shr         rdx,7  
    00007FF754941523 80 C9 80             or          cl,80h  
    00007FF754941526 41 88 48 2E          mov         byte ptr [r8+2Eh],cl  
    00007FF75494152A 0F B6 CA             movzx       ecx,dl  
    00007FF75494152D 48 C1 EA 07          shr         rdx,7  
    00007FF754941531 80 C9 80             or          cl,80h  
    00007FF754941534 41 88 48 2F          mov         byte ptr [r8+2Fh],cl  
    00007FF754941538 0F B6 C2             movzx       eax,dl  
    00007FF75494153B 0C 80                or          al,80h  
    00007FF75494153D 48 C1 EA 07          shr         rdx,7  
    00007FF754941541 41 88 40 30          mov         byte ptr [r8+30h],al  
    00007FF754941545 0F B6 C2             movzx       eax,dl  
    00007FF754941548 0C 80                or          al,80h  
    00007FF75494154A 48 C1 EA 07          shr         rdx,7  
    00007FF75494154E 41 88 40 31          mov         byte ptr [r8+31h],al  
    00007FF754941552 0F B6 C2             movzx       eax,dl  
    00007FF754941555 0C 80                or          al,80h  
    00007FF754941557 48 C1 EA 07          shr         rdx,7  
    00007FF75494155B 41 88 40 32          mov         byte ptr [r8+32h],al  
    00007FF75494155F 0F B6 C2             movzx       eax,dl  
    00007FF754941562 0C 80                or          al,80h  
    00007FF754941564 48 C1 EA 07          shr         rdx,7  
    00007FF754941568 41 88 40 33          mov         byte ptr [r8+33h],al  
    00007FF75494156C 0F B6 C2             movzx       eax,dl  
    00007FF75494156F 0C 80                or          al,80h  
    00007FF754941571 48 C1 EA 07          shr         rdx,7  
    00007FF754941575 41 88 40 34          mov         byte ptr [r8+34h],al  
    00007FF754941579 0F B6 C2             movzx       eax,dl  
    00007FF75494157C 0C 80                or          al,80h  
    00007FF75494157E 48 C1 EA 07          shr         rdx,7  
    00007FF754941582 41 88 40 35          mov         byte ptr [r8+35h],al  
    00007FF754941586 41 88 50 36          mov         byte ptr [r8+36h],dl  
    ; check for buffer full
    00007FF75494158A 41 83 41 04 2B       add         dword ptr [r9+4],2Bh  
    00007FF75494158F 41 8B 41 04          mov         eax,dword ptr [r9+4]  
    00007FF754941593 41 3B 41 08          cmp         eax,dword ptr [r9+8]  
    00007FF754941597 77 06                ja          `main'::`2'::internal_trace_struct83::trace+0EFh (07FF75494159Fh)  
    00007FF754941599 48 83 C4 40          add         rsp,40h  
    00007FF75494159D 5B                   pop         rbx  
    00007FF75494159E C3                   ret  
    ; [[unlikely]] flush path
    00007FF75494159F 48 8D 15 5A FA 00 00 lea         rdx,[intern83 (07FF754951000h)]  
    00007FF7549415A6 49 8B C9             mov         rcx,r9  
    00007FF7549415A9 E8 92 29 00 00       call        trace::details::rb_flush (07FF754943F40h)  
    00007FF7549415AE B9 00 00 01 00       mov         ecx,10000h  
    00007FF7549415B3 FF 15 EF 8C 00 00    call        qword ptr [__imp_malloc (07FF75494A2A8h)]  
    00007FF7549415B9 33 C9                xor         ecx,ecx  
    00007FF7549415BB 48 89 03             mov         qword ptr [rbx],rax  
    00007FF7549415BE C7 40 08 F4 FE 00 00 mov         dword ptr [rax+8],0FEF4h  
    00007FF7549415C5 48 89 08             mov         qword ptr [rax],rcx  
    00007FF7549415C8 EB CF                jmp         `main'::`2'::internal_trace_struct83::trace+0E9h (07FF754941599h)
    

    I haven't timed it yet, but it's gotta beat 300ns, I'm eliding almost all of the awkward protobuf stuff

  • FremFrem Registered User regular
    Ear3nd1l wrote: »
    The only ORM I've liked is Entity Framework. Mongoose for Mongo is OK, but I have always preferred database-first development instead of code-first. But I cut teeth on dBase IV and MSSQL 5 back in the 90s, so I have a pretty good handle on the do's and don'ts of database development.

    In Ruby I like Sequel a lot. ActiveRecord gets all the love here, and it's fine for the vast majority of common database operations. But when you need to drop abstractions and write something close to raw SQL, Sequel is so much less painful.

  • EchoEcho ski-bap ba-dapModerator, Administrator admin
    I came across ULID again and started pondering orbs things. We do a lot of cursor pagination on things that have a UUID as a PK, which means we need another unchanging field to sort on for cursor paginating, which would be the insertion timestamp since it's there anyway.

    ULID would be sortable on that alone, since it's a timestamp+entropy in one field you could sort on. There wouldn't be any major benefits to bother migrating existing stuff to that, but I'll consider it for future things.

    tl;dr the problem with UUID v4 is that they're randomly generated, so if you order by them, you'll get new insertions placed at random places so the "ORDER BY uuid LIMIT 100" would not be an immutable result, which is why you add that second constant field (like an auto-incrementing ID, or an insertion timestamp which also happens to give you sorting by the timestamp) so the UUIDs aren't the sole ordering factor.

    Also UUIDs look ugly which is clearly my primary reason here.

  • gavindelgavindel The reason all your software is brokenRegistered User regular
    While I understand the need to prevent leaking information to potential attackers, man I hate staring at the same generic auth error for six hours straight. The only part of this that's even right is the authority!*

    *(Spoiler: The authority was the problem)

    Book - Royal road - Free! Seraphim === TTRPG - Wuxia - Free! Seln Alora
  • PhyphorPhyphor Building Planet Busters Tasting FruitRegistered User regular
    edited September 2022
    Hmm it looks like constexpr strings and vectors did make it into 20, I thought that got bumped. That simplifies things dramatically! No longer having to change type to resize means I can actually build proper structs
    struct SourceLocation
    {
    	consteval size_t contents_size() const
    	{
    		return iid.size() + file_name.size() + function_name.size() + line_number.size();
    	}
    	consteval void append(std::vector<uint8_t> &v)
    	{
    		iid.append(v);
    		file_name.append(v);
    		function_name.append(v);
    		line_number.append(v);
    	}
    
    	Varint<1> iid;
    	String<2> file_name;
    	String<3> function_name;
    	Varint<4> line_number;
    };
    

    Phyphor on
  • KakodaimonosKakodaimonos Code fondler Helping the 1% get richerRegistered User regular
  • urahonkyurahonky Cynical Old Man Registered User regular
    I have a presentation I'm giving today to some BA/PM/QA folks. My topic is about Jira Point Estimation and how it's a headache for devs. I already have a ton of talking points but I'm curious if anyone here has anything to say on the topic? If not that's okay but I figured I'd open it up here to see what others have to say about it.

  • schussschuss Registered User regular
    edited September 2022
    urahonky wrote: »
    I have a presentation I'm giving today to some BA/PM/QA folks. My topic is about Jira Point Estimation and how it's a headache for devs. I already have a ton of talking points but I'm curious if anyone here has anything to say on the topic? If not that's okay but I figured I'd open it up here to see what others have to say about it.

    I mean - don't overthink it? I'm a person who does agile measurement and consulting as part of my job. Points are just a tool to help size stories. They cannot generally be compared across teams and should just be proxies for relative effort within a team to help make planning more consistent. They will sometimes be wrong. Generally I recommend people start with some basic straw man framework - 1 point - half day to one day of one person. 3 points - 3 days effort or so. 5 points - likely a whole 2 week sprint or close to. Adjust as needed.
    A common trap people fall into is "hey here's a story, and each role gets a subtask". You want to be really careful about that, as every story should fit into a sprint. So your five point story that needs QA to test things (and note, generally you want people to be trained by the QA person to test their own shit) will need a separate story (NOT SUB TASK) for testing as it's unlikely that fits into a sprint. More appropriate would be to see where you can break up the five point story, as those should be relative unicorns if you're iterating properly.
    A lot of other people get the bright idea to track pts/person/sprint as well, which is stupid because it discourages pairing and teaming while preventing easy transfer of work items. So with that you end up with more stories with inflated points and more admin overhead.
    EDIT: Also - unless you're assigning when you're pointing with the people in the room, point for the average dev. Many senior devs point as if they were doing it rather than if a random person their team did it, which leads to lots of carryover as the junior dev cannot execute as quickly as the senior.

    schuss on
  • DelzhandDelzhand Agrias Fucking Oaks Registered User, Transition Team regular
    edited September 2022
    Can anyone with experience using CryptoJS explain to me why the input and output strings to this are different?
    var ciphertext = CryptoJS.enc.Base64.parse(inData);
    var outData = ciphertext.toString(CryptoJS.enc.Base64));
    

    Followup question that's been plaguing me for an entire day: How do I get the IV of an AES encrypted string when all I have is the key it was encrypted with? Do I even need it? It seems like if it can be derived from the encrypted data then the library should handle it without additional input.

    Delzhand on
  • SageinaRageSageinaRage Registered User regular
    edited September 2022
    schuss wrote: »
    urahonky wrote: »
    I have a presentation I'm giving today to some BA/PM/QA folks. My topic is about Jira Point Estimation and how it's a headache for devs. I already have a ton of talking points but I'm curious if anyone here has anything to say on the topic? If not that's okay but I figured I'd open it up here to see what others have to say about it.

    I mean - don't overthink it? I'm a person who does agile measurement and consulting as part of my job. Points are just a tool to help size stories. They cannot generally be compared across teams and should just be proxies for relative effort within a team to help make planning more consistent. They will sometimes be wrong. Generally I recommend people start with some basic straw man framework - 1 point - half day to one day of one person. 3 points - 3 days effort or so. 5 points - likely a whole 2 week sprint or close to. Adjust as needed.
    A common trap people fall into is "hey here's a story, and each role gets a subtask". You want to be really careful about that, as every story should fit into a sprint. So your five point story that needs QA to test things (and note, generally you want people to be trained by the QA person to test their own shit) will need a separate story (NOT SUB TASK) for testing as it's unlikely that fits into a sprint. More appropriate would be to see where you can break up the five point story, as those should be relative unicorns if you're iterating properly.
    A lot of other people get the bright idea to track pts/person/sprint as well, which is stupid because it discourages pairing and teaming while preventing easy transfer of work items. So with that you end up with more stories with inflated points and more admin overhead.
    EDIT: Also - unless you're assigning when you're pointing with the people in the room, point for the average dev. Many senior devs point as if they were doing it rather than if a random person their team did it, which leads to lots of carryover as the junior dev cannot execute as quickly as the senior.

    I like the method of having each person point it as if they were doing the work, and then take the average. This can help a lot when there's a big experience disparity.

    Also, make sure people factor uncertainty into their scores. If there's still requirements not fully defined, or if people haven't read or don't understand part of the codebase, that should make the points go up, even if it seems like a simple task on the surface.

    edit:: I've been at some places where they want everyone to agree on a score, and I don't think that's a good process. Leads to arguments that I think are unnecessary, when the differences are valid. Just take the average and move on. Take an initial vote, then have discussion, see if people's opinions change, but don't force it.

    SageinaRage on
    sig.gif
  • schussschuss Registered User regular
    I'd also say - the biggest issue I usually see is poorly defined stories and acceptance criteria, so you could assign point values of 1-500 based on the vagueness and be "right". Fixing that usually fixes all the other symptoms.

  • SpoitSpoit *twitch twitch* Registered User regular
    edited September 2022
    schuss wrote: »
    urahonky wrote: »
    I have a presentation I'm giving today to some BA/PM/QA folks. My topic is about Jira Point Estimation and how it's a headache for devs. I already have a ton of talking points but I'm curious if anyone here has anything to say on the topic? If not that's okay but I figured I'd open it up here to see what others have to say about it.

    I mean - don't overthink it? I'm a person who does agile measurement and consulting as part of my job. Points are just a tool to help size stories. They cannot generally be compared across teams and should just be proxies for relative effort within a team to help make planning more consistent. They will sometimes be wrong. Generally I recommend people start with some basic straw man framework - 1 point - half day to one day of one person. 3 points - 3 days effort or so. 5 points - likely a whole 2 week sprint or close to. Adjust as needed.
    A common trap people fall into is "hey here's a story, and each role gets a subtask". You want to be really careful about that, as every story should fit into a sprint. So your five point story that needs QA to test things (and note, generally you want people to be trained by the QA person to test their own shit) will need a separate story (NOT SUB TASK) for testing as it's unlikely that fits into a sprint. More appropriate would be to see where you can break up the five point story, as those should be relative unicorns if you're iterating properly.
    A lot of other people get the bright idea to track pts/person/sprint as well, which is stupid because it discourages pairing and teaming while preventing easy transfer of work items. So with that you end up with more stories with inflated points and more admin overhead.
    EDIT: Also - unless you're assigning when you're pointing with the people in the room, point for the average dev. Many senior devs point as if they were doing it rather than if a random person their team did it, which leads to lots of carryover as the junior dev cannot execute as quickly as the senior.

    I like the method of having each person point it as if they were doing the work, and then take the average. This can help a lot when there's a big experience disparity.

    Also, make sure people factor uncertainty into their scores. If there's still requirements not fully defined, or if people haven't read or don't understand part of the codebase, that should make the points go up, even if it seems like a simple task on the surface.

    edit:: I've been at some places where they want everyone to agree on a score, and I don't think that's a good process. Leads to arguments that I think are unnecessary, when the differences are valid. Just take the average and move on. Take an initial vote, then have discussion, see if people's opinions change, but don't force it.

    I disagree a bit, while sometimes the difference is just because of experience making it seem easier/harder, most of the time the arguments can help expose hidden requirements and complexities that the pm didn't think of

    Spoit on
    steam_sig.png
  • djmdjm Registered User regular
    We do pointing where everyone gives a number, then as long as they're all close enough we just take the average. If some estimates are significantly higher/lower than the rest, we'll ask why, maybe that person thought of something that got missed, or maybe that person already knows how to do the work and it'll go quicker. (but then we have to make sure that the low-estimator gets the story -- sometimes it makes sense to assign work to someone that _doesn't_ know how to do it yet, so that they get to learn about that bit of the code along the way)

  • dporowskidporowski Registered User regular
    Yeah, "why did you say 3 and why did he say 8 for this?" exposes some Stuff, let me tell you.

  • InfidelInfidel Heretic Registered User regular
    dporowski wrote: »
    Yeah, "why did you say 3 and why did he say 8 for this?" exposes some Stuff, let me tell you.

    Yep, doing planning poker the exercise is more important than the actual values, ime.

    We do it, and we call out outliers, as a means of prompting further discussion.

    Had a very memorable ticket where every developer said 3 and the QA said 13. That was a very vital conversation and why I just use estimation as a tool to get people to engage with tickets, I only pretend to care about the sprint reports.

This discussion has been closed.