00:04 jekstrand: Woo! Progress!
00:06 jenatali: Ugh the CL CTS is really annoying with how precise it wants rounding to be
00:06 jekstrand: I've got it crashing on an unsupported load_input intrinsic.
00:06 karolherbst: jenatali: well, that's good, no?
00:06 karolherbst: :p
00:06 jenatali: For some definition of good
00:07 karolherbst: jenatali: load_kernel_input I guess?
00:07 karolherbst: ... jekstrand
00:07 jekstrand: karolherbst: Yeah
00:07 jekstrand: karolherbst: It's progress. At least the loader is loading the ICD, recognizing that it can do compute, and trying to run stuff.
00:07 karolherbst: :)
00:09 jenatali: karolherbst: 0xffffff7fffffffff => *18446742974197923840.000000 vs 18446744073709551616.000000
00:09 jenatali: :(
00:09 karolherbst: :D
00:09 karolherbst: wait until you get to NaN handling
00:09 jenatali: Nah, that's easy. Flush to 0
00:10 karolherbst: heh
00:10 karolherbst: if you think so :p
00:10 karolherbst: wait.. conversion to int probably that's fine actually
00:10 jenatali: The only conversion failures we have left are fp32 <-> long/ulong, and not due to NaNs, but due to the fact we're not really supporting int64 so we're lowering it
00:12 karolherbst: ufff
00:12 karolherbst: sounds annoying
00:12 airlied: okay nailed the last CTS 4.6.0 issue
00:12 jenatali: karolherbst: Isn't that exactly what I said? :P
00:13 airlied: hopefully once that is fixed I can submit GL 4.5 for llvmpipe
00:13 jekstrand: Ok, at this point, the next stage is figuring out how to do inputs again. I'm going to put it down and make supper.
00:13 jekstrand: I can figure out inputs tomorrow.
00:13 jekstrand: Then I can start hacking on trying to clean up the unstructured SPIR-V path a bit.
00:13 jekstrand: I think I can cut it at least in half.
00:14 airlied: jekstrand: I thought inputs was the shove them in cbuf0 hack
00:14 jekstrand: airlied: They are. I need to write the hack. :)
00:14 airlied: https://cgit.freedesktop.org/~airlied/mesa/commit/?h=nonya&id=b468b05d3c65c3d6f2dffbe82ca8c2c4cc60142f
00:14 airlied: it even has HACK written on it :-P
00:14 airlied: oh an your name
00:15 jekstrand: airlied: I know. I was planning to do it differently this time
00:15 jekstrand: With actual cbuf0 rather than push constants
00:15 jekstrand: Though I do kind-of like doing it with push constants.......
00:16 airlied: how I mapped that to vulkan was pretty horrible
00:16 jekstrand: hehe
00:16 jekstrand: I can imagine
00:16 karolherbst: jekstrand: "always aligned" ... hehe
00:17 karolherbst: you wish
00:17 karolherbst: you are aware that you can pass in structs by value and those structs can be packed? :p
00:17 airlied: karolherbst: did you ever see any robustness fails on the 4.6.0 GL CTS?
00:17 karolherbst: airlied: don't think so, why?
00:19 airlied: the last bug I had was 4.6.0 only and it looks like the test result is inverted
00:19 airlied: no idea though how anyone would pass it
00:20 karolherbst: patches I guess? dunno :p
00:20 airlied: of course it could be the new code in master is broken
00:20 karolherbst: might be
00:20 airlied: I should probably work out where is goes wrong
00:21 karolherbst: soo... now I have to store the local size...
00:21 karolherbst: but where
00:23 jenatali: karolherbst: isn't there a local_size in the shader_info for cs?
00:23 karolherbst: jenatali: I meant in clover
00:23 jenatali: Ah :)
00:23 karolherbst: sadly reading it out was the easy part :p
02:33 cmarcelo: jekstrand: by now you figured, but for iris we initially did a "special case" push (since it was the only thing pushed), but then moved it to be with iris system values (which means cbuf0), with the idea that we could optimize cbuf0 handling to pick things to push one day.
03:19 jekstrand: cmarcelo: yup
03:20 jekstrand: cmarcelo: I think I'm going to go the same direction with shamrock
03:20 jekstrand: I just need to write the code
03:22 jekstrand: Wait....
03:22 jekstrand: I thought we put system values in cbuf0
03:22 jekstrand: But, no, we put them in cbufN-1
03:23 jekstrand: Oh, that's aggrivating
03:24 jekstrand:wonders how hard it would be to make clover upload cbuf0
03:24 jenatali: jekstrand, karolherbst: There has to be a better way to find places where NIR wasn't updated for 16-component vectors than just running until you hit heap corruption
03:24 jekstrand: jenatali: :-(
03:25 jekstrand: jenatali: Search for 4
03:25 jenatali: Just debugged a problem that only reproduced on binaries built by our automation, and didn't repro when built locally
03:25 jekstrand: jenatali: :-(
03:25 jekstrand: jenatali: I fix them whenever I come across them.
03:25 airlied: asan would probalby catch them
03:25 jenatali: jekstrand: Thanks. Same :)
03:25 jekstrand: But, yeah, we should have parameterized it from the start rather than just assuming 4
03:25 jenatali: airlied: For stack variables?
03:25 jekstrand: asan can catch OOB on stack variables
03:26 jenatali: Ah, cool
03:26 jekstrand: But that still requires hitting that path with a vec8 or vec16
03:26 airlied: jekstrand: I think we did, then just wrote 4 anyways :-P
03:27 jenatali: The most recent one: https://gitlab.freedesktop.org/mesa/mesa/-/blob/master/src/compiler/nir/nir_range_analysis.c#L90
03:32 jenatali: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6275
05:07 tomeu: airlied: vendoring today's libLLVMSPIRVLib won't be as much fun now that it's not a fork of LLVM
07:05 linusw: Can someone help me merge v5.8 final into drm-misc-next or should I try this crazy thing myself, or should I maybe not do it because bad idea?
08:39 MrCooper: tpalli: why did you push to https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6240 again? Just reassigning to Marge again (as I did) should have done the trick, no need for another CI pipeline run
09:53 danvet: airlied, [PATCH] drm/ttm: revert "make TT creation purely optional v3" <- I'll leave this for you to ack
10:23 karolherbst: jenatali: ehh.. for design reasons I can't implement the local_size thing.... well except I rework how clover manages/compiles kernels :/
10:23 karolherbst: but that's a huge pain already anyway
11:23 karolherbst: jekstrand: mhh. I think we can actually share most of the switch handling code. I moved things around a bit and now I only have one difference: vtn_case vs util_dynarray as the hash table Value type, everything else is the same
11:24 karolherbst: well.. and with unstructured I also save the default block, but that's a minor issue
11:30 karolherbst: I can do a callback, but that mildly sucks
11:34 karolherbst: https://gist.github.com/karolherbst/8a600be0be557d5fe4c0ccb37f52c04a
11:36 karolherbst: mhh.. maybe I really should just parse the literals out and the structured code would do a bit of copying the dynarrays... mhh
11:37 karolherbst: but dynarrays can also just be assigned...
11:37 karolherbst: yeah...
11:37 karolherbst: and then just do a loop for structured over the parsed literals and create the vtn_case objects
11:56 tpalli: MrCooper I thought marge maybe failed to rebase so I rebased manually
11:59 karolherbst: ehhh... "const"..
12:07 mlankhorst: skeggsb: can I merge https://patchwork.freedesktop.org/patch/362898/?series=76431&rev=1 through drm-misc-next?
12:10 karolherbst: jekstrand: sooo.. the only annoying thing left to solve is that "warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]" warning when dealing with hash table keys
12:13 karolherbst: but I think the hash_table_entry struct is actually wrong
12:13 hakzsam: freedreno CI flakes: https://gitlab.freedesktop.org/hakzsam/mesa/-/jobs/4035758
12:14 karolherbst: shouldn't it be "void const *key;" instead?
12:14 karolherbst: ehh wait
12:14 karolherbst: that's the same
12:14 karolherbst: mhhh
12:15 karolherbst: void * const key;
12:15 karolherbst: const void* key; is a bit pointless isn't it as consumers can just overwrite the key...
13:12 jenatali: karolherbst: Hash table keys should be immutable. It should probably be const void* const key;
13:14 karolherbst: well.. right
13:25 pq: C makes it fundamentally difficult to have struct members that should be const after the first initialization.
13:26 karolherbst: pq: that's true for like all languages
13:27 karolherbst: most C developers just don't initialize in one go, but declare the variable, then assign
13:27 karolherbst: but you can always do var = { ... };
13:28 karolherbst: you could even write constructors and copy by value the result and the compiler will constant fold it
13:28 karolherbst: soo.. really not different to other languages
13:28 pq: isn't that assignment illegal wrt. to any members that are const?
13:28 karolherbst: why would it?
13:29 karolherbst: the thing is, you do it when declaring
13:29 karolherbst: then it's fine
13:29 karolherbst: but most C developers are just declaring at the top still :p
13:29 pq: because you are assigning to a const member for 'var', not initializing 'var'
13:29 karolherbst: goes a bit against what most used to do
13:29 karolherbst: doesn't matter
13:30 pq: I mean, I assume the object to initialize is allocated dynamically, which means you can't use an initializer.
13:31 karolherbst: mhhh, for dynamic allocations it's more annoying right.. mhh
13:32 karolherbst: okay.. for dynamic allocations it's busted :)
13:33 karolherbst: you can probably play dumb tricks with unions and private declarations and shit
13:33 karolherbst: but yeah
13:33 karolherbst: mhhh
13:34 pq: yeah
13:34 karolherbst: I guess you could have an allocator taking an initial value
13:34 karolherbst: and just const cast internally
13:34 karolherbst: that's probably the easiest way out
13:35 karolherbst: or well.. memcpy
13:35 karolherbst: MALLOC_STRUCT(T, val) void *ptr = malloc(sizeof(T); memcpy(ptr, &val, sizeof(T));
13:36 karolherbst: and use it like MALLOC_STRUCT(struct point, { 5, 6})
13:36 karolherbst: not too bad actually
13:37 pq: I wonder if something in that could still explode wrt. compiler optimizations. :-)
13:37 karolherbst: I think C guarentees that it doesn't do dumb shit with const members of structs :p
13:37 karolherbst: but yeah.. might be?
13:38 pq: something something aliasing rules?
13:39 karolherbst: I doubt it as at this point you probably left scope already and C still has to treat it like memory... but yeah.. also don't know the details :p
13:39 karolherbst: I just assume enough projects rely on that to behave sanely
13:43 pq: well, put it in a function, not a macro, and maybe you're safe? :-)
13:43 karolherbst: mhh, but then it needs to be typed :p
13:43 pq: maaaaybe not
13:44 emersion: what about not doing weird things?
13:44 karolherbst: well, you can have a macro wrapper casting the iniitalizer
13:44 karolherbst: and then have a void* init
13:45 pq: emersion, it's not that weird if you look at the goal of exposing an API where the struct members you're not allowed to change are actually marked as const.
13:45 karolherbst: emersion: it's not weird
13:46 emersion: well, that's adding your own micro-framework people need to use to contribute
13:46 emersion: instead of doing plain idiomatic C
13:46 pq: not really
13:46 emersion: also macro magic
13:47 pq: your users have to call create_foo() like things anyway, all the macro magic is hidden inside that.
13:47 emersion: i mean, inside the lib
13:47 karolherbst: pq: ehh. the preprocessor kills my idea :(
13:49 pq: oh hey, one more idea: struct con { int this_is_const; }; struct api_obj { const struct con *con; };, struct api_obj *o = malloc(sizeof struct api_obj + sizeof struct con); o->con = o+1;
13:49 karolherbst: ehhh
13:50 pq: hmm, can that be compacted...
13:50 pq: nah, just leads to the same old pointer cast tricks
13:51 karolherbst: pq: https://gist.github.com/karolherbst/81eb510caa32eabe78abc6bc009c9faa
13:51 bnieuwenhuizen: pq: IIRC casting a const-pointer to a non-const pointer is valid though? (Unless the const pointer actually pointed to const storage?)
13:51 karolherbst: the only problem are those , inside the initializer :/
13:51 pq: I think I may have thought about this many years ago, and the conclusion was "can't bother, just live without const"
13:52 karolherbst: because cpp is stupid :/
13:53 karolherbst: ohhh
13:53 jenatali: karolherbst: Wrap it in another macro, which takes variadic args and just expands to its args
13:53 karolherbst: yeah :D
13:53 jenatali: Cpp needs to use it for template macro args all the time unfortunately...
13:54 karolherbst: jenatali: no need for a second macro
13:54 karolherbst: pq, jenatali: https://gist.github.com/karolherbst/81eb510caa32eabe78abc6bc009c9faa
13:54 jenatali: Ah, sure
13:55 karolherbst: requires C11 though :p
13:56 karolherbst: ohh.. c99 is fine
13:56 karolherbst: c++11 wasn't I think
13:56 pq: oh my, what a trick :-D
13:56 karolherbst: or something weird
13:56 pq: C is not valid C++, what else it new ;-)
13:56 karolherbst: I mean, those initializers got added in c99
13:57 karolherbst: and c++11 is dumb
13:57 jenatali: Disagree :)
13:57 karolherbst: ehh c++ complains about other stupid shit
13:58 karolherbst: "error: taking address of rvalue [-fpermissive]"
13:58 karolherbst: ehhh
13:58 karolherbst: I guess that's valid, but matters not?
14:00 pq: If you're compiling that as C++, what the size of struct point?
14:00 karolherbst: ahhh
14:00 karolherbst: "C++ designated initializers only available with ‘-std=c++2a’ or ‘-std=gnu++2a’"
14:00 karolherbst: so you need C++20 anyway
14:00 pq: 0 maybe?
14:00 karolherbst: "warning: ISO C++ forbids compound-literals" ...
14:00 karolherbst: I give up :D
14:01 karolherbst: pq: why would it be 0?
14:01 jenatali: karolherbst: Even with designated initializers, the C++ spec requires that they be initialized in the same order they're declared in the struct
14:01 jenatali: Since constructors and side effects and blah blah blah
14:01 karolherbst: I know...
14:01 karolherbst: it's annoying
14:01 karolherbst: C++ needs this "this is a C struct" thing :p
14:02 karolherbst: struct == class was a mistake
14:02 pq: Does the compiler actually have to allocate memory in *every* instance for const members of a class? :-p
14:02 karolherbst: pq: :D I think so
14:02 jenatali: pq: Unless it's static const - maybe that's what you're thinking?
14:02 karolherbst: you still need to be able to pass it around and stuff, but yeah.. I guess it could optimize it later on
14:03 pq: oh yes, static indeed
14:03 karolherbst: dunno if it would reduce the allocation
14:03 karolherbst: it just would remove the allocation alltogether :p
14:03 karolherbst: in C++ it's legal to optimize constructor calls away with side effects :D
14:03 karolherbst: so I guess you have other options
14:04 karolherbst: anyway
14:04 karolherbst: it doesn't work in C++ because of the rvalue address
14:04 karolherbst: it's fine at runtime though
14:04 pq: don't C++ constructors give a trivial solution to the problem of initializing const members from constructor arguments?
14:04 karolherbst: yes
14:04 karolherbst: but you might still deal with C structs
14:04 pq: so you only need the tricks for C :-)
14:05 pq: did you try declaring the struct as C?
14:05 pq: extern "C" and so on?
14:06 karolherbst: same
14:06 pq: d'oh
14:06 karolherbst: oh well...
14:06 pq: "can't bother"
14:06 karolherbst: but you can do other magic in C++ :p
14:06 karolherbst: operator overloading eg
14:07 jenatali: pq: extern "C" just affects name mangling for functions
14:07 karolherbst: yeah.. but I think that's the biggest flaw in C++ that it doesn't treat struct as C structs, but as C++ classes
14:08 karolherbst: but yeah.. irrelevant here
14:08 karolherbst: taking addresses or rvalues is indeed a bad idea, it just happens to be fine in this case
14:08 karolherbst: I think....
14:08 karolherbst: mhh
14:08 karolherbst: I have an idea
14:18 karolherbst: mhhh
14:19 karolherbst: oh well
14:19 karolherbst: there are multiple ways on making it better, but either it requres GNU extensions or weird stuff
14:26 jekstrand: karolherbst: Cool. Yeah, I think we can probably mostly unify. I was going to take a look at that myself but I've been rebasing shamrock
14:26 karolherbst: jekstrand: already pushed an update and left a comment describing the ugly bits
14:27 karolherbst: now wondering what to replace the set with, but lists are also so annoying :/
14:27 karolherbst: do we have a list thing, which doesn't require a wrapper struct? :p
14:27 jekstrand: We already have a list node in vtn_cf_node. Just use that.
14:28 jekstrand: I don't think your blocks are ever being placed in any other CF node
14:28 karolherbst: right..
14:29 karolherbst: yeah.. I guess that should be fine then
14:30 jenatali: Would someone be willing to Marge https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6275 for me? Assuming you think it's good to go
14:30 karolherbst: ehhh
14:30 karolherbst: feelds like we want to have a global constant default_swizzle thing :p
14:31 jekstrand: jenatali: Add an rb tag from me and I'll marge it
14:31 karolherbst: I am sure we have that one like hundreds of time spread around
14:31 jekstrand: karolherbst: I don't think it's as simple as that
14:32 jekstrand: most cases modify their swizzle so a global constant won't workl.
14:32 karolherbst: ahh.. right
14:33 jenatali: jekstrand: Done
14:43 karolherbst: jekstrand: done
14:55 karolherbst: jekstrand: maybe it would be a fun idea to force the unstructured path for vullkan and see how much breaks :D
14:55 jekstrand: karolherbst: We should do that, actually
14:55 jekstrand: It would be a very good test
14:56 karolherbst: yeah.. maybe I trigger it and see how the intel CI does :)
14:57 karolherbst: jekstrand: maybe even make it an env var option?
14:59 jenatali: jekstrand: Mind trying Marge again? Looks like freedreno had an unexpected pass, but I doubt I fixed a gles3.1 issue by increasing that array size :)
15:32 karolherbst: jekstrand: heh.. the CI was telling me after 15 minutes it's done.. I think it broke :D
15:32 karolherbst: or maybe a stupid compilation error I didn't catch
15:36 MrCooper: tpalli: nah, 'Branch cannot be merged' sometimes happens because marge-bot doesn't properly wait for the MR to be in mergeable state yet; just re-assign to Marge again and she'll merge it (unless she already moved on to another MR in her queue)
15:37 jekstrand:has an OpenCL kernel executing again \o/
15:37 jekstrand: Now to get correct results. :)
15:37 jenatali: Woo :)
15:42 jekstrand: Woo! Correct results even!
15:42 jenatali: Huzzah
15:45 karolherbst: jekstrand: and how many patches do you still need?
15:46 jekstrand: 12
15:46 jekstrand: But they're a pretty clean 12 patches
15:46 karolherbst: :)
15:46 jekstrand: And some of them are stupid-obvious
15:47 jenatali: jekstrand, karolherbst: I think we'd like to start developing the "correct" approach for implementing CL's conversions with round/sat modifiers. Right now we're handling all of this in vtn, which is definitely not right. I think options we discussed were:
15:47 jenatali: 1. Add new alu op types for all the conversion ops
15:47 jenatali: 2. Add rounding/saturation modifiers to all ALU ops
15:47 jenatali: 3. Add a whole new instruction type for conversions, to house rounding/saturation modifiers
15:47 jenatali: Thoughts?
15:47 karolherbst: I prefer 3 to be honest
15:48 karolherbst: the painful bit with alu ops is, that it's annoying to group stuff
15:48 karolherbst: so you have those siwtch cases with 16 cases before the block, beacuse you handle all int to float the same
15:48 jekstrand: cwabbott: ^^
15:48 karolherbst: new instruction type allows for a more natural code like if (cvt->dest == float) { ... }
15:49 karolherbst: at least that's what annoys me about the current solution
15:50 jekstrand: There's a part of me that wants 3.
15:50 jekstrand: There's another part of me that really doesn't want to think about plumbing NIR for a new instruction type.
15:50 karolherbst: yeah..
15:50 karolherbst: I think it will be a lot of work
15:50 karolherbst: but I think it would be worth it
15:51 jenatali: We could have a new instruction type, with lowering passes that convert it back to existing alu ops, for drivers that don't handle the new type - that way we can at least stage the transition without having to update ALL nir drivers at once?
15:52 jenatali: Only CL-supporting drivers would need to handle the new type explicitly
15:52 jekstrand: I've been burned by that approach in the past. The problem is that none of them ever switch and so we end up supporting both paths forever.
15:52 jenatali: Fair
15:54 jekstrand: On sort-of terrible middle-ground option would be to do it with an intrinsic
15:54 jekstrand: And put the source and destination type in the const indices
15:54 karolherbst: mhhh
15:54 karolherbst: and have flags for sat?
15:54 jekstrand: Along with rounding mode and sat
15:54 jenatali: One nice property of #2 btw is that some ops, e.g. uadd_sat, could be dropped and replaced by just having the modifier
15:54 karolherbst: uhm.. a flag
15:54 jekstrand: Yeah, #2 also has some advantages, TBH.
15:55 karolherbst: I think we might want to do 2 anyway
15:55 jekstrand: Some HW out there can support rounding modes on virtually everything.
15:55 jekstrand: And we can SAT basically anything
15:55 karolherbst:hides
15:55 jekstrand: We just have to be very careful
15:55 karolherbst: it just makes opt algebraic a bit annoying, but maybe we can make it work
15:55 jekstrand: Because, with integers, sat isn't a second operation like the current saturate modifier.
15:55 jekstrand: Yeah, it does fairly well make hash of opt_algebraic
15:56 jenatali: So does #3 though
15:56 karolherbst: but honestly..
15:56 jekstrand: They all do
15:56 karolherbst: now I want #2, not #3
15:56 karolherbst: :D
15:56 jenatali: True
15:56 karolherbst: I could use it for other optimizations
15:56 karolherbst: _and_
15:56 karolherbst: at some point we add float flushing as well
15:56 jekstrand: The problem with #2 is that it leaks modifiers and things everywhere.
15:56 karolherbst: and could fix pow lowering :p
15:56 jekstrand: As opposed to keeping them contained to conversions.
15:56 karolherbst: because pow lowering in nir is just busted
15:57 jekstrand: And that means opt_algebraic has to reason about them.
15:57 jekstrand: karolherbst: What's busted about it?
15:57 karolherbst: jekstrand: denormals need to be flushed to zero for the MUL
15:57 jekstrand: karolherbst: And how do modifiers help?
15:57 jenatali: jekstrand: We could add rules to nir validation to prevent it from leaking *everywhere* at least?
15:57 karolherbst: there is a piglit test failing due to that even
15:57 karolherbst: some weird corner case
15:57 jekstrand: jenatali: Possibly
15:58 jekstrand: But we have to figure out how to reason about it in nir_opt_algebraic and that isn't going to be easy
15:58 jenatali: Yeah
15:58 jenatali: For starters, you could just say if there's rounding/saturation, you can't touch it
15:58 jekstrand: We've been able to avoid it so far because the only "modifiers" we have right now are abs/neg/sat and they aren't allowed during most of the compile process.
15:58 karolherbst: so.. in codegen we have a few bits for floats
15:58 jenatali: Since again, these modifiers are probably just going to come from CL for now
15:58 karolherbst: rounding mode, denormal flushing and sat
15:58 jenatali: Well, except for f32->f16
15:59 karolherbst: jenatali: GL needs some of them :p
15:59 karolherbst: as I said, mul lowering is wrong at the moment
15:59 karolherbst: uhm
15:59 karolherbst: pow lowering
15:59 jekstrand: karolherbst: Yes, but most people don't have a "flush denorms" bit they can set on instructions.
15:59 jekstrand: karolherbst: That's a weird NV thing
15:59 jekstrand: karolherbst: AMD has a separate fmul that flushes
15:59 jekstrand: And on Intel, it's a shader-wide bit in a status register.
15:59 karolherbst: what about intel?
16:00 jenatali: DXIL just got a whole-program mode
16:00 karolherbst: ahh
16:00 karolherbst: annoying
16:00 karolherbst: the good thing about those things is, that igoring it in GL is fairly safe
16:00 karolherbst: except for sat
16:00 jekstrand: So while having a "denorm flush" bit on nir_alu_instr is the most general thing possible, it maps to, as far as I know, exactly one bit of hardware and is terrible for everyone else.
16:00 imirkin: gotta flush denorms in GL
16:00 glennk: radeons have both a global round/denorm state register and some ops that ignore that and always apply a specific mode
16:00 imirkin: otherwise you get render fail.
16:00 imirkin: in heaven and maybe others
16:01 karolherbst: jekstrand: I see :/
16:01 karolherbst: but I guess for rounding modes and sat it still might make sense for most GPUs?
16:01 jekstrand: We should probably have an fmul_flush in NIR or something
16:01 karolherbst: mhhhh
16:02 karolherbst: not sure
16:02 karolherbst: because for compute it's all different again
16:02 jekstrand: We also don't have rounding mode bits on instructions.
16:02 jekstrand: We have a state register and we have to whack it back and forth
16:02 jekstrand: It's a lot like Intel CPUs
16:02 karolherbst: ehhh
16:02 karolherbst: why...
16:02 karolherbst: ...
16:02 karolherbst: I know why, but it's even more sad
16:03 jekstrand: Why is very simple. In reality, everyone wants round-to-even except for a few cases where they're doing a float -> int and they want round-down or round-to-zero and then we flip it back.
16:03 jekstrand: Being able to set the rounding mode per-instruction is pretty useless for most programs.
16:03 jekstrand: Unless those programs are CTS tests
16:04 jekstrand: And it's extra bits in the instruction encoding that can be used for something else.
16:04 jenatali: Agreed
16:04 karolherbst: I am sure having a state for that just wastes transistors :p
16:04 karolherbst: because then you have state
16:04 karolherbst: and it's not just part of the instruction itself
16:04 jekstrand: Yes, but we don't have a lot of extra bits on the intstructions themselves to spend on this sort of thing.
16:04 karolherbst: right
16:05 jekstrand: But, also, us arguing the point here is uselsess. There are now HW engineers from either company in this channel AFAIK. :-)
16:05 karolherbst: but having an 96/128 bit based ISA is soooooo nice
16:05 karolherbst: serisouly
16:05 karolherbst: evey hw should do it :p
16:05 jekstrand: We have a 128-bit ISA
16:05 karolherbst: ufff
16:05 karolherbst: and no space?
16:05 karolherbst: how is that even possible
16:06 jekstrand: I've not looked at the encodings in a while.
16:06 jenatali: Honestly I'm leaning back towards #1 now... just encode the specific ops that can have rounding/sat modifiers, and use nir_op_infos[].is_conversion if you need to target all of them, otherwise use a lowering pass that converts them to a rounding/clamping algorithm before converting
16:06 jekstrand: We have all those register regions though and those take space
16:06 karolherbst: mhh
16:06 karolherbst: with volta it's very simple, you have some bits flipping what register "types" you have and change the second/third source from gpr to something else
16:06 karolherbst: and that's it
16:07 jekstrand: jenatali: Yeah, that should work. We can add more fields to nir_op_info to provide the extra bits you're looking for if needed.
16:07 karolherbst: and you can just reuse stuff
16:07 jenatali: jekstrand: Yeah, and then algebraic just works after the lowering pass
16:07 jekstrand: karolherbst: For us, we have regions on all the sources, a simplified region on the destination, types on everything, and register numbers. It starts to burn a lot of space.
16:07 karolherbst: yeah...
16:07 jenatali: Or if we want, we can write new algebraic rules that operate on the round/sat variations of conversions to take that into account
16:08 jekstrand: Oh, and abs/neg bits on all sources and .sat on the destination
16:08 jekstrand: And conditional modifiers
16:08 karolherbst: right.. but you still have 128 bits
16:08 karolherbst: those things take like.. 5?
16:08 jekstrand: And we use them all.....
16:08 karolherbst: ohh maybe your condition stuff is a bit weird
16:08 karolherbst: we need 6 or 7 bits for predication
16:08 jekstrand: 7 for a register number, 5 or 6 for the region, probably 4 for the type
16:09 jekstrand: SO that's 16-17 bits per source at a minimum
16:09 bl4ckb0ne: could I have a review on this please https://gitlab.freedesktop.org/mesa/waffle/-/merge_requests/77
16:09 jekstrand: Oh, and two source modifiers so 28-19
16:09 karolherbst: ohh.. you have a type per source?
16:09 jekstrand: *18
16:09 karolherbst: that's insane
16:09 jekstrand: Yeah
16:09 karolherbst: ... uff
16:10 karolherbst: we have 8 bits per register, 2 bit for mods :) done
16:10 karolherbst: 8 for the dest, 7 for predicate and 12 for the op encoding
16:10 karolherbst: op encoding contains register types, instruction type, and everything else related to layout
16:10 jekstrand: Destination is a little simpler because it only has horizontal stride.
16:11 jekstrand: But we also have 4 bits for a predicate, 2 to specify which flag, probably 4 for the conditional modifier
16:11 karolherbst: so an iadd3 needs like .. 50 bits or so
16:12 karolherbst: including carry predicate
16:12 jekstrand: Oh, we have a different encoding for 3src because they don't fit in 128 bits. :-) So they have a simplified encoding with fewer region parameters and restricted types.
16:12 karolherbst: ....
16:12 karolherbst: that all is way to complicated :o
16:12 karolherbst: maybe you should simplify the encoding a lot :D
16:12 jekstrand: Each instruction can do quite a bit
16:12 jekstrand: I'm not sure it's worth it but there are a lot of neat tricks you can pull
16:13 karolherbst: I bet
16:13 karolherbst: we can only have one non gpr source eg
16:13 karolherbst: so iadd $r0 $r1 c2[0x100] $r2 is legal
16:13 karolherbst: but two ubo accesses are not
16:13 karolherbst: also can't mix immediate + ubo
16:13 karolherbst: and so on
16:13 karolherbst: but usually it doesn't matter all that much
16:14 karolherbst: but this slims down the encoding a lot
16:14 karolherbst: with the maxwel ISa we only got 64 bits
16:14 karolherbst: was still good enough for most stuff
16:14 karolherbst: we even got a fma with a 32 bit immediate variant :)
16:14 karolherbst: in 64 bits
16:16 jekstrand: We've got a compaction scheme which reduces a lot of instructions to 64 bits
16:17 jekstrand: airlied, Kayden: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6280
16:17 jekstrand: There's my entire shamrock branch sans the final commit to add the CAPs and turn it on
16:17 jekstrand: ^^
16:19 karolherbst: that really doesn't look that bad :)
16:19 jekstrand: It's not
16:19 jekstrand: The worst part is uploading inputs
16:19 jekstrand: I really wish clover would just stuff them in cbuf0 for us
16:21 jekstrand: It doesn't handle images, of course.
16:21 jekstrand: But meh
16:21 jenatali: jekstrand: Clover doesn't handle images yet
16:21 jekstrand: hehe
16:21 jenatali: CL images are a bit crazy
16:22 jekstrand: There's at least 3 of the patches in there which are trivial "Yeah, we want this" patches
16:22 jekstrand: Even without clover
16:22 jekstrand: We'll see when Kayden wakes up. Maybe we'll land the whole lot. Maybe he'll have better ideas about handling system values.
16:23 jekstrand: Or maybe karolherbst can implement cbuf0 for us and I can just drop the last 3 :)
16:23 karolherbst: mhhh
16:23 karolherbst: not quite sure about it actually.. but maybe
16:23 karolherbst: how are uniforms handled?
16:23 jekstrand: cbuf0
16:23 karolherbst: and those still end up as push constants?
16:23 karolherbst: or a real ubo?
16:24 jekstrand: My thought was to have clover look at the MAX_INPUTS cap and, if it's 0, emulate with cbuf0
16:24 jekstrand: cbuf0 will end up as a real UBO. I don't care.
16:24 karolherbst: I am also more thinking about other hw doing something similiar
16:24 jekstrand: If that's what nouveau would like anyway, it's "good enough" for us.
16:24 karolherbst: the kernel input buffer _can_ be smaller then the constant buffer one afaik
16:24 karolherbst: I think it also matters for freedreno? dunno
16:24 jekstrand: Yeah, to accomodate Intel's push constants
16:25 jekstrand: It's pretty clear those limits are expressly designed so Intel can reasonably push the whole thing.
16:25 jekstrand: But, again, this isn't a production OpenCL driver so I don't care.
16:25 karolherbst: I don't mind the cbuf0 thing being special, but maybe we could change a few bits around and make cbuf0 special in the sense it can also be smaller
16:25 karolherbst: but still making the API simplier
16:26 jekstrand: As long as everything's at a constant offset, our pulls are pretty fast. Certainly, once you start scratching around on images etc., you won't notice a UBO pull or two.
16:26 jenatali: karolherbst: CL_​DEVICE_​MAX_​PARAMETER_​SIZE: Minimum value of 1024
16:26 karolherbst: jekstrand: do push constants allow indirect access?
16:26 jekstrand: karolherbst: Yes
16:26 karolherbst: mhhh
16:26 jekstrand: karolherbst: Which is to say that we can indirect our entire reigster file at least for reads.
16:26 jekstrand: Yet more stuff shoved into the encoding. :)
16:26 karolherbst: right..
16:26 karolherbst: we can't :)
16:26 karolherbst: soo.. mhh
16:27 karolherbst: I am also thinking at this to improve GL a bit
16:27 jekstrand: But, again, for the purposes of getting stuff up-and-running on iris, I really don't care about push
16:27 karolherbst: jekstrand: push constants are "faster" than ubos?
16:27 jekstrand: karolherbst: Yes, they're quite a bit faster.
16:27 karolherbst: so pushing uniforms as push constants would be a general benefit?
16:27 jekstrand: But we can push cbuf0 for all shader stages except compute
16:27 jekstrand: So we really don't care about the "general" benefit.
16:28 jekstrand: We've never noticed a perf problem in iris from not being able to push cbuf0 for compute
16:28 jekstrand: compute does so much other memory access that it gets dwarfed
16:28 karolherbst: ohh, you can't do it for compute?
16:28 jekstrand: Where it really matters is fragment shaders
16:28 jekstrand: We can push in compute but, due to hardware stupidity, we can't push UBOs
16:28 jekstrand: So we can push inputs if we get them as a CPU pointer
16:28 jekstrand: But I don't think it's worth caring
16:28 karolherbst: right
16:28 karolherbst: but we have that in place already :p
16:29 jekstrand: And the MR I just posted doesn't do that. It puts them in a UBO.
16:29 karolherbst: I am wondering if we could make cbuf0 special indeed and it's up to the driver to do something useful with it or not
16:29 jekstrand: It would be a pretty small patch to switch to push but I'm trying to keeep the iris delta small.
16:29 karolherbst: generally I mean
16:30 jekstrand: The way I imagine this working is that if the driver advertises it can support kernel inputs, it gets them that way. If it doesn't, clover advertises some reasonable amount (say 1024B) and emulates with cbuf0.
16:30 jekstrand: That way you still have the option to get them special and push or whatever.
16:30 jekstrand: But it reduces the barrier to entry for being a clover driver because it's less special-sauce.
16:30 karolherbst: yeah
16:30 jekstrand: Right now, clover and st/mesa use the API almost completely differently.
16:31 karolherbst: didn't we have a cap for ubo size?
16:31 jekstrand: It's a bit nuts if I'm honest.
16:31 jekstrand: We do have a cap for UBO size and I guess clover could use that.
16:31 karolherbst: it is
16:31 karolherbst: ohh it's a shader cap
16:31 karolherbst: or not?
16:31 karolherbst: I can't find it :D
16:31 jekstrand: And everything I'm seeing indicates to me that OpenGL uniforms and OpenCL inputs should be pretty much the same thing.
16:31 jekstrand: With slightly different alignment rules, most likely.
16:31 karolherbst: PIPE_COMPUTE_CAP_MAX_INPUT_SIZE is the one for the kernel input at least
16:32 jekstrand: Which means that the "right" way to pass them through is cbuf0
16:32 karolherbst: jekstrand: you saw my remark on packed structs?
16:32 jekstrand: I looked at doing the emulation but I really dont' know what I'm doing.
16:32 jekstrand: karolherbst: Yeah, I know. :-(
16:32 imirkin: fwiw the DX10-class nvidia hw likes to receive kernel inputs separately from uniforms
16:32 karolherbst: imirkin: heh? but still as constant buffers
16:32 jekstrand: imirkin: How does it want to get them?
16:33 imirkin: as far as the shader is concerned, they magically appear in shared memory
16:33 karolherbst: ohh
16:33 karolherbst: tesla
16:33 karolherbst: yes...
16:33 imirkin: they get there via pushbuf commands
16:33 karolherbst: tesla is a bit weird in this regard
16:33 jekstrand: Dumping them in shared is interesting...
16:33 karolherbst: well, it's L2 cache
16:34 imirkin: anyways, it's not like cbufs wouldn't *work*
16:34 imirkin: but they have this dedicated mechanism too
16:34 karolherbst: I guess there must be some benefit
16:34 karolherbst: how many ubos do we have on nv50 for compute? :D
16:35 imirkin: i assume 16, same as other shader stages
16:35 imirkin: but i haven't verified.
16:35 karolherbst: you mean 8 :p
16:35 imirkin: no, i mean 16
16:35 karolherbst: what?
16:35 karolherbst: why does nv50 have 16?
16:35 karolherbst: unfair
16:35 imirkin: fermi also has 16, no?
16:35 karolherbst: no, 8 :p
16:35 imirkin: it's just kepler+ that's weird...
16:35 karolherbst: really?
16:35 karolherbst: ohh
16:35 imirkin: pretty sure.
16:35 karolherbst: maybe fermi had 16 then :p
16:35 imirkin: i could be wrong.
16:36 jekstrand: Inconceivable!
16:36 karolherbst: sounds odd to remove them though
16:36 karolherbst: I am sure they had good reasons for it though :p
16:36 imirkin: laziness?
16:36 imirkin: better use of gates?
16:36 imirkin: it's weird coz they alias
16:36 karolherbst: yeah
16:36 imirkin: i.e. the high half of the 16 == the low half
16:37 karolherbst: or maybe the other 8 are used as caches?
16:37 imirkin: why would you do that? dunno. ran out of bits i guess.
16:37 karolherbst: ldg _has_ a const modifier
16:37 karolherbst: to declare the memory being constant
16:37 jenatali: So, not to derail, but I'd like to at least get some amount of consensus so we don't do a ton of work and then have to redesign it after we change our minds - thoughts about just adding alu opcodes for the crazy CL round/sat conversions?
16:53 jekstrand: jenatali: Adding a pile of opcodes and using is_conversion seems like the least disruptive thing.
16:53 jekstrand: jenatali: I don't know if it's the best long-term solution though.
16:53 jekstrand: But it's at least the least disruptive solution short-term.
16:54 jenatali: Yeah, the biggest downside I see is having to implement all the constant folding expressions
16:54 jekstrand: And the types/rounding -> conversion_op stuff is codegen so it should be easy enough to add a million ops
16:54 jekstrand: I think a bunch of that could be generalized.
16:54 jekstrand: And we should constant-fold regardless of which solution we pick.
16:55 jenatali: Well, it's easy enough to generate lowering passes which do the round/sat implementation, and running the lowered ops on a constant would naturally fold
16:56 jekstrand: True
16:56 jenatali: But putting them as native alu opcodes means they have to immediately have a fold expression
16:56 jekstrand: Sure but we can codegen the expression
16:57 jenatali: So, we have to implement the rounding algorithm twice :D once in nir, and once in C
16:57 jekstrand: So while it'll be a bit painful, it should be tractable.
16:57 jekstrand: Yeah.....
16:57 jekstrand: That is the rub
16:57 jenatali: Yep
16:58 jenatali: Unless there was some way to opt out of constant folding, and require a lowering pass to be able to fold them
16:58 jekstrand: But any solution where we actually put them into NIR as opcodes rather than lowering in vtn would, I think, have that problem.
16:58 jekstrand: Because we don't want drivers that choose to support most of them in their back-end not get constant folding. That would just be mean.
16:59 jenatali: Sure, we could just add an option to the lowering pass to only run it on constant sources though
16:59 jekstrand: I suppose.....
16:59 jekstrand: How bad is the rounding, though?
16:59 jenatali: Ehhh for float -> float or int -> float it's a bit rough
17:00 jenatali: float -> float we have an algorithm which does a round trip, with nextafter to round up/down
17:01 jenatali: For int -> float, I just wrote something yesterday to do the rounding on arbitrary bit-widths, e.g. 32bit int -> 24bit int so you can round the mantissa before float conversion
17:01 jenatali: Super fun
17:01 jekstrand: int -> float is mostly a matter of figuring out how many bits you loose off the bottom of the int and then doing the right thing with them.
17:01 jenatali: Yeah
17:02 jekstrand: float -> int is fairly straightforward
17:02 jekstrand: float -> float is annoying
17:02 jenatali: Yep, floor/ceil/trunc
17:16 karolherbst: ehhh
17:16 karolherbst: why is the intel CI result not showing up? annoying
17:17 karolherbst: jekstrand: can you check what went wrong with my build 154?
17:18 jekstrand: build failed
17:19 jekstrand: 08:15:19 /home/jenkins/workspace/Leeroy_3/repos/mesa/build_mesa_m64/../src/gallium/state_trackers/xvmc/context.c:71: undefined reference to `XvQueryAdaptors'
17:19 jekstrand: That looks like not your problem
17:19 jekstrand: craftyguy ^^
17:20 ajax: /home/jenkinds/workspace/Leeroy_3
17:21 ajax: luls
17:21 jekstrand: Yes, our master is named Leeroy
17:21 jekstrand: As are the builders, aparently
17:21 jekstrand: Or something
17:26 craftyguy: I'm looking into that failure, it's not seen building mesa master
17:27 karolherbst: strange
17:27 craftyguy: https://bpa.st/5D7A
17:28 karolherbst: not even toughing any of this.. mhhh
17:29 karolherbst: uhm.. wait
17:29 karolherbst: I mess up my push
17:29 karolherbst: *messed
17:29 karolherbst: yeah.. I pushed the wrong commit
17:29 karolherbst: thanks for looking into it though
17:29 karolherbst:pushed a state from 2018 or so
17:29 craftyguy: that branch looks ooooold
17:30 karolherbst: yeah...
17:30 karolherbst: I didn't touch the jenkins branch for a _very_ long time locally
17:30 jekstrand: 2018 was a long time ago
17:30 karolherbst: I just push explicitly into it normally
17:30 karolherbst: "git coup karolherbst nouveau_nir_spirv_opencl_unstructured:jenkins"
17:30 karolherbst: I guess I messed it up today :)
17:30 craftyguy: heh. glad this was a simple one to figure out :D
17:31 craftyguy: ajax: 'leeroy' is the name of the jobs that run in jenkins that run tests, build mesa, etc :P
17:32 danvet: jekstrand, I just read code from the naughties, what are you talking about :-)
17:33 karolherbst: craftyguy: I assume you also use jenkins as the CI software :p
17:37 craftyguy: karolherbst: yep :)
18:09 jekstrand: karolherbst: for/karolherbst in my tree
18:09 jekstrand: karolherbst: I reworked switch handling some more
18:10 jekstrand: karolherbst: Among other things, I got rid of the one remaing hash table foreach
18:10 jekstrand: With that, I'm really happy with it
18:10 jekstrand: vtn_emit_cf_func_unstructured almost fits on my screen without scrolling
18:10 karolherbst: heh :D
18:11 imirkin: jekstrand's home setup: https://edtechmagazine.com/higher/sites/edtechmagazine.com.higher/files/styles/cdw_hero/public/articles/%5Bcdw_tech_site%3Afield_site_shortname%5D/201808/Untitled-1_3.png.jpg?itok=7VnI7TRo
18:12 karolherbst: jekstrand: mhh.. I kind of wanted to get around having to allocate a vtn_case object though if it's not needed, but I guess it doens't hurt?
18:14 karolherbst: yeah... maybe it's fine like that :)
18:17 jekstrand: karolherbst: We heed half the data in it anyway
18:17 jekstrand: And you're rallocing a util_dynarray now
18:17 jekstrand: It's like 32 more bytes
18:18 jekstrand: And this lets us use a list of cases which isn't going to have any problems with not being in a deterministic order
18:20 karolherbst: right
18:21 jekstrand: karolherbst: If you're alright with those two and I didn't mess up and break the universe, feel free to sqash them and add my R-B to the patches they're squashed into.
18:22 karolherbst: yeah.. will do
18:22 jekstrand: Now to actually review the structurizer... Oof...
18:25 imirkin: just assume it works until proven otherwise?
18:31 jekstrand: karolherbst: Mind if I re-order a few things?
18:32 jekstrand: karolherbst: Nah, I'll just give you more squash-ins
18:32 karolherbst: :D
18:44 jekstrand: I gave up and just did the re-order
18:44 jekstrand: give me a minute and I'll push it to my for/karolherbst branch
18:46 jpark: Does Mesa still support Windows XP? I'm looking at include/c11/threads_win32.h, and I wish it was using a real condition variable.
18:47 imirkin: even if it does, i'd say it's safe to drop
18:48 jpark: I'd just like to make sure before creating a merge request.
18:50 imirkin: esp if there are tangible benefits to dropping, i'd say it's quite reasonable
18:50 imirkin: but i'm also not one of the windows users...
18:51 ajax: wow. vista EOL'd in 2011.
18:51 Lyude: airlied, skeggsb: do you think it would be possible to maybe start including the nouveau repo airlied usually pulls from in drm-tip?
18:52 imirkin: the branch airlied pulls from only starts existing about 1hr before airlied pulls from it...
18:52 ajax: excuse me. win7 EOL'd in 2011. vista only in 2017. and xp in 2008.
18:52 jpark: I also don't like that it null checks the parameters, but an optional build macro to remove might be overkill. It's more grossing me out than me knowing that it's a perf drain.
18:52 imirkin: ajax: win7 could not have been EOL'd in 2011. it was just killed off recently...
18:52 Lyude: yeah i'm aware, I'm mostly asking though because it'd make working on stuff that goes across multiple drivers (mainly DP stuff) a lot easier
18:53 imirkin: Lyude: yeah, i'd be in support of having a "do it like everyone else does" approach to nouveau kernel development
18:53 ajax: imirkin: i'm just going by what wiki says for "latest release"
18:53 imirkin: but i've learned to live with the current situation
18:53 imirkin: ajax: ah yeah. maybe latest release. but EOL means no more updates/etc
18:53 ajax: but yeah, i guess that's not the same thing.
18:53 imirkin: they stopped doing SP's like they used to
18:54 imirkin: win7 was EOL'd Jan 14, 2020
18:54 imirkin: (and tons of companies have extended support plans, i assume)
18:55 Lyude: other thing is too it'd be really nice if I could just push stuff to whatever upstream nouveau repo we use so we don't need to wait until the next nouveau pull for patches that rely on other r-b'd patches to apply cleanly when I send them to the ml
18:56 jpark: even Blizzard dropped XP/Vista support in WoW 3 years ago, so I feel like it should be safe to drop XP.
18:57 ajax: jpark: i suspect the primary consumer of mesa on windows is vmware's guest driver, so (if we can't just ask the vmware peeps about it) a reasonable proxy might be what guest OSes they still support in their guest driver
18:57 jenatali: I'm honestly surprised it's been supported this long
18:57 Lyude: like, right now i've got a couple of runpm fixes that aren't in master yet, and I've got a bunch of displayport/general HPD related work that doesn't apply cleanly without those fixes, but said fixes also pull a bunch of code out of i915 and into dp helpers so I can't just submit the patches to the ml without CI failing (I can use trybot if I just want to see if they'll pass intel's CI or not,
18:57 Lyude: but it'd be nice to have the patches just apply by default when I send them out)
18:58 ajax: not that that'd be much fun to research...
18:59 Lyude: ...I'd also probably be willing to help with maintaining and such, seeing as the nouveau ML isn't really that busy for the most part
18:59 airlied: Lyude: get acks and rbs from skeggsb and smash into misc next
18:59 airlied: esp if there are cross driver deps
19:00 airlied: then myself and skeggsb get to sort out the merge later if needed
19:02 alyssa: I'm almost afraid to ask but - any way to detect the active display driver in a kmsro setup?
19:02 alyssa: Need to workaround a kernel bug in the rockchip display driver. Easy workaround but would break !rockchip setups
19:03 Lyude: airlied: alright-I'm mainly asking though because I think we already actually have some of those runpm fixes upstream, but they're in https://github.com/skeggsb/nouveau/commits/master which isn't present in drm-tip, so i'd have to include those runpm fixes in my already-pretty-large DP related series for it to apply and get run by CI, or just resend the patches and pull them out of that branch
19:03 Lyude: if that's even possible.
19:03 Lyude: would just be kinda nice if this was a bit more like developing on other drivers :)
19:04 jpark: ajax: It looks like current VMware VMs still support Windows 2000 as guest, but I don't know if that ties to Mesa.
19:05 ajax: jpark: would be a question of if that support includes an accelerated gl driver (based on sufficiently recent mesa), i suppose?
19:05 jenatali: My hunch is that it probably does
19:05 jpark: ajax: is this the driver? it requires DX11 (Windows 7): https://docs.mesa3d.org/vmware-guest.html
19:05 airlied: Lyude: are the runpm fixes necessay for the dp stuff?
19:06 airlied: or just for testing?
19:06 ajax: jpark: that's the driver for a linux guest, i think
19:06 Lyude: airlied: i mean, if y'all are fine with a merge conflict otherwise
19:07 Lyude: since both series modify the same bit of code in nouveau_connector_hotplug() and nouveau_display_acpi_ntfy()
19:07 airlied: Lyude: merge conflicts are fine, getting the same patch through two trees isnt
19:08 airlied: merge conflicts are normal in kernel dev
19:08 Lyude: alright
19:08 jpark: ajax: the page says the host OS needs to support DX11 (WIndows).
19:08 Lyude: I will just go that route then, as long as drm-misc-next gets merged after nouveau gets pulled that should be fine
19:10 ajax: jpark: right, but your question (afaict) was about building mesa for windows. which, for the linux driver, you wouldn't be doing; you'd be building mesa for linux, and the dx11 requirement would be of the host OS.
19:11 karolherbst: craftyguy: it seems to take much longer until results are posted on the public site, can that be speeded up a little or isn't that that easy to do?
19:11 ajax: to be clear, vmware's _windows guest_ driver isn't in mesa proper, it's in their internal mesa fork
19:14 jpark: ajax: i see. i could just do it, and their XP build should fail pretty loud. then they can just undo the change locally.
19:14 jpark: assuming they have one
19:14 craftyguy: karolherbst: yeah that's a known issue, unfortunately. it seems that the IO on the public server isn't enough to keep up with the large amount of data in CI results. I've been trying to get that system replaced with a much better one, but that's not an easy process to navigate @ my employer
19:14 karolherbst: right...
19:15 craftyguy: I had an unofficial mirror up and running that was much faster (but no https and no domain name url), but it seems to be down right now
19:33 mattst88: craftyguy: if only we knew of a hardware company that made lots of money on server CPUs!
19:35 karolherbst: jekstrand: do you still want to push something?
19:36 jekstrand: karolherbst: Yeah, I'm playing around
19:37 jekstrand: karolherbst: I've made pretty good hash of the first two patches and will be adding a couple. :-)
19:37 karolherbst: okay :D
19:37 jekstrand: karolherbst: Trying to assert the crap out of unstructured NIR
19:37 airlied: craftyguy: use amazon graviton2 for a while, send in the bill, use that to justify getting a server :-P
19:37 karolherbst: sounds reasonable
19:41 craftyguy: airlied: lol
19:42 craftyguy: mattst88: if only..
19:42 craftyguy: though this time I have a fast server, but the red tape around getting dns updated + tls certs is the problem. there's a path forward, it's just long and tortuous
19:43 karolherbst: craftyguy: letsencrypt?
19:43 karolherbst: + DNS api
19:43 craftyguy: karolherbst: yeah that's what I use extensively for my personal stuff, but my employer doesn't allow it
19:44 craftyguy: no idea why. and asking might cause me more trouble. sigh
19:44 karolherbst: ...
19:44 karolherbst: sounds painful to work there
19:49 jekstrand: karolherbst: Just pushed my for/karolherbst branch
19:49 jekstrand: karolherbst: I've squashed my stuff into the last two patches and added R-B tags.
19:49 jekstrand: karolherbst: I've modified the first two NIR patches and added my RB. You probably want to re-read
19:50 jekstrand: karolherbst: I also added a NIR patch of my own
19:51 jekstrand: karolherbst: Time to read the actual structurizer pass :)
19:51 jekstrand: karolherbst: I did tweak the structurizer pass just a tiny bit but it's not substantial.
19:51 jekstrand: Mostly just to deal with "structured" being per-impl rather than per-shader.
20:01 karolherbst: jekstrand: mhh, did you push for real?
20:04 jekstrand: karolherbst: Didn't add --force
20:04 jekstrand: it's pushed now
20:06 airlied:overuses + with git pushes
20:06 imirkin: what does + do?
20:06 airlied: it's like force but simpler
20:06 airlied: git push airlied +master
20:06 imirkin: nice
20:06 airlied: though I normally do +HEAD:wherver
20:07 karolherbst: airlied: ehh.. just add an git alias :p
20:07 jekstrand: karolherbst: And now is the point at which I start to wonder how lower_goto_ifs isn't segfaulting everywhere.... :-(
20:07 karolherbst: :)
20:11 jekstrand: karolherbst: Who wrote this pass? I'm not familiar with the name.
20:12 karolherbst: yeah.. don't know that person either. just appeared out of nowhere and opened an MR against my fork
20:12 karolherbst: jekstrand: https://gitlab.freedesktop.org/karolherbst/mesa/-/merge_requests/1
20:12 airlied: then disappeared :-P
20:12 airlied: I thought it was jekstrand in disguise
20:12 karolherbst: probably because I pestered about documentation too much
20:13 jekstrand: Wasn't me
20:13 karolherbst: yeah.. looking at the code it sounds more of a person with reserach background than programming
20:14 jekstrand: yeah
20:16 karolherbst: I still think it was very fortunate that somebody like that came around just doing the stuff for us, that was a pleasent surprise because I already saw me diggin into papers and shit
20:17 jekstrand: Yeah, I'll happily take it.
20:17 jekstrand: If I can figure out how it works. :-)
20:17 jekstrand: I have a feeling I've read the paper at some time or another. It seems somewhat familiar.
20:18 karolherbst: well, the MR was opened 6 months ago
20:18 jekstrand: I'm guessing it was someone's master's project :)
20:19 karolherbst: could be
20:20 sravn: Lyude: code like this "for_each_old_connector_in_state(state, connector, connector_state,<line-break><Nx tab>i) {" is hardly readable. We have increased acceptable line length to 100 chars recently so maybe utilize this?
20:20 sravn: Lyude: Spotted in "drm/nouveau/kms: Search for encoders' connectors properly"
20:21 airlied: karolherbst: does it link to the paper?
20:22 jekstrand: airlied: No, not that I can see. :-/
20:22 airlied: oh he does link to a html explaination in the MR
20:23 airlied: might be useful for review
20:23 karolherbst: yeah
20:23 jekstrand: airlied: Thanks for pointing that out!
20:24 karolherbst: fun
20:34 Vanfanel: Hi! Currently I am passing gbm_bo pointers between functions, but I guess it's best to pass gbm bo handles. However, is there a function to get a GBM BO from it's handle? I have been looking at gbm.h without much luck.
20:35 Lyude: sravn: oooh, wasn't aware of that actually! yeah-I can totally fix that up in the next respin
20:36 Lyude:fixes her .lvimrc...
20:36 Kayden: jekstrand: looking at your set_global_binding patch, and ... it returns the bottom 32 bits of the 48-bit address as the handle?
20:36 jekstrand: Kayden: It should return all 48 bits
20:37 karolherbst: jekstrand: does it requre !6064 then?
20:37 karolherbst: Kayden: for context: 6064
20:37 karolherbst: ...
20:37 karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6064
20:37 karolherbst: I assume jekstrand patches might be based on that being there
20:38 jekstrand: karolherbst: No, I fill out both uint32_ts
20:38 jekstrand: I thought
20:38 karolherbst: both?
20:38 jekstrand: Yeah, without that patch, clover works just fine, it's just an uint32_t[2]
20:39 karolherbst: mhhhh
20:39 jekstrand: I'm happy for it to be a uint64_t
20:39 jekstrand: But I've got it working without that
20:39 karolherbst: you know what is stupid about the handles pointer?
20:39 karolherbst: or rather the array of pointers?
20:39 jekstrand: what?
20:40 karolherbst: it points into the kernel input buffer directly
20:40 karolherbst: now imagine devices doing 32 bit shit because they can't do 64 bit
20:40 karolherbst: and advertise 32 bit addresses
20:41 karolherbst: hence my MR to make it.. more sane
20:42 karolherbst: jekstrand: also, handles contains offsets you don't want to overwrite
20:42 karolherbst: no idea where it matters
20:42 karolherbst: but clover can set offsets
20:42 jekstrand: karolherbst: I think I take the offsets into account
20:43 jekstrand: Maybe not
20:43 karolherbst: yeah...
20:43 karolherbst: I somehow don't like the interface anyway :/
20:43 jekstrand: Oh, that's totally fine. Like I said, I'm happy to see it change.
20:43 karolherbst: it feels like people tried to be smart, but now it's an annoying to get right interface
20:43 jekstrand: But I did get it working
20:44 karolherbst: no idea how I want it to look like though
20:46 jekstrand: I've got a bug somewhere..... It doesn't cache my shaders properly :-(
20:46 jekstrand: Fails with the cache enabled
20:47 karolherbst: ehh :/
20:50 jekstrand: My MEDIA_INTERFACE_DESCRIPTOR_LOAD is bogus....
20:51 karolherbst: jekstrand: the results are in... it's.... better than I thought it would be
20:51 karolherbst: but something is still up
20:51 karolherbst: https://mesa-ci.01.org/karolherbst/builds/155/group/63a9f0ea7bb98050796b649e85481845
20:51 Vanfanel: emersion: is there any way to get a GBM BO from it's handle? I don't think passing a GBM BO ptr between functions is a good idea...
20:52 karolherbst: ehh. seems like it aborted midthrough..
20:52 karolherbst: oh well
20:52 karolherbst: let's see
20:56 karolherbst: jekstrand: ... ehh.... guess what?
20:56 jekstrand: karolherbst: what?
20:57 karolherbst: huh.. something is funky
20:58 karolherbst: I am sure some spirv CFG ops are causing assert
20:58 karolherbst: but mhh
20:58 karolherbst: something is broken locally here anyway
20:59 karolherbst: mhhh
21:00 karolherbst: it works locally
21:00 karolherbst: maybe memory corruptions? mhh
21:01 jekstrand: I think I found my caching bug
21:01 jekstrand: maybe
21:01 jekstrand: I found something naywya
21:01 jekstrand: *anyway
21:01 karolherbst: ohh.. maybe this breaks CI?
21:03 karolherbst: ahh GL46 spirv.. that's probably easy to reproduce and fix
21:05 Kayden: jekstrand: uh, you're copying 64-bits into the uint32_t[] array at handles[i], but then incrementing i+1 and overwriting it
21:06 jekstrand: Kayden: No, it's an array of arrays
21:06 Kayden: oh, crud, right
21:06 Kayden: yeah, this works, yeesh
21:08 karolherbst: the heck
21:08 karolherbst: it works locally
21:11 karolherbst:tries again
21:15 jekstrand: Kayden: Just fixed my caching bug. It required a couple more patches. MR updated.
21:17 jekstrand: Now that I have a repeatable setup, time to keep hacking on structurizing
21:20 anholt: nir to tgsi is now passing more tests on softpipe than master, and has higher fps in glmark2.
21:20 airlied: anholt: win
21:21 ajax: niiiice
21:22 jekstrand: glmark2: Also, not really a benchmark.
21:22 airlied: for softpipe if it runs its a benchmrk
21:22 jekstrand: hehe
21:22 anholt: seriously, tgsi_exec.c is so very bad.
21:23 jekstrand: nir_exec.c :)
21:23 anholt: jekstrand: I have literally dreamed about writing that
21:23 jekstrand: hehe
21:23 jekstrand: Honestly, with the constant folding logic in place, it might not be terrible.
21:23 imirkin: i think it was more designed to be easy to write than fast to run :)
21:24 imirkin: but it fails at even that
21:24 imirkin: since it tries to maintain its vector-ness
21:24 imirkin: (i guess it has to for derivatives and such)
21:25 karolherbst: the goal to vectorize is to not vectorize :p
21:25 anholt: jekstrand: basically my motivation for nir_exec would be to get sp perf higher than classic swrast perf so we can make progress on deleting that.
21:25 imirkin: anholt: maybe make a JIT? :) based on translate. heh.
21:26 anholt: imirkin: nope, if you want a jit use llvmpipe
21:26 jekstrand: I thought that was called LLVMpipe :P
21:26 jekstrand: too slow
21:26 imirkin: a simpler jit :p
21:26 anholt: this is the "if your platform is so bad you don't have llvmpipe" which was the objection to me deleting swrast
21:26 jekstrand: ksim!
21:26 karolherbst: maybe we should just do the same as with radv and come up with our own stuff which compiles faster and runs faster than llvmpipe :p
21:26 jekstrand: Having something that doesn't involve a JIT does have advantages.
21:26 imirkin: the translate stuff was nice for vertex transformations, but yeah, not really usable for generic i think
21:27 ajax: i'm willing to wager there's an efficient bytecode representation of nir that would be faster than the existing softpipe
21:27 jekstrand: Why bytecode?
21:27 jekstrand: Just walk the linked lists
21:27 ajax: linked lists are shit for your d$
21:27 jekstrand: That's true
21:27 ajax: like the actual worst thing you could do
21:28 karolherbst: just execute the serialized nir :p
21:28 anholt: ajax: I think you could massively win with merely "bake a table at shader creation of the slightly-specialized (bit size, maybe ssa-only vs regs) interpreter functions to call per block.
21:30 karolherbst: jekstrand: why are you asking a question and you get an answer? :D
21:30 ajax: anholt: emit those s13s as static functions in a single CU and computed-goto your way to success.
21:30 ajax: (s(pecialization)s assuming i counted right)
21:30 anholt: ajax: yep. but again, the bar is so very low I don't know if I even need computed goto
21:31 airlied: anholt: softpipe did have some sse2 support at one time, but hey if you have sse2 just use llvmpipe
21:33 Kayden: personally, I think it would be cool if someone hooked up a nir exec based on https://developers.redhat.com/blog/2020/01/20/mir-a-lightweight-jit-compiler-project/
21:33 Kayden: it would probably not be all that bad to do
21:34 ajax: hah. mesa could finally get mir support.
21:34 jekstrand: You know.... You could compile the NIR down to a table of "ops" where one possible op is "execute this actual nir_instr" and then you have speicaized versions for most ALU and a few intrinsics like load/store_input/output.
21:34 Kayden: it looks like a good fit with GL's 'compile shaders on the fly' needs
21:34 jekstrand: Where most of of it is just "exec a function pointer to evalue ALU"
21:34 Kayden: and seems to generate decent quality code
21:34 jekstrand: That might be pretty fast
21:34 Kayden: while also being massively smaller than llvm
21:35 karolherbst: being faster with CPU code than llvm would be fun indeed :p
21:36 ajax: i would be entirely happy with not linking llvm into my GL drivers, yes.
21:36 jekstrand: ajax: If you want that, you're going to have a long talk with the AMD folks....
21:36 karolherbst: I thought aco is already faster and faster?
21:37 imirkin: i don't think there are any current plans to make use of aco in radeonsi though
21:37 mareko: not really... feel free to add ACO support into radeonsi, but it'll be quite a project
21:37 karolherbst: ohh I see
21:38 anholt: Kayden: my fantasy llvm replacement is cranelift :)
21:38 karolherbst: but why would it be much work? it's just consuming the nir and be done with it?
21:38 karolherbst: but I guess long term stability and stuff
21:38 jekstrand: karolherbst: The AMD drivers are more closely tied to the back-end than some others.
21:38 imirkin: mareko: would the "difficult" bits be all the places where radeonsi generates "custom" shaders, like merged stuff, primitive culling, etc?
21:38 jekstrand: They don't lower as much in NIR as, say, Intel does
21:38 karolherbst: mhhh
21:38 mareko: the hard stuff is the code linker
21:38 karolherbst: right.. guess that's done in llvm
21:40 mareko: the code linker is implemented both at the binary level (shader binary concatenation) and LLVM level (each part is a function and you inline them into one big function)
21:40 Kayden: anholt: huh, also looks neat
21:41 anholt: Kayden: if it's going to be fast enough at both compile and execute for mozilla's js, it's probably fast enough for me.
21:41 Kayden: anholt: Yeah :)
21:41 danvet: Lyude, airlied I guess another reason why mlankhorst needs to do the backmerge to drm-misc-next asap
21:41 danvet: mripard seems on vacation I think
21:41 jekstrand: anholt: Just transcode NIR to JS and bake spidermonkey (or whatever the latest one is) into the driver. :-P
21:42 danvet: Lyude, airlied for the nouveau stuff into -misc-next I mean
21:42 anholt: jekstrand: something something wasm
21:43 jekstrand: Just make sure you minify the JS or it might run slow. 🤣
21:43 glennk: gotta exercise those fp64 ops somehow...
21:43 imirkin: (in case some people don't know .. the only numbers natively representable in JS are 64-bit floats.)
21:43 jekstrand: fp64 for free!
21:44 imirkin: it's esp great when you need to compare large integers, and all of a sudden random things become equal
21:44 karolherbst: imirkin: sounds like a JS JIT lowers precision to fp32 if it thinks it's safe :p
21:44 imirkin: karolherbst: no, like very large integers.
21:44 imirkin: ones > 2^53
21:44 karolherbst: ahh...
21:44 karolherbst: fun
21:45 imirkin: yeah. long debugging session, that one.
21:47 karolherbst: mhh
21:47 karolherbst: python gets this right
21:47 imirkin: lol no, py3 messed it up pretty bad
21:47 imirkin: along with many things
21:48 karolherbst: at least "2**100 + 2**10 == 2**100" is false in python but true in javascript :p
21:50 imirkin: karolherbst: right. there's just automatic type messings-about in py3 which i think are wrong
21:50 imirkin: whereas JS is just ... highly annoying.
21:50 jekstrand: This is why I don't like languages that mess about with types at all
21:50 ajax: bring back B
21:50 anholt: and now for the obligatory link to https://www.destroyallsoftware.com/talks/wat
21:51 ajax: you get exactly one data type: the machine word
21:51 jekstrand: I guess well-defined automatic casts are ok but I tend to get very annoyed by the rules.
21:51 ajax: go ahead, dereference that integer
21:52 jekstrand:wants to program in Rust
21:52 jekstrand: Where integer overflow automatically assert-fails in debug builds
21:52 FreeFull: And you know your references are always valid
21:53 jekstrand: Except that your replace most pointers with array indices that can just as easily be out-of-bounds.
21:54 FreeFull: Well, array indexing gets bounds checked
21:54 jekstrand: Sure but my point is that yes, your references are all valid, but you end up with other invalid stuff.
21:54 jekstrand: In my brief bit of rust programming, I didn't find that the compiler actually fixed that many of that sort of bug.
21:54 jekstrand: I still had bad pointers, they were just OOB array indices instead.
21:55 jekstrand: Don't get me wrong. THere's still a lot to like about the language
21:55 jekstrand: But the "you'll never have an invalid reference" promise falls a bit flat, IMO.
21:55 anholt: working on parallel-deqp-runner especially made me wish I was writing rust.
21:56 jekstrand: They have a constness model which might actually mean something unlike C and C++'s differently busted models.
21:57 ajax: i really just want C minus the sharp edges
21:58 imirkin: jekstrand: JS doesn't mess with types. it just doesn't have an integer type :)
21:58 Sachiel: Comic Sans C
21:58 ajax: find all the UB and uninitialized variable crap and define something sensible
21:58 imirkin: ajax: you may be interested in Go
21:59 karolherbst: imirkin: well.. given the amount of builtin types JS has, it does mess around with them heavily
21:59 ajax: Sachiel: if i ever did try to make that language "C Sans" is a good name
21:59 ajax: as in, C, without (the bullshit)
21:59 karolherbst: so like C11 = C89?
21:59 karolherbst: *C11 - C89
22:00 karolherbst:considers everything before C99 just broken C
22:00 imirkin: ajax wants C77 back.
22:00 imirkin: heh
22:00 ajax: the hard bit is fixing integer promotion without making you want to murder things, i think
22:00 karolherbst: C11 is the first one actually making it pleasent to write C :p
22:01 karolherbst:still thinks we should just flip to C11 for mesa
22:01 karolherbst: not doing at least C11 is a mistake
22:01 ajax: does The Compiler Which Shall Not Be Named support C11 yet?
22:02 karolherbst: I don't care about broken compilers
22:02 ajax: (not that you're wrong, just there's practical concerns sometimes)
22:02 jekstrand: No, it doesn't, sadly. :-(
22:02 karolherbst: now that MS even uses mesa, maybe it would be a big enough motivation to just fix it
22:02 karolherbst: :D
22:02 FreeFull: jekstrand: Most rust code I've written doesn't actually do much in the way of indexing
22:02 ccr: hah :D
22:02 karolherbst: hey.. let's flip over to C11 :D
22:02 FreeFull: But I can see it coming into play if you are dealing with graphs
22:02 karolherbst: and see what happens
22:03 FreeFull: At least indexes out of bounds aren't exploitable, beyond a denial of service
22:03 karolherbst: jenatali: I assume you just ack it, right?
22:03 jekstrand: FreeFull: I was trying to write a compiler.
22:03 ajax: FreeFull: oob _writes_ are absolutely exploitable.
22:04 ajax: reads... can be. or at least can make other exploits feasible.
22:05 karolherbst: if there is one thing we've learned form intel the past years, it's that everything is exploitable if you are just creative enough
22:06 jenatali: karolherbst: Unfortunately I don't have any sway over the MSVC folks
22:07 jekstrand: I don't know if there are any compelling reasons today why you can't just compile w/ clang
22:07 jenatali: Reasons? Sure. Compelling... eh
22:08 ajax: clang ain't without its issues
22:09 FreeFull: ajax: I was talking in the context of Rust
22:09 jekstrand: Oh, sure. But I think at one point in time there were actual reasons having to do with which libc you were using or something like that why things that looked like drivers had to be compiled with MSVC.
22:09 jenatali: Actually, I do have a compelling reason. As part of ARM64 support, we've got a crazy special binary format (that clang will probably never emit) which is ARM64 code for x86 processes (e.g. 32bit pointers, x86 calling conventions, etc)
22:09 jenatali: We haven't yet compiled Mesa into that format, but I wouldn't be surprised if we do someday
22:09 jekstrand: jenatali: Ugh
22:10 jekstrand: Pardon me while I commit some C11 code to make that impossible.....
22:11 FreeFull: jenatali: I bet once Linux gets running on Apple's new ARM Macs, that might become relevant
22:11 karolherbst: :D
22:11 karolherbst: FreeFull: ahh, never then
22:11 FreeFull: =P
22:11 karolherbst:gave on on linux on apple hw entirely
22:11 karolherbst: I am sure they don't even allow you to run anything at all
22:12 karolherbst: :p
22:12 FreeFull: Maybe the Apple hardware will inspire other manufacturers
22:12 FreeFull: We'll see
22:12 jenatali: FreeFull: FYI I was talking about Windows, not Linux
22:13 glennk: so apple m68k calling conventions, translated to ppc, then x86, now arm...
22:13 dcbaker[m]: jekstrand: vs2019 appears to support c11...
22:13 FreeFull: I wonder if ARM Windows will run on the Apple hardware at all
22:13 jekstrand: dcbaker[m]: Oh, really? Full C11 or just C11 when using C++17?
22:14 FreeFull: jenatali: Yeah, I should have caught that
22:14 jekstrand: Because MSVC does implement all the C stuff in the context of C++ compatibility.
22:14 dcbaker[m]: They have a /STD:c11 option
22:14 jekstrand: But if you compile a C program, you get most of C99 last I knew
22:14 jekstrand: Ok, then maybe it works and we can finally move forward another C version.
22:15 jenatali: dcbaker[m]: Last I saw was this: https://developercommunity.visualstudio.com/idea/387315/add-c11-support.html
22:15 jenatali: Most recent MSFT comment in July saying it's on the roadmap
22:16 karolherbst: I think it's just not complete
22:16 karolherbst: and only supports the bits c++ requires
22:16 dcbaker[m]: jenatali: this is all I know https://github.com/mesonbuild/meson/issues/7554
22:16 jenatali: Huh, cool. Sounds like it's in development, but I'm surprised it's so hard to find any details about it
22:25 HdkR: jenatali: I have zero reason to believe that Linux on ARM64 Macbooks will be a thing. Then even more questionable will be OSS drivers for Apple's GPU
22:27 jenatali: Not sure why you're telling me :) my comments about ARM64 were all about Windows on ARM64
22:27 jenatali: Though, I honestly wouldn't be surprised to see it at happen at some point. The Linux community is certainly persistent and talented
22:27 HdkR: er yes, I derped thinking about Linux as well. Same thing problem space :D
22:27 karolherbst: well, first you need to find an exploit to the bootloader anyway :p
22:27 HdkR: Apple supporting Bootcamp on it? nah
22:28 karolherbst: HdkR: give it time :P
22:28 jekstrand: karolherbst: Sure that hasn't happened yet?
22:28 karolherbst: jekstrand: are the devices even out yet?
22:28 jenatali: I mean, we ship ARM64 devices too - I really wasn't talking about the new ARM Macs at all :P
22:28 jekstrand: karolherbst: You can buy the devkit today
22:28 karolherbst: ohh, right
22:28 HdkR: jenatali: Is it a device I own? :P
22:28 karolherbst: "buy" wasn't it more of a rent/loan?
22:28 jenatali: HdkR: Doubtful :)
22:29 jekstrand: karolherbst: Not sure. I have seen reports of people having them in the wild though.
22:29 jekstrand: Someone commented on the Khronos Slack about MoltenVK working fine on it.
22:29 karolherbst: jekstrand: it's also ridiculous expensive
22:29 HdkR: karolherbst: Surely there have been enough time with iOS devices on the market that people could have also pushed Linux on those...
22:29 karolherbst: I doubt anybody does that just to hack up the bootloader :D
22:29 karolherbst: HdkR: there was linux on iPods in the past.. close enough :p
22:30 HdkR: There was also Linux on some /really/ old iPhone
22:30 HdkR: 3GS era or something
22:31 chrisf: i think you'll find they've got more hostile since then
22:31 HdkR: Definitely, that was very early on
22:38 mareko: how does b2f work in nir_opt_algebraic (and why is it not b2f32?)
22:40 karolherbst: mareko: b2f either gives 0 or 1.0, no?
22:40 karolherbst: based on that assumption you can do many opts
22:41 mareko: does it work with all bit sizes?
22:42 karolherbst: ohh, that you mean.. mhh
22:43 karolherbst: I see we have the same for f2i as well
22:43 karolherbst: and others
22:43 karolherbst: seems like it it's for all bit sizes
22:43 mattst88: mareko: yes, it produces a float value of the correct size for the destination, IIRC
23:31 karolherbst: jekstrand: ahh, now I can reproduce issues :)
23:31 karolherbst: "Unhandled opcode SpvOpKill" :)
23:31 karolherbst: ehh
23:36 karolherbst: jekstrand: I guess I could add handling for at least the trivial ones
23:38 karolherbst: jekstrand: mind checking what's up with dEQP-VK.glsl.discard.basic_always?
23:39 karolherbst: with the updated branch at least
23:39 karolherbst: it fails, but no idea why :)
23:39 karolherbst: trying to quickly fix trivial issues first so I can push the CI again and check next morning :p
23:40 karolherbst: ohhh, I think I know
23:40 karolherbst: ehhh
23:40 karolherbst: need to emit discard.. mhh
23:42 karolherbst: passes now :)
23:44 karolherbst: mhhhh... "goto block_1 if ssa_8 else block_1"
23:44 karolherbst: I guess I could short cut that
23:46 karolherbst: SpvOpUnreachable :D the hell
23:47 karolherbst: what should I do with that one? mhhh
23:47 karolherbst: don't want to connect it to the end block? or maybe I should?
23:49 jekstrand: Uh...
23:49 jekstrand: That ones annoying.
23:49 jekstrand: I'd say connect to the end
23:49 jekstrand: It's got to go somewhere.
23:49 karolherbst: deqp-vk is fine if I do that, yeah
23:49 jekstrand: We don't have a goto_unreachable
23:49 jekstrand: Nice!
23:49 karolherbst: okay.. so 3 bugs fixed already :)
23:49 jekstrand: If we can run deqp-vk through it, gives me a lot of confidence in the lowering pass.
23:50 jekstrand: Not 100% confidence, of course. For that, I'd want to see a NIR pass that contracts edges like mad and then see it sort it out. :-)
23:50 karolherbst: jekstrand: https://mesa-ci.01.org/karolherbst/builds/156/group/b368d07ebed0a2f4703296d5dfb763b0
23:50 jekstrand: Because GLSLang produces code that's trivially structurizable.
23:50 karolherbst: not too bad
23:50 karolherbst: just random shit wrong
23:51 jekstrand: Yeah, fixing OpKill will help
23:51 jekstrand: That's probably a lot of the fail.
23:51 karolherbst: yeah
23:51 karolherbst: I think I fixed all of them now :D
23:52 karolherbst: well.. 99% at least
23:52 karolherbst: so.. another run
23:52 karolherbst: jekstrand: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/3dab99a9f905a90e003bc66ff8a4dbc24ed3b894
23:53 karolherbst: I think all of that is reasonable
23:53 karolherbst: or any comments? oterhwise I just merge it in
23:53 karolherbst: *squash
23:54 jekstrand: karolherbst: What's wrong with a degenerate goto_if?
23:54 karolherbst: I assert on the targets to be different
23:55 karolherbst: and.. I don't want to risk tripping up the structurizer :D
23:55 karolherbst: and we already do a lot of that stuff in vtn anyway
23:56 jekstrand: If we ever do any sort of unstructured control-flow optimizations in NIR, we're going to end up with degenerate ifs
23:56 jekstrand: Or we're going to have to have code to remove them which gets run constantly.
23:57 karolherbst: pro tip: don't :p
23:57 jekstrand: Don't what?
23:57 karolherbst: no idea if it really makes sense to even care about unstructured control flow
23:57 jekstrand: Do unstructured control-flow optimizations? But those are the most fun kind. :-)
23:57 karolherbst: it just makes stuff... hard in other places
23:57 jekstrand: Oh, I'm well aware.
23:57 jekstrand: We may go there eventually but it'll have to be done very carefully.
23:57 karolherbst: we have to insert convergency points and that's needlessly hard on unstructured already
23:58 jekstrand: Yes, it is.
23:58 karolherbst: so we do it based on the structured nir/TGSI :p
23:58 karolherbst: anyway.. for now I'd just do what the structured path does as well
23:58 jekstrand: nh has some ideas for trying to do it in LLVM but, last I heard, he's been having trouble gaining traction.
23:58 jekstrand: karolherbst: Yeah, for now we just need structurizing to parse SPIR-V
23:59 jekstrand: That's the whole point at the moment
23:59 karolherbst: okay, so I just squash it in for now. We can always revisit later :p
23:59 jekstrand: WFM