00:00 jekstrand: I've got a patch to squash into the structurizer which fixes some potential memory issues and simplifies some things.
00:00 jekstrand: Let me push it to my branch
00:00 karolherbst: anyway, triggered a new run, it will be #157 in case you get bored in one or two hours :D
00:00 karolherbst: ahh
00:00 karolherbst: yeah... will check it out tomorrow
00:00 karolherbst: I repushed mine at least
00:00 jekstrand: karolherbst: Grab my "FIXUP structurizer" and squash it in
00:01 karolherbst: k
00:01 jekstrand: karolherbst: I'm going to see if I can actually wrap my head around the structurizer tomorrow. If I give up before EOD, let's land the lot.
00:01 jekstrand: karolherbst: Oh, did you apply my suggestion to the hash set patch?
00:01 jekstrand: karolherbst: I'd like to at least see the asserts added.
00:02 jekstrand: The swap-to-shortest I care about a lot less
00:02 karolherbst: uhh, not yet
00:02 karolherbst: that's all for tomorrow :p
00:02 jekstrand: Ok, I left is a comment on the MR
00:02 karolherbst: wanted to get the CI to accept the branch at least
00:03 jenatali: Exciting, it's actually going to land :D
00:03 karolherbst: anyway, the results are quite promising
00:03 karolherbst: just... the run before that tripped for unknown reasons
00:03 karolherbst: probably crashed the systems a lot
00:04 jenatali: Then we just need to land libclc, and we're not too far from a workable full CL implementation
00:04 karolherbst: :) finally
00:05 jenatali: We've still got lots of minor patches that need to go in, but nothing quite as significant as the structurizer and libclc
00:05 karolherbst: after the structurizer I only have small things on my list, like fixing the api :D
00:05 karolherbst: yeah.. libclc will be fun
00:05 karolherbst: before that it's pointless to care much about the CTS
00:06 jenatali: Yeah, that's fair
00:06 karolherbst: I bet I also need to support more CF instructions, but I doubt those will cause many fails
00:07 karolherbst: are there even more?
00:07 jenatali: We're not failing any CTS tests due to missing CF instructions on your old version of the structurizer
00:07 jekstrand: jenatali: Yeah, it's going to be good to get this in finally
00:07 karolherbst: jekstrand: I meant vulkan ones
00:07 karolherbst: ...
00:07 karolherbst: jenatali:
00:07 jenatali: :)
00:07 jekstrand: Various unstructured things have been in my review queue for far too long....
00:07 karolherbst: jenatali: this was vulkan on top of the strucutrizer: https://mesa-ci.01.org/karolherbst/builds/156/group/b368d07ebed0a2f4703296d5dfb763b0
00:08 jenatali: Yeah, not bad
00:08 karolherbst: the fails are mostly fixed locally :D
00:08 karolherbst: just have to check what is left now
00:08 jekstrand: Looks like my shamrock branch breaks some GL cts tests... :-?
00:08 jekstrand: :-/ rather
00:08 karolherbst: jenatali: OpKill and OpUnreachable were missing
00:09 jenatali: karolherbst: Psh, those don't exist in CL, who needs 'em ;)
00:09 karolherbst: jenatali: I didn't pull in your iris patches
00:09 karolherbst: ...
00:09 karolherbst: ehhh
00:09 jekstrand: OpUnreachable doesn't?
00:09 karolherbst: how can that be :D
00:09 jenatali: I don't think so?
00:09 karolherbst: jekstrand: let's hope llvm doesn't emit them :D
00:09 jenatali: Eh, ok it's not defined as shader, so I guess theoretically CL could have them, I haven't seen one yet though
00:10 karolherbst: because.. let's face it.. everybody doing CL with spir-v will use llvm :p
00:10 karolherbst: and llvm rather connects everything before trying to just give up
00:10 jekstrand: karolherbst: Yeah, but that doesn't mean they won't let LLVM optimize it a bit first. :)
00:10 jenatali: OpKill is shader though, so none of that in CL
00:10 karolherbst: jenatali: right, but needed to run vulkan tests on top
00:10 jekstrand: I'd be really interested to see what would happen if you stuck some LLVM control-flow optimizations in the middle of one of these pipelines. :)
00:11 karolherbst: 8k fails are caused by OpKill
00:11 karolherbst: probably
00:11 jekstrand: Yeah, not having OpKill is going to be a lot of fail.
00:11 karolherbst: jekstrand: you know the reason the CL CTS if tests causes switches to get emited?
00:11 jekstrand: karolherbst: Do I want to?
00:11 karolherbst: do you think there is a switch in the clc source?
00:11 airlied: the LLVM-> SPIRV convertor is very specific about only ealing with unoptimised llvm
00:12 airlied: dealing with
00:12 karolherbst: jekstrand: also wondered how the default case could be already be visited and why I added the check for that even? :D
00:13 karolherbst: turns out.. llvm is very smart
00:13 jekstrand: "smart"
00:13 jenatali: That one's not even in the CTS :(
00:14 karolherbst: jekstrand: if (...) { switch(...) { ... default: break; } } ended up causing the switch default _and_ the else branch to have the same target :)
00:14 karolherbst: well.. the default _was_ the else branch
00:14 karolherbst: uhm.. block
00:14 jekstrand: karolherbst: Yeah, makes sense.
00:14 karolherbst: right..
00:14 karolherbst: just annoying
00:15 jekstrand: karolherbst: I deleted some of your special-casing there, FYI
00:15 karolherbst: you did?
00:15 jekstrand: Not sure if I deleted the one you're thinking of or not
00:15 jekstrand: Yeah, default handling is a bit different now
00:15 jekstrand: But it should be ok, I think.
00:15 karolherbst: vtn_add_unstructured_block is what fixed it :p
00:16 jenatali: Worth retrying that kernel
00:16 karolherbst: mhhh.. "vtn_assert(def != NULL)" mhh... you are lucky that spirv demands a default block :D
00:16 jekstrand: karolherbst: Not lucky. It's designed that way.
00:16 jekstrand: It's literally impossible to encode a valid OpSwitch without one.
00:16 karolherbst: yeah, because llvm would have abused it otherwise I am sure of
00:16 jekstrand: hehe
00:17 karolherbst: the new code still looks file
00:17 karolherbst: jenatali: I forgot which test tripped it
00:17 karolherbst: ohh, it was the custom kernel
00:17 karolherbst: ehhh
00:17 karolherbst: should still work
00:17 jenatali: Yeah, some custom stripped down repro kernel from Adobe I think
00:17 karolherbst: yeah
00:18 karolherbst: once there are no test regressions and we dealt with the code you are free to retest everything :p
00:18 jenatali: :P sounds good
00:19 jekstrand: jenatali: Yeah, I'd really appreciate one final test from you before we land it.
00:19 jekstrand: Running the Vulkan CTS through it gives me some confidence
00:19 jenatali: jekstrand: How much of a test? A full CL CTS run is... days
00:19 jekstrand: But that's easier to handle than CL kernels.
00:19 jekstrand: jenatali: Whatever gives you comfort in us landing it. :)
00:20 jenatali: Heh, fair enough. Probably basic + wimpy bruteforce to validate libclc
00:20 jekstrand: jenatali: I don't think we need to test all the rounding modes with the CF series. :)
00:20 jekstrand: Oh... that actually reminds me....
00:20 jekstrand: karolherbst: Do we emit mixed structured+unstructured?
00:20 jekstrand: I made that illegal
00:20 karolherbst: jekstrand: I hope not
00:20 karolherbst: why?
00:20 jekstrand: karolherbst: I'm thinking some weird built-ins might emit control-flow.
00:21 jenatali: jekstrand: I tried to emit an if from vtn today and it blew up in the structurizer
00:21 jenatali: So, I don't think so
00:21 karolherbst: jekstrand: in vtn?
00:21 jekstrand: Yeah, it doesn't look like we do. All uses of nir_push_if in vtn are in vtn_cfg.c
00:21 jekstrand: jenatali: Cool
00:21 karolherbst: that's a relief
00:21 jekstrand: jenatali: We could theoretically make nir_builder emulate ifs with goto_if when unstructured.
00:22 daniels: jenatali: const-folding is literally why I went away from separate insns per mode, despite that being a pretty clear win - the reason why I ended up doing it in vtn was that, despite how unpleasant it is, there is likely not going to be any non-CL consumer, so at that point containing the damage seemed better than a perpetual audit of how every single driver deals with unexpected modifiers etc
00:22 karolherbst: ehhhh
00:22 jenatali: daniels: Yeah, makes sense - glad to have your input on this discussion :)
00:22 daniels: still alive! :)
00:22 jenatali: Not sure if you've finished catching up - it was a lot of discussion earlier today :P
00:23 jenatali: But I'm interested in your opinions too if you've got any
00:23 daniels: yeah, it's too hot to sleep here, but I've just finished catching up
00:24 karolherbst: ahh.. it fell below 20, I guess I can sleep now :D
00:24 daniels: my opinion is exactly as above - separate insns would be the most technically correct solution but const-folding is a nightmare to get correct and condemns you to having two separate lowering paths (one for const in C, one for non-const in NIR) which are complex enough to be painful to reconcile as well as one of those 'here be dragons' bits of the source no-one dares touch
00:24 karolherbst:still wants to use native hw support
00:24 jenatali:nods
00:25 daniels: modifiers seem like a pragmatic halfway house, but then you're into the realm of making sure every driver and every pass (run in essentially random order) deals with them correctly, which is an endless trudge
00:25 karolherbst: yeah.. I added all ops explicitly in some very old patches and that worked quite fine, but then algebraic opt is annoying again
00:25 daniels: doing it in vtn is definitely the 'worst' solution, but otoh, given I don't expect to see anyone other than CL ever dealing with it ... probably the most sustainable solution? means you only need to write it on one language rather than two, and the damage is very clearly confined to CL
00:25 karolherbst: I think it's either work or work :p
00:26 karolherbst: you just have to choose which pile of work you want to work on
00:26 daniels: karolherbst: I saw your branch, but ... does it actually pass the entire CTS? :P
00:26 karolherbst: daniels: sure
00:26 karolherbst: it passed all convert tests
00:26 karolherbst: our hw is just mildly dumb, like you can't do 64 to 16 bit conversions directly
00:26 karolherbst: etc..
00:26 karolherbst: but besides that it just works
00:27 jenatali: I'm thinking... maybe intrinsics is the right way to go, with a lowering pass to convert them to a series of ALU ops
00:27 jenatali: And a pipe cap so Clover can either lower them for the driver, or in the case of NV, send them to the driver
00:27 karolherbst: I honestly like the idea of a new instruction type still... even though it's super annoying
00:28 karolherbst: jenatali: I am sure you are the only ones needing the lowering
00:28 daniels: yeah, I think if you can subvert NIR enough that you can have un-const-fold-able intrinsics, and make sure that you commit very early to either a) lowering through NIR ops, or b) pushing all the way through to the backend, that would make sense
00:28 karolherbst: so far all hw I know of supports all of that
00:28 karolherbst: in some form
00:29 karolherbst: be it having to use two instructions or maybe 3 or 4
00:29 karolherbst: but lowering is more of a 50 instruction fun, no?
00:29 karolherbst: at least for some variants
00:29 jenatali: Yeah some variants end up being closer to ~10
00:29 daniels: karolherbst: incorrect
00:30 karolherbst: daniels: wait until you do fp64
00:30 karolherbst: or was that more to the all hw supports it?
00:30 karolherbst: :p would like to know what hw doesn't
00:30 jenatali: karolherbst: We won't support fp64 unless there's native fp64 from the D3D driver
00:30 karolherbst: jenatali: heh, don't trust nvidia
00:31 karolherbst: they can't do int64, nor fp64 properly
00:31 karolherbst: it's all lowered anyway
00:31 daniels2: bad time for my IRC host to drop out ...
00:31 karolherbst: so if you loweri rounding modes on fp64 you are getting massive shaders
00:31 jenatali:shrugs
00:32 karolherbst: and perf will be toast
00:32 daniels2: karolherbst: missed a bit, but anyway, lots of other hardware just cannot do it at all - bear in mind that NV is special and a little more compute-oriented than most vendors :P
00:32 jenatali: That's okay. I doubt anybody actually uses these features anyway...
00:32 karolherbst: but yeah.. maybe nobody uses those
00:32 karolherbst: daniels2: still, I am not aware of any hw not supporting it :p
00:32 jenatali: At this point, I'm just looking for the rubber stamp of "pass" on the CTS :P
00:32 karolherbst: even panfrost can support that natively
00:32 karolherbst: so...
00:33 karolherbst: maybe the older SoC won't.. dunno
00:33 daniels2: karolherbst: 'it' being native ALU ops for all {int,float}{8,16,32,64}<->{int,float}{8,16,32,64} with all round/sat modes? because AFAIK that is not true for any gen of Panfrost
00:33 karolherbst: daniels2: afaik, yes
00:33 daniels2: (largely true yes, but with corner-cases you need to open-code in shaders)
00:33 karolherbst: maybe not _all_ combination directly
00:34 karolherbst: but without requiring actual lowering
00:34 karolherbst: just do two or three converts and be done with it
00:34 daniels2: jenatali: next week, CL2.0!
00:34 karolherbst: like for nv we have to convert 64 to 32 bit and then to 16/8
00:34 karolherbst: still just two converts
00:34 karolherbst: not real alu lowering
00:34 jenatali: daniels2: Nah, 2.0 is never going to happen. Skipping straight to 3.0 ;)
00:34 daniels2: jenatali: \o/
00:35 karolherbst: my point was: no hw actually needs the proper alu lowering
00:35 airlied: not much skipping there :-P
00:35 daniels2: karolherbst: I don't think that's true
00:35 karolherbst: I think it is until I hear about hw which can't do it :p
00:35 jenatali: karolherbst: Regardless - having all of it implemented means we can constant fold it by lowering to separate ALU ops
00:35 jenatali: It also means we can have each backend pick which conversions they can do and which need to be lowered... somehow :)
00:36 karolherbst: yeah...
00:36 karolherbst: I'd like to push more lowering into nir for nouveau
00:36 karolherbst: and splitting the converts is one of that
00:37 jenatali: So... convert intrinsic with constant indices for rounding and saturation?
00:38 karolherbst: mhhhh
00:38 jenatali: With a lowering pass to convert to ALU ops, which is *always* run for constant values (by Clover), and optionally run based on pipe caps?
00:38 daniels2: karolherbst: in the past I've had interest for implementing CL on Freedreno (stopped only by Qualcomm stopping production of the particular SoC) - are you confident that Qualcomm hardware properly implements all of the rnd+sat combinations including the inf/denorm corner cases? :)
00:38 karolherbst: jenatali: honestly? being that a special instruction type feels more natural... but I fear the amount of work that will bring upon us
00:39 karolherbst: daniels2: I think I even saw patches for that at some point :p
00:39 jenatali: Yeah, I dunno that I'm ready to rock the boat quite *that* hard
00:39 daniels: ah, my actual host is back
00:39 jenatali: I can add dozens of intrinsics and lowering passes, but a whole instruction type is maybe a bit out of my comfort zone :P
00:39 jenatali: Hell I'm not even a Mesa developer yet :P
00:39 daniels: karolherbst: yeah, but like ... CTS
00:40 daniels: jenatali: you might regret saying that!
00:40 jenatali: Uh oh, saying what?
00:40 karolherbst: daniels: well, floats are well defined and messing that up would be.. well... right..
00:40 karolherbst: I guess there could be some denormal stuff going wrong or so
00:40 karolherbst: but even then.. handling that is still simplier then full lowering
00:41 jenatali: karolherbst: Let the backend say they "support" it, then write their own more minimal lowering
00:41 karolherbst: full lowering involves nextafter and that means it's all toast anyway
00:41 jenatali: Worst case, backends that don't support it at all (*cough* DXIL *cough*) can take advantage of the full lowering for slow, correct implementations
00:42 karolherbst: right
00:42 jenatali: Which we'll still want for constant folding anyway
00:42 karolherbst: right
00:42 karolherbst: constant folding ... mhhh
00:44 daniels: jenatali: that you aren't a Mesa developer - someone might just make you responsible ;)
00:44 daniels: karolherbst: 'floats are well-defined' != 'floats are easy to implement' != 'floats are easy to implement in bounded floorplan/TDP'
00:46 karolherbst: well.. would be interesting to see how the closed source adreno stuff implements those
00:46 karolherbst: it's even surprising they support CL 2.0
00:46 karolherbst: so.. can't be that bad
00:48 karolherbst: jekstrand: https://mesa-ci.01.org/karolherbst/builds/157/group/63a9f0ea7bb98050796b649e85481845#subgroups
00:48 karolherbst: what can I say? :D
00:48 karolherbst: why are there still fails
00:48 karolherbst: :D
00:48 daniels: karolherbst: are you genuinely saying 'how hard can it be to implement all of CL in a conformant manner' ... ?!
00:49 daniels: I mean, Mali implements all of ES3.2, but they implement GS/TS by translating to CS
00:49 karolherbst: sure
00:49 karolherbst: but CS is... not that complex
00:50 imirkin: daniels: not such a horrible plan, imho
00:50 daniels: at some point eating the cost of paying too many software engineers + burning half of them out, is far better than increasing your silicon floorplan by however much to do things which are not what you design & optimise for
00:50 karolherbst: if you compare it to the graphics pipeline, compute is trivial
00:50 imirkin: daniels: plus they get mesh shaders for free :)
00:50 jekstrand: karolherbst: https://mesa-ci.01.org/karolherbst/builds/157/results/25529579
00:51 karolherbst: daniels: I am sure broken hw burns them out even more, because those are the really annoying issues to deal with
00:51 jekstrand: karolherbst: Looks like something strange going on with jumps in functions or something
00:51 karolherbst: jekstrand: deqp-vk: ../src/compiler/nir/nir.c:861: nir_instr_insert: Assertion `cursor.instr->type != nir_instr_type_jump' failed.
00:51 karolherbst: yeah...
00:51 daniels: karolherbst: if it's trivial, then by the end of next week you can send me a link to the MR accompanied by the bill for a restaurant + bar of your choice :)
00:51 karolherbst: first you find a bug, hw engineers say: no, it's all fine
00:51 jenatali: So, I'm still not hearing a strong reason not to go with intrinsics + lowering
00:51 karolherbst: then you discsuss for months and finally it's a hw bug :p
00:52 daniels: luckily software has no bugs
00:52 karolherbst: they are less annoying to find out
00:52 karolherbst: hw bugs are just evil if the people involved are blocking
00:52 jenatali: Not always :)
00:52 karolherbst: happened often enough to me and you can just give up
00:52 daniels: jenatali: is that intrinsics + no C-lang const expr + early choice to lower immediately or preserve right through to backend, or something else?
00:53 jenatali: daniels: That'd be my proposal, yeah
00:53 daniels: jenatali: ++ from me, but I think we need jekstrand/cwabbott buy-in
00:53 jenatali: Maybe some more nuanced lowering choices, but either way, constants should always be lowered by any CL-supporting frontend
00:54 karolherbst:still wants to find this magic C math lib doing it for us
00:54 daniels: karolherbst: my favourite errata was the early OMAP2420 rev we got at Nokia - after two months of banging our head on why we got SIGILL on anything from executing /bin/sh (kernel mostly but not always fine, unless you waited about an hour for it to run with all cache disabled), we got TI to admit there was a problem with the cache!
00:54 jenatali: In theory libclc supports it... in practice it doesn't :)
00:54 karolherbst: :D
00:54 daniels: Problem: SD-RAM controller randomly discards writeback from L3 cache
00:54 daniels: Workaround: None known.
00:54 karolherbst: daniels: yeah, sounds like how it always goes...
00:55 daniels: thankfully that one was enough to make them spin a new rev :P
00:55 karolherbst: I still have this nouveau runpm bug I am sure it's a hw bug
00:55 karolherbst: but do you think _anybody_ wants to look into it at intel?
00:55 karolherbst: nope
00:55 karolherbst: "all is fine" is what I get :p
00:55 karolherbst: it's still broken
00:55 karolherbst: even with the nvidia driver
00:56 karolherbst: I am even able to reproduce by banging pci controler regs
00:56 karolherbst: "using undocumented regs can always mess things up".. yes thank you for your input :p
00:57 karolherbst: we have a workaround now, but still...
00:57 karolherbst: anyway.. sw bugs are fine to figure out, hw is just plain evil if the people are annoying
00:59 karolherbst: but honestly? I'd just would like to do conversions the proper way
00:59 karolherbst: I already wrote most of the C lowering, it's just annoying
01:00 karolherbst:doesn't see why we should shortcut here if hw actually benefits from doing it the right way
01:00 jenatali: karolherbst: Not sure why intrinsics are shortcut vs alu ops?
01:01 karolherbst: jenatali: I was more refering to the lowering of constants
01:01 karolherbst: but yeah..
01:01 karolherbst: maybe that would be fine
01:01 jenatali: Why would you not lower a constant?
01:01 jenatali: Trading GPU time every execution for CPU time compiling
01:02 karolherbst: jenatali: I meant lowering conversion with a constant argument, so we can constant fold the lowered code feels... odd
01:02 jenatali: IMO it's better than having two implementations, one in C, one in NIR
01:02 jenatali: At least the value will be the *same*, regardless of whether it's a constant or not
01:02 karolherbst: yeah..that's why I wish there to be an C math lib supporting that :p
01:02 karolherbst: I am sure stuff like MKL does support it
01:02 karolherbst: it's just not open source
01:03 karolherbst: but yeah.. I totally see the benefits
01:04 karolherbst: jekstrand: ehh.. the fail is an actual bug in the structurizer mhhh
01:04 jenatali: :(
01:04 karolherbst: https://gist.githubusercontent.com/karolherbst/f0a544b26fa51098420dcfa050f3609b/raw/06cb16d75efdcbd8a722c05773c5a3ffd3b1ad24/gistfile1.txt
01:04 karolherbst: have fun, I'll sleep now :p
01:06 daniels: karolherbst: I get that you wrote the separate path, and I'm super glad that you did and it's a great effort - but the thought of having two complex and borderline-unreviewable paths doing the same thing in two different languages, selected between them by being const or not - genuinely makes me itchy
01:07 jenatali: Especially since the CTS only tests one of them :D
01:07 karolherbst: daniels: yeah... I blame x86 FPU being broken and other archs not learning from the mistakes :D
01:07 karolherbst: jenatali: the CTS has a CPU implementation tough :p
01:08 karolherbst: so it compares that to whatever the gpu does
01:08 jenatali: Fair - there's your math lib that you were asking for ;)
01:08 karolherbst: it's.. x86/arm only I think
01:08 karolherbst: :D
01:08 jenatali: Isn't it plain C?
01:08 karolherbst: they might use the saner sse variants
01:08 karolherbst: let me check
01:09 daniels: more divergence \o/
01:09 airlied: then someone runs on ppc64 with 80-bit floats
01:10 airlied: or 128-bit or whatever the other crazy things is
01:10 jenatali: ... Is that a thing?
01:10 karolherbst: yep
01:10 karolherbst: jenatali: have fun: https://github.com/KhronosGroup/OpenCL-CTS/blob/master/test_conformance/conversions/basic_test_conversions.cpp
01:10 karolherbst: :D
01:10 karolherbst: fplib.cpp is also fun
01:11 airlied: https://en.wikipedia.org/wiki/Long_double
01:11 jenatali: Ugh... you all are making me sad now :(
01:11 karolherbst: jenatali: x86 even supports doing 80 bit internally and giving 64 bit results or something?
01:11 karolherbst: that stuff was always supper weird
01:11 mattst88: what is this weird arm FPU type?
01:12 daniels: mattst88: ... which one?
01:12 airlied:had one ppc64 bug in that area, I've blocked out the details from memroy
01:12 karolherbst: lrintf_clamped :)
01:12 karolherbst: mattst88: rounding modes
01:12 mattst88: daniels: it was mentioned in...
01:12 mattst88: jenatali> | Actually, I do have a compelling reason. As part of ARM64 support, we've got a crazy special binary format (that clang will probably never emit) which is ARM64 code for x86 processes (e.g. 32bit pointers, x86 calling conventions, etc)
01:12 karolherbst: I am convinced SSE is the opnly sane FPU implementation so far
01:13 karolherbst: no global rounding mode state flag
01:13 mattst88: I assumed it was an FPU format being referred to, but maybe I just totally misunderstood
01:13 jenatali: mattst88: It's not an FPU format, it's a PE binary format
01:13 imirkin: mattst88: like x32?
01:13 mattst88: karolherbst: but... SSE does have a global rounding mode state flag...?
01:13 imirkin: but for arm :)
01:14 karolherbst: mattst88: afaik you can pass it along the intrinsics? no?
01:14 daniels: mattst88: oh yeah, that was interesting. I thought I might find the answer by opening Karol's link to the CL conversions test, upon which I discovered that ARM + GCC == proprietary QCOM FPU lib. nice.
01:14 mattst88: imirkin: okay, cool. I did misunderstand, and I'm glad there's not another dumb 80-bit FPU format or something :)
01:14 mattst88: daniels: doh!
01:14 daniels: srs.
01:15 mattst88: karolherbst: some later SSE version (4.1, I think) gives the ability to choose the rounding mode in the instruction word, yeah
01:15 karolherbst: daniels: as I said: FPUs are stupid :p
01:15 karolherbst: mattst88: ahh, yeah..
01:15 mattst88: but before that, you're still stuck with flipping the bit in the SSE FP control word :|
01:15 karolherbst: sse2 doesn't seem to have it
01:15 karolherbst: yeah..
01:15 karolherbst: it sucks
01:15 daniels: mattst88: re ARM64/x86, I take it from what jenatali's saying is that the answer is 'bizzaro-land x32'
01:15 mattst88: daniels: right. that sounds... unfun
01:16 jenatali: :D
01:16 jenatali: We call it CHPE
01:16 karolherbst: we really need this open source math library doing that floating point stuff in an sane way :p
01:16 imirkin: mattst88: not sure if your comment was misdirected, but i have no idea about 80-bit fp formats
01:16 karolherbst: but I guess nobody would use it, as it's slow and they rather break libraries/applications by flipping global state wihtout unflipping
01:17 imirkin: (besides knowing they exist, esp in x87, but perhaps others got similarly bad ideas)
01:17 daniels: (fun story about flipping bits on SSE control words: the only reason we can't run CTS on radeonsi atm is because dEQP burns >18% of CPU time within fesetround)
01:17 karolherbst: ... lol
01:18 mattst88: imirkin: yeah, sorry. when he said 'binary format' my mind went to 'binary32' / 'binary64' / 'binary80' rather than ELF/PE/a.out
01:18 jenatali: daniels: That sounds like a test sorting issue to me :)
01:18 mattst88: daniels: lol, wtf is it doing that for?
01:18 daniels: jenatali: hm?
01:18 karolherbst: daniels: I discovered yesterday that compiling with Ofast breaks things :) I know, big surprise
01:18 bnieuwenhuizen: daniels: if you think that is bad, consider the ARM on x86 emulation for running ARM Android CTS on an x86 device
01:18 jenatali: Seems like you should be able to re-sort the tests so that you don't have to reset rounding modes so frequently
01:18 mattst88: FWIW, in Mesa we decided a few years ago that it's okay to do bogus shit if you call into libGL with a non-standard rounding mode
01:18 daniels: bnieuwenhuizen: thanks, but I'd rather not :)
01:19 karolherbst: mattst88: I compiled mesa with Ofast :)
01:19 karolherbst: :D
01:19 karolherbst: so mesa was breaking the CTS rather
01:19 daniels: bnieuwenhuizen: at least you have infinite money to pay for test farms - we have this idiotic business model where our revenue is a direct factor of the amount of time we spend working on things
01:19 daniels: (weird, I know)
01:20 imirkin: daniels: you should switch to the other one
01:20 mattst88: "oh, okay" :)
01:20 karolherbst: anyway.. I am inclined to do whatever works as long I get my native rounding modes :D
01:20 daniels: imirkin: SoftBank is not without its downsides
01:21 karolherbst: also.. I am quite happy that this global rounding mode flag thin in CL just disappeared without a trace :)
01:21 HdkR: Oh, CHPE? Fun stuff, reminds me that I should test x87 performance on my ProX :D
01:22 jenatali: HdkR: Oh, you do have one of our arm64 devices?
01:22 daniels: karolherbst: look, I don't disagree with you at all, but I really do think you're overestimating the extent to which non-NV hardware has the silicon floorplan + TDP to do the correct thing in 1-3 insns
01:23 karolherbst: as I said, I can be convinced with counter examples
01:23 HdkR: jenatali: I own a /lot/ of ARM64 devices
01:23 karolherbst: I just heard that panfrost could make use of native instructions, no idea how precise they are, and I think AMD and intel also can make use of some native stuff
01:23 karolherbst: at least it would be good to have an overview on what hw can do what
01:24 HdkR: Sadly the ProX has been one of the less utilized ARM64 devices since WSL2 doesn't work with reclocking and getting DeviceTree booting on it is a nightmare
01:26 HdkR: (ACPI booting Linux has its own challenges there)
01:26 daniels: karolherbst: yeah, I mean, there are insns, and they will work in a lot of cases - but not in corner cases
01:27 karolherbst: daniels: I guess for mali devices you are able to dumb what the closed source CL implementation does, right?
01:28 karolherbst: would be interesting to compare it against potential lowering
01:28 airlied: what does llvm do with conversions?
01:28 karolherbst: but I think to remember somebody did this and it was quite simple code in the end?
01:28 karolherbst: dunno
01:28 daniels: airlied: intrinsics
01:29 daniels: karolherbst: yep
02:22 jekstrand: karolherbst: Uh... Yeah, that's probably a bug.
02:25 jekstrand: karolherbst: Looks like it's inserting a break or something and then code after it. I'll have to dig in after I get done trying to grok it.
03:14 hanetzer: agd5f_: https://gitlab.freedesktop.org/drm/amd/-/issues/1257 filed the bug.
05:58 airlied: definitely last llvmpipe bug tracked down for 4.5 CTS with 4.6.0 branch
06:55 danvet: airlied, lol that diffstat
06:55 danvet: not sure what git is doing tbh
06:57 airlied: danvet: yaeah I've had it get confused in taht way before
07:30 MrCooper: anholt: doubtful making softpipe faster than swrast is possible without compromising its raison d'être as a simple reference Gallium driver; might be better to make it a separate driver instead
09:29 pinchartl: narmstrong: are you as thrilled as I am about "[PATCH 0/6] Support change dw-hdmi output color" ?
10:18 narmstrong: pinchartl: I haven’t got the pleasure yet, I’m back from vacation in 1,5w
10:40 MrCooper: anholt robclark: FYI, cheza job hitting 1h timeout due to some communications failure: https://gitlab.freedesktop.org/vlee/mesa/-/jobs/4040954
11:26 bbrezillon: karolherbst: regarding the CL round() implementation, can we opt for the second version (you said it was working)
11:26 bbrezillon: *was working on nvidia
11:37 karolherbst: bbrezillon: sure
13:00 Vanfanel: emersion: Any idea on why the sum of this changeset-> https://github.com/vanfanel/SDL/blob/6e81e406f3851f12acc272c8367bcc0fd199dcf0/src/video/kmsdrm/SDL_kmsdrmvideo.c#L414 and this other changeset-> https://github.com/vanfanel/SDL/blob/6e81e406f3851f12acc272c8367bcc0fd199dcf0/src/video/kmsdrm/SDL_kmsdrmvideo.c#L387 makes the corresponing atomic commit fail with EINVAL(-22)? I have verified that all
13:00 Vanfanel: values make sense, and there's no "fail" string in dmesg in DEBUG mode.
13:12 MrCooper: Vanfanel: gbm_bo pointers are best for simply passing around between functions; handles need to be ex/imported, which is more expensive
13:16 Vanfanel: MrCooper: then I am doing it fine. Thanks for the reply on yesterday's question :)
13:17 Vanfanel: MrCooper: any idea on why these changesets are failing, btw?
13:20 MrCooper: nope
13:21 Vanfanel: ok, thanks anyway
13:24 emersion: no idea either, sounds fine from a quick look
13:24 emersion: if an atomic commit fails, you really should be getting "failed" in the log
13:26 Vanfanel: emersion: I am doing "dmesg |grep fail" (after setting debug mode) with no results...
13:26 emersion: dmesg is a ring buffer
13:26 emersion: it means old messages are scraped
13:26 emersion: you need to obtain the full log with dmesg -w
13:27 emersion: (continuously recording the logs during the atomic commit)
13:27 Vanfanel: emersion: oh, hand't thought of that since the -22 is at the end of the program...
13:33 pq: Vanfanel, do you know what cursot hotspot means in general?
13:34 pq: *cursor
13:35 Vanfanel: pq: I think I do: it means the distance from the borders of the screen, right?
13:36 pq: Vanfanel, it looks to me like you are using just one instance of plane_props, but you have two planes, primary and cursor. The props are not guaranteed to be compatible to other planes.
13:36 emersion: it's the distance between the top-left corner of the buffer and the tip of the cursor
13:36 pq: Vanfanel, no. Cursor hotspot means the x,y offset inside the cursor image from the its top-left corner to the point that a user things is the "active" point.
13:37 pq: *thinks
13:37 emersion: "active point" is probably more accurate, yeah
13:37 emersion: tip of the pointer, middle of the loading circle, etc
13:38 pq: Vanfanel, CRTC_X/Y are the position of the plane on the output. So for a cursor, you'd have CRTC_X = cursor_position.x - hotspot.x
13:40 pq: Vanfanel, re: props; reading your add_plane_property() function, you cannot use that on a cursor plane, because it assume it is operating on the primary plane, most likely.
13:40 pq: you need plane props for each plane separely
13:41 pq: the same with props of all KMS objects
13:43 Vanfanel: pq: Thanks A LOT, that must be it! I had copied over add_plane_property() from examples, and I had not seen it's operating on a hard-coded plane props! It always operates on the same plane props! :(
13:44 pq: you're welcome - I was lucky to see it :-)
13:47 Vanfanel: pq: I have been chasing this for days... VERY lucky to have your help and emersion's help too :)
13:48 Vanfanel: I have become somewhat obsessed with this SDL2-has-to-go-atomic thing! :D
13:58 apinheiro: do anyone knows if vulkan environment somehow infers some kind of restrictiction on the registers that should be available when doing the shader register allocation?
13:59 apinheiro: Im fighting with a test that puts the register allocation pressure over 100, and even if we are able to reduce that somehow
13:59 apinheiro: all options I can think of are somewhat really speficic for that CTS test
13:59 apinheiro: so not sure if that make sense
14:00 apinheiro: hmm, without context all what I said is somewhat hard to understand
14:00 apinheiro: ok, so entering into details
14:00 apinheiro: this is the test:
14:00 apinheiro: dEQP-VK.spirv_assembly.instruction.compute.opcopymemory.array
14:01 apinheiro: the test is about testing that opcopymemory with an array works
14:01 apinheiro: so, it copies from an ssbo to an array, and then from that array to another ssbo
14:01 apinheiro: so going to the specifics of the spirv, is basically this:
14:02 apinheiro: %var = OpVariable %f32arr100ptr_f Function
14:02 apinheiro: %inarr = OpAccessChain %f32arr100ptr_u %indata %zero
14:02 apinheiro: %outarr = OpAccessChain %f32arr100ptr_u %outdata %zero
14:02 apinheiro: OpCopyMemory %var %inarr
14:02 apinheiro: OpCopyMemory %outarr %var
14:02 apinheiro: the issue is that right now, the generated nir is a big chunk of load_const/load_ssbo, followed by a big chunk of store_ssbo
14:03 apinheiro: so all the registers used for the load_ssbo are live, because they are going to be used on that store_ssbo
14:03 apinheiro: also, nir_schedule can't just interleave the store_ssbo, because as it is, because any load_ssbo would create a dep with any store_ssbo
14:03 apinheiro: so
14:04 apinheiro: I guess that there are some things that we could do to get that working
14:04 apinheiro: but Im also wondering if it makes sense that in order to test that functionality, you need a 100-element array
14:05 apinheiro: so perhaps it makes sense to just ask to modify the test to something more reasonable like 50?
14:05 danvet: mlankhorst, tzimmermann did you see various backmerge request for drm-misc-next?
14:05 apinheiro: so that goes back to my original question, about if the Vulkan/spir-v spec somehow infer something about the register allocation
14:05 tzimmermann: danvet, no. i've been away for two days
14:06 danvet: I think mripard is also away
14:06 danvet: so maybe mlankhorst?
14:06 tzimmermann: danvet, what do you need backmerged?
14:06 danvet: just backmerging all of drm-next (it's still outside of the merge window) should be good
14:06 bnieuwenhuizen: apinheiro: no, why would it?
14:06 danvet: I think Lyude needs it for nouveau, and linus w needs it for some panel stuff
14:07 apinheiro: bnieuwenhuizen, well, I was not expecting something totally explicit
14:07 danvet: also there's a conflict, so beware
14:07 danvet: (iirc there's a conflict at least)
14:07 apinheiro: but more like "shaders would need to support arrays of size X"
14:07 apinheiro: or something like that
14:07 apinheiro: I understand that going fully explicit to register allocate would not make sense
14:08 bnieuwenhuizen: apinheiro: closest is https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#_a_id_limits_a_universal_limits
14:09 bnieuwenhuizen: but the minimum limit is only in number of variables and not their size
14:11 apinheiro: bnieuwenhuizen, interesting, that was more or less what I was thinking on
14:11 apinheiro: but now that we are talking and thinking about that, Im more and more realizing that my question was just silly
14:12 apinheiro: because the issue is about how many of those variables are live at a given moment, and that for sure can't be even implied on the spec
14:12 apinheiro: I guess that it was an attempt to bypass a somewhat annoying test
14:12 apinheiro: bnieuwenhuizen, thanks for the link, sorry for the noise
14:17 jekstrand: Hey, apinheiro! Yeah, there's no implicit restriction there, I'm afraid. :-(
14:17 jekstrand: apinheiro: The best thing we're likely to be able to do is maybe optimize it a bit so the load/store get interleaved better.
14:18 apinheiro: jekstrand, well, yes, that was the first thing I tried
14:18 apinheiro: using the nir_scheduler
14:18 apinheiro: but as far as I see
14:18 jekstrand: Right now, I think it does a load of the whole thing and then a store of the whole thing. :-(
14:18 apinheiro: when you set a dependency between intrinsics
14:18 apinheiro: no matter the real source/dst so it just avoid the interleave
14:18 jekstrand: Yeah, nir_scheduler doesn't have enough information to sort it out, I'm afraid. :-(
14:18 apinheiro: jekstrand, yes, that's what it is happening now
14:19 apinheiro: on anv it works because it has enough registers
14:19 apinheiro: so lucky you that they didn't set a initial 200 or something like that array
14:19 bnieuwenhuizen: time to implement spilling as well?
14:20 apinheiro: bnieuwenhuizen, we support spilling, but current implementation doesn't get any of those nodes as spillable
14:20 jekstrand: Wait... Reading spirv_to_nir, OpCopyMemory should be interleaving
14:20 apinheiro: jekstrand, note that there are two opcopymemory
14:20 apinheiro: C&P again:
14:20 jekstrand: apinheiro: Oh... Yeah, sorry, you're toast. :-(
14:20 apinheiro: OpCopyMemory %var %inarr
14:20 apinheiro: OpCopyMemory %outarr %var
14:21 apinheiro: I think that you are thinking on OpCopyMemory %outarr %inarr
14:21 bnieuwenhuizen: apinheiro: that sounds like a problem with the spiller?
14:21 apinheiro: that btw that works
14:21 apinheiro: although is not what the test is working
14:21 apinheiro: *testing
14:21 bnieuwenhuizen: you should be able to spill the values between the load and the store
14:22 apinheiro: bnieuwenhuizen, yes, perhaps, I just checked a little the spiller and then moved to try nir_schedule
14:22 apinheiro: I guess that I would need to go back (and in deep) for the spilling
14:22 jekstrand: apinheiro: Here's a possibility: We've got a comment in there about emitting deref_copy. If the driver is using derefs for UBOs and SSBOs, that should be possible.
14:22 jekstrand: At which point, copy_prop_vars should be able to reduce it to one copy.
14:23 apinheiro: hmm, I think that we briefly talked about that recently
14:23 apinheiro: about moving from non-deref path
14:24 jekstrand: You should really move away from the non-deref path
14:24 apinheiro:checking TODO list
14:24 jekstrand: I want to kill that code so bad.....
14:25 jekstrand: Another dream I have, which I don't know how to do, is to have NIR analysis passes be able to produce a dependency graph for intrinsics. So things like the NIR scheduler or back-end schedulers can know which loads/stores actually interfere and which ones don't.
14:25 apinheiro: well, re-reading your email, you pointed that it shouldn0t be easy
14:25 jekstrand: But that's a dream and I have no idea how to do it today.
14:25 jekstrand: What shouldn't be easy?
14:25 apinheiro: sorry typo
14:26 apinheiro: shoulnd't be hard
14:26 jekstrand: Yeah, it should be pretty trivial.
14:26 jekstrand: It's a bit of a pain for RADV because they consume the vulkan resource index intrinsics directly in ACO
14:26 apinheiro: so taking into account that the spilling code is far from trivial, perhaps it makes sense to look at that
14:26 jekstrand: Which, IMO, is a bit weird but whatever.
14:27 apinheiro: if the plan is getting that out at some point, perhaps it makes sense to look now. Specially if we get this test fixed for free
14:27 apinheiro: that==non-deref paths
14:27 jekstrand: Yes, please!
14:28 apinheiro: my main concern is about the v3d compiler
14:28 apinheiro: because we are basically reusing it from the opengl driver, and we try to not touch it as much as possible
14:35 bnieuwenhuizen: jekstrand: where else would you consume them?
14:35 jekstrand: apinheiro: In ANV, we lower the vulkan resource intrinsics in NIR and the back-end never sees them. For us, it was a really easy transition.
14:36 jekstrand: bnieuwenhuizen: ^^
14:36 bnieuwenhuizen: what do you lower them to?
14:36 apinheiro: jekstrand, hmm, well, yes we do the same on v3dv
14:36 apinheiro: we lower the vulkan resource intrinsics
14:36 apinheiro: then I think that I somewhat misunderstood what you mean for non-deref paths
14:37 apinheiro: I thought that it was about the backend consume directly derefs
14:37 jekstrand: apinheiro: Both paths use vulkan_resource_index and friends
14:37 jekstrand: Oh, no, the back-end doesn't need to consume derefs!
14:37 jekstrand: You use nir_lower_explicit_io
14:38 apinheiro: we already do that with compute shaders
14:39 apinheiro: do that == call nir_lower_explicit_io
14:39 jekstrand: bnieuwenhuizen: Depends. For some, it's a load_ubo to pull a pointer/size out of the descriptor UBO and cast it to a deref. For others, we lower to a binding table index. For others, we lower to a load_ubo to pull a bindless image handle out of the descriptor buffer and pass it to a texture intstruction or whatever.
14:39 jekstrand: apinheiro: Yeah, you probably do for shared. You just need to start using it for UBOs and SSBOs too.
14:39 jekstrand: apinheiro: If you ask for 32bit_index_offset, it'll give you the exact same intrinsics you have today.
14:40 jekstrand: apinheiro: The only difference will be that the vulkan_resource_descriptor intrinsics will be a tiny bit different.
14:40 apinheiro: jekstrand, yep for shared
14:40 jekstrand: Mostly because I also cleaned up the model there a bit. At least, I think it was a clean-up. :-) It seems to have confused some people....
14:40 apinheiro: jekstrand, so then I guess that we would need to modify a tiny bit the lowering we already have for vulkan resource descritor?
14:43 jekstrand: apinheiro: Yup
14:43 jekstrand: That should be it
14:43 jekstrand: apinheiro: There are two differences to be aware of:
14:44 jekstrand: 1. vulkan_resource_index and vulkan_resource_reindex now return a pointer type (probably uvec2) instead of a scalar 32-bit integer
14:44 jekstrand: 2. You'll start seeing a load_vulkan_descriptor intrinsic between the last vulkan_resource_reindex and the deref chain. You can make that a no-op if you want.
14:46 jekstrand: My recommendation is that vulkan_resource_[re]index just return vec2(what_you're_doing_now, 0) and load_vulkan_descriptor return vec2(src, 0)
14:47 jekstrand: That should get you more-or-less what you have now.
14:47 jekstrand: We actually do interesting stuff in load_vulkan_descriptor but you can also make it a pass-through if you'd prefer.
14:50 apinheiro: jekstrand, thanks for the tips
14:50 jekstrand: yw
14:50 apinheiro: about what to do, I think that as a first step I would try to keep it simple
14:51 jekstrand: Yeah
14:51 apinheiro: right now we are on around 220/20 fails/crashes when running the 1.0 mustpass
14:51 jekstrand: That's not bad at all
14:51 apinheiro: so we are focusing on getting 1.0 done first, and then we can start to think on improvements later
14:51 apinheiro: as usual, the last fails/crashes are the trickiest ones, including flaky
14:52 robclark: MrCooper: yeah, that looks like tftpboot fail.. iirc anholt made some tweaks to max packet size or something like that, but I guess we still see it occasionally.. I guess it would be nice if we had a shorter timeout to detect when kernel doesn't start
14:52 apinheiro: in any case, the kids are awake now, but I have plenty of stuff to look later
14:52 apinheiro: jekstrand, again thanks
14:53 apinheiro: bnieuwenhuizen too, and I would add to the todo checking spilling
14:53 jekstrand: apinheiro: yw Always happy to help!
14:53 jekstrand: apinheiro: Once you have derefs, nir_opt_find_array_copies may be able to figure out the copy. You may not need to change spirv_to_nir.
14:54 jekstrand: But I'm not 100% sure on that one
14:54 jenatali: Does that mean it's my turn to bug jekstrand now? :)
14:55 jekstrand: lol
14:55 jenatali: We had some more discussion on conversions and I wanted to get your take on it
14:55 jekstrand: No, I'm going to go take a shower before the SPIR-V call. Back in half an hour.
14:55 jenatali: Heh, alright, later then
14:58 imirkin: jekstrand: after seems more appropriate... you'll feel all dirty
14:59 jekstrand: imirkin: You're not wrong....
14:59 jenatali: ;D
15:04 karolherbst: jekstrand: you don't have any local changes left for the structurizer, correct?
15:17 bl4ckb0ne: does vulkan runs on ivy bridge cpus ?
15:18 imirkin: there's a partial implementation
15:18 imirkin: but it's non-conformant
15:18 imirkin: but it's enough to run some stuff
15:18 bl4ckb0ne: good to know
15:18 bl4ckb0ne: thanks!
15:19 imirkin: (presumably you meant ivy bridge *gpu*, not cpu...)
15:20 bl4ckb0ne: yes, that
15:20 bl4ckb0ne: looking for an upgrade for my x220 laptop
15:20 bl4ckb0ne: and x230 looks decent
15:21 imirkin: haswell is a much stronger gpu iirc
15:21 imirkin: i don't know that e.g. skylake is materially better than haswell (in terms of practical perf)
15:23 bl4ckb0ne: as long as i can run vulkan ill be happy
15:23 imirkin: afaik the anv on haswell is a supported product
15:23 imirkin: i.e. you get a conformant vulkan
15:28 jekstrand: karolherbst: There are two fixup patches in my for/karolherbst branch
15:29 bnieuwenhuizen: imirkin: I thought haswell wasn't complete?
15:29 karolherbst: jekstrand: yeah.. those I got already :)
15:30 imirkin: bnieuwenhuizen: ivb isn't complete
15:30 karolherbst: jekstrand: do you want to dig into the bug I've found or should I?
15:30 bnieuwenhuizen: imirkin: I thought haswell neither?
15:30 karolherbst: but I guess we want to rewrite most of the pass first anyway? mhhh
15:31 jekstrand: karolherbst: Also, all but two if the iris patches are upstream as of last night.
15:31 imirkin:can't remember the haswell official product names...
15:32 bnieuwenhuizen: imirkin: I can't find any 4th generation intel on the Khronos conformance page
15:32 jekstrand: bl4ckb0ne: IVB Vulkan support is very sketchy
15:32 bnieuwenhuizen: while 5th gen is on there (broadwell AFAIU)
15:32 jekstrand: And it's missing a bunch of perf features.
15:32 bl4ckb0ne: like it needs some love>
15:32 bl4ckb0ne: ?
15:32 jekstrand: HSW mostly works but it's not really 100% until Broadwell.
15:32 imirkin:will update internal database.
15:33 jekstrand: Everything left on IVB is really annoying or near impossible.
15:33 imirkin: sorry for the misinformation!
15:33 jenatali: That's more or less true for D3D12 too
15:34 bl4ckb0ne: i guess it would be enough to show a triangle
15:34 jekstrand: If all you want is a triangle, IVB can do that. It can even do more than one triangle. :D
15:35 bl4ckb0ne: perfect
15:35 bl4ckb0ne: and x230 are cheaper than x240 on ebay
15:35 jekstrand: x230s also have better keyboards
15:35 bl4ckb0ne: yep
15:36 jekstrand: I had an x240 work laptop for a while. I never did like that keyboard.
15:36 bl4ckb0ne: im in love with my x220
15:36 jekstrand: I've not been a fan of lenovo keyboards since the x230
15:36 bl4ckb0ne: but battery is failing and i feel it lacks a bit of power when it comes to having more than 2 browser tabs opened
15:37 bl4ckb0ne: also for my graphics shenanigans im stuck with opengl
15:37 jekstrand: Please don't turn into yet another IVB Vulkan bug reporter... :-P
15:37 bl4ckb0ne: iirc is the compute shader stuff that lack for sandy bridge
15:37 bl4ckb0ne: i wont, promess
15:38 bl4ckb0ne: you can slap my finger if i do
15:38 jekstrand: Yeah, IVB is a massively more competent GPU than SNB
15:38 jekstrand: It can do compute, tessellation, and even fp64 if you look at it the right way.
15:38 jekstrand: And it's like 2x faster than SNB
15:38 imirkin: jekstrand: but HSW is way better too right?
15:38 imirkin: (than IVB)
15:38 jekstrand: Yeah,
15:39 jekstrand: HSW is sort of IVB done properly
15:39 imirkin: whereas SKL ... not so much better than HSW?
15:39 jekstrand: Except border color. They royally messed that one up.
15:39 imirkin: hehe
15:39 imirkin: the most important feature of all
15:39 imirkin: at least if you go by piglit test count
15:39 jekstrand: In terms of design, SKL is a lot better than IVB. THere were a lot of things that got refined IVB -> HSW -> BDW -> SKL.
15:39 bl4ckb0ne: cant compute shaders be emulated?
15:40 imirkin: jekstrand: but in terms of "moar fps"?
15:40 karolherbst: bl4ckb0ne: on the CPU, yes
15:40 jekstrand: bl4ckb0ne: Not really, no. Not without massive pain for things like barriers and shared memory.
15:40 jekstrand: karolherbst: :-P
15:40 karolherbst: just being honest
15:40 imirkin: "compute shader" can mean a lot of things. e.g. vertex shader + xfb is a sort of poor-man's compute shader.
15:41 jekstrand: imirkin: In terms of MOAR FPS, I'm not sure. Comparing laptop GPU to laptop GPU, SKL is quite a bit faster than IVB but each individual jump isn't as big as ILK -> SNB -> IVB.
15:41 imirkin: but a conformant GL compute shader can't be "emulated" unless the hw supports it.
15:41 jekstrand: Until you hit ICL, that is. SKL -> ICL is a giant perf jump.
15:41 karolherbst: imirkin: I think it's easyer to render a rectangle and just fragment shader it :p
15:41 karolherbst: you get even nice 2d layouts
15:41 imirkin: karolherbst: but the xfb thing allows you to feed it back into another vertex shader
15:41 karolherbst: why bother with that?
15:42 jekstrand: imirkin: Oh, sure it can. You just use a vertex shader, bind an SSBO for shared memory, and implement barriers by putting the whole shader pre-barrier in a loop.
15:42 jekstrand: We do have memory read/write instructions on SNB
15:42 jekstrand: The real kicker that makes it impossible is atomics.
15:42 jekstrand: We don't have those
15:43 karolherbst: ahh
15:43 karolherbst: not even cas I guess?
15:43 imirkin: jekstrand: you're just baiting me into adding compute shader support on SNB...
15:43 jekstrand: But, also, the barrier hack I mentioned is a really bad idea.
15:43 jekstrand: imirkin: You have a SNB...
15:43 imirkin: but it won't work!
15:43 jekstrand: imirkin: It's ok. If you add Vulkan support to ANV, I *will* NAK the patches.
15:43 karolherbst: is there even a way to implement atomics without CAS?
15:44 karolherbst: :D
15:44 imirkin: hehe
15:44 karolherbst:thinks we can always play the "driven by cooperate interest" card
15:45 jekstrand: Nah, that one's entirely driven by my own personal interests. I don't want to have to pull out my SNB to address bug reports.
15:46 karolherbst: as if the truth even matters in that case :p
15:50 jekstrand: karolherbst: Where are we at on your unstructured branch?
15:50 karolherbst: jekstrand: what do you mean?
15:51 jekstrand: Oh, you were asking if I had patches.
15:51 jekstrand: I've got one that adds a few cleanups to the structurizer and another to add a copyright block.
15:51 jekstrand: Julian has acked the copyright block add
15:51 karolherbst: yeah, I already squashed them in :)
15:51 jekstrand: Cool
15:51 jekstrand: I don't have any others at the moment
15:51 karolherbst: cool
15:52 karolherbst: so right now we only have two fails in the vulkan cts
15:52 jekstrand: Have you re-pushed the branch with the force-unstructured for Vulkan? I'd be happy to look at that fail
15:52 karolherbst: but the fail sounds like something we don't want to ignore
15:52 karolherbst: jekstrand: https://mesa-ci.01.org/karolherbst/builds/157/group/63a9f0ea7bb98050796b649e85481845 :p
15:53 jekstrand: karolherbst: Agreed.
15:53 karolherbst: but yeah.. overall it looks surprisingly good
15:53 karolherbst: just the super complex shaders where it tripps over
15:54 jekstrand: karolherbst: Those are the ones we should worry about. :)
15:54 karolherbst: yeah
15:54 jekstrand: karolherbst: I suspect it's a fairly simple unreachable case
15:54 karolherbst: probably
15:55 karolherbst: no idea if I will find time to look into it today though
15:55 jekstrand: I'm going to look at it now
15:56 karolherbst: cool
15:57 jekstrand: karolherbst: We should make your hack patch just add a MESA_SPIRV_FORCE_UNSTRUCTURED environment variable.
15:57 karolherbst:thought it could be avoided, but builds Xorg or the first time
15:57 karolherbst: yeah
15:57 jekstrand: karolherbst: Sounds like a useful thing to be able to test
15:57 karolherbst: with your rework it's even quite simple to add :)
16:15 jenatali: jekstrand: So, conversions? :)
16:17 jekstrand: jenatali: sure
16:17 jenatali: Not sure if you saw any of the discussion yesterday, but high level I'm leaning towards your approach of using intrinsics with rounding/saturation modes in the constant indices
16:17 jekstrand: That would certainly work.
16:17 jenatali: Just so we don't have to have potentially-divergent C implementations for constant folding, plus nir lowering for drivers that don't support all the modes
16:18 jekstrand: Are you planning to implement constant folding for them? :-P
16:18 jenatali: My thinking was we can just have Clover (or other CL frontends...) run a lowering pass which always lowers conversions on constants
16:18 jenatali: Then the nir algorithm does the constant folding for us
16:19 jekstrand: I don't know if I think that's a brilliant idea of a terrible one.
16:19 jekstrand: *or
16:19 jenatali: :)
16:19 jenatali: Hence why I'm asking, I'm not really sure either
16:19 jekstrand: I don't think it's a merely medeocre idea. :)
16:21 karolherbst: soo.. one thing I'd accept as a good alternative: using whatever math library implementing all those conversions _without_ messing with the fpu state or in a deterministic way so we can unmess after calling into
16:21 karolherbst: but yeah...
16:21 karolherbst: not trhilled about having to maintain the C code either
16:21 jekstrand: Here's a crazy idea: Since we're requiring CLC for OpenCL with NIR anyway.... Could we implement them in C and do something similar to what we do today for fp64 for the lowering pass?
16:21 karolherbst: jekstrand: the issue is, that hw usually supports most of it
16:21 karolherbst: so we really only want it for constants
16:21 jenatali: Also, we tried libclc's implementation of the rounding, and found it didn't actually work correctly
16:22 karolherbst: jenatali: it might work on AMD :p
16:22 jekstrand: Oh, sure, I'm not saying we syould use libclc's implementation
16:22 jenatali: karolherbst: Yeah, I think that's correct
16:22 jekstrand: I'm saying we could write something in C and then compile it to SPIR-V at build-time, embed that in the driver, and use it for the lowering.
16:22 jekstrand: So we would have exactly the same code for lowering as constant folding.
16:23 jekstrand: I'm not sure how terrible of an idea that is.....
16:23 karolherbst: mhhhhhhh
16:23 jekstrand: It would introduce a build dep
16:23 karolherbst: pocl has a CLC implementation for all of that
16:23 jekstrand: We could also do it at runtime if we didn't want the build dep and just embed the source in the driver.
16:23 karolherbst: which I am sure of works
16:23 jekstrand: If we're careful, we can use a subset of C and CLC that works for both.
16:23 jekstrand: That's the idea, anyway.
16:24 jekstrand: Maybe completely crazy. I'm not sure.
16:24 jenatali: Yeah, that's a possibility
16:24 karolherbst: mhhh
16:24 jekstrand: It seems to have worked ok for fp64
16:25 jekstrand: All the more reason for ajax and airlied to poke people to package CLC. :)
16:25 karolherbst: libclc is already packaged
16:25 jekstrand: Then why did I build it the other day. :-/
16:25 jenatali: You're talking about the SPIRV-LLVM-Converter?
16:25 karolherbst: because of airlied spirv patches
16:26 jekstrand: So what exactly is libclc?
16:26 karolherbst: compiles clc into llvm for builtins and clover links against it
16:26 jenatali: Libclc is CLC implementations of the CL library functions
16:26 jekstrand: Oh, right.
16:26 karolherbst: well.. clover links programs against it
16:27 jekstrand: Right
16:27 jekstrand: So I meant spirv-llvm-converter
16:28 jekstrand: or whatever it's called
16:28 jekstrand: But we have to link Mesa against it anyway so we may as well link a very tiny executable which we can invoke at build time to build the SPIR-V for the lowering pass.
16:28 jenatali: Yeah, we could have some code that can be compiled as C or CLC, and at build time feed the CLC through clang + spirv-llvm-converter, and embed that in the driver
16:29 jekstrand: I'm just trying to figure out what the most maintainable long-term solution is
16:29 jekstrand: Because I'm guessing the nir_builder code isn't exactly pleasent.
16:29 jenatali: It's not that bad actually
16:29 karolherbst: emit spirv :p
16:30 jenatali: jekstrand: 3 functions starting at https://gitlab.freedesktop.org/kusma/mesa/-/blob/552e89df42468025d13e9b3dde3ccf90ee7ad709/src/compiler/spirv/vtn_alu.c#L419
16:30 jenatali: Ignore the fact it's in vtn, obviously it'd move to a lowering pass
16:30 jenatali: Eh, 4 functions to include saturation
16:31 jenatali: I guess my point is - the nir_builder code is already written ;)
16:31 jekstrand: Yeah
16:31 jekstrand: It does have that virtue. :)
16:34 jekstrand: If we do go the intrinsic approach, I imagine we probably want the lowering pass to have three options:
16:34 jekstrand: 1. Lower conversions of constants
16:34 jekstrand: 2. Lower conversion which can be represented as normal NIR ALU conversions
16:34 jekstrand: 3. Lower all of them.
16:34 jenatali: Yep
16:35 jenatali: Maybe split out saturation + rounding to 2 separate sets of options, since they're relatively independent, but yeah
16:37 jenatali: But yeah, the nir_builder code is conformant to the CL CTS, at least if you ignore doubles -- it's written but untested since we aren't supporting those yet
16:37 jekstrand: It doesn't look like the current constant-folding pass is really set up to do the lowering all in one go but it wouldn't be too hard to make it lower first and then fold.
16:38 jenatali: And honestly, you're never going to have a kernel that runs one of these crazy conversion functions on a constant
16:38 jekstrand: Oh, I'm sure the CTS contains some. :)
16:38 jekstrand: But, yeah
16:38 jenatali: Nope
16:38 jekstrand: No?
16:38 jekstrand: That's a bit surprising
16:39 jekstrand: Ok, intrinsics it is then
16:39 jenatali: Cool, works for me
16:39 jekstrand: This seems like the best plan we've come up with so far
16:39 jekstrand: And we can figure out a folding mechanism if we want
16:39 jekstrand: But as long as we can switch the "normal" ones to nir_alu_op, I think that'll cover the 95% case
16:39 jenatali: Last question - one intrinsic with types in the constant indices? Or different intrinsics for the various types?
16:40 jenatali: Leaning towards one
16:40 karolherbst: we might want to have different lowering though
16:40 karolherbst: like hw supports it, but requires some fixes, like denormal handling or weird stuff
16:41 karolherbst: so not the full lowering is needed
16:41 karolherbst: but I guess we can deal with that later
16:41 jenatali: karolherbst: Seems fine - just write an alternate pass and stick it in the driver?
16:41 jekstrand: jenatali: Either one or one for each of int -> float, float -> int and float -> float
16:41 karolherbst: jenatali: yeah.. I just don't know what hw supports and what not
16:41 karolherbst: so it's more of a "might be the case"
16:41 jenatali: Yeah, we can cross that bridge when we come to it
16:41 jekstrand: karolherbst: Yeah, but let's have a use-case besides the CTS first. :)
16:42 jekstrand: This is over-engineering quicksand
16:42 jenatali: jekstrand: I'm thinking one, and just stick the sized nir_alu_type in the constant indices
16:42 karolherbst: I'd be happy to just consume the conversion intrinsics in the backend...
16:42 karolherbst: but how do we deal with opt_algebraic? just teach it to handle those?
16:42 jenatali: karolherbst: Do you really want opt_algebraic to touch them?
16:43 karolherbst: yes?
16:43 karolherbst: maybe not alll
16:43 karolherbst: but we do have a few opts
16:43 karolherbst: and I am sure a subset of them are safe if you got conversion modifiers
16:43 karolherbst: and you probably also want to have sat speciic optimizations
16:43 karolherbst: like convert_sat + min/max with constants or something
16:43 jekstrand: If it's a "easy" conversion, we can switch it to a nir_alu_instr
16:44 jekstrand: But, again, let's optimize actual shaders that exist
16:44 karolherbst: yeah...
16:44 karolherbst: I just see convert_sat to be used
16:44 jenatali: Agreed - worst-case we can write non-algebraic opt passes if we need to
16:44 karolherbst: that one actually makes sense :p
16:44 karolherbst: jenatali: yeah.. right
16:44 bbrezillon: karolherbst, jenatali, jekstrand: not sure I understand why the alu approach is a bad idea. I mean if the problem is only constant folding, and we're discussing about implementing it using NIR passes, why not stick to one ALU per conv/sat/round combination and a add lowering passes for each of them
16:45 bbrezillon: (and sorry to chime in only now :-/)
16:45 jenatali: bbrezillon: My take is that we have 2 options if we want to use alu ops: Implement C constant folding, or don't, which isn't currently supported by alu ops
16:46 jenatali: And I don't really want to do either of those :)
16:46 jekstrand: bbrezillon: Mostly to avoid a combinatorial explosion of opcodes that everyone has to put in switch statements.
16:47 jenatali: Yeah, that too
16:47 jekstrand: If it was just constant folding, I would just insist that jenatali get over it. :-P
16:47 jekstrand: But it's a *lot* of opcodes
16:48 jenatali: At that point a switch would be out the window, you'd have to go by properties in the nir_op_infos
16:48 bbrezillon: in practice, assuming the lowering passes are implemented (which we'll need anyway for CL), it's just about making the lowering an opt-out
16:51 bbrezillon: well, it feels a bit weird to have a lowering pass that might be run only on constant, because the HW might support the intrinsic natively for non constants
16:52 bbrezillon: is that what we're heading too
16:52 jekstrand: Why not?
16:52 jenatali: I don't disagree, I'm just not sure using alu ops is a better option
16:52 karolherbst: bbrezillon: the idea is, we only lower converts which do have constants and leave the other alone
16:52 karolherbst: but yeah..
16:53 bbrezillon: yes, that's the asymetry I'm complaining about
16:53 karolherbst: I honestly think that converts deserver their own instruction type still :p
16:53 karolherbst: nir_instr_type_convert
16:53 jekstrand: We can make constant-folding fold using nir_builder.
16:53 jekstrand: karolherbst: That's basically what we're doing with an intrinsic but more painful
16:53 karolherbst: I know
16:54 karolherbst: jekstrand: ohhh, that would be a cool idea actually
16:54 jenatali: What would?
16:55 karolherbst: jenatali: constant folding, but nir_builder stuff instead of c code
16:55 karolherbst: ohh wait
16:55 karolherbst: it's called opt_algebraic :p
16:56 karolherbst: although maybe the syntax there would be super annoying for that kind of stuff
16:56 jenatali: Yeah, I think it would be
16:56 karolherbst: or maybe not?
16:56 bbrezillon: well, we could have an opt_convert
16:56 bbrezillon: :)
16:56 karolherbst: yeah...
16:56 jekstrand: karolherbst: It's mostly a matter of having nir_opt_constant_folding iterate more carefully
16:56 jekstrand: karolherbst: Like we do for system values where it can lower and then lower the lowered coode
16:57 karolherbst: bbrezillon: but we also have opt_algebraic stuff using conversions :/
16:57 jekstrand: In a single pass
16:57 karolherbst: yeah
16:57 bbrezillon: karolherbst: right
16:57 jenatali: karolherbst: Most of the conversions I've seen in opt_algebraic are handling no-ops: upconvert then downconvert, or downconvert then up => masking
16:58 bbrezillon: in the end, the opcode explosion is only a problem if we force drivers to support all of them
16:58 karolherbst: jenatali: my point was, with sat that changes anyway
16:58 bbrezillon: or are there other issues?
16:58 karolherbst: bbrezillon: nouveau would
16:58 karolherbst: but yeah...
16:58 bbrezillon: ok
16:58 karolherbst: I think it's just annoying because it takes so much space
16:58 karolherbst: but we could also just generate helper functions
16:58 bbrezillon: but then embedding infos in alu_infos would help
16:59 jenatali: Yeah but then we're kind of already treating them as special, in that we never expect anybody to treat the individual opcodes as opcodes, and always to treat conversions as a class of thing
16:59 jenatali: I feel like an intrinsic actually matches that mindset more closely
16:59 bbrezillon: not necessarily
16:59 bbrezillon: it's up to the driver to decide how to handle them
17:00 jenatali: Sure, but we're talking about 800 opcodes
17:00 karolherbst: just?
17:00 jenatali: Sorry, 900
17:00 karolherbst: ehh.. I thought it's like 1.5k?
17:00 jenatali: CL's test_conversions has 899 subtests, at least that's what it looks like to me
17:01 karolherbst: right... but we don't need all of those
17:01 karolherbst: source is unsized eg
17:01 jenatali: Ah, true
17:01 karolherbst: I think it's around 100 actually
17:02 bbrezillon: the fact that the opt pass might be using ALU converters when others would use the intrinsic is kind of weird too
17:03 bbrezillon: s/opt/opt_algebraic/
17:03 karolherbst: yeah.. I don't think we should have a mix of alu ops and intrinsic variants tbh
17:03 bbrezillon: mixing the 2 is likely to bring even more confusion IMHO
17:03 karolherbst: or rather.. there is no real benefit of having them as alu
17:04 karolherbst: backend code will just set the dest and source type according to whatever it has
17:04 karolherbst: but all opcodes are already treated equally
17:04 karolherbst: would potentially just remove like 500 locs from mesa :p
17:04 jekstrand: Wait, what?
17:04 jekstrand: What are we suggesting now? Getting rid of the ALU ops?
17:04 karolherbst: yeah
17:05 jekstrand: No, we want to keep those
17:05 jekstrand: very much
17:05 karolherbst: why?
17:05 jekstrand: Because nir_opt_algebraic works on them.
17:05 jekstrand: And they constant fold and all the other good ALU stuff
17:05 karolherbst: we can teach opt_algebraic to work on the intrinsics
17:05 jekstrand: And then we get all the insanity
17:05 bbrezillon: can we?
17:05 jekstrand: I thought the objective was to get what we need without piles of insanity.
17:05 karolherbst: jekstrand: well.. then we just add more alu ops later as there is a benefit of optimizing them
17:06 karolherbst: like the _sat ones where I am sure those are used
17:06 jekstrand: Sure, we can add _sat ones
17:06 jekstrand: THat's fine
17:06 jekstrand: If we need them
17:06 karolherbst: and then we find kernels using rounding modes :p
17:06 jekstrand: But I don't think you have any idea how painful it would be to teach nir_search about nir_instr_type_convert (or the relevant intrinsic)
17:06 jenatali: karolherbst: I'd hope that if a kernel is using the rounding modes, they're doing it because they need to, and the optimizations wouldn't be able to do anything about it anyway
17:06 karolherbst: every solution here is painful
17:07 jekstrand: And to maintain all the optimizations in the presence of that.
17:07 jekstrand: Sure, but there's a difference between "ouch" and "I'd rather have my leg cut off"
17:07 jenatali: Yeah, I really think just allowing optimizations to fall apart slightly when dealing with "special" conversions is fine
17:08 jekstrand: Part of what makes nir_search effective is that we have a mechanism where detecting a pattern is easy.
17:08 jekstrand: The moment you add more flags and bits and things that need to be part of that detection, it starts getting hairy fast.
17:10 jekstrand: This is the same reason it explicitly doesn't support source modifiers.
17:17 jenatali: Hm, it probably makes sense to remove the rtz/rtne variants of f2f16 (and the ru/rd ones I added) and just use the intrinsic for those too - though that'd have impact on Vulkan so maybe that can be a follow-on change
17:17 jekstrand: Yeah, that may make sense
17:35 ajax: ugggggh
17:35 jekstrand: ajax: ?
17:35 ajax: jekstrand: #3398 is making me sad
17:37 ajax: one of these days someone should port libGL the rest of the way to xcb
17:37 ajax: today is not that day
17:37 jekstrand: :-(
17:38 tzimmermann: danvet, mripard, mlankhorst, i'm doing the backmerge ATM. i'll push after test-building
17:45 anholt: jekstrand: so, looks like nir_lower_int_to_float rewrites the contents of load_consts that are ever used as an integer alu src. this ends up going badly when we rewrite 1u to 1.0 that happens to also be used as a deref_array's index.
17:46 anholt: I guess this is also @anarsoul
17:47 jekstrand: anholt: Uh... Aren't your array indices also float?
17:47 jekstrand: Also, why is it being run while there are still derefs?
17:47 jekstrand: It should be basically the last pass run or else stuff like that gets badly messed up.
17:48 anholt: because lower_locals_to_regs has to happen after from_ssa
17:48 jekstrand: Right.....
17:48 anholt: but int_to_float has to be in ssa
17:48 jekstrand: And you really want locals_to_regs and not if-ladders?
17:49 anholt: probably not?
17:49 jekstrand: anholt: This is for softpipe, isn't it?
17:49 anholt: this is for the next test of nir-to-tgsi, i915g
17:49 jekstrand: Ah
17:49 jekstrand: I doubt you want actual indirecting on registers.
17:49 anholt: feels like the other end of the spectrum, if we can do both of these we're probably good
17:49 anholt: definitely don't
17:49 jekstrand: So run lower_indirect_derefs first
17:50 jekstrand: Then, after a bit of optimization, you should have zero derefs when you get to int_to_float
17:50 anholt: lololol i915g advertising CAP_INDIRECT_TEMP_ADDR=1
17:50 jekstrand: That seems... wrong.
17:50 mareko: it's right for VS
17:51 anholt: mareko: in FS
17:51 jekstrand: I don't know much about that hardware but I suspect that's not what you want.
17:51 mareko: ok
17:51 jekstrand: mareko: Why does 915 want it for VS?
17:51 anholt: jekstrand: software vs impl
17:51 jekstrand: right
17:51 jekstrand: That can do anything. :)
17:52 imirkin: does i915 just not have vertex shaders, or does it not have any fixed function t&l at all?
17:52 jekstrand: anholt: Trying to get rid of TGSI or do you just feel like hacking on i915?
17:52 anholt: imirkin: no tnl at all
17:52 mareko: anholt: do you plan to nuke glsl_to_tgsi or is it just for lulz?
17:52 anholt: jekstrand: trying to delete glsl_to_tgsi
17:52 imirkin: gah!
17:53 karolherbst: finally! :P
17:53 anholt: >10kloc to <3.5kloc
17:53 anholt: and better code gen
17:53 karolherbst: although we should at least wait for 21.0 or have nouveau do nir for 1 or 2 releases :p
17:53 ajax: now that's a diffstat we can believe in
17:53 mareko: r300_tgsi_to_rc.c has the supported TGSI instruction set at the beginning
17:53 karolherbst: although as far as the CTS is concered nir == TGSI for nouveau
17:54 karolherbst: but I think there are more deqp regressions
17:55 mareko: svga is gonna be more complicated
17:55 anholt: mareko: in what way?
17:56 mareko: anholt: it can do GL3 or maybe even GL4
17:56 anholt: other than "ha ha you wanted to test it?"
17:56 anholt: softpipe's at 3.3 + fp64 + misc
17:56 anholt: and I'm passing more fp64 tests than master
17:56 mareko: nice
17:57 jekstrand: Of course softpipe has fp64
17:57 anholt: with some bananas tgsi instructions to do it!
17:57 mareko: svga has multiple shader model versions
17:57 jekstrand: How many software rasterizers do we need?
17:57 anholt:scalarizes fp64 then vectorizes back to vec2s.
17:58 anholt: jekstrand: deleting more of those is also on my list of "pay down mesa technical debt"
17:58 mareko: the vectorizer doesn't really vectorize if inputs are scalar
17:59 karolherbst: right.. we need to be smarter about alignments in nir
18:00 karolherbst: and maybe align inputs/outputs to 0x10 + n * 4
18:00 karolherbst: jekstrand: ^^?
18:00 karolherbst: and then it can be all vectorized, no?
18:00 jekstrand: karolherbst: I don't think that's the problem he's referring to but I'm not sure
18:01 mareko: jekstrand: the vectorizer doesn't work if loads don't return vectors already
18:01 karolherbst: well.. we do something similiar in codegen, where we merge up to vec4 loads no matter if the vars are vec4 or scalar or whatever
18:01 jekstrand: Ah
18:01 karolherbst: like you can have float uniforms/inputs
18:01 jekstrand: We've got a vectorize_io pass
18:02 karolherbst: mhh st_nir_vectorize_i
18:02 mareko: vectorize_io helps somewhat, but constructors and reswizzling probably doesn't
18:02 karolherbst: *st_nir_vectorize_io
18:02 mareko: also vectorize_io breaks other stuff for me
18:04 karolherbst: nir_lower_io_to_vector is the magic pass, no?
18:04 karolherbst: mhhh
18:04 mareko: options->vectorize_io
18:04 karolherbst: yeah.. I just checked what get called later on
18:05 karolherbst: seems like dual source blending ins't handled
18:05 karolherbst: *isn't
18:05 karolherbst: with multiple render targets
18:06 mareko: it's not a solution, because a vectorizer should be able to vectorizer anything, not just results of intrinsics used trivially
18:06 karolherbst: well...
18:06 karolherbst: I guess that's true
18:06 karolherbst: but hence my alignment idea
18:07 karolherbst: in codegen we even vectorize indirect loads as we always assume things are sanely aligned
18:07 karolherbst: more or less
18:07 karolherbst: or at least I think we do if we know it's a multiple of 2/4
18:08 mareko: alignment might matter to some backends, but not to amd
18:08 karolherbst: ahh
18:10 mareko: right now the only solution for tgsi-to-nir is to never scalarize
18:12 mlankhorst: tzimmermann: thanks
18:12 karolherbst: jekstrand: I think your merge wasn't as silent as you hoped for :p
18:38 anholt: and, after a morning of hacking, i915g is now compiling more shaders through nir-to-tgsi than without.
19:02 tzimmermann: danvet, mripard, mlankhorst, lyude, i backmerged drm-next into drm-misc-next. there was a conflict between ttm and nouveau
19:03 danvet: tzimmermann, did you double-check with the solution from sfr?
19:03 danvet: tzimmermann, maybe also send a mail to christian könig and bskeggs asking them to double check it's all ok
19:04 danvet: since it's a bit a bigger thing
19:04 tzimmermann: danvet, no, who/what is sfr? i renamed ttm_mem_res to ttm_resource in nouveau and test-built
19:04 danvet: the linux-next maintainer
19:04 danvet: he sends conflict notices to dri-devel when there's a conflict among drm trees
19:04 tzimmermann: oh, ok
19:05 danvet: he's doing nothing else than resolve conflicts all day long, so pretty good at it
19:05 tzimmermann: didn't know his callsign
19:05 jekstrand: karolherbst: Yeah....
19:05 danvet: tzimmermann, it's the email address, I don't think he's on irc
19:05 tzimmermann: that must be depressing
19:05 jekstrand: karolherbst: But Larabel talked about NEO and Level0 at the end so I'm not too worried.
19:05 jekstrand: As long as he keeps towing the company line on my behalf, I'm not worried.
19:05 tzimmermann: danvet, i also took a look at drm-tip to make sure the structure names match
19:05 tzimmermann: i'll send out that email
19:07 danvet: tzimmermann, yeah a big reason we do drm-tip with commit rights is that everyone has to resolve the conflicts they create themselves
19:07 danvet: so usually the resolution in drm-tip is good
19:07 danvet: it's indeed tedious work if that's all stuck on one maintainer
19:11 tzimmermann: ok, mail sent out
19:43 ajax: does anyone happen to know of a driver that implements EGL_NV_stream_consumer_eglimage ?
19:43 jekstrand: ajax: I imagine the Nvidia blob does
19:44 jekstrand: Though I don't even see that in our EGL xml
19:44 ajax: [ajax@haldol ~]$ strings /usr/lib64/libEGL_nvidia.so.0 | grep EGL_NV_stream_consumer
19:44 ajax: EGL_NV_stream_consumer_gltexture_yuv
19:44 ajax: [ajax@haldol ~]$
19:44 jekstrand: Then maybe not?
19:44 ajax: does not match the string "eglimage" at all, in fact
19:44 ajax: wondering if this is like some tegra-only thing
19:45 jekstrand: coule be
19:47 imirkin: you can download their arm drivers if you want to check
19:52 ajax: does not seem to be in 32.4.3
19:53 ajax: what a tease
20:10 sravn: mripard, mlankhorst: I did a dim push and got
20:10 sravn: Merging drm-misc/drm-misc-next... Applying manual fixup patch for drm-tip merge... patching file drivers/gpu/drm/nouveau/nouveau_bo.h
20:10 sravn: Reversed (or previously applied) patch detected! Assume -R? [n]
20:11 sravn: 1 out of 1 hunk ignored -- saving rejects to file drivers/gpu/drm/nouveau/nouveau_bo.h.rej
20:11 sravn: Same for several nouveau files.
20:11 sravn: I am not sure if I have left drm-tip in some broken state...
20:12 sravn: My patches added a new binding and a new bdridge driver - so no conflict material
20:12 sravn: dim status looks OK - so I hope things are good
20:24 airlied: sravn: I think it was prebroken
20:25 airlied: sravn: will try and fix it up
20:27 airlied: sravn: might be fixed now, but I'm just building tip to make sure
20:59 sravn: airlied: I did "dim status; dim update-branches; cd ../drm-tip; dim rebuild-tip" All looks good. Thanks for fixing this
21:00 sravn: I guess I was hit by some of the merging issues you and others had talked about
21:00 airlied: sravn: no worries, it even built
22:01 jekstrand: karolherbst: I've got a bunch of patches to the structurizer. I'll hand them off once I'm done.
22:01 jekstrand: karolherbst: I'm cleaning as I'm reviewing.
22:01 jekstrand: karolherbst: I've already found a couple places where hash set order matters. :-/
22:01 karolherbst: uff
22:03 karolherbst: already arrived at "how does it even work" or not that bad?
22:03 jekstrand: karolherbst: No, I'm starting to grok it, I think. It's tricky though.
23:26 jekstrand: karolherbst: I think I found the bug
23:26 jekstrand: karolherbst: Not quite sure how to fix it yet
23:43 alyssa:notices v3d's initialized_buffers is never used
23:55 jekstrand: karolherbst: Correction: I think I've roughly figured out the nature of the bug.
23:55 jekstrand: I think I need to make some helpers to dump state to figure this one out.
23:56 anarsoul: anholt: have you resolved your issue with int_to_float pass?
23:56 anarsoul: sorry for a late reply, I'm quite busy at work for past few weeks...
23:57 jekstrand: anarsoul: There was quite a bit of chatter in the backlog