00:38 nyef: So, progress: It turns out that my Mac Mini can output sound to that one panel if run on the blob driver, even if it can't under nouveau.
00:38 nyef: This also means that the problem is in nouveau, and not the HDA audio driver.
00:42 nyef: Oops. Did a "sudo shutdown" on the wrong computer.
09:20 leberus__: mupuf: hi :)! Did you have some time to check out the patches?
10:12 mupuf: leberus: not yet, but I did not forget ;)
11:23 AndChat|499956: mupuf: thanks! Hopefully it could be merged if evrything looks fine
11:23 mupuf: well, we'll make sure it is. Thanks again for doing this!
14:10 karolherbst: uhh
14:20 karolherbst: hakzsam: I think working on this input-output optimization seems to be really worth it. If I didn't do anything wrong this example should see a sginificant benefit from it: https://gist.github.com/karolherbst/1bf5cd11c94b8d22086e6025b531d4e3
14:21 karolherbst: sadly I don't know much about how the data is transfered between stages, so maybe I didn't respect something important
17:44 karolherbst: how did that happen "mov u32 $r42 $r42"
17:52 pmoreau: karolherbst: The blob also likes doing that, instead of having a nop.
17:59 Booti386: I wonder why nouveau does not expose GL 4.5 whereas all the extensions seems supported?
17:59 imirkin: we don't pass the conformance tests
17:59 imirkin: and it has been requested that we keep the version advertised to 4.3
18:00 Booti386: Oh, you pass the conformance tests for GL4.3?
18:00 imirkin: not at all
18:00 imirkin: (or s/at// if you prefer)
18:00 Booti386: Oh, ok.
18:00 imirkin: which is why it's a bit unclear to me why this matters
18:01 Booti386: Yes, it's weird...
18:01 imirkin: ultimately OpenGL (TM) (R) (SM) is owned by Khronos
18:01 imirkin: and i think the theory is that they started caring about conformance for GL 4.4 and 4.5 but not 4.3? dunno.
18:02 imirkin: i have no interest in creating annoying issues for distributions
18:04 Booti386: Yes, of course. I just wondered why. (and of course, all the well-written programs probe for extensions, and does not check the GL version, so... It doesn't make sense)
18:05 imirkin: dunno man... i just work here
18:25 karolherbst: pmoreau: well... but the shader looks differently
18:26 pmoreau: karolherbst: Weird… :-/
18:26 karolherbst: okay, maybe I should first show what I do
18:28 karolherbst: pmoreau: https://github.com/karolherbst/mesa/commit/247ff6aa7470308c5b1a5d27b3dcba76344a404c
18:29 karolherbst: this should fixup some silly things done by RA
18:29 karolherbst: like if you have "mov $r2 0x1" ... stuff ... "mov $r4 $r2" <-- mov inserted by RA for exports/stuff
18:30 karolherbst: so the immediate could be moved into the final mov and the former one deleted
18:30 imirkin: yeah, i've tried to defeat that issue
18:30 imirkin: it was painful though
18:30 karolherbst: yeah
18:30 karolherbst: I thing such a simple pass could make it less bad
18:30 karolherbst: and easy to implement
18:30 pmoreau: Since we are post RA, we are in SSA form, right?
18:30 karolherbst: no SSA
18:31 pmoreau: So, stuff could be overwriting $r2, couldn’t it?
18:31 karolherbst: let's assume my code isn't _that_ stupid :p
18:31 karolherbst: the src thing should be still correct
18:31 karolherbst: even in post RA
18:32 karolherbst: but that's not the issue I have
18:32 pmoreau: Ah, getting the instruction setting the source should return the last instruction setting src, rather than a random one in between?
18:33 karolherbst: yes
18:33 karolherbst: I think I have to add a hard check to only set immediates though
18:34 karolherbst: ....
18:35 karolherbst: the crash is gone
18:35 karolherbst: good that we talked about it
18:36 karolherbst: I don't expect the change to be much above 1% but let's see what shader-db says
18:37 karolherbst: "total instructions in shared programs : 322031 -> 321762 (-0.08%)" for alien isolation
18:38 karolherbst: but also having move movs with immediates instead of register access should help
18:39 karolherbst: imirkin: would you mind such an optimization or would you rather see RA being fixed up directly?
18:39 imirkin: erm
18:40 imirkin: what optimization if you don't mind my asking?
18:40 karolherbst: this one: https://github.com/karolherbst/mesa/commit/eeb05f4a32bbe47971e55cfae3c517375dad7e1b
18:41 imirkin: oh, that won't work
18:41 imirkin: but i like it :)
18:41 karolherbst: why wouldn't it work?
18:41 imirkin: go ahead and try running something with that "optimization"
18:41 karolherbst: besides shader-db?
18:41 imirkin: [that's more complex than glxgears]
18:41 karolherbst: alien isolation it is
18:42 imirkin: shader-db doesn't actually test that the resulting shaders work
18:42 karolherbst: true
18:42 imirkin: if i had an opt that just removed all instructions, the shader-db thing would be wildly happy :)
18:42 karolherbst: I tried my opt looping with alien isolation today... broken shaders as well
18:42 karolherbst: :D
18:43 imirkin: yeah, shader-db isn't a test :) it's a nice little thing to check after you've performed a well-considered change
18:44 karolherbst: that opt touched 267 shaders, so I guess something should break for real if there is something wrong
18:44 imirkin: well
18:44 imirkin: let me just give you a thought exercise
18:44 karolherbst: ohhhh yeah
18:44 imirkin: let's say you have code like
18:44 imirkin: int x;
18:44 karolherbst: something broke
18:44 imirkin: if () { x = 1; } else { x = 2; }
18:44 karolherbst: I think?
18:44 karolherbst: or just CPU overload
18:44 imirkin: what will that opt do
18:45 karolherbst: nothing
18:45 imirkin: why not.
18:45 imirkin: so i have like
18:45 imirkin: if () { x = 1; } else { x = 2; }; use(x);
18:45 imirkin: the use(x) is effective a phi node
18:45 imirkin: which becomes a mov
18:45 imirkin: if the two x's get alloc'd into different regs
18:45 imirkin: which they won't here, but very well might
18:47 karolherbst: I don't see why my opt will touch those
18:47 imirkin: all the conditions fit
18:47 karolherbst: uhhh ohh meh
18:48 karolherbst: I forgot to use getUniqueInsn after my fixup
18:48 imirkin: anyways, you can't really do stuff like ->getInsn() post-ra
18:48 karolherbst: I know
18:48 karolherbst: I had getUniqueInsn there
18:48 karolherbst: but I cleaned stuff
18:48 karolherbst: and forgot to fix it up
18:49 karolherbst: also, alien isolation still runs
18:50 karolherbst: imirkin: so, with s/getInsn/getUniqueInsn/ is there still something wrong?
18:50 karolherbst: or doesn't getUniqueInsn actually check for such conditions?
18:56 karolherbst: https://github.com/karolherbst/mesa/commit/c1301ac03fa3f61f917bd348fed8bcf49ee53aff
19:04 imirkin: karolherbst: iirc it just asserts
19:04 imirkin: tbh i don't remember
19:06 karolherbst: yeah, it only asserts
19:08 karolherbst: what is the best way to find out? or should I limit this pass to only within one BB?
19:15 karolherbst: if the improvement is good enough, I think I would leave it with a bb check then
19:15 karolherbst: or is there another simple enough way to figure that out?
19:18 karolherbst: "total instructions in shared programs : 4251494 -> 4248560 (-0.07%)" good enough
19:44 karolherbst: those shaders are crazy
19:44 karolherbst: I have here like one with 18 phis in one BB
19:46 karolherbst: imirkin: ohh right, there was code I wanted to show you: "r1.x = (r1.x != 0.0) ? float(1.00000000f)/r1.x : uintBitsToFloat(uint(0x70000000)) * sign(float(1.00000000f));" any opinions on that?
19:46 imirkin: beyond "wow, that's a lot of letters to represent some simple concepts"?
19:46 karolherbst: that "0x70000000" part is meh
19:47 imirkin: what float value is that? maxfloat or so?
19:47 karolherbst: just because they don't want to run into a NaN
19:47 karolherbst: not maxfloat
19:47 karolherbst: but quite near maxfloat
19:47 karolherbst: it's 1.58456325E29
19:47 imirkin: "big number"
19:47 karolherbst: yes
19:47 karolherbst: sometimes they even have "small numbers"
19:48 imirkin: anyways... is there a question in there?
19:48 karolherbst: doing stuff like 1E-30 +0.5
19:48 karolherbst: because that one little fraction is important
19:48 karolherbst: the generated code just seems to be superflous
19:48 imirkin: well, ideally sign(1.0) gets folded at least?
19:48 karolherbst: and I was thinking if we could make such things smarter in a way
19:48 imirkin: (by glsl ir)
19:49 karolherbst: that's not the important part though
19:49 karolherbst: couldn't we like handle that with one instruction if the result should be either maxFloat or 1/x?
19:49 imirkin: well, it's the logical equivalent of r1.x = 1/max(minfloat, r1.x)
19:50 imirkin: not the literal equivalent though
19:50 karolherbst: alien isolation uses something like this quite a lot
19:51 karolherbst: kept me thinking why 0x70000000
19:51 karolherbst: maybe some d3d11 sillyness?
19:51 imirkin: unlikely
19:51 imirkin: just want to avoid divisions by 0
19:51 imirkin: and want to make sure the result is ~= 1/minfloat
19:51 karolherbst: yeah I know, but why 0x70000000
19:51 imirkin: why not
19:51 karolherbst: wouldn't make 0x7f800000 more sense?
19:51 imirkin: that's infinity iirc
19:52 karolherbst: yeah
19:52 imirkin: that'll upset other math
19:52 karolherbst: I see
19:52 imirkin: nobody likes dealing with infinities
19:52 imirkin: or nan's
19:53 imirkin: or zeros, really - the romans had it right :)
19:53 karolherbst: they check against 0 whenever they do a division
19:53 karolherbst: like always
19:53 imirkin: that's my point -- if 0 weren't a thing, they wouldn't have this problem
19:53 karolherbst: sure
19:54 imirkin: (and in roman math, zero was a non-existent concept)
19:54 karolherbst: I am just wondering if the compiler could do something smarter instead
20:18 karolherbst: especially because we end up with code like this: "mul ftz f32 %r2987 %r2985 158456325028528675187087900672.000000"
20:30 pmoreau: xexaxo1: The following seems to work (to link against SPIRV-Tools and access its headers) https://phabricator.pmoreau.org/rMESA9920f06e3e561fe9eed0abbe9c995a4cbd30244c, however I am not sure whether this is the proper way or not.
20:31 pmoreau: xexaxo1: Also, I don’t think there is a way to check the version without using pkg-config, which isn’t great, as in their current release, SPV_MSG_WARNING is spelled SPV_MSG_WARNINING but is fixed in master.
20:50 karolherbst: what does insbf do?
20:51 karolherbst: and could we do something smarter than this? "insbf u32 %r1413 %r1398 0x00000707 0x00000000"
20:54 karolherbst: ohh, that is insert bitfield
20:57 imirkin: aka BFI
20:58 karolherbst: yeah, found it already
21:01 karolherbst: uhm... this makes no sense
21:01 karolherbst: insbf u32 %r1413 %r1398 0x00000707 0x00000000 + insbf u32 %r1417 %r1397 0x00000700 %r1413
21:02 karolherbst: the latter could be insbf u32 %r1417 %r1397 0x00000700 0x00000000 as well
21:06 karolherbst: imirkin: does this look correct to you? if (e & b == e): insbf(d, e, insbf(a, b, c)) == insbf(d, e, c)
21:07 imirkin: well, remember that insbf takes an arg that's the merge of two bitfieldInsert args
21:07 imirkin: look at how TGSI_OPCODE_BFI is handled.
21:10 karolherbst: uhhh... one could make things simple but no...
21:13 karolherbst: do you know what the hardware instruction does or do I have to figure that out myself now?
21:14 imirkin: sure
21:14 imirkin: exactly what bitfieldInsert does
21:14 imirkin: however the last 2 args are merged into one
21:14 imirkin: look at the constant folding impl for it
21:14 karolherbst: ohh okay
21:14 imirkin: (and they are merged into 1 using none other than insbf...)
21:15 karolherbst: ahh, that helps
21:16 imirkin: lower 8 bits is one thing, next 8 bits is another
21:16 imirkin: naturally i have no recollection which is which
21:16 karolherbst: mhh as it seems that src0 and src2 are the data inputs and src1 is the "configuration"
21:16 imirkin: yes.
21:16 imirkin: low 8 bits of src1 are the width and next 8 bits are offset? or vice-versa.
21:16 karolherbst: low is offset
21:17 imirkin: iirc the low bits from src2 are moved into the relevant bits of src0. or vice-versa :)
21:17 karolherbst: so 0x707 basically means, insert 7 bits from src2 starting at bit 0x7?
21:17 imirkin: it means take the low 7 bits of src2
21:18 imirkin: and create a new value which takes src0, and replaces bits 7..13 with those low bits from src2
21:18 karolherbst: yeah, that's what I meant
21:18 imirkin: (and i might have src0 and src2 flipped in that description. i forget.)
21:19 karolherbst: yep, looks like it
21:19 karolherbst: src0 gets shifted
21:29 karolherbst: okay, got something else though
21:33 karolherbst: imirkin: if I understood this correctly, this should be correct: https://gist.githubusercontent.com/karolherbst/8147fa83941dab03e1dd44b14798ddd3/raw/fc2d7670bcb6d474efad2da7b6f3501094bc410b/gistfile1.txt
21:33 karolherbst: ohh
21:33 karolherbst: I need to add the offset
21:33 imirkin: doubtful.
21:34 karolherbst: fixed: https://gist.githubusercontent.com/karolherbst/8147fa83941dab03e1dd44b14798ddd3/raw/2932871ae4e7523986421a9e22709aada67ba600/gistfile1.txt
21:35 imirkin: unlikely. i'd have to glance at it
21:35 imirkin: width == 0 might mean width == 32
21:35 karolherbst: how would that make any sense?
21:36 karolherbst: ohhh
21:36 karolherbst: I see
21:36 karolherbst: still, makes no sense
21:36 karolherbst: not really
21:37 karolherbst: nope, the bitmask is 0x0 with width==0
21:38 imirkin: 0x700 means offset = 0, width = 7.
21:38 imirkin: right?
21:38 karolherbst: yes
21:39 karolherbst: so if you shift the value to the right by 8 it makes no sense to do that insbf before
21:39 imirkin: so that means "take 7 low bits of r1397 and merge them with high 25 bits of r1413"
21:39 karolherbst: yeah
21:39 imirkin: and then you shr by 8, then yeah
21:39 karolherbst: yes
21:39 karolherbst: except I missunderstood how the isntruction work, but this would be the idea
21:40 karolherbst: ConstantFolding or AlgebraicOpt?
21:40 karolherbst: it sounds more like the latter, but would be easier to implement in the former
21:41 imirkin: constantfolding i think
21:56 karolherbst: there is no ConstantFolding for SHR at all anyway
22:56 karolherbst: *sigh* no change in shader-db overview
23:40 karolherbst: imirkin: would be nice if you could look at the bug fix commit for alien isolation from hakzsam
23:41 karolherbst: ohh wait, he could add it too...
23:41 karolherbst: I don't mind who puts it upstream