00:28pmoreau: imirkin_: I *think* I found the issue with the u8 stores: the u8 Value has an id of -1, so it ends up overwriting part of the imm offset, address reg and store object reg.
00:29pmoreau: IIRC, it is expected to get an id of -1 for <4 bytes regs due to how the formulas work in RA.
00:38imirkin: karolherbst: i'll take a look when i've had less beer in my system
00:38imirkin: same goes for pmoreau
00:40pmoreau: imirkin: No worries! I’ll have a look as well, when I have more sleep in my system. ;-) I’d like to get more familiar with RA, and codegen in general. I don’t want to dump everything on you, especially as you don’t have that much time.
01:00imirkin: pmoreau: id = -1 means "register not allocated" btw
01:00imirkin: pmoreau: it's the flip between "fake reg" (i.e. %rxx) and "real reg" (i.e. $rxx)
07:56yusukesuzuki: imirkin_: i was working on this things, but i cannot achieve what we want... somewhat context switching
12:53karolherbst: uhu tegra patches
12:57pmoreau: Well, there isn’t too much in them, but nice to see some patches again from NVIDIA.
13:00RSpliet: Ahh, good to hear from Mikko though!
13:19karolherbst: RSpliet: I just remember something, maybe if Schottland leaves, con would have the majority back! awesome plot twist
13:20karolherbst: or north irland, could be fun
13:49RSpliet: Off-topic, but con teamed with NI unionists to form a majority. No such plot-twists are on the agenda
13:50RSpliet: The most-likely M. Night Shyamalanesque scenario is a BoJo powergrab if May loses trust from her own backbenches
13:52karolherbst: RSpliet: actually I expected this to happen, even without knowing anything about those parties
16:31Lyude: BTW karolherbst did you ever get those patches for fixing the power sensor on that gtx 560 I've got?
16:32Lyude: Sorry, did you get them upstream?
16:35karolherbst: Lyude: I need to check
16:35karolherbst: can't right now
16:37Lyude: Ah, gotcha
16:38Lyude: Just noticed my card wasn't reporting power usage without applying the patch by hand is all
16:53karolherbst: Lyude: it seems they got merged into bens tree
17:12Lyude: karolherbst: gotcha
17:19karolherbst: Lyude: are you on 4.11?
17:19karolherbst: Lyude: pro tip: use my master_4.11 branch ;)
17:20karolherbst: it's out of tree though, but I usually rebase skeggsb/master on the recent kernel release and make it work there
17:23Lyude: karolherbst: 4.12 rc, but I've also still got the patch you gave me a while back that fixed it
17:24karolherbst: not sure if it got into 4.12, I think it will land for 4.13
17:24karolherbst: but I won't do any nouveau stuff today, cause I got my new lens for my camera :)
17:25karolherbst: yep, the fix comes with 4.13
18:08karolherbst: or I think I go to fix that compiler issue today
18:45karolherbst: imirkin_: I suppose you didn't find any time to look at the instructions? Because I don't see a reason why it breaks, assuming RA and the emiter is fine
18:47swedave: Hello . I have a nvidia 970 gtx . How can i manage to run wine games with better fps ?
18:47karolherbst: swedave: you need to kick nvidias ass to release PMU firmware images
18:48karolherbst: or somebody of us to find a way to extract those from the nvidia driver
18:50swedave: Ok . ;(
18:50karolherbst: we already have the code to reclock those GPUs, we just can't control the fan
18:50karolherbst: because for that you need signed firmware
18:52swedave: so in otherwords i can get the nvidia gpu to run faster but go to hot then ? Excuse me for my bad english , its not my native language
19:44karolherbst: imirkin: it looks most likely like a precision thing... I just muliplied a and b with 0.5 in the fmad and then the flicker disappears, but it looks kind of wrong, but this is to be expected I guess
19:46imirkin_: karolherbst: does the shader have any mention of "precise" or any sort of thing like that?
19:46imirkin_: (or "invariant")
19:46karolherbst: precise vec4 r0, r1;
19:46imirkin_: so that's why
19:47imirkin_: we don't support 'precise' properly
19:47imirkin_: are the mul/add in question assignments to r0/r1?
19:47karolherbst: we could turn off this opt when there is just one precise in the shader
19:47karolherbst: looks like it
19:47imirkin_: precise is a lot more than that
19:47imirkin_: like a + b + c != c + b + a
19:48karolherbst: maybe that's why the rendered result was so much off
19:48karolherbst: but we could at least start with the bug we encountered regarding that
19:48karolherbst: if we find more bugs related to that: fine
19:48karolherbst: but fixing the one we've found has priority I guess
19:48karolherbst: r0.x = fma(idx_uniforms2_vs.cb2.x, idx_uniforms4_vs.cb4.w, r0.x);
19:49karolherbst: imirkin_: I guess we need to populate which SSA values are taged as precise after the tgsi -> nv50 conversion?
19:49karolherbst: and then check in the passes against it
19:50imirkin_: well, that inforatino isn't in the tgsi
19:50imirkin_: so it's a lot more work than that :)
19:50karolherbst: well, somebody has to do this anyhow at some point
19:51karolherbst: so I just concentrate on fixing this first and then we look how it goes
19:52karolherbst: why isn't it inside the TGSI though?
19:52karolherbst: I can't imagine a better place to put it
19:54imirkin_: ... because no one has piped it through?
19:54karolherbst: does any driver make use of it at all?
19:54imirkin_: [and because it's not extremely clear how to do it]
19:54karolherbst: so AMD should have the same bug or they are less aggressive with optimisations
19:54imirkin_: there's something in the glsl -> nir converter that they did to hack it, dunno if they're happy with the hack
19:54imirkin_: or they could get lucky
19:54karolherbst: or that
19:55karolherbst: I'll ask in dri-devel, maybe somebody was working on it already
20:29karolherbst: imirkin_: I think the simpliest way would be a global "contains precise stuff" flag, but I guess we should do it right from the start instead
20:53karolherbst: imirkin_: glennk just pointed out, that maybe we can deal with it by just setting the proper rounding mode for mad instructions in such cases. Would be worth it to figure those out with such a good example
21:02imirkin_: erm... the rounding mode should be "rn"
21:02imirkin_: aka "round nearest"
21:02imirkin_: which is the same as the rounding mode on the mul
21:02imirkin_: glsl doesn't really talk about rounding modes for floating point ops
21:04karolherbst: okay, rounding mode N seems to be the default
21:04karolherbst: but I think he actually refered to a flag for internal rounding of the fma instruction
21:04karolherbst: no idea if there is such
21:04glennk: imirkin_, the intermediate rounding between the mul and add, vs no rounding in fma
21:04imirkin_: well, "contains precise stuff" would have to turn off like 50% of the opts, including CSE and others
21:04imirkin_: glennk: that's not controllable on nvidia
21:04imirkin_: glennk: i.e. there's no fma vs muladd op
21:05glennk: right, figured as such
21:06karolherbst: imirkin_: well as I said: we could only disable those opts for now, where we know they do harm
21:06karolherbst: imirkin_: and maybe add an env variable NV50_AGGRESSIVE_PRECISE
21:06karolherbst: or so
21:06imirkin_: karolherbst: CSE does harm ;)
21:06karolherbst: well sure, but we don't know of anything broken by this, that's what I meant
21:07karolherbst: sure we could just disable like everything, but that's not the point really as well
21:07karolherbst: I wouldn't mind doing it in a proper way from the begining, but this will just take me a while, cause I never ever looked into that tgsi stuff at all
21:18karolherbst: imirkin_: but it seems like that precise is most of the time used for nvidia GPUs regarding that mad thing ;)
21:35karolherbst: robclark: is there something like muladd vs fma/mad on adreno hardware?
21:42imirkin_: i believe r600 has both muladd and fma
21:43imirkin_: and nv50's float mad is muladd, not fma
21:44karolherbst: so it only affects nvc0+
22:36robclark: karolherbst, adreno (ir3) just has mad.. for open-cl (on a3xx at least) unless you give flags for reduced precision it will use separate mul+add instructions..
22:37robclark: then again, I guess they probably prioritize gl/vk perf which doesn't care about fma for mad
22:37imirkin_: you mean increased precision?
22:38robclark: umm, I'm pretty sure it was some compiler flag to say "go fast at expense of precision" but don't remember the name of the flag offhand
22:38robclark: (that *was* a bunch of years ago so memory a bit hazy)
22:38imirkin_: ok. well fma is meant to be more precise than mul+add.
22:39imirkin_: since it does a fused add, with an extra bit of precision
22:42karolherbst: robclark: I am wondering, because we just figure out a mul+add=>fma conversion breaking things
22:42karolherbst: and I was wondering how this may affect freedreno
22:43robclark: imirkin, ok, right, so mad (what ir3 has) is the less precise version
22:44robclark: karolherbst, for nir, I would assume there is a flag to enable that optimization (or if there isn't it should be quite easy to add to nir_compiler_options
22:44karolherbst: ohhh right, freedreno uses nir... totally forgot about it
22:44imirkin_: the issue isn't mul+add -> fma
22:44imirkin_: the issue is lack of precise handling.
22:44karolherbst: yeah, i965 isn't affected at least
22:45imirkin_: which says that 2 expressions should be computed the same way
22:45imirkin_: so the issue is that in one place we end up doing mul+add->fma, and in another we don't
22:45imirkin_: and so calculations come out different
22:45imirkin_: when they should be the same
22:45karolherbst: and we have no means to identify those places I assume
22:46imirkin_: it's not piped through to tgsi
22:46imirkin_: and even the GLSL spec isn't extremely clear as to what 'precise' means, precisely.
22:47karolherbst: well, browsing through google results seems to indicate, that a lot use presice for that fma case
22:47karolherbst: or is used as an example most of the time
22:48karolherbst: imirkin_: I found a way to fix it, guess how
22:50imirkin_: that's a slightly questionable thing we do there with the refcount
22:50imirkin_: which is not at all a legit thing to look at there
22:50imirkin_: coz there's no DCE, no other opts, etc
22:50karolherbst: so for the sake of fixing it the easy way, we can just remove those refcount checks?
22:51karolherbst: or would you prefer a proper solution based on the precise modifier?
22:52karolherbst: well it kind of makes sense that the refcount thing introduces such issues, cause it might indeed change computations across stages or between different shaders or whatever
22:53karolherbst: so what's the kind of fix you would prefer then?
22:53imirkin_: figuring out how to handle precise properly.
22:54karolherbst: well sure, I would plan to do that anyway
22:55karolherbst: the question is rather if that refcount thing is doing more harm or more good generally
22:55imirkin_: more good.
22:55karolherbst: even while risking non matching calculations and risking "tears" on borders and such alike?
22:56imirkin_: yeah, that's not really a thing in the vast majority of games.
22:56imirkin_: this is the first i've heard of it actually mattering
22:56karolherbst: me as well
22:56imirkin_: if you get rid of the refcount, it may end up increasing the operation count of a LOT of shaders
22:56imirkin_: check what happens with shader-db... i'm curious
22:56karolherbst: I am aware
22:57karolherbst: yeah, why not actually
22:57imirkin_: perhaps it doesn't play out that way, who knows
22:58karolherbst: I doubt it's more than 1% increase of instructions though, but 1% is still quite a lot
23:13karolherbst: imirkin_: mhhh instruction count doesn't matter
23:13karolherbst: gprs count changed though
23:13imirkin_: "doesn't matter"?
23:14imirkin_: and local usage went up too. makes sense.
23:27karolherbst: I've added a trello card