01:04 kisak: howdy, recently got my hands on a sandybridge/fermi optimus laptop. so far I've been happy with just running off the sandybridge and was wondering if I was getting a battery consumption penalty with the fermi idling with nouveau (vs the blob)
01:06 imirkin_: kisak: cat /sys/...../vgaswitcheroo
01:06 imirkin_: if it says DynOff, you're golden
01:06 imirkin_: if it says DynPwr, it's wasting power
01:07 kisak: cool, will check
01:08 imirkin_: cat /sys/kernel/debug/vgaswitcheroo/switch
01:10 kisak: gotta get that in my kernel real quick
01:10 imirkin_: oh heh. without that, i think it won't power off.
01:13 kisak: iirc, I read the module text a few days back, must have interprited it as a display muxer option since it referenced a relatively tight year range
01:16 imirkin_: yeah, so that was the original purpose - to switch the mux
01:17 imirkin_: but i'm FAIRLY sure that it's also used to control whether the gpu is used or not
01:17 imirkin_: or at least nouveau depends on that option? i forget
01:20 kisak: oh well, so on the new kernel and I can't get the temp off the nouveau anymore, that's a good sign (that the card is off)
01:20 karolherbst: kisak: reading as reports N/A or someting?
01:20 kisak: -0C on the system tray
01:20 karolherbst: I see.. I think it's still a bit buggy
01:21 karolherbst: but yeah, it means it can't read the real temperature
01:21 kisak: and, I've got DynOff, cool
01:21 imirkin_: so ideally it should be powered off by the system
01:21 imirkin_: at the platform level
01:22 imirkin_: i.e. no power
01:22 karolherbst: right, but that won't work for fermi GPUs
01:22 imirkin_: and then should you need to use nouveau, it will get powered on, the gpu will do its stuff, and then turn back off.
01:22 imirkin_: karolherbst: huh?
01:22 imirkin_: runpm works fine
01:22 karolherbst: right, but the GPU still has power, allthough just a little
01:22 imirkin_: reclocking - not so much. but that's different.
01:22 imirkin_: no. platform turns off the power
01:23 karolherbst: not entirely
01:23 imirkin_: it goes into D3cold
01:23 karolherbst: no
01:23 imirkin_: that's pretty much no power
01:23 karolherbst: D3cold is broken
01:23 karolherbst: you would notice
01:23 karolherbst: the best which works is D3hot
01:23 imirkin_: either way, it's controlled by the ACPI platform
01:23 imirkin_: so it does whatever it wants
01:23 karolherbst: more or less, yes
01:23 imirkin_: and we don't really control it
01:23 imirkin_: besides the "on/off" switch
01:23 karolherbst: well
01:24 kisak: DRI_PRIME=1 glxgears works and gets temp off the chipset while active. nice
01:24 karolherbst: actually for the no _PR3 stuff we do that within nouveau, but yeah, there is this _DSM on/off switch
01:24 karolherbst: *not
01:24 imirkin_: yeah, but _PR3 is way new
01:24 karolherbst: right
01:24 imirkin_: kisak: cool :)
01:24 karolherbst: and does D3cold if the firmware things it makes sense to do ;)
01:25 karolherbst: d3cold is not possible with the _DSM call and how that's implemented anyway
01:25 imirkin_: kisak: it's likely that the fermi gpu will be only mildly faster than the snb, coz no reclocking on fermi
01:25 imirkin_: however it does have GL 4.5, while SNB maxes out at 3.3
01:25 karolherbst: there was the benefit once that your desktop would stay smooth while offloading
01:25 karolherbst: but that's long gone
01:26 karolherbst: I would really like to dig into this issue at some point :/
01:26 karolherbst: in worst case it can also happen that the intel side displays slower than stuff gets rendered
01:26 kisak: indeed, this 525M isn't exactly a miracle worker to start with
01:26 karolherbst: and you get a <20 fps feeling allthough it gets rendered at 45 fps and so on :/
01:27 imirkin_: kisak: and you're likely getting it up at the lowest perf level
01:27 kisak: imirkin_: that's what I expected
01:28 karolherbst: even fully reclocked I would argue that the SNB one is faster :D
01:29 karolherbst: kisak: HD 3000?
01:29 karolherbst: or 2000?
01:29 kisak: I expect it's the GT2 variant with 12
01:30 karolherbst: CPU?
01:31 kisak: yeah, i7-2630QM -> HD 3000 (GT2)
01:31 karolherbst: *sigh*
01:31 karolherbst: that's the worst combination I've ever heard of :D
01:31 karolherbst: on paper the intel one is even faster
01:32 karolherbst: 259.2 GFLOPS (intel) vs 230.4 (NV)
01:32 karolherbst: doesn't mean much
01:32 HdkR: ow
01:32 karolherbst: but even fully reclocked I am not sure if we would be able to get even close with nouveau
01:33 karolherbst: usually the nvidia one is faster as the driver quality on windows is just that bad for intel
01:33 karolherbst: so the nvidia would be faster, even if just a little
01:34 karolherbst: mhh, benchmark say the nvidia one is around 40% faster though
01:35 HdkR: dedicated vram probably helps
01:35 karolherbst: probably
01:36 karolherbst: allthough it's still a joke on the 525m
01:36 karolherbst: well, at least 128 width
01:36 karolherbst: so that's something
01:37 karolherbst: HdkR: also, the nvidia driver is probably quite a lot better than the intel one under windows so I would say most of the perf comes from there actually
01:37 HdkR: That would help out in benchmarks on Windows yea
01:37 karolherbst: it's just quite sad why OEMs even choose to do such combinations :/
01:38 karolherbst: why even bother with the nvidia GPU
01:38 kisak: they probably put it in this model to make sure the bluray drive was usable
01:39 kisak: unless I'm terribly underestimating sandybridge's video decode
01:40 karolherbst: SNB can do VC-1
01:40 karolherbst: and H.264
01:41 karolherbst: should be enough for bluray
01:46 HdkR: I terribly overestimated this MX150's video decode prowess when I decided to install the binary blob for testing
01:46 HdkR: (its video decode capabilities don't show up in vainfo or vdpauinfo)
01:59 imirkin: HdkR: should work for vdpauinfo... or is all nvdec now?
01:59 HdkR: I think MX150 is just dumb actually
02:00 HdkR: I /think/ the expectation is that the user will use the video decode engine on the Intel side
02:00 HdkR: But there is no sane way to do that under Linux atm
02:01 imirkin: oh yeah, so some gpu's have them fused off.
02:01 imirkin: dunno about your particular one
02:02 HdkR: I'm guessing that's the case, otherwise theoretically it should just work on the blob
02:02 imirkin: which chip is this?
02:02 HdkR: I could jump in to Windows to see if it exposes video engines there but bleh
02:04 imirkin: aha - GP108... hrm, we don't have disable logic for the video decode engines, since we don't touch them =/
02:04 imirkin: not sure which bit it is.
02:05 imirkin: the other thing to look at is EXT ones with ARM people in the authors list. that's more fraught with peril though.
02:06 HdkR: Got distracted, yea. GP108
02:07 imirkin: gr, that last comment was for the other chan
02:16 imirkin: skeggsb: riddle me this... nv42... i run glretrace -w, it all renders fine. i move the cursor over the window in question, and immediately it looks like the LUT is messed up. i move the cursor off the window, LUT is back to normal. i move the cursor onto the window, it's messed up again. even if focus stays on the window (sloppy focus), as long as the cursor isn't over the window, it's all fine.
02:17 imirkin: W.T.F.
02:48 rhyskidd: pmoreau: a macbook pro 2009 (macbookpro5,3). will take a look
15:54 karolherbst: so, I hope this way I get more comments on the channel reset stuff *sigh*
15:59 imirkin: karolherbst: what's the problem with vblank?
15:59 karolherbst: imirkin: check the mesa code. Where I poll(fd) I get the vblanks as well
15:59 imirkin: karolherbst: btw, not a concrete comment on your approach, but not all kernel versions have nvif
15:59 imirkin: we don't necessarily have to support all kernels in our userspace code, but it should at least try to detect it
16:00 karolherbst: and I don't want to read() because this would mean we have to add an event dispatcher to libdrm and so on
16:00 karolherbst: would get really ugly
16:00 karolherbst: I basically wanted to have an fd I can register to listen to the "channel is dead" event and nothing else and just poll until I get something
16:00 karolherbst: and then do the magic
16:00 karolherbst: works great for reandom applications
16:00 karolherbst: glamor just insta crashes
16:00 imirkin: ;)
16:01 karolherbst: but otherwise it works great
16:01 karolherbst: application freeze X for like 2-3 seconds, then they get killed and life moves on
16:02 karolherbst: imirkin: regarding supporting all kernels. I think it's enough to support some of the stable kernels and not depend on 5.x+ just to handle dead channels correctly
16:02 imirkin: i mean you shouldn't just error out
16:02 karolherbst: if it works for 4.19 + 4.14 that would be quite helpful already (no idea where nvif was added, but 4.9 would work fine as well I imagine?)
16:02 imirkin: i dunno if you would
16:03 imirkin: but ideally i should be able to take a fresh mesa and run it on an old kernel
16:03 imirkin: and similarly with an old mesa and a new kernel
16:03 karolherbst: sure, that's something the code should be able to deal with
16:03 imirkin: exactly.
16:03 karolherbst: but I don't want users to depend on new kernels to get this issue fixed either
16:05 karolherbst: or let's say, I don't want to depend on "future features" we don't even know how those should look like now, allthough I could imagine we could add a new ioctl to nouveau just for that or something
16:05 karolherbst: but then it's hard to argue to get that backported to stable releases
16:27 john_cephalopoda: Hey. I found a freeze that I can reproduce.
16:28 john_cephalopoda: Linux 4.20.0, mesa 18.3.1, xorg-xf86-nouveau 1.0.15
16:29 john_cephalopoda: GPU is an NVIDIA Corporation GK106 [GeForce GTX 645 OEM] (rev a1)
16:31 john_cephalopoda: To reproduce it, I start the latest blender 2.80 beta, create an object (e.g. sphere) with at least 5000 vertices and switch to edit mode. The X11 session freezes.
16:33 john_cephalopoda: The kernel log says this: https://bpaste.net/show/508fe49ac79f
16:34 karolherbst: right
16:34 karolherbst: I just send out an email to solve that issue
16:35 karolherbst: john_cephalopoda: it's a super stupid issue. If you are able you can SSH into the machine and just kill blender to move on
16:35 karolherbst: but it's good to know that blender triggers it that fast
16:37 john_cephalopoda: Ah, good to know.
16:38 john_cephalopoda: Thanks!
16:47 karolherbst: john_cephalopoda: well, by fixing I mean it shouldn't freeze your desktop ;)
16:47 karolherbst: no idea why blender crashes the gpu context
16:48 john_cephalopoda: When it switches into edit mode, instead of rendering an outline it renders the mesh with its points.
16:49 imirkin_: john_cephalopoda: GTX 660?
16:49 john_cephalopoda: imirkin_: GTX 645
16:49 imirkin_: oh wait, no, you said it's a GTX 645
16:49 imirkin_: is that the mac one?
16:50 imirkin_: [i.e. is this a mac?]
16:50 john_cephalopoda: Nope, it's a DELL PC.
16:50 imirkin_: heh ok
16:50 imirkin_: well if you feel like trying random things
16:50 imirkin_: something that's helped GTX 660 owners with fairly easily triggerable ctxsw fw hangs
16:50 imirkin_: is to use the blob firmware
16:50 karolherbst: imirkin_: *sigh*, with a debug build chromium somehow chooses to use glProgramBinary with GL_PROGRAM_BINARY_FORMAT_MESA as the format, sees that there is a gl error and like all tests just fail :/
16:51 karolherbst: debug build of mesa
16:51 karolherbst: or maybe just some recent master change
16:51 imirkin_: karolherbst: hrm, i don't remember having that. or perhaps i ran without cache
16:51 karolherbst: mhhh
16:52 karolherbst: shader-cache=false
16:52 karolherbst: the GL_PROGRAM_BINARY_FORMAT_MESA thing is just for the binary NIR though
16:52 karolherbst: afaik
16:53 karolherbst: something stupid required at least one binary format
16:57 imirkin_: karolherbst: i mean MESA_GLSL_WHATEVER
16:59 karolherbst: imirkin_: sure, but I don't have shader-cache enabled in the binary at all, mhh, but maybe that's causing it?
17:00 karolherbst: indeed
17:30 pmoreau: rhyskidd: Cool! I have the same and started using it again as my main laptop. So it’s not a bug, but if you feel like adding power/clock gating on Tesla, please feel free to do so. O:-)
17:33 pmoreau: If you are interested in runpm for that laptop, I have some patches around from l1k for auto-suspend/resume. Switching GPUs at runtime (though X needs to be restarted) used to work, but I think that is no longer the case.
18:13 imirkin_: skeggsb: any clue on the LUT issue i pointed out yesterday?
18:13 imirkin_: it makes no sense... but it's reproducible.
18:18 Lyude: 349061
18:18 Lyude: oh whoops, yubikey was in the wrong terminal D:
18:22 imirkin_: haven't tried across reboots though.
18:22 imirkin_: i wonder if we overwrite something in some cases, which just happens to do it. very weird.
18:23 imirkin_: but it all goes back to normal the second i move the cursor away from the window. and breaks immediately when i move the cursor over it.
18:23 imirkin_: even without changes in window focus
18:58 Lyude: skeggsb: so, I'm definitely going to need to implement some more control over what rates we use for link training in nouveau in order to fully fix this issue I think, but I'm having trouble figuring out where the appropriate place to actually pass the desired link rate/lane count from drm down to nvkm
18:58 Lyude: it looks like we currently determine what link rate and such we want to use there instead of in the higher-level DRM portions of nouveau?
19:02 imirkin_: Lyude: we might already have this, but remember it's also important to know the max capacity of the link for things like determining bpp-ness
19:02 imirkin_: [also for hdmi]
19:16 Lyude: imirkin_: we already have that yeah :p
19:17 imirkin_: Lyude: well, right now we auto-select the max bpp
19:17 imirkin_: but we need to have that info when validating modes
19:17 Lyude: imirkin_: yeah-that's on my todo list as well
19:17 imirkin_: pretty much all the drivers need SOMETHING around selecting all the "funky" stuff
19:17 imirkin_: however there's not been a ton of exploration into what that might look like
19:17 Lyude: we need to add a mode_valid_ctx hook most likely, add some helpers for doing mode_valid checks using the pbn of the topology
19:17 Lyude: etc
19:17 imirkin_: i believe vsyrjala has done some of that for intel
19:18 imirkin_: apparently intel has some very funky requirements
19:18 Lyude: yes it does!
19:18 imirkin_: like 100 = ok, 101 = bad, 102 = ok again
19:18 Lyude: imirkin_: we also do have better atomic selection for mst in nouveau now
19:18 imirkin_: so you can't just use limits
19:19 Lyude: imirkin_: although the patches for that landed in drm-misc first, since there was a lot of multi-driver changes
19:19 imirkin_: Lyude: i'm mostly looking at this from the hdmi perspective
19:19 Lyude: ahh
19:19 imirkin_: i want to get 12bpc support in
19:19 Lyude: that's the easy perspective ;)
19:20 imirkin_: since i have a TV that supports it
19:20 Lyude: imirkin_: feel free to fly patches for it by me (cc me though so I don't miss them)
19:20 imirkin_: and perhaps YUV420, although that's less pressing now that the high-speed modes are a thing
19:20 imirkin_: Lyude: well, i'm not nearly at the "patches" phase of this exercise
19:21 imirkin_: more like thinking "damn, all this code in the intel driver looks 99.99999973% copyable"
19:21 Lyude: probably is!
19:53 Lyude: Does anyone know if nvkm has some way of getting access to the atomic states of various modesetting objects?
20:27 karolherbst: imirkin_: okay... so one of those annoying CTS crashes: nouveau_bo has a handle of 0x65660a20 and inside cli_kref_set libdrm tries a realloc(pcli->kref, 16 * bo->handle * 2) and asserts/crashes
20:28 karolherbst: well, chromium reports out of memory which might be more or less reasonable here
20:29 karolherbst: if I calculated correctly, those are 50GB
20:31 karolherbst: why is it using bo->handle though?
20:37 karolherbst: skeggsb: ^^?
20:45 imirkin_: karolherbst: i used to ask such questions ...
20:46 imirkin_: now i just roll with it :)
20:46 karolherbst: :D
20:46 imirkin_: i do remember reading over that very confusing code
20:46 karolherbst: okay sure, but that just doesn't make sense
20:46 imirkin_: and deciding that it was 100% correct
20:46 karolherbst: since when is a "handle" anything remotely like a size?
20:46 imirkin_: check how it's set
20:46 imirkin_: i don't remember now, but iirc it's all confusingly correct
20:46 karolherbst: mhh
20:46 imirkin_: it's liek the index in the kref array or something
20:46 karolherbst: okay, but still, libdrm tries to realloc 50GB
20:47 imirkin_: get more ram? :)
20:47 karolherbst: that's pcli->kref_nr
20:47 karolherbst: the inddex
20:47 karolherbst: imirkin_: that machine roles with 40GB currently :/
20:47 imirkin_: i'm just going by memory
20:47 karolherbst: weird thing is, 2.14GB are actually used
20:49 karolherbst: mhh bo->handle is used in cli_kref_fet :/
20:52 karolherbst: imirkin_: maybe it's some random memory corruption, but somehow I don't look forwared running valgrind on what needs 1 hour to run without valgrind
20:53 imirkin_: ;)
20:54 karolherbst: the most annoying part is, that chromium basically gives up and all future test fail due to the lack of a GL context :(
20:54 imirkin_: yeah, i commented about that too
20:54 imirkin_: you can deselect some problem tests first
20:55 Lyude: ugh, figuring out what drm stuff goes down to what disp engine stuff in nouveau is a pain
20:55 karolherbst: imirkin_: it appears to happen randomly though, or rather "after some time"
20:55 imirkin_: Lyude: do you have concrete questions? i may be able to answer.
20:55 Lyude: imirkin_: yes
20:55 imirkin_: [but no promises]
20:55 imirkin_: karolherbst: yeah ... i noticed that there was a group of tests where it happened deterministically
20:55 Lyude: imirkin_: I'm trying to figure out where exactly I should add something so that we can actually communicate an explicit link rate/lane count to nvkm for DP displays
20:56 karolherbst: imirkin_: maybe I should throw in some poison into the nouveau_bo object?
20:56 karolherbst: and just assert if _anything_ happens to it
20:56 imirkin_: Lyude: so to clarify, you want to say "i want 4 links @162, make it happen"?
20:56 Lyude: It looks like nvkm just kind of comes up with it's own DP link training parameters
20:56 Lyude: imirkin_: yes
20:56 karolherbst: or just hash over all values
20:56 karolherbst: and just always in libdrm debug builds
20:56 karolherbst: *check
20:56 Lyude: and I want to be able to pass it down from nv50disp/disp.c
20:56 imirkin_: there's a dp.c somewhere which controls some but not all of this
20:57 Lyude: yeah that's the part that I found where it looks like nouveau comes up with the link rate parameters
20:57 imirkin_: right
20:57 imirkin_: no huge reason not to move it out
20:57 imirkin_: except there may be some bios init reasons
20:57 karolherbst: I think I will just xor all values of nouveau_bo together or something... there shouldn't be anything touching those except libdrm....
20:58 Lyude: imirkin_: I'm having trouble figuring out how we would even do that though, the way that this is supposed to map to an actual drm object of some sort is extremely unclear
20:58 karolherbst: or... I don't know
20:59 Lyude: it seems like there's endless layers of abstraction between "what is the atomic state" and "what is the actual hardware state"
20:59 imirkin_: Lyude: so ... this happens at the "or" level, i believe
21:00 imirkin_: Lyude: i think the moe is to be able to pass more parameters to acquire or whatever
21:00 imirkin_: move*
21:01 imirkin_: so that instead of guessing, it just takes what you told it
21:01 imirkin_: but ... obviously skeggsb knows a lot more about this, and has a clearer picture of it all
21:03 Lyude: mhm, they said they had plans to do what I'm trying to do now but haven't heard anything else since
21:03 Lyude: iirc I thought they had said that we shouldn't be adding stuff to nvif, but I don't see any other way to actually make the link train parameters explicit
21:03 karolherbst: mhh, the handle comes out of drm_gem_handle_create
21:04 karolherbst: ohhhh... I have a bad feeling about it
21:05 Lyude: we should have better documentation for how all of this disp stuff works, like, pretty badly :s
21:05 imirkin_: not just disp
21:05 imirkin_: i did try to, early on in my nouveau career, to write up how everything worked
21:06 imirkin_: however as i was doing that, i realized there were giant holes in my understanding
21:06 imirkin_: and then nouveau was rewritten 3-4 more times since then
21:07 Lyude: also, because it's always kind of unclear and confusing ;_; even though I feel like I've asked this before, is nvif actually a hardware API that the GPU defines?
21:07 imirkin_: not in the least.
21:08 Lyude: alright then in the lt training stuff goes
21:08 Lyude: because i see no better way of doing this :p
21:08 imirkin_: it's a purely software api, which the "core" presents to both the linux glue, as well as, optionally, to userspace
21:08 Lyude: awesome
21:08 imirkin_: in theory you could take that core pretty much unmodified to another OS
21:08 imirkin_: and just rewrite the glue to plug into whatever they wanted
21:08 Lyude: do we need to increment the API version if I add something btw?
21:08 imirkin_: in practice, the glue is kinda important, since userspace interacts with the glue :)
21:09 imirkin_: rarely.
21:09 imirkin_: but in theory it supports versioned api's
21:09 imirkin_: obviously if userspace uses that particular nvif, you have to rev it
21:09 karolherbst: imirkin_: uff...
21:09 karolherbst: yes, the handle can be _any_ value indeed
21:09 karolherbst: :(
21:09 Lyude: imirkin_: good thing I'm fairly sure all of this evo nvif stuff is just used by the kernel
21:09 karolherbst: it basically comes from an idr_alloc with unlimited range
21:10 imirkin_: Lyude: yeah, i think that's right. i think ben has rev'd a few for other reasons too, not 100% sure though.
21:10 karolherbst: allthough it's still confusing why that value is that high
21:10 imirkin_: i'd start by not rev'ing, and if he says to do it, then do it :)
21:10 Lyude: cool
21:10 imirkin_: karolherbst: idr_alloc allocates small numbers first. if it's a huge number, you have a ton of handles
21:11 karolherbst: yeah... I will check in libdrm what numbers I get
21:11 imirkin_: and i think that array is meant to store all the handles anyways
21:18 karolherbst: how can I enable stderr logging in chromium :/
21:19 karolherbst: that sandboxing thing is quite annoying for debugging
21:22 imirkin_: --no-sandbox?
21:22 imirkin_: use my launcher thing
21:24 karolherbst: mhh, that's not it :/
21:25 karolherbst: it doesn't use my libdrm
21:26 imirkin_: you have to do the LD_LIBRARY_PATH in the thing
21:26 karolherbst: I know
21:26 imirkin_: not in the global en
21:26 imirkin_: k
21:27 karolherbst: duh ... I wasn't using my local mesa right now
21:27 karolherbst: now it works
21:30 karolherbst: ufff, hopefully it wasn't one of the mt issues and I ran without my fixes
21:34 karolherbst: yay, the gles2 deqp tests: Failed: 2/14192 (0.0%)
21:34 karolherbst: dEQP-GLES2.functional.shaders.invariance.mediump.loop_3 and dEQP-GLES2.functional.rasterization.limits.points
21:35 karolherbst: " Fail (Detected variance between two invariant values)" *sigh*
21:36 karolherbst: that should be easy to fix
21:37 karolherbst: passes with OPTIMIZE=0, okay
21:37 karolherbst: nice
21:43 karolherbst: odd, it's just MUL/ADDs and there is no MAD in the shader
21:44 imirkin_: probably doesn't get marked as exact? i forget if we piped exact into tgsi or not?
21:44 karolherbst: it does
21:44 karolherbst: I fixed all that for tomb raider two years ago :p
21:44 karolherbst: and it still seems to get piped through
21:44 imirkin_: yeah, i remember you doing some fixes
21:45 imirkin_: but i just don't remember what they were :)
21:45 karolherbst: add _PRECISE to TGSI
21:45 karolherbst: and check that in codegen
21:45 imirkin_: and hopefully we avoid fusing mul+add in that case
21:45 karolherbst: exactly
21:45 imirkin_: not all our opts are exact
21:45 karolherbst: okay
21:45 imirkin_: in a floating point sense
21:45 karolherbst: it's constant folding
21:46 karolherbst: without constant folding it passes the test
21:46 imirkin_: doesn't necessarily mean it's constant folding
21:46 karolherbst: true
21:46 imirkin_: could be that constant folding exposes further opts :)
21:46 karolherbst: maybe
21:46 imirkin_: (that's happened to me before)
21:47 karolherbst: it's still constant folding :p
21:47 imirkin_: hehe ok
21:48 karolherbst: okay.. so what exactly in constant folding
21:56 karolherbst: imirkin_: 0x3a83126f * 0x3ff33333 might be the cause
21:56 imirkin_: float or int?
21:57 karolherbst: float
21:57 karolherbst: but...
21:57 imirkin_: hm. 0.001 * 1.9
21:58 imirkin_: do we get 0x3af9096c ?
21:58 karolherbst: of course codegen prints 0.001900 for the mov :)
21:58 imirkin_: heh
21:58 karolherbst: let me disassemble
21:59 karolherbst: yeah, we get 0x3af9096c
21:59 imirkin_: hm
22:00 karolherbst: the other folds are 0 + constant
22:00 karolherbst: very unlikely to cause issues
22:00 imirkin_: yeah dunno
22:00 karolherbst: or rather 0 + 1 + 1 + ... + 1 chains
22:01 imirkin_: if you show me the shader, i might be able to tell (the glsl)
22:02 karolherbst: mhh, maybe it's the other shader...
22:02 imirkin_: i'm guessing it's not actually 0
22:02 imirkin_: and a + b + c != b + a + c
22:02 imirkin_: so one has to be careful.
22:02 imirkin_: but i think we are :)
22:02 imirkin_: [careful]
22:03 karolherbst: imirkin_: dumped shader_test files: https://gist.github.com/karolherbst/1401d2c991588a3b881dde8a9050f2c4
22:06 imirkin_: huh.
22:06 imirkin_: yeah, i think it's the (a + b) + c != a + (b + c) problem.
22:07 karolherbst: yeah.. looks like it a little
22:08 imirkin_: the 1.9 * 0.001 thing is actually ok
22:08 imirkin_: it's all the other stuff that's not.
22:09 karolherbst: IRs: https://gist.github.com/karolherbst/dfcb34c2e7c35c67c7da46f8c66bc4da
22:09 karolherbst: there is this one add with a rewritten source
22:11 imirkin_: i don't get it.
22:11 imirkin_: mov f32 %r709 7.000000 (0)
22:11 imirkin_: that's not used anywhere.
22:11 karolherbst: it is
22:11 karolherbst: in a future constant folding
22:11 karolherbst: mov f32 %r749 8.000000
22:11 karolherbst: was %709 + 1.0
22:11 imirkin_: but the value 709 is never used
22:12 imirkin_: or is this from the middle, pre-dce?
22:12 karolherbst: it was, but the use was constant folded ;)
22:12 karolherbst: add ftz f32 %r749 %r709 %r446 (0) -> mov f32 %r749 8.000000 (0)
22:12 imirkin_: 749 is also never used.
22:12 karolherbst: some thing lower with 9.0
22:12 karolherbst: constant folding is the last pass ran
22:12 imirkin_: neither is 909.
22:13 karolherbst: mov f32 %r946 13.000000 is the last value
22:13 imirkin_: oh. so it is.
22:18 karolherbst: imirkin_: yeah.. so nothing really stands out despite that "add ftz f32 %r940 %r926 %r456" -> "add ftz f32 %r940 %r915 %r456"
22:20 karolherbst: ohhh
22:20 karolherbst: that "mul ftz f32 %r926 %r915 %r448" mul gets folded in, interesting
22:21 karolherbst: + a rewrite of 915 to fold the constant "1.9 * 0.001" in
22:27 karolherbst: add(mul(mul(b, 1.9), 0.001), a) -> add(mul(b, 0.001900), a)
22:27 karolherbst: imirkin_: mul(mul(b, 1.9), 0.001) != mul(b, 0.001900) ?
22:27 karolherbst: makes sense, no?
22:29 imirkin_: comment out the mul folding
22:29 imirkin_: see if it helps
22:29 imirkin_: except for crazy values of b, that shouldn't matter though
22:29 karolherbst: Passed: 1/1 (100.0%)
22:30 Lyude: skeggsb: mind giving me a poke when you get a chance? I would still like your opinion on the link training stuff for nouveua
22:30 Lyude: *nouveau
22:30 karolherbst: imirkin_: I guess we can't do that tryCollapseChainedMULs thing if the instruction is precise
22:30 imirkin_: i guess
22:31 karolherbst: still passes with all opts enabled now
22:33 karolherbst: imirkin_: https://github.com/karolherbst/mesa/commit/aac689bda9389faa2b074ed71d603bd23eeb9c49
22:33 karolherbst: allthough... are we able to collapse if only the source is marked precise?
22:33 karolherbst: should we care as long as nothing complains?