00:00imirkin: airlied: ok, looks like his stuff is largely related to (surprise surprise) HDR. that's about 5 steps ahead of me.
00:02imirkin: i'm just trying to get things up and running. worrying about colorspaces / whitepoints is ... a lot fancier.
01:09Lyude: maxwell2 starts at nv120 right
01:11skeggsb: Lyude: yeah
01:33imirkin: wow, impressive patch on-list for a first contribution
03:08Lyude: imirkin: is that re: clockgating?
03:08imirkin: no, re some guy messing around with the disp sor assignment internals
03:08Lyude: ahh, ok
03:08imirkin: it'll ultimately probably not go upstream, but it shows a higher-than-usual level of competence :)
05:58Benau: imirkin will you have the interests to look at this trace (recorded in nouveau gt240)
05:58Benau: seems that texture buffer object is not properly supported
08:59pmoreau: Lyude: Which branch to use on your GitHub for the latest clock gating version? I have been using wip/kepler+-clockgating-v1r5, but I don’t get anything printed saying clock gating is enabled, and that branch does not contain the 5th patch you were talking about earlier (that would enable the kernel param)
09:07pmoreau: Lyude: Nevermind, I switched to the series on the ML instead.
11:49imirkin: Benau: file a bug so i don't forget. that does seem odd. seems to work OK with a kepler board, so probably a nv50-era-specific bug
11:49imirkin: Benau: do you see any errors in dmesg perchance?
11:49imirkin: (like invalid opcode, or who knows)
11:49imirkin: of course you report this literally the day after i unplug my G92...
11:49imirkin: generically speaking, TBO's should work btw
11:50imirkin: so it's more that there's something odd going on here
11:50imirkin: than TBO's not working at all
11:52imirkin: my *guess*, without really doing any serious analysis, is that we're missing a flush
11:54Benau: well no shader compile error
11:55imirkin: Benau: are you able to apply a quick patch?
11:55imirkin: (to mesa)
11:55Benau: yes surd
11:56imirkin: actually ... hm. i don't even know what i'd flush.
11:56imirkin: TIC/TSC cache is still valid...
11:57imirkin: can you just replay the trace with ST_DEBUG=flush ?
11:57imirkin: i.e. ST_DEBUG=flush glretrace stk.trace
11:57imirkin: or wait. not ST_DEBUg. MESA_DEBUG
11:58imirkin: only works in a debug build though
12:00Benau: after 4 hours till i'm back home
12:01Benau: don't worry i have all the tools to build a debug mrsa
12:02Benau: and btw i have get rid of all indirect draw call in stk, so you can remove some related hacks
12:02Benau: (if sny
12:02imirkin: robclark will be happy
12:03imirkin: indirect draws should work fine on fermi+ though, and plain unsupported on tesla
12:04Benau: and i see u can close the related bug about stk-indirect hangs
12:04imirkin: those were fixed iirc
12:04imirkin: i was not aware of any outstanding issues with stk until just now
12:05imirkin: i did know that it did a bunch of *weird* stuff which tripped up freedreno
12:05imirkin: and there was a time when indeed it caused nouveau some grief, but that was like 2y ago
12:06imirkin: Benau: if you might also grab piglit, perhaps we just regressed TBO support. run some of the ARB_texture_buffer_object ( / ARB_texture_buffer_ranges) tests
12:06Benau: ok can do later
12:06imirkin: we have nothing in terms of CI, so it's all just manual testing, and older boards get less of it
12:07imirkin: (read: none)
13:58karolherbst: imirkin: the current gm107 emiter doesn't seem to like surface operations on MS images, but is something like "STORE IMAGE, TEMP.xyxw, TEMP, 2D_MSAA" legal in theory and we just have to work around something missing on the hardware or whatever?
13:58karolherbst: it doesn't seem like that the PTX sust allows ms surfaces anyway
14:04imirkin_: karolherbst: we don't expose MSAA images on maxwell+
14:04imirkin_: it's legal in theory, we just don't support it
14:04karolherbst: imirkin_: uhh, mhh the CTS still uses those
14:04imirkin_: CTS bug
14:04karolherbst: I see
14:04imirkin_: or driver bug in exposing the wrong limits
14:05imirkin_: is this CL or GL?
14:07imirkin_: k. i know nothing about CL limits, so for all i know it's required there
14:07imirkin_: hmmm ... i wonder if it does something sneaky. like bind a 1x-msaa image to a image2DMS or something.
14:07karolherbst: I am sure that the CTS doesn't check for anything
14:07karolherbst: not really I think
14:08karolherbst: not quite sure though
14:09karolherbst: it does have a max_image_samples though
14:09imirkin_: so ideally max_image_samples == 1 for us.
14:09karolherbst: CTS expects 0
14:09imirkin_: (for maxwell+)
14:09imirkin_: oh. maybe it's supposed to be 0
14:09karolherbst: then it switches to 2D
14:09imirkin_: rtfs :)
14:10imirkin_: min value for MAX_IMAGE_SAMPLES == 0, so yeah, it's probably right
14:11imirkin_: change that to sample_count > 0
14:12imirkin_: and i sent a patch for the enhacned layouts fail
14:12karolherbst: yeah, I saw
14:12karolherbst: will test it later
14:12imirkin_: airlied was running into it on r600, figured i'd fix it on nvc0 :)
14:12imirkin_: but it actually needs testing on fermi
14:12imirkin_: since iirc fermi works slightly differently
14:12karolherbst: I was thinking about how to deal with the fp64 stuff, because I don't look forward having 5+ asm files for that :/
14:12imirkin_: could write the function with nv50 ir
14:13karolherbst: CFG will kill us here though
14:13karolherbst: but if we are careful enough, this might work
14:13imirkin_: like an actual Function
14:13karolherbst: and embeded it whenever it is needed?
14:13imirkin_: no, called
14:13imirkin_: same as before. out-of-line.
14:13karolherbst: yeah right, but I meant the function itself
14:13imirkin_: oh yes.
14:14imirkin_: note that i've never tested this
14:14imirkin_: so ... start with a small function which just returns the orig value
14:14imirkin_: or something
14:14imirkin_: before you go to a lot of trouble
14:14karolherbst: or we wait until robclar finishes the fp64 emulation stuff and we just ask glsl to do that for us :p
14:15imirkin_: iirc we do better than glsl could
14:15karolherbst: ohh wait
14:15karolherbst: airlied wanted to do that
14:15imirkin_: i mean, more optimally
14:15karolherbst: yeah.... sure
14:15imirkin_: e.g. rsq64h
14:15karolherbst: but this seems to be not enough for the CTS afaik
14:15imirkin_: right, but ...
14:15imirkin_: the impl that DOES work still makes use of those
14:15imirkin_: as a better first guess.
14:16karolherbst: I see
14:16imirkin_: dboyan tested his impl extensively, and was within like 1 or 2 ULP's of the CPU result
14:16karolherbst: anyway, need to get ready for the plane
14:16imirkin_: kk. safe travels!
14:16karolherbst: I will think that through and maybe I come up with a good enough result
16:50karolherbst: got my devconf feedback today, seems like people were happy enough about it :)
16:55karolherbst: imirkin: uhm.. my tested-by is for Pascal...
16:56imirkin_: for what?
16:56karolherbst: "nvc0: collapse output slots to have adjacent registers"
16:57imirkin_: ok cool
16:57imirkin_: it worked on kepler too
16:57imirkin_: just need someone to test on fermi
16:57imirkin_: since it definitely does things different than kepler
16:57karolherbst: I see
16:57karolherbst: RSpliet might be able to test it?
16:57imirkin_: i never QUITE figured out how different
16:58imirkin_: see commit 39df725f731f75f488c75a4910169beb352213fb
16:58imirkin_: (that was a fun one to track down...)
16:58RSpliet: There is a gf119 in my desktop machine currently. Low on time though...
16:59karolherbst: imirkin_: ... meh
16:59RSpliet: Something something getting a Phd on track and maintaining sanity rah rah rah
17:00imirkin_: still at cambridge?
17:00karolherbst: RSpliet: give up on sanity, it is only useless most of the time anyway
17:00RSpliet: imirkin_: yep.
17:00RSpliet: karolherbst: "If you want to be number one, you need to be odd"
17:01imirkin_: or, to quote a horrible 80's movie, "If you want to be the best, you must lose your mind"
17:01imirkin_: (not actually an 80's movie, more like 80's-style-movie)
17:01karolherbst: RSpliet: At first I thought it was a random quote, but coming from you, there had to be a joke... which I just got a few seconds later
17:02RSpliet: imirkin_: Quite related to that horrible 90's quote "Of all the things I've lost, I miss my mind the most"
17:02imirkin_: that's a good one
17:06Benau: imirkin MESA_DEBUG=flush glretrace stk.trace doesn't seem to work...
17:08imirkin_: as in ... not at all, or just still misrendered?
17:09imirkin_: also, can you describe the misrender?
17:09imirkin_: i don't think you posted a screenshot
17:10Benau: just like in the issue ticket (only mesh rendered with non-skinned mesh shader is displayed)
17:10Benau: (which are the wheels)
17:10imirkin_: and can you confirm that you don't see anything odd in dmesg while the trace runs?
17:11imirkin_: can you point me to the comment in your bug which contains what you see?
17:11Benau: dmesg nor journalctl show anything useful
17:11imirkin_: ok - that's good
17:12imirkin_: do you just see what's in the very first screenshot in the github issue?
17:12imirkin_: that contains a trace, not what you see it render
17:14imirkin_: the link you gave points to your trace, not to what you see it render as
17:14imirkin_: are any of the screenshots representative of what you see, and if so, which ones
17:15Benau: you meant the link to the trace i recorded?
17:15imirkin_: the github bug has a number of screenshots
17:15imirkin_: are any of them representative of what you see when replaying the trace?
17:15imirkin_: (i don't have the hw plugged in, so i can't check easily)
17:16Benau_: seems that the motherboard doesn't like me run in naked without computer case
17:16Benau_: hangs agian
17:16Benau_: let me give you a screenshot of what i see
17:17imirkin_: careful about the stress the GPU puts on the PCIe slot
17:17imirkin_: if it has a big fan or whatever
17:17Benau_: nope the gt240 i used is a "cheap" card
17:17imirkin_: k. i have a gt240 with gddr5 which is 2-wide
17:22Benau_: this is what i see now
17:23imirkin_: nice, i like it ;)
17:23imirkin_: are you a developer working on stk? or just an unlucky user?
17:25Benau_: i'm the one of the developer of stk
17:25imirkin_: oh cool
17:25imirkin_: if there's any way you could try to narrow down the issue, that'd be super helpful
17:25Benau_: and the kiki you see was made by me
17:26imirkin_: i.e. is it really related to TBO's?
17:27Benau_: i now try to make it avoid upload the skin matrix
17:27Benau_: every frame
17:29Benau_: i think it's related to tbo mainly if i use gles build it's work fie
17:29Benau_: i think it's related to tbo mainly if i use gles build it's work fine
17:29Benau_: which in gles it use a 2d texture as skinning matrices
17:32Benau_: and seems that even i don't upload the skinning matrices every frame it still doesn't show the animated model
17:32Benau_: so maybe not flush related?
17:34Benau_: also if i edit the skinning shader for joint_matrix = mat4(1.0); after the texelFetch the static pose of kiki is shown
17:35Benau_: (texelFetch failed somehow?)
17:41Benau_: also even i sample texelFetch(skinning_tex, 0) it still shows nothing
17:49imirkin_: Benau_: what if you have it just return vec4(1,1,1,1) instead of the texelFetch?
17:49imirkin_: (i.e. i dunno that texelFetch is the actual issue.)
17:49Benau_: well i think i do joint_matrix = mat4(1.0); after the texelFetch is the same?
17:50imirkin_: oh, i didn't read the shader
17:50imirkin_: sorry :)
17:50imirkin_: and with the joint_matrix = mat4(1) it "works"?
17:51imirkin_: (i.e. you get a white model or whatever it should be)
17:51Benau_: no it show a static pose
17:51Benau_: not animated
17:51imirkin_: should it be animated?
17:51imirkin_: (given that the joint_matrix == 1 always)
17:51Benau_: yes if you get the correct matrices from skinning_tex
17:51imirkin_: oh, what if you do
17:51Benau_: just like the correct apitrace replay
17:52imirkin_: joint_matrix = mat4(i_weight + i_weight + i_weight + i_weight)
17:53imirkin_: i.e. the equivalent of a vec4(1) being returned for each of the texelFetch's
17:54imirkin_: also ... can i assume that there's no funny float business in that RGBA32F texture? like no denorms that you expect to come out as 0's, no nan/inf, etc?
17:55Benau_: joint_matrix = mat4(i_weight + i_weight + i_weight + i_weight) shows a static pose
17:55Benau_: and i sure that no nan in tbo
17:56imirkin_: well phooey
18:18Benau_: btw does mesa always unroll for (int i = 0; i < 4; i++) loop to the way like my skinning shader?
18:18Benau_: (if 4 is constant)
18:23Benau_: time sleep see ya tmr
19:33dagb: imirkin_/Lyude: therm: Clockgating enabled
19:34Lyude: dagb: nice :)
19:57imirkin_: dagb: getting lower power usage?
20:16dagb: imirkin_: nothing major at idle, anyway. Not that I expected it.
20:17dagb: Powertop claims 19.8W at idle with 10% backlight.
20:23imirkin_: o well
20:23imirkin_: oh, but when the gpu's suspended, clockgating doesn't matter :)
20:25karolherbst: well at 20W you want to hope that your GPU is still on :p
20:25karolherbst: otherwise you have other serious problems