12:53 AndrewR: imirkin, hello! I saw your vdpau testing request on mesa-devel.. Mplayer + vdpau (presentation) seems to work for me on mesa master (Mesa 17.3.0-devel (git-288621b1b7) ) with dri3.. kernel is 4.12.0-x64, mesa and most of libs compiled by gcc 4.9.2/4.8.4 . Not sure if I should just let mplayer running for few hours, but last night I was reviewing 50 min long sequence and it worked just fine ...
12:54 imirkin: AndrewR: with what hw?
12:54 imirkin: your G92?
12:56 AndrewR: imirkin, yes
12:56 imirkin: AndrewR: ok. for me it was with hw decoding.
12:56 imirkin: good to know that presentation's not totally fucked
12:56 imirkin: i meant to try that, but my box hung before i got the chance :)
12:59 AndrewR: imirkin, for me there was issue with really long x session/uptime and many s2ram's: vdpau window just started black ..it was fixed by running glxgears before player start...so, vblank(s) tend to stop working after some time? (sometimes I get unsyncronized glxgears after days of uptime, but right now its syncronized by default to my vrefresh at 75 hz)
13:00 imirkin: weird.
13:52 imirkin: is there a brave soul to get more info on https://bugs.freedesktop.org/show_bug.cgi?id=102337 ?
13:53 imirkin: [allegedly kills the box]
13:55 pmoreau: imirkin: I can have a try tonight.
14:31 karolherbst_: :/ I need access to a system with a 4.5 radeon GPU to test stuff on it
15:24 karolherbst: imirkin: are there any gallium switches to somehow change the behaviour regarding inputs for image_load_store related shaders? Because currently it either looks like some gallium magic or we just don't get all the information we actually need to properly support 3d images
15:24 karolherbst: or maybe it's completly missing in gallium, no idea
15:34 imirkin_: karolherbst: what don't we get?
15:35 imirkin_: karolherbst: most of the info is obtained from the image struct which is somewhere in constbuf-land
15:35 karolherbst: I don't know technically wise, but nvidia does something like "$r0 = $r0 * c0[0xf00] + c0[0xf08]" and "$r1 = $r1 * $r2 = c0[0xf04] + c0[0xf0c]" and feeds both values into the suclamp calls
15:35 imirkin_: right
15:35 karolherbst: but nouveau does more like "$r0 = $r0 * c0[0xf00] + c0[0xf04]"
15:35 imirkin_: so you need to figure out what it sticks into 0xf0c
15:36 imirkin_: i THINK that nouveau tries to mirror nvidia's structure
15:36 karolherbst: and sticks that imto suclamp, and the other value unmodified
15:36 imirkin_: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/nvc0_tex.c#n811
15:36 karolherbst: ahhh okay
15:36 karolherbst: and that also influences the generated TGSI?
15:37 imirkin_: info[3] = (0x88 << 24) | (lvl->pitch / 64);
15:37 imirkin_: oh goodie.
15:37 imirkin_: sure am glad calim figured that out :)
15:37 karolherbst: :)
15:37 imirkin_: no, the TGSI just gets an image reference
15:37 karolherbst: mhhh
15:37 imirkin_: e.g. IMAGE[1]
15:37 karolherbst: because the TGSI is already borked afaik
15:37 imirkin_: and then the driver has to know wtf to do with it
15:37 imirkin_: well, if the TGSI is wrong, there's no way out ;)
15:37 karolherbst: yeah
15:37 imirkin_: however i doubt that
15:37 karolherbst: that's what I want to see what radeonsi is doing
15:38 karolherbst: *why
15:38 karolherbst: well, I doubt that as well
15:38 imirkin_: pastebin the shader + tgsi you think are wrong?
15:38 karolherbst: but you know, I want to be sure about it
15:38 karolherbst: I don't have it with me right now, could do in around 3 hours
15:38 karolherbst: maybe 2
15:38 imirkin_: sure, whenever
15:39 karolherbst: I have the shader though: https://gist.github.com/karolherbst/190939c7f056c040a320dbaa01b4c1e8
15:39 karolherbst: well
15:39 karolherbst: the nvc0ir thing
15:39 karolherbst: vs nvidia: https://gist.github.com/karolherbst/d8a3e38d2a6d5ccd200486a71dd620da
15:39 karolherbst: allthough I changed stuff a lot in the nvidia shader to have it more structurited
15:39 karolherbst: *structureized
15:39 imirkin_: but all those loads from constbuf are entirely driver-side
15:39 imirkin_: not tgsi-side
15:39 imirkin_: i.e. the driver has to know how to convert the 3d coordinates
15:40 karolherbst: the fma thing is inside the TGSI
15:40 imirkin_: into something that the underlying hw understands
15:40 karolherbst: "fma ftz rn f32 $r1 $r1 c0[0x0] $r2"
15:40 karolherbst: this comes from TGSI
15:40 imirkin_: ok
15:40 karolherbst: and nvidia does something else there
15:40 imirkin_: which i'm sure is because there's something to that effect in the shader.
15:40 karolherbst: now sure if due to gallium magic
15:40 karolherbst: or not
15:41 imirkin_: almost certainly "not"
15:41 imirkin_: the shader has a mul+add...
15:42 karolherbst: mhhh
15:43 karolherbst: I will have another look later. Just hoped you have a few nice pointer I could follow, because a lot of it looks actually right. Well despite the big else clauses nouveau doesn't "implement"
15:43 karolherbst: or rather differently
15:43 imirkin_: right, so ignore that entirely
15:43 imirkin_: i think it's an opt
15:43 karolherbst: yeah
15:43 karolherbst: me as well
15:43 imirkin_: hardly required for correctness
15:43 karolherbst: because the code seemed to be there
15:44 imirkin_: if the code is the same
15:44 karolherbst: ohh, I meant the outer else branches though
15:44 imirkin_: that would suggest that they de-tile the 3d image first
15:44 imirkin_: which is definitely a requirement on fermi
15:44 imirkin_: but it might also be the way that kepler is handled
15:44 karolherbst: what caught my eye as well are those prmt intructions
15:45 imirkin_: yeah i totally don't remember what they do (if i ever did)
15:46 imirkin_: might be something to do with swizzling (due to tiling), dunno
15:46 karolherbst: odd bit swapping things
15:46 imirkin_: swizzling is a lot of bit swapping btw
15:46 karolherbst: "Permute bytes from register pair."
15:46 imirkin_: ah fun
15:46 karolherbst: yeah
15:46 karolherbst: http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-prmt
15:47 imirkin_: lol
15:47 imirkin_: not exactly intuitive, is it
15:47 karolherbst: not really
15:48 karolherbst: but well, the output looked right format wise
15:48 karolherbst: just, cut of as if some pixels weren't transfered
15:54 karolherbst: and I think I need to understand what those instructions are doing anyway
15:55 imirkin_: right - the 3d tiling isn't being considered.
16:15 karolherbst: imirkin: who should I ping again regarding the format related patches I sent?
17:15 karolherbst: imirkin: that's the tgsi: https://gist.github.com/karolherbst/bbeda4fb3d4a92c8156f22686ae59f3d
17:18 karolherbst: and here the GLSL: https://gist.github.com/karolherbst/ce5aa1dfda111895b0e31574dc091a53
17:21 karolherbst: ohhh
17:21 karolherbst: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/nvc0_tex.c#n916 gets triggered
17:21 karolherbst: /* doesn't work if z passes z-tile boundary */
17:21 karolherbst: kind of makes sense
17:24 imirkin_: right.
17:24 karolherbst: oaky, it looks like bad handling here is the problem
17:24 imirkin_: right :)
17:25 imirkin_: or the driver de-tiles
17:25 imirkin_: that thing should actually look at the tiling of the resource
17:25 imirkin_: and bail if it's z-tiled
17:29 karolherbst: that #if 0 block also looks interesting
17:34 karolherbst: sadly nvidia doesn't document the internal suld and sust instructions
17:35 karolherbst: imirkin_: am I right in saying that this tex information is mapped into c15[0x460+] or is it the stuff put into c0 space?
17:35 imirkin_: well, they're pretty straightforward
17:35 imirkin_: it's in c15-space
17:35 karolherbst: okay
17:35 karolherbst: where comes the data from c0, the shader?
17:36 karolherbst: aka the application?
17:36 imirkin_: depends
17:36 imirkin_: application and/or higher layers
17:36 karolherbst: okay
17:37 imirkin_: btw, dunno if that's always the case on nvidia
17:37 karolherbst: I am really interested what ends up inside c0[0xf00] to c0[0xf0c] for nvidia
17:37 imirkin_: same stuff as for us
17:37 imirkin_: surface descriptor info
17:37 karolherbst: well okay
17:37 karolherbst: but we only use 0x0 and 0x4
17:37 imirkin_: well ... it's in the trace :)
17:37 karolherbst: :) true
17:38 imirkin_: also i suspect it's the line i pointed out
17:38 imirkin_: info[3] = whatever
17:38 karolherbst: ohh
17:38 karolherbst: it's 0
17:38 karolherbst: and 4 as well
17:38 karolherbst: ohh wait
17:39 karolherbst: no
17:39 imirkin_: well, info is an array of ints.
17:39 karolherbst: only for PIPE_BUFFER
17:39 imirkin_: so +0xc is really info[3]
17:39 imirkin_: aka the thing i pasted.
17:39 karolherbst: okay
17:40 karolherbst: info[3] maps to c0[0x10] or is there some special handling?
17:40 karolherbst: uhm
17:40 karolherbst: c0[0xc]
17:40 imirkin_: huh?
17:40 imirkin_: it's the image descriptor that lives in the driver constbuf, for nouveau
17:41 imirkin_: nvidia has clever logic for where to stick all that stuff
17:41 imirkin_: but that doesn't change the contents of the data
17:41 karolherbst: yeah sure, I just wanted to know where to find it inside the c0 space then
17:41 karolherbst: for nouveau
17:41 imirkin_: it's in c15
17:41 karolherbst: okay
17:41 imirkin_: PUSH_DATAh(push, screen->uniform_bo->offset + NVC0_CB_AUX_INFO(s));
17:42 imirkin_: PUSH_DATA (push, screen->uniform_bo->offset + NVC0_CB_AUX_INFO(s));
17:43 imirkin_: oh, i guess the relevant bit is
17:43 imirkin_: PUSH_DATA (push, NVC0_CB_AUX_SU_INFO(i));
17:43 imirkin_: in nve4_update_surface_bindings
17:43 imirkin_: for the position
17:44 imirkin_: in that constbuf
17:52 karolherbst: okay, now I need to understand how I figure out what is inside which constbuf inside that mmt trace
17:52 imirkin_: look for uploads to the address you see in the shader
17:53 karolherbst: like if I see c0[0xf00] I should look for f00?
17:54 karolherbst: I was expecting something like: this is a constbuf with following data, and this constbuf gets mapped into cX
17:54 imirkin_: of c0
17:54 imirkin_: well, for constbufs to work well
17:54 imirkin_: you have to upload the data via the constbuf methods
17:54 imirkin_: look at how nouveau does it
17:54 imirkin_: and look for similar patterns
17:54 karolherbst: yeah, well I see those inside the mmt trace already
17:54 imirkin_: specifically CB_POS
17:54 karolherbst: something like this? "GK104_3D.CB_POS = 0x160"
17:55 imirkin_: yes
17:55 imirkin_: but 0xf00
17:55 imirkin_: ;)
17:55 karolherbst: okay
17:55 karolherbst: ahh
17:56 karolherbst: okay, and how do I figure out that it's c0 and not c1?
17:56 imirkin_: look above CB_POS
17:56 imirkin_: that will have a buffer's address
17:56 karolherbst: okay, 0x1005c0000 in this case
17:56 karolherbst: ohhh
17:56 karolherbst: okay
17:56 karolherbst: I see
17:56 imirkin_: and somewhere that address will be bound to an index.
17:58 karolherbst: okay, makes sense
18:00 karolherbst: I guess it's "GK104_3D.CB_BIND[0] = { VALID | INDEX = 0 }"
18:01 imirkin_: yes.
18:01 karolherbst: mhhh
18:21 karolherbst: *sigh* it would be much better to actualy debug those things, if at least intel would pass all those tests
18:21 karolherbst: and wouldn't crash
18:21 karolherbst: all KHR-GL44.enhanced_layouts fails are bugs inside mesa core
18:26 karolherbst: GL_ARB_enhanced_layouts, out: " layout (location = 2) vec4 gohan;" but in is "vec4 gohan;", mesa reports an error: definitions do not match, fun
19:32 karolherbst: mhh, KHR-GL44.robust_buffer_access_behavior.texel_fetch should be easy to fix. will work on this for now until I have time again to dig into images more deeply
20:39 karolherbst: imirkin: nouveau doesn't implement robustness for texfetch, would we need to add unconditonal out of bound checks for now or should we indeed check of a robustness context is requested or not?
20:48 tobijk: karolherbst: the more fine grained would be nice, which work is neede for that? only context checking at one place?
20:48 karolherbst: tobijk: I suppose a check inside the shader
20:49 tobijk: i'm not familiar with robustness, can that be requested t any time? or only at compile time?
20:49 karolherbst: we do it for other things as well
20:49 karolherbst: well
20:50 karolherbst: if you have inputs to an instruction such that you access something out of bounds, it has to return 0 (most of the time, or always)
20:50 karolherbst: and usually you don't know it before
20:51 karolherbst: I think
20:51 karolherbst: not quite sure
20:51 tobijk: we can crash if khr_robustness or how its called is not wanted :>
20:51 karolherbst: I don't know how texfetch works to be honest
20:51 karolherbst: ... doesn't really help
20:51 karolherbst: it's about instructions like these: texfetch 2D $r8 $s0 f32 $r0q $r0d
20:51 tobijk: well i guess you are right, taht is part of 4.4 or 4.5 so its always there
20:51 karolherbst: I think $r and $s are the coodinates?
20:52 karolherbst: and $r0d the sampler?
20:52 karolherbst: I really don't know though
20:52 tobijk: me neither, sorry
20:53 karolherbst: ahh okay, coordinates in textures are s t r q
20:54 karolherbst: let's see what nvidia does
21:00 karolherbst: meh
21:00 karolherbst: how easy
21:02 tobijk: ?
21:02 karolherbst: wierd shaders they generate
21:03 karolherbst: https://gist.github.com/karolherbst/165023dea346c7f203d8e611e3a25d3c
21:03 karolherbst: ohh wait
21:03 karolherbst: those are two shaders
21:03 karolherbst: silly grep
21:10 karolherbst: they do some kind of magic somewhere
21:10 tobijk: well i havent seen the shader :D
21:10 karolherbst: the gist
21:11 karolherbst: it is nothing inside the shader most likely
21:11 tobijk: i thought that were 2 shaders and some bullshit :D
21:11 karolherbst: I am sure it's some kind of flag we can set somewhere
21:11 tobijk: ok will have alook
21:11 karolherbst: and then texfetch will return 0 at out of bound accesses
21:13 tobijk: what does our texfetch look like for this?
21:14 karolherbst: the same
21:14 karolherbst: the entire shader are the same
21:14 karolherbst: except
21:14 tobijk: :o
21:14 karolherbst: we do all instead of live
21:15 tobijk: what does the $r63 do? that looks like it could be the switch
21:15 karolherbst: $r63 is 0
21:15 tobijk: $r63= 0 if im correct?
21:16 imirkin_: karolherbst: see what blob does
21:16 imirkin_: karolherbst: afaik that's not necessary...
21:16 imirkin_: karolherbst: what do we return for texfetch?
21:16 karolherbst: some garbage value
21:16 karolherbst: 0x2000020 or something
21:16 imirkin_: hm ok
21:16 imirkin_: i wouldn't be surprised if there were some method we had to set to 0
21:16 karolherbst: sometimes 0x2000000
21:16 imirkin_: like the vertex data runout
21:16 karolherbst: yeah
21:17 karolherbst: I have a mmt trace
21:17 imirkin_: karolherbst: well check their shader
21:17 imirkin_: if they have robustness shit in there, then i guess we have to as well
21:17 imirkin_: if not, we have to figure out how to do it properly
21:18 karolherbst: at least I know we can set liveOnly also for OP_TXF
21:18 karolherbst: nope
21:18 karolherbst: no robustness shit in there
21:18 imirkin_: of course we can.
21:18 tobijk: imirkin_: yeah either its a per context setting, we don't see in the shader or its a addition to the texfetch thingy :>
21:19 imirkin_: could also be something in the descriptor
21:19 karolherbst: imirkin_: yeah, just saying because my initial opt was only for OP_TEX
21:20 karolherbst: imirkin_: https://gist.github.com/karolherbst/eb41cb07c25c9a5cf1da5a54f3a28781
21:20 imirkin_: right
21:21 karolherbst: so, it's a method I suppose
21:22 imirkin_: or a bit in the descriptor
21:22 karolherbst: I thought the header is a descriptor
21:22 imirkin_: the texture descriptor
21:22 imirkin_: TIC
21:22 karolherbst: ohh, right
21:22 tobijk: texfetch p lzero live $r0:$r1:$r2:$r3 t2d $t8 $s0 $r0:$r1 () vs texfetch p llod live $r0:$r1:$r2:$r3 t2d $t8 $s0 $r0:$r1 $r63
21:22 karolherbst: tobijk: first vs last shader from trace
21:22 tobijk: meh :/
21:22 imirkin_: tobijk: lzero has a lod of 0
21:23 imirkin_: tobijk: llod has an explicit lod
21:23 tobijk: k
21:23 imirkin_: avoids an unnecessary "0" argument, which is *extremely* common
21:23 karolherbst: I greped TIC from the mmt: https://gist.githubusercontent.com/karolherbst/4f9f2f2493b68542afd2ce75ab7777d3/raw/257b0fc74be587387bc6e6ff3712388cbf3071ea/gistfile1.txt
21:25 imirkin_: probably a runout thing...
21:25 imirkin_: grep for RUNOUT
21:25 imirkin_: should find a vertex thing
21:25 imirkin_: look for random addresses to unknown methods around it
21:25 karolherbst: GK104_3D.UNK1A2C[0] = 0
21:26 karolherbst: yeah
21:26 karolherbst: might be it
21:26 imirkin_: i doubt it'd be an explicit value
21:26 imirkin_: but i guess you never know
21:26 imirkin_: more likely an address which contains the value in question
21:27 karolherbst: https://gist.githubusercontent.com/karolherbst/3f060e59201027cfdf525f01bf47bc7c/raw/5da6051001bc802959e288312932b4343a931a8d/gistfile1.txt
21:27 karolherbst: I would try that unk1a2c thing for now
21:29 tobijk: does the GK104_3D.UNK1A2C[0] the [0] match at least?
21:29 karolherbst: now I have to figure out where to call that method
21:31 karolherbst: they write a lot of 0s into 0x100420000
21:32 imirkin_: that's the IB address
21:32 imirkin_: those 0's go into the vertex runout
21:32 imirkin_: although... that is more 0's than i expected
21:32 imirkin_: i think we only allocate 16 bytes towards it
21:33 karolherbst: there are a _lot_ of those
21:33 karolherbst: more like 50 in total
21:33 tobijk: karolherbst: always packs of 3?
21:33 karolherbst: no idea
21:43 karolherbst: imirkin_: maybe we don't clear the memory to 0?
21:44 imirkin_: what memory
21:44 imirkin_: i actually wonder if the coordinates are getting clamped
21:45 karolherbst: by what?
21:45 imirkin_: and so we're returning the texel value from where the coordinates are at
21:45 karolherbst: ohhh
21:47 karolherbst: but why should that happen
21:51 karolherbst: okay
21:51 karolherbst: we return random data
21:52 karolherbst: or more like whatever is inside memory now
21:54 karolherbst: uhm....
21:55 karolherbst: they do odd things
22:05 karolherbst: :(
22:05 karolherbst: imirkin_: okay, they simply use an unitialized texture
22:12 karolherbst: imirkin_: okay, this is what the test does: glGetTextures, glBindTexture, glTexStorage2D(levels = 2, 32x32), glTexSubImage2D(level = 1).. rendering
22:16 karolherbst: fun test
22:16 karolherbst: basically intel passes this test, because all uninitialized memory is simply 0, cheating :D
22:17 karolherbst: so there is no out of bound access from the texture
22:17 karolherbst: it's just never filled with any values
22:18 karolherbst: checking what nvidia does
22:19 imirkin_: what lod do they fetch from?
22:21 karolherbst: I doubt they set any for the first tests
22:21 imirkin_: texelFetch() takes a lod
22:21 imirkin_: no way around it.
22:22 karolherbst: 1
22:22 karolherbst: last argument, right?
22:22 imirkin_: maybe
22:22 karolherbst: well, the others are called uni_texture and point
22:22 imirkin_: yes, last arg
22:22 karolherbst: mhh odd
22:22 karolherbst: fun
22:22 imirkin_: so it initializes lod 1
22:22 karolherbst: level 0 is uninit
22:22 karolherbst: and 1 is set to something usefull
22:22 imirkin_: and fetches from it
22:23 karolherbst: and on nvidia/intel have a resulting color_attachment0 with the data from level 1
22:23 karolherbst: but nouveau has only garbage there
22:23 karolherbst: like if taken from level 0
22:23 imirkin_: well, your shader *did* have lzero
22:23 imirkin_: which is lod 0
22:23 karolherbst: yeah, might be the issue
22:24 karolherbst: nice, qapitrace shows the the NVxp5.0 shaders
22:25 tobijk: imirkin_: first he showed me a shader with $r63 ath the end
22:25 imirkin_: well, $r63 = 0 as well (for pre-GK110)
22:25 tobijk: maybe they pass in something else on _some_ shaders
22:26 karolherbst: anyway
22:26 tobijk: hard to tell woithout all shaders :/
22:26 karolherbst: it's not an out of bound access
22:27 karolherbst: anyway, going to bed now. Maybe I figure something out tomorrow
22:27 tobijk: gn8
22:30 karolherbst: skeggsb: still waiting for reviews of my other series ;) and uhm... when I find more time for nouveau in the future, I might even plan to finish all my started work on the kernel side, even dynamic reclocking is on my todo list pretty high and I would like to have at least most of the ground work done by then.