00:01 karolherbst: imirkin: https://github.com/karolherbst/mesa/commit/5f1cb0378977a67e31be115fd354e63b71b8ccc6
00:03 imirkin: karolherbst: worksforme. should make that a helper though - that pattern is in a lot of places.
00:03 imirkin: like the whole "upload a bunch of stuff"
00:04 karolherbst: I kind of doubt we run into this issue somewhere else, but... maybe we do indeed?
00:05 imirkin: ok
00:36 mupuf: imirkin, rhyskidd: i ran igt on nouveau. Even on the blob
00:54 imirkin: mupuf: how'd it go?
00:55 annadane: https://blog.avast.com/avast-open-sources-its-machine-code-decompiler is what i meant earlier by free compiler
01:03 annadane: i'm sure you have your own methods of reverse engineering of course
01:04 karolherbst: annadane: it's not a compiler
01:04 annadane: i meant decompiler, but ok
01:05 karolherbst: but, mhh
01:05 karolherbst: well there is always llvm
01:05 karolherbst: but no idea how good that is
01:05 karolherbst: thing is
01:05 karolherbst: decompiling stuff for using the information you got from it is always a legally questionable
01:05 karolherbst: because you can't really use the information you got from it
01:05 karolherbst: at least not for your own work
01:06 karolherbst: using it to publish about sec issues, sure, but that's more like citing
01:06 karolherbst: not actually writing your own code, which is supposed to do kind of the same as the thing you just decompiled
01:08 karolherbst: also, I don't trust any of those antivir firms anyway
01:09 karolherbst: and I am sure that online service is just used to collect binaries and feed their antivir AI or db or whatever plans they have with it
01:09 annadane: yeah that's fair
01:09 annadane: just passing on the message in case you guys found it useful
01:10 karolherbst: it is most likely useful, but we can't use it, so, that's kind of the situation
01:21 mupuf: imirkin: Most failures where in tests full of intelisms
01:23 imirkin: ah
01:41 Lyude: Does anyone know where that nikk person went? I would like to get in contact with them if they're interested
01:50 juri_: so, envytools contains dozens of executables, and none of them are documented what they do?
01:51 juri_: more fun is the complete lack of comments by the authors.
02:09 mupuf: juri_: well, the point of envytools is not to be understood by everyone, just the right kind of people!
02:10 imirkin: juri_: if it was hard to write, it should be hard to read!
02:14 imirkin: juri_: what kind of comments / etc were you hoping for?
04:12 juri_: i'll write some up, to show you.
04:57 rhyskidd: juri_: the main parts of envytools that i use are rnn/demmio (annotate mmio dumps) and nvbios/nvbios (parse vbios')
04:57 rhyskidd: but it depends on what you are looking to work on
04:58 rhyskidd: e.g. there's tools in nva to poke/prod various parts of the GPU
04:58 rhyskidd: python bindings that mwk used to RE various opcodes
04:58 rhyskidd: and some prior work on video decode engine
10:58 karolherbst: imirkin: I have a weird situation here, when calling div s32 on a lot of data (splitted by the invocation id), I get wrong results for random pairs of data
10:58 karolherbst: and for different pairs each run
11:08 karolherbst: imirkin: actually div s32 is fine, div u32 is not
11:09 karolherbst: hopefully I got something else wrong
11:27 karolherbst: imirkin: mhh, it starts producing wrong results when the block size becomes bigger than 256
11:30 karolherbst: funny though that it works for div s32
11:31 karolherbst: only change is really just a different buitin called
11:33 karolherbst: imirkin: https://gist.githubusercontent.com/karolherbst/c98f2c3fa5d72eb6359f977505f5147e/raw/b834a664ee8aca8d8dbc2b0bee2a83969227a339/gistfile1.txt and this is the weird thing
11:33 karolherbst: index, a % b = expected != gotten
11:34 karolherbst: I let it make 288 * 1024 calculations (288 grid size) and only errors inside the calculations for elements 54912-54939
11:34 karolherbst: numbers are quite random though
11:35 karolherbst: but there always seem to be groups where the error rate is quite high
13:17 pendingchaos: imirkin: the blob creates an aliased image with all sample locations at 0.5,0.5
13:18 pendingchaos: so I something is probably missing or something in nvc0
13:31 imirkin: pendingchaos: yeah. could be an enable bit. who knows.
13:31 imirkin: check the other MS settings
13:32 imirkin: karolherbst: for what particular a % b are you not getting the right value?
13:32 karolherbst: imirkin: random ones
13:33 imirkin: karolherbst: literally random? like it works one time but not the next?
13:33 karolherbst: I am not 100% sure, but the output kind of tells me this
13:33 karolherbst: I get the values from rand()
13:33 imirkin: could be we got the sched values wrong
13:33 karolherbst: and sometimes it looks like that: https://gist.githubusercontent.com/karolherbst/23d3090015e4ff54a410d40c60d5a17e/raw/0fb05354efedb1499a1721cc652b604cf451c2b7/gistfile1.txt
13:33 karolherbst: unlikely
13:33 karolherbst: I get no wrong result with lower grid sizes
13:34 karolherbst: but maybe that's kind of related?
13:34 karolherbst: dunno
13:34 karolherbst: imirkin: or do you mean the scheds in the lib code?
13:34 imirkin: i do.
13:34 imirkin: interesting
13:34 imirkin: it does look group-related
13:34 karolherbst: yeah
13:35 karolherbst: with the CTS 32 is the only size where I never get an error
13:35 karolherbst: but below 256 it is getting super rare
13:35 karolherbst: 512 gets me tons of errors
13:35 imirkin: yeah dunno
13:35 imirkin: try flipping the lib code's sched to 0x7e0 everywhere
13:36 karolherbst: yeah, will do that after I have recompiled everything here (moving to the new llvm-spirv thing currently, which also means building llvm-7)
13:36 imirkin: k
13:47 imirkin: pendingchaos: render to a MS fb attachment, and then attach it as a texture and check what each sample's value is. The resolve could be getting messed up.
13:47 imirkin: pendingchaos: or you're using the winsys framebuffer ms - that could get messed up too. (not sure how, but ... it's a source of weirdness.)
15:18 pendingchaos: running a program which retrieves the sample locations (like https://mynameismjp.wordpress.com/2010/07/07/msaa-sample-pattern-detector/, so msaa resolve, glGetMultisample and gl_SamplePosition are not used)
15:18 pendingchaos: while having nvc0 call the method with 0x8..8
15:18 pendingchaos: seems to show that the sample locations are indeed all at the center of the pixel
15:18 pendingchaos: (I also tested the program under normal circumstances, it seems to work fine)
15:18 pendingchaos: imirkin: see above (forgot to include you)
15:33 imirkin: pendingchaos: interesting. so all is well? is the issue in the resolve somewhere then?
15:34 imirkin: pendingchaos: there's a chance that we determine gl_SamplePosition weird... iirc we use pixld which hopefully works out correctly.
15:34 pendingchaos: For setting sample locations? I guess so? I haven't looked to much more into it yet
15:35 imirkin: pendingchaos: well, you said you were seeing something unexpected
15:35 imirkin: when setting all the sample positions to center
15:35 pendingchaos: I thought you determined gl_SamplePosition using the const buffer uploaded right before the sample positions were set?
15:35 pendingchaos: setting all the sample positions to the center still seems to create an anti-aliased image
15:36 imirkin: hmmmmm let's seeeeee here
15:36 imirkin: there is a PIXLD which we use for interpolateAtSample
15:36 imirkin: but perhaps not for gl_SamplePosition?
15:37 imirkin: gah!
15:37 imirkin: looks like we use the values in the constbuf
15:37 imirkin: which i guess will be wrong? or did you adjust the sample positions in that array that gets returned from get_sample_positions?
15:38 pendingchaos: I did uint32_t val[4] = {0x88888888, 0x88888888, 0x88888888, 0x88888888};
15:38 imirkin: ok
15:39 imirkin: hold on, let me see if pixld has something nice for this
15:40 imirkin: oh, well i get the sample id anyways. but i'd have to convert it to float.
15:41 imirkin: pendingchaos: well, for now, make sure you adjust https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/nvc0_context.c#n518 as well
17:37 pendingchaos: imirkin: with all samples locations set to the center:
17:37 pendingchaos: blitting onto a 4x msaa winsys framebuffer: looks anti-aliased
17:37 pendingchaos: blitting onto a normal winsys framebuffer: aliased
17:37 pendingchaos: using a custom resolve onto any winsys framebuffer: aliased
17:37 pendingchaos: I think the winsys framebuffer resolve might be doing something odd
17:38 pendingchaos: perhaps blurring it slightly vertically? (the blit was done with glBlitFramebuffer btw)
17:38 imirkin: mmmmkay.
17:39 imirkin: let's think for a minute here.
17:39 imirkin: the winsys fb isn't ACTUALLY multisampled
17:39 imirkin: it just says it is.
17:39 imirkin: in the background, we allocate a 4x (or whatever) fb, and then blit onto the winsys fb
17:39 imirkin: are you, perchance, using glReadPixels on such a framebuffer?
17:39 imirkin: because that's broken :)
17:40 imirkin: (we don't do the resolve. oops.)
17:40 pendingchaos: in the application I used to create the results I showed you? I'm not using glReadPixels
17:41 pendingchaos: yeah, by winsys fb I meant gl's fb 0 (not the actual image fed to the compositor)
17:43 imirkin: how are you passing in the sample locations?
17:43 imirkin: are you just jamming them for all fb's for now?
17:43 imirkin: or did you do it "for real"
17:43 pendingchaos: I hardcoded them in nvc0_validate_fb()
17:44 pendingchaos: set the array "val" to be full of 0x88888888
17:44 imirkin: ok
17:45 imirkin: so the question is ... what's the difference between blitting onto a regular winsys buffer, and the 4x msaa one
17:45 imirkin: can i see your sample code which produces the wrong results?
17:45 imirkin: [the GL program, that is]
17:46 pendingchaos: https://hastebin.com/zaxilihape.cpp
17:46 imirkin: thing is that if the sample positions are the same, no amount of resolve algorithms can mess it up.....
17:46 pendingchaos: sorry, have to go (I should be back in ~1 hour at most)
17:46 pendingchaos: it might be a small vertical blur or something
17:46 pendingchaos: I wouldn't be surprised if it wouldn't be very noticable in a typical situation
17:47 imirkin: well the way we resolve is we assume that the samples are laid out on a grid
17:47 imirkin: and then blit down
17:47 imirkin: so the blur makes sense
17:47 imirkin: we don't do a clever resolve thing
17:47 imirkin: just texture it down using regular filtering
17:48 imirkin: i don't see why there'd be a difference between blitting it one way or another.
17:48 imirkin: oh interesting. you call glBlitFramebuffer with GL_NEAREST
17:48 imirkin: i bet if you call it with GL_LINEAR you'll get the same result
17:51 imirkin: you know, blitting a MSAA fb to a winsys MSAA fb is a weird situation.
17:51 imirkin: might be interesting what shows up in the gallium api call trace (GALLIUM_TRACE=foo.xml)
18:38 pendingchaos: imirkin: here's the trace if you're interested: https://drive.google.com/open?id=1DBC93tpc2R1R9UGA-pN1Wsy5piS7_q6y
18:40 pendingchaos: it seems to crash at context destruction during tracing but I think everything before it is there
18:40 karolherbst: imirkin: yep, sched opcodes are responsible
18:40 karolherbst: imirkin: no wrong result after reverting
18:40 imirkin: pendingchaos: there's a tool to process it
18:41 imirkin: in mesa... dump_trace.py or something like that
18:41 karolherbst: imirkin: how common is u32 / or % though? might explain some super annoying bugs
18:42 imirkin: yeah, in src/gallium/tools/trace/dump.py
18:42 imirkin: karolherbst: somewhat, i think
18:42 imirkin: comes up in dolphin a lot iirc
18:42 karolherbst: ohh
18:42 karolherbst: but maybe it was just broken in compute shaders with big enough work groups, dunno
18:43 imirkin: pendingchaos: is that a trace with the custom resolve shader?
18:43 pendingchaos: should be the with the 4x msaa winsys framebuffer and glBlitFramebuffer
18:46 HdkR: Dolphin doesn't have an overwhelming number of integer divides at least :)
18:46 imirkin: HdkR: used to. i think you got rid of them a while back?
18:46 imirkin: used to do a ton of / 255 stuff.
18:48 imirkin: which became >> 7 + >> 8 or something? i forget.
18:48 HdkR: Yea, it is mostly changed over to masks and shifts
18:48 HdkR: Loads of fdiv though
18:49 karolherbst: uhh, well
18:49 imirkin: karolherbst: maxwell and pascal may have different latencies
18:50 karolherbst: imirkin: maybe
18:50 karolherbst: will test stuff on work
18:50 karolherbst: *at
18:52 imirkin: karolherbst: can you bisect the issue?
18:52 imirkin: i.e. only update half the scheds
18:52 imirkin: etc
18:53 karolherbst: imirkin: yeah, will do that later I suppose
18:53 karolherbst: currently do a full CL CTS run with the updated llvm-spirv thing and that kind of takes a while
18:54 imirkin: k
19:14 imirkin: pendingchaos: https://hastebin.com/raw/esifatigud
19:14 imirkin: nothing too odd
19:15 imirkin: there *is* a msaa copy that includes a flip
19:15 imirkin: whereas i suspect the other way, it's a msaa resolve that includes a flip
19:15 imirkin: perhaps something gets weird in that logic?
19:21 pendingchaos: imirkin: "the other way"? could you rephrase that?
19:23 imirkin: when you render to a fb, blit the fb to winsys, that then gets blitted to the *real* winsys fb
19:23 imirkin: when you render directly to winsys "fb", that gets blitted to the backing resolved fb
19:23 imirkin: when you blit between regular fb and winsys fb, there's a fb flip involved
19:23 imirkin: since the origin is different
19:54 karolherbst: imirkin: it's a sched op for a mov
19:55 karolherbst: or maybe not directly..
19:58 karolherbst: hakzsam: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/lib/gm107.asm#n34
19:59 karolherbst: only putting "(st 0x0)" for the mov fixes the issue I see
19:59 karolherbst: (wrong results when using div u32)
19:59 karolherbst: any ideas, suggestions?
20:00 karolherbst: I am sure there is a proper solution, but I really don't know the meaning of those thingies
20:02 imirkin: karolherbst: iirc the number in st is the number of cycles to wait AFTER the mov
20:02 karolherbst: huh
20:02 imirkin: and st 0x0 == 0xf
20:02 karolherbst: ahh
20:03 karolherbst: maybe 0xe works
20:03 karolherbst: not even 0xf works
20:04 karolherbst: only st 0x0
20:04 karolherbst: I guess that means a bit more than just 0xf
20:06 imirkin: i can never remember wtf "wt" and "wr" are
20:06 imirkin: st = stall
20:08 imirkin: hakzsam: can you take a look, see if anything obvious pops out?
20:08 karolherbst: the st = 0x0 is the same as 0x00 on kepler I think
20:08 karolherbst: except that not everything works
20:08 imirkin: wr = write dep bar, rd = read dep bar, wt = wait dep bar
20:09 karolherbst: mhh
20:09 karolherbst: imirkin: maybe this and that mov+shr issue I found a few days ago is kind of related
20:09 karolherbst: I doubt it is a coincidence that mov is involved in both cases
20:16 imirkin: i wonder if imad needs a read barrier
20:17 karolherbst: I think it actually does
20:17 imirkin: i think it does too
20:17 imirkin: and the mov overwrites the previous imad's argument
20:17 imirkin: on line 30
20:18 imirkin: add a "rd 0x1" to the last group
20:18 imirkin: and "wt 0x2" to the mov that comes after it.
20:18 karolherbst: last group?
20:19 imirkin: on line 30
20:19 karolherbst: ohh of the next one
20:19 karolherbst: uhh
20:19 karolherbst: ok
20:19 imirkin: the last ()
20:19 imirkin: which references the 3rd op in the group, i.e. imad u32 u32 hi $r2 $r2 $r3 $r2
20:19 karolherbst: yeah
20:19 karolherbst: got it
20:19 imirkin: and then change (st 0x6) to (st 0x6 wt 0x2) for the mov right below it
20:20 karolherbst: that works
20:20 imirkin: yay
20:20 karolherbst: wt 0x1 seems to worka s well
20:21 karolherbst: but not wt 0x0
20:21 imirkin: yes, it does
20:21 imirkin: but it would wait too long
20:21 imirkin: you want wt 0x2
20:21 karolherbst: ohh
20:21 imirkin: i.e. you want for the imad read to be done, not necessarily its write
20:21 karolherbst: I see
20:21 imirkin: the read dep bar is set to the second barrier, i.e. 0x1
20:21 imirkin: the wt takes a barrier mask
20:22 imirkin: so 0x1 corresponds to barrier 0x0, 0x2 corresponds to barrier 0x1
20:22 karolherbst: ahh
20:22 karolherbst: yeah, I have no clue about how any of that works :)
20:22 imirkin: no worries
20:23 karolherbst: https://github.com/karolherbst/mesa/commit/8f13b848392b4c66359f6e22e6b9c8f16f653f47
20:24 imirkin: lgtm
20:24 imirkin: ship it
20:24 karolherbst: well, I'll give hakzsam a chance to respond
20:43 pendingchaos: imirkin: how should I expose any idea how I should expose SAMPLE_LOCATION_PIXEL_GRID_*_ARB in gallium?
20:44 pendingchaos: it's value depends on the number of samples used
20:44 imirkin: not sure. there's already a get_sample_locations() callback
20:44 imirkin: is that enough?
20:46 pendingchaos: do you mean I should extend get_sample_position so that is can be used to get the pixel grid size?
20:46 imirkin: not necessarily - perhaps another callback
20:46 imirkin: i haven't looked at the ext too carefully
20:46 imirkin: so i'm not 100% sure what's necessary
20:47 imirkin: the gallium api can be extended if it makes sense though
20:47 pendingchaos: I guess I'll add a get_sample_pixel_grid_size callback
20:48 imirkin: what are the inputs/outputs?
20:48 pendingchaos: (or maybe just get_sample_pixel_grid)
20:48 pendingchaos: the inputs: sample count
20:48 pendingchaos: the outputs: width and height
20:48 imirkin: right.
20:48 imirkin: well, you can also shop around and see what other hw's restrictions are
20:49 imirkin: but otherwise that makes sense
20:49 pendingchaos: intel seems to be 1x1 for all samples (though there's no gallium driver for it)
20:49 pendingchaos: amd seems to always be 2x2
20:49 imirkin: ah ok. and nvidia is variable.
20:49 pendingchaos: ^
21:12 imirkin: pendingchaos: so yeah, a callback can make sense
21:13 imirkin: can all be generalized when there's too many of them
21:19 imirkin: bbiab... going to try out the fermi reclock patches.
21:32 rhyskidd: does envytools have a way to be smart about variable names of a bitfield value, where the name changes whether it is a read or a write?
21:34 rhyskidd: i know there is "access='r'" and "access='w'", but not clear that allows the changed variable name
21:40 imirkin: o well. no go.
21:45 imirkin: rhyskidd: no such distinction is available
21:45 rhyskidd: ok
21:45 imirkin: you could do something hacky
21:45 rhyskidd: i'll put it in a comment next to the line I'm changing
21:45 imirkin: like add a variable for whether it's a read or write
21:46 imirkin: and restrict it on varset
21:46 imirkin: but ... seems like overkill
21:46 rhyskidd: yeh, that's be a pain for a large-ish bitfield
23:34 mwk: rhyskidd: access= works only on whole registers
23:34 mwk: can't do it on just a bitfield
23:34 mwk: but it's perfectly possible to have two registers at the same place, with access="r" and access="w"