11:55 karolherbst: imirkin: mhh, seems like on nvc0 we can just go ahead and enable support for atomic float min/max? (+ the required TGSI bits)
11:55 karolherbst: but from a hardware perspective that should be fine, right?
11:55 karolherbst: didn't check what nvidia does though
11:55 karolherbst: but the tests seem to just pass on my pascal
12:07 karolherbst: mhh, something up with the float atomcas
12:09 karolherbst: ahh, GL_NV_shader_atomic_float also adds a cas variant, nice
12:11 karolherbst: heh.. mhh, or maybe that doesn't work indeed
12:11 karolherbst: weird
12:11 karolherbst: mhh, meh
12:13 karolherbst: seems like the hw can deal with min/max though
12:28 karolherbst: but we might want to expose GL_NV_shader_atomic_float :)
12:28 karolherbst: ohh, fermi/kepler..
12:33 HdkR: Older stuff would be very sad :P
13:32 imirkin: karolherbst: we expose atomic float
13:32 imirkin: just not the intel min/max thing
13:33 karolherbst: just the add stuff, yes
13:33 karolherbst: and only for fermi/kepler
13:33 imirkin: huh?
13:33 imirkin: oh
13:33 imirkin: right
13:33 imirkin: on maxwell+ ... what was the issue ...
13:33 imirkin: i had figured it out, but it sometimes upset RA?
13:33 karolherbst: but the min/max thing seems to work on maxwell, apparently
13:33 karolherbst: could be a coincidence though
13:33 imirkin: uhhhh ... that just means it's bad tests
13:33 karolherbst: all tests passed besides the cas thing
13:33 imirkin: iirc there is no F32.MIN variant
13:34 karolherbst: mhh, hardware didn't complain
13:34 karolherbst: and at least nvdisasm is happy
13:34 imirkin: yeah, that's coz we emit like U32.MIN :)
13:34 karolherbst: I fixed that ;)
13:34 karolherbst: and we actually emit f32
13:34 imirkin: erm
13:34 karolherbst: but yeah.. no idea what's going on there
13:34 karolherbst: and pointless without f32 cas support
13:34 imirkin: well, cas is weird anyways
13:35 imirkin: and has fallback support
13:35 imirkin: did you test ssbo's? or only shared memory?
13:35 imirkin: ssbo's will take a fallback path on fermi and maybe kepler
13:35 imirkin: er
13:35 imirkin: sorry
13:35 imirkin: shared will take a fallback path
13:35 karolherbst: mhh, let me check
13:35 imirkin: which can be used to implement ~anything
13:35 imirkin: (at a cost)
13:36 karolherbst: piglit tests shared and ssbos
13:36 imirkin: hm ok
13:36 karolherbst: but.. some tests even passed when I was just emiting the u32 opcode
13:36 karolherbst: so it might be that the tests are just bad
13:36 imirkin: tests ain't perfect
13:36 karolherbst: _but_ emiting f32 fixed the nan tests
13:36 imirkin: what does nvdisasm produce?
13:36 karolherbst: for CAS: SUATOM.D.1D_BUFFER.CAS.F32.FTZ.RN.IGN RZ, [R4], R0, R3 ;
13:37 imirkin: SUATOM = maxwell+ no?
13:37 karolherbst: for max: SUATOM.D.1D_BUFFER.MAX.F32.FTZ.RN.IGN R0, [R4], R1, R0 ;
13:37 karolherbst: yeah
13:37 imirkin: yeah ok. maybe it works on maxwell+
13:37 karolherbst: but CAS caused a shader trap
13:37 imirkin: expected.
13:37 karolherbst: gr: GPC0/TPC0/TEX: 80000041
13:37 imirkin: check the patch i had to enable this stuff
13:38 karolherbst: ohh, good idea
13:38 imirkin: https://github.com/imirkin/mesa/commit/26a346b4716905e7d8c0556e28043cde11d9d788
13:39 imirkin: not 100% sure that's the latest, but at least it should give you an idea
13:39 karolherbst: heh.. weird
13:39 imirkin: the RA bit is unrelated, i think
13:39 karolherbst: shared-atomiccompswap-float passed though. the ssbo variant caused issues
13:39 karolherbst: or.. let me check that actually
13:40 imirkin: hm. did i push that change too then?
13:40 karolherbst: no.. I think i stillr an piglit when the u32 cas opcode was emited
13:40 imirkin: yeah, i didn't end up pushign that
13:41 imirkin: ok, i found the latest ...
13:41 karolherbst: k.. it traps
13:42 imirkin: karolherbst: https://github.com/imirkin/mesa/commits/tmp-for-karol
13:42 imirkin: have a look at the ATOM-related commits
13:42 karolherbst: mhhh
13:43 karolherbst: looking at the piglit tests.. they don't really test much
13:43 karolherbst: the nan does
13:44 karolherbst: mhhh
14:04 imirkin: karolherbst: this was the bogus change i needed to make it work: https://github.com/imirkin/mesa/commit/50fc29b7cced132dbabfb1dd28d172dc9f9b0ddf
14:04 karolherbst: ohh uhm..I think I remember that one
14:04 imirkin: and this is the change which has the *correct* lowering for shared atomic float: https://github.com/imirkin/mesa/commit/071c0fd6089c1fc21a33623c03430f4b0a070002
14:05 imirkin: obviously would need to be extended for min/max
14:05 imirkin: but should be fairly straightforward.
15:28 karolherbst: imirkin: btw, I am also wondering if we should enable PIPE_CAP_FBFETCH_COHERENT or not
16:32 karolherbst: imirkin: wanna take a look at my demmt patches to support newer chips/drivers or should I just go ahead and merge it? https://github.com/envytools/envytools/pull/192
17:51 imirkin_: karolherbst: go ahead and merge that demmt stuff
17:51 imirkin_: doesn't seem obviously wrong :)
17:51 imirkin_: if you're just adding support for new ioctl's, doesn't really need review
17:51 karolherbst: mhh, true
17:52 imirkin_: if you're changing something core, like ib-related stuff, probably worth a look
17:52 karolherbst: yeah.. I still have my UVM support stuff locally
17:52 karolherbst: that will be annoying
17:52 imirkin_: as for fbfetch coherent -- i don't think our hw supports that. or if it does, then not the way we're using it.
17:52 karolherbst: probably
17:53 imirkin_: fbfetch coherent means that we can access the current state of the fb without funny flushes in between
17:54 imirkin_: this is needed for advanced_blend_coherent as well as like EXT_framebuffer_something_fetch
18:14 imirkin_: gah! we still haven't merged pendingchaos's PIPE_CAP_IMAGE_LOAD_FORMATTED patches for nouveau?
18:15 imirkin_: need to re-check what the situation was. iirc i looked at them, conceivable i had some issues with them
18:20 karolherbst: mhh, currently wondering how I want to do this real nouveau shader cache stuff :/
18:20 karolherbst: or rather, what to cache...
18:24 karolherbst: mhh, I think essentially I would like to have a if(cached) else thing inside nvc0_program_translate?
18:24 karolherbst: any better idea?
18:24 imirkin_: ideally it wouldn't get that far... although maybe
18:24 imirkin_: i forget how the code is laid out
18:25 karolherbst: yeah...
18:25 imirkin_: and i don't really have time to look now, sorry
18:25 karolherbst: nvc0_program_translate calls into the compiler though
18:25 karolherbst: and manages that nv50_ir_prog_info struct
18:25 imirkin_: ok, well all the artifacts have to be recreated of course
18:25 karolherbst: sure
18:25 imirkin_: including all the fixups/etc
18:26 karolherbst: I think we can skip most of it there, like all those info assignments
18:26 imirkin_: yeah, a bunch can be skipped
18:26 karolherbst: and just set evertying on prog we have to
18:26 imirkin_: i.e. a bunch of it is just helping the compiler
18:26 karolherbst: and then run through the header generation for cached and uncached
18:26 imirkin_: also note that in some cases, recompiles will be triggered
18:26 imirkin_: so some of the state has to be set up accurately
18:27 imirkin_: (since we don't support variants, and no one in their right mind would flip those settings)
18:27 karolherbst: right.. but I think the part between nv50_ir_generate_code and header generation is the critical stuff
18:27 karolherbst: mhhh, prog->tfb = nvc0_program_create_tfb_state
18:27 imirkin_: that's not real :)
18:27 karolherbst: but that's unrelated
18:27 imirkin_: that's just a token
18:27 karolherbst: ahh
18:27 imirkin_: we store the whole pointer
18:27 imirkin_: but we could just as well store a random id
18:27 karolherbst: still have to check what it reads out from prog->tfb = nvc0_program_create_tfb_state
18:27 karolherbst: ...
18:27 karolherbst: info
18:27 imirkin_: only ever compared for equality
18:28 imirkin_: (iirc)
18:28 karolherbst: mhh, it reads from the slots
18:28 imirkin_: oh hm. somewhere else it's only compared for equality ;)
18:28 karolherbst: uff
18:28 imirkin_: that should be separaet though
18:28 imirkin_: i guess i dunno.
18:29 imirkin_: is xfb linked into the program state, logically?
18:29 imirkin_: i don't think it is.
18:29 imirkin_: it informs how the program is compiled
18:29 karolherbst: it is
18:29 karolherbst: prog->ftb is a struct nvc0_transform_feedback_state
18:29 imirkin_: well, that's an impl detail.
18:29 imirkin_: my point is that it doesn't have to be.
18:29 karolherbst: yeah... probably not
18:29 imirkin_: that tfb state is only needed in a handful of places
18:29 karolherbst: I think I have to write down all the dependencies and see what we can move around to make it less painful
18:30 imirkin_: and with a shader cache, it would come from the draw call anyways.
18:30 karolherbst: imirkin_: well, the idea would be that we can also kill all recompilations with caching.. that would be nice
18:30 imirkin_: it helps to have a clear picture in your head as to wtf all these things are
18:30 imirkin_: lmk if you run into things you're unsure of
18:30 imirkin_: i have a full understanding of all that stuff, so i can probably explain how it fits together
18:31 karolherbst: mhh, I think I would like to hash all the input to nvc0_program_translate
18:31 karolherbst: ...
18:31 karolherbst: but that's a nvc0_program
18:32 karolherbst: well.. the most expensive part is calling into nv50_ir_generate_code anyway
18:32 karolherbst: if we can skip that, then we already achieved what we want to have
18:33 karolherbst: so maybe first step is if (cached) nv50_ir_code_from_cache else nv50_ir_generate_code?
18:33 karolherbst: and just recreated the info object in nv50_ir_code_from_cachelike nv50_ir_generate_code would do?
18:33 karolherbst: s/recreated/recreate/
18:33 karolherbst: that should be enough I think
18:33 imirkin_: something like that.
18:36 karolherbst: ohh, I have an idea
18:36 karolherbst: we should split nv50_ir_prog_info
18:36 karolherbst: into nv50_ir_prog_info_in and nv50_ir_prog_info_out
18:36 karolherbst: that would make it cleared what is actually the input and what is the result of the compilation :)
18:36 karolherbst: that struct is only used in this function anyway
18:37 karolherbst: and well inside codegen
21:31 karolherbst: cool, nvidia-uvm is actually MIT licensed, so we can just use their header files for demmt as well :)
21:31 karolherbst: just need a good strategy on how to track changes