11:55karolherbst: imirkin: mhh, seems like on nvc0 we can just go ahead and enable support for atomic float min/max? (+ the required TGSI bits)
11:55karolherbst: but from a hardware perspective that should be fine, right?
11:55karolherbst: didn't check what nvidia does though
11:55karolherbst: but the tests seem to just pass on my pascal
12:07karolherbst: mhh, something up with the float atomcas
12:09karolherbst: ahh, GL_NV_shader_atomic_float also adds a cas variant, nice
12:11karolherbst: heh.. mhh, or maybe that doesn't work indeed
12:11karolherbst: mhh, meh
12:13karolherbst: seems like the hw can deal with min/max though
12:28karolherbst: but we might want to expose GL_NV_shader_atomic_float :)
12:28karolherbst: ohh, fermi/kepler..
12:33HdkR: Older stuff would be very sad :P
13:32imirkin: karolherbst: we expose atomic float
13:32imirkin: just not the intel min/max thing
13:33karolherbst: just the add stuff, yes
13:33karolherbst: and only for fermi/kepler
13:33imirkin: on maxwell+ ... what was the issue ...
13:33imirkin: i had figured it out, but it sometimes upset RA?
13:33karolherbst: but the min/max thing seems to work on maxwell, apparently
13:33karolherbst: could be a coincidence though
13:33imirkin: uhhhh ... that just means it's bad tests
13:33karolherbst: all tests passed besides the cas thing
13:33imirkin: iirc there is no F32.MIN variant
13:34karolherbst: mhh, hardware didn't complain
13:34karolherbst: and at least nvdisasm is happy
13:34imirkin: yeah, that's coz we emit like U32.MIN :)
13:34karolherbst: I fixed that ;)
13:34karolherbst: and we actually emit f32
13:34karolherbst: but yeah.. no idea what's going on there
13:34karolherbst: and pointless without f32 cas support
13:34imirkin: well, cas is weird anyways
13:35imirkin: and has fallback support
13:35imirkin: did you test ssbo's? or only shared memory?
13:35imirkin: ssbo's will take a fallback path on fermi and maybe kepler
13:35imirkin: shared will take a fallback path
13:35karolherbst: mhh, let me check
13:35imirkin: which can be used to implement ~anything
13:35imirkin: (at a cost)
13:36karolherbst: piglit tests shared and ssbos
13:36imirkin: hm ok
13:36karolherbst: but.. some tests even passed when I was just emiting the u32 opcode
13:36karolherbst: so it might be that the tests are just bad
13:36imirkin: tests ain't perfect
13:36karolherbst: _but_ emiting f32 fixed the nan tests
13:36imirkin: what does nvdisasm produce?
13:36karolherbst: for CAS: SUATOM.D.1D_BUFFER.CAS.F32.FTZ.RN.IGN RZ, [R4], R0, R3 ;
13:37imirkin: SUATOM = maxwell+ no?
13:37karolherbst: for max: SUATOM.D.1D_BUFFER.MAX.F32.FTZ.RN.IGN R0, [R4], R1, R0 ;
13:37imirkin: yeah ok. maybe it works on maxwell+
13:37karolherbst: but CAS caused a shader trap
13:37karolherbst: gr: GPC0/TPC0/TEX: 80000041
13:37imirkin: check the patch i had to enable this stuff
13:38karolherbst: ohh, good idea
13:39imirkin: not 100% sure that's the latest, but at least it should give you an idea
13:39karolherbst: heh.. weird
13:39imirkin: the RA bit is unrelated, i think
13:39karolherbst: shared-atomiccompswap-float passed though. the ssbo variant caused issues
13:39karolherbst: or.. let me check that actually
13:40imirkin: hm. did i push that change too then?
13:40karolherbst: no.. I think i stillr an piglit when the u32 cas opcode was emited
13:40imirkin: yeah, i didn't end up pushign that
13:41imirkin: ok, i found the latest ...
13:41karolherbst: k.. it traps
13:42imirkin: karolherbst: https://github.com/imirkin/mesa/commits/tmp-for-karol
13:42imirkin: have a look at the ATOM-related commits
13:43karolherbst: looking at the piglit tests.. they don't really test much
13:43karolherbst: the nan does
14:04imirkin: karolherbst: this was the bogus change i needed to make it work: https://github.com/imirkin/mesa/commit/50fc29b7cced132dbabfb1dd28d172dc9f9b0ddf
14:04karolherbst: ohh uhm..I think I remember that one
14:04imirkin: and this is the change which has the *correct* lowering for shared atomic float: https://github.com/imirkin/mesa/commit/071c0fd6089c1fc21a33623c03430f4b0a070002
14:05imirkin: obviously would need to be extended for min/max
14:05imirkin: but should be fairly straightforward.
15:28karolherbst: imirkin: btw, I am also wondering if we should enable PIPE_CAP_FBFETCH_COHERENT or not
16:32karolherbst: imirkin: wanna take a look at my demmt patches to support newer chips/drivers or should I just go ahead and merge it? https://github.com/envytools/envytools/pull/192
17:51imirkin_: karolherbst: go ahead and merge that demmt stuff
17:51imirkin_: doesn't seem obviously wrong :)
17:51imirkin_: if you're just adding support for new ioctl's, doesn't really need review
17:51karolherbst: mhh, true
17:52imirkin_: if you're changing something core, like ib-related stuff, probably worth a look
17:52karolherbst: yeah.. I still have my UVM support stuff locally
17:52karolherbst: that will be annoying
17:52imirkin_: as for fbfetch coherent -- i don't think our hw supports that. or if it does, then not the way we're using it.
17:53imirkin_: fbfetch coherent means that we can access the current state of the fb without funny flushes in between
17:54imirkin_: this is needed for advanced_blend_coherent as well as like EXT_framebuffer_something_fetch
18:14imirkin_: gah! we still haven't merged pendingchaos's PIPE_CAP_IMAGE_LOAD_FORMATTED patches for nouveau?
18:15imirkin_: need to re-check what the situation was. iirc i looked at them, conceivable i had some issues with them
18:20karolherbst: mhh, currently wondering how I want to do this real nouveau shader cache stuff :/
18:20karolherbst: or rather, what to cache...
18:24karolherbst: mhh, I think essentially I would like to have a if(cached) else thing inside nvc0_program_translate?
18:24karolherbst: any better idea?
18:24imirkin_: ideally it wouldn't get that far... although maybe
18:24imirkin_: i forget how the code is laid out
18:25imirkin_: and i don't really have time to look now, sorry
18:25karolherbst: nvc0_program_translate calls into the compiler though
18:25karolherbst: and manages that nv50_ir_prog_info struct
18:25imirkin_: ok, well all the artifacts have to be recreated of course
18:25imirkin_: including all the fixups/etc
18:26karolherbst: I think we can skip most of it there, like all those info assignments
18:26imirkin_: yeah, a bunch can be skipped
18:26karolherbst: and just set evertying on prog we have to
18:26imirkin_: i.e. a bunch of it is just helping the compiler
18:26karolherbst: and then run through the header generation for cached and uncached
18:26imirkin_: also note that in some cases, recompiles will be triggered
18:26imirkin_: so some of the state has to be set up accurately
18:27imirkin_: (since we don't support variants, and no one in their right mind would flip those settings)
18:27karolherbst: right.. but I think the part between nv50_ir_generate_code and header generation is the critical stuff
18:27karolherbst: mhhh, prog->tfb = nvc0_program_create_tfb_state
18:27imirkin_: that's not real :)
18:27karolherbst: but that's unrelated
18:27imirkin_: that's just a token
18:27imirkin_: we store the whole pointer
18:27imirkin_: but we could just as well store a random id
18:27karolherbst: still have to check what it reads out from prog->tfb = nvc0_program_create_tfb_state
18:27imirkin_: only ever compared for equality
18:28karolherbst: mhh, it reads from the slots
18:28imirkin_: oh hm. somewhere else it's only compared for equality ;)
18:28imirkin_: that should be separaet though
18:28imirkin_: i guess i dunno.
18:29imirkin_: is xfb linked into the program state, logically?
18:29imirkin_: i don't think it is.
18:29imirkin_: it informs how the program is compiled
18:29karolherbst: it is
18:29karolherbst: prog->ftb is a struct nvc0_transform_feedback_state
18:29imirkin_: well, that's an impl detail.
18:29imirkin_: my point is that it doesn't have to be.
18:29karolherbst: yeah... probably not
18:29imirkin_: that tfb state is only needed in a handful of places
18:29karolherbst: I think I have to write down all the dependencies and see what we can move around to make it less painful
18:30imirkin_: and with a shader cache, it would come from the draw call anyways.
18:30karolherbst: imirkin_: well, the idea would be that we can also kill all recompilations with caching.. that would be nice
18:30imirkin_: it helps to have a clear picture in your head as to wtf all these things are
18:30imirkin_: lmk if you run into things you're unsure of
18:30imirkin_: i have a full understanding of all that stuff, so i can probably explain how it fits together
18:31karolherbst: mhh, I think I would like to hash all the input to nvc0_program_translate
18:31karolherbst: but that's a nvc0_program
18:32karolherbst: well.. the most expensive part is calling into nv50_ir_generate_code anyway
18:32karolherbst: if we can skip that, then we already achieved what we want to have
18:33karolherbst: so maybe first step is if (cached) nv50_ir_code_from_cache else nv50_ir_generate_code?
18:33karolherbst: and just recreated the info object in nv50_ir_code_from_cachelike nv50_ir_generate_code would do?
18:33karolherbst: that should be enough I think
18:33imirkin_: something like that.
18:36karolherbst: ohh, I have an idea
18:36karolherbst: we should split nv50_ir_prog_info
18:36karolherbst: into nv50_ir_prog_info_in and nv50_ir_prog_info_out
18:36karolherbst: that would make it cleared what is actually the input and what is the result of the compilation :)
18:36karolherbst: that struct is only used in this function anyway
18:37karolherbst: and well inside codegen
21:31karolherbst: cool, nvidia-uvm is actually MIT licensed, so we can just use their header files for demmt as well :)
21:31karolherbst: just need a good strategy on how to track changes