00:26 skeggsb: imirkin_: what exactly were the symptoms for that patch? you shouldn't need to do that if mesa is accounting for the BO correctly...
00:26 imirkin_: for what patch?
00:26 skeggsb: the code bo one
00:27 imirkin_: hang when trying to run Dirt Rally with bindless enabled
00:27 imirkin_: thus thinking that it was bindless's fault
00:27 imirkin_: nouveau_bo_deal just nukes the bo entirely
00:27 imirkin_: er, nouveau_bo_del, but i dunno if that's the function name. wtvr. when the refcount drops to 0
00:28 skeggsb: yes, that's supposed to happen, the GPU PTEs will stick around as long as there's a command stream still referencing it though
00:28 skeggsb: (if mesa hasn't screwed up)
00:28 imirkin_: right.
00:28 imirkin_: the patch makes mesa not screw up.
00:28 skeggsb: in the wrong way ;)
00:28 imirkin_: in the easy way.
00:29 skeggsb: still incorrect, it doesn't help under memory pressure when TTM decides to evict it for the fun of it
00:29 skeggsb: (because it doesn't know it's in-use)
00:29 imirkin_: uhm
00:29 imirkin_: hrm
00:29 imirkin_: did i mess that up?
00:29 imirkin_: yes, i probably did, didn't i
00:29 imirkin_: BUT
00:29 imirkin_: the situation can still occur
00:29 skeggsb: well, you wouldn't need the patch you added if it was being accounted for correctly
00:30 imirkin_: nah, i still would
00:30 imirkin_: bufctx doesn't hold a reference to the bo
00:30 imirkin_: so if you delete the bo before you submit the batch
00:30 imirkin_: when you go to submit the batch it says, "what bo you talkin' 'bout"
00:30 skeggsb: right, yes, this would help with that part of the problem
00:31 skeggsb: is that what you were seeing rather than GPU PTEs not being present?
00:31 imirkin_: but ... right - i need to do like a nouveau_something_validate()
00:31 imirkin_: to make sure it doesn't get evicted
00:31 imirkin_: no, i was seeing the PTE's not there
00:31 imirkin_: and this fixed the PTE's to be there :)
00:31 imirkin_: but you're right - i need to make sure that bo is part of the batch and it probably isn't
00:31 imirkin_: and then i could del it as soon as the batch was submitted
00:31 imirkin_: but we don't have a way of doign that
00:32 imirkin_: so ... i just stick it in the fence work :)
04:03 karolherbst: duh... I found another Ra issue
04:03 karolherbst: well or a CFG issue in my code
04:04 karolherbst: " 73: export b128 # o[0xa0] $r11q" + " 74: export b128 # o[0xb0] $r12q" :) usually that doesn't make much sense
04:44 imirkin: fuck. all those resizes are broken too.
04:45 imirkin: grrr. skeggsb. why did you have to notice that my patch was broken. i could have sat here blissfully ignorant. but noooo.
04:52 skeggsb: try making that resizing stuff work nicely with multithreading (without requiring a massive lock around EVERYTHING)
05:02 karolherbst: mhh, something odd is happening with those phi nodes :( weird
05:03 imirkin: skeggsb: no thanks :)
05:04 imirkin: looks like a PUSH_REFN should do the trick
05:04 imirkin: once nouveau_pushbuf_refn is called, the pushbuf structure acquires a ref on the bo
05:04 imirkin: (kinda)
05:10 karolherbst: imirkin: is there anything CFG related for simple if/then/else I have to take care of except using the proper edges? I get kind of weird RA for phis and I am sure that at least the edges are the same as with TGSI
05:11 imirkin: pastebin nvir, pre- and post-ssa
05:12 karolherbst: https://gist.githubusercontent.com/karolherbst/061713922dc6737df74df53a4ba74838/raw/d9478e47227820f13f76c28018f12f7bfd139805/gistfile1.txt
05:13 imirkin: exports at the very end got you down?
05:13 karolherbst: yeah
05:13 karolherbst: BB:16 looks weird as well
05:13 karolherbst: same regs written twice with different values :)
05:13 imirkin: 17: mov u32 %r192 c0[0x0] (0)
05:13 imirkin: where does that come from?
05:14 imirkin: also none of that is the pre-ssa stuff
05:14 karolherbst: ahh, right
05:15 karolherbst: https://gist.githubusercontent.com/karolherbst/061713922dc6737df74df53a4ba74838/raw/c41bbe58b69602183bb3c1acc42f5f0cd628de1f/gistfile1.txt
05:15 karolherbst: was thinking pre RA :( my bad
05:16 imirkin: wait a second ... what the F?! we don't ever even stick the new screen->text into the bufctx!!
05:17 imirkin: how can that even be
05:17 imirkin: oh no, ok, it's sane.
05:18 skeggsb: it's definitely not sane, but iirc it's ok until a resize
05:19 karolherbst: mhh, might be that I got something wrong related to nested ifs... would be kind of weird though
05:19 imirkin: skeggsb: well, it resizes and immediately dumps the new bo into bufctx
05:20 imirkin: so it's just broken the way i thought rather than being much more broken :)
05:32 karolherbst: imirkin: is there a problem if the same value is written to from more than two locations? especially if the blocks have a different depth?
05:33 imirkin: no
05:35 karolherbst: maybe the RA debug output helps here
05:39 imirkin: skeggsb: does this seem right? https://github.com/imirkin/mesa/commit/f4f1dbadc7827841842819b8aa4504cb4599d5f0
05:40 imirkin: it also doesn't crash with dirt rally, but that's not exactly a 100% accurate test ;)
05:40 imirkin: (although it did previously *reliably* cause the crash)
05:41 imirkin: karolherbst: so ... i have a question
05:41 imirkin: karolherbst: https://hastebin.com/jugihabera.cpp
05:42 karolherbst: yeah?
05:42 karolherbst: you can ignore those immediates here
05:42 imirkin: does it strike you as odd that you have "mov %rN, 0"; "export foo, %totally different reg"
05:42 karolherbst: no
05:42 imirkin: ok
05:42 karolherbst: nir doesn't make a difference between fixed access and relative ones
05:42 imirkin: just asking ;)
05:42 karolherbst: those 0x0 are indirects ;)
05:42 imirkin: gotcha
05:43 karolherbst: but I thought it is pointless to actually use indirects here :)
05:43 imirkin: wtvr, it's fine; code just seemed odd, wanted to make sure it wasn't a bigger error
05:43 karolherbst: yeah
05:45 imirkin: ok
05:45 imirkin: so
05:45 imirkin: there appears to be something wrong
05:45 imirkin: let's take one concrete example
05:45 imirkin: 197: export f32 # o[0xa4] %r252 (0)
05:46 imirkin: now, 252 comes from
05:46 imirkin: 172: phi u32 %r252 %r300 %r223 (0)
05:46 imirkin: let's look at the %r300 branch:
05:46 imirkin: 170: mov s32 %r300 %r292 (0) and previously 162: mov s32 %r292 %r192 (0)
05:46 imirkin: and we see that
05:46 imirkin: 56: ld u32 %r192 c0[0x0] (0)
05:47 imirkin: so ... it would stand to reason that the whole branch of a4 is == c0[0x0]
05:47 imirkin: however if you look at the code prior to opts
05:47 imirkin: er, prior to ssa
05:47 imirkin: 172: export f32 # o[0xa4] %r82 (0)
05:47 imirkin: and in that branch, 154: mov s32 %r82 %r108 (0); 146: mov s32 %r108 %r56 (0)
05:48 imirkin: er hm. crap. did i screw this up?
05:48 karolherbst: well I am quite sure, that if MemoryOpt is disabled, everything works as expected (kind of)
05:49 karolherbst: for this test it works, but maybe there is some serious issue somewhere
05:49 imirkin: yea dunno
05:49 karolherbst: I will take a look at the RA output, maybe this gives me some clues
05:50 imirkin: i wonder if some sizes are messed up ro something
05:50 imirkin: not everything ends up getting printed
05:51 karolherbst: yeah, maybe
05:53 imirkin: with the tgsi fe, what does the post-ssa-opt nvir look like?
05:53 imirkin: should be pretty similar i'd think
06:01 karolherbst: well "similar" https://gist.githubusercontent.com/karolherbst/468da80553ebeb67bf3e5943ee4ccb55/raw/f04d85f09340b17771fc627a12f126367bc2bd73/gistfile1.txt
06:11 imirkin: interesting
06:11 imirkin: nir avoids the lmem somehow?
06:11 imirkin: that seems very surprising.
06:13 imirkin: hm, looks like it could be legit though
06:14 imirkin: nope, i take that back
06:14 imirkin: do you have the glsl and nir src for this too?
06:17 karolherbst: https://gist.github.com/karolherbst/061713922dc6737df74df53a4ba74838
06:19 karolherbst: it somehow looks like nir does odd things here anyway
06:22 imirkin: right i see - so they assume that row/col < 3
06:22 imirkin: and do clever logic accordingly
06:22 karolherbst: yeah
06:23 karolherbst: also they do linking optimisations I think
06:23 karolherbst: the tgsi one also has much more outputs
06:26 karolherbst: ugh...
06:27 imirkin: yeah
06:27 imirkin: was about to ask
06:28 imirkin: there are 3 matrices on output
06:28 imirkin: do you only ever look at dst_matrix[1] or something?
06:28 imirkin: if so, you get link opts that perhaps nouveau doesn't get?
06:30 karolherbst: well, I do what nir tells me to do
06:31 karolherbst: nir uses up to VARYING_SLOT_VAR4
06:31 karolherbst: TGSI up to DCL OUT[6], GENERIC[5]
06:31 imirkin: yea
06:31 karolherbst: uhm
06:31 karolherbst: DCL OUT[7].xyz, GENERIC[6]
06:32 imirkin: right
06:32 karolherbst: so two more slots
06:32 imirkin: so it's not packed
06:32 imirkin: for some odd reason
06:32 karolherbst: no idea really
06:32 karolherbst: tgsi writes to w as well for all outputs
06:33 karolherbst: anyway, I think I found something in the RA output
06:33 imirkin: it feels like you did something REALLY quite dumb
06:33 imirkin: like forget to configure osmething basic
06:34 imirkin: i don't know what though
06:34 karolherbst: ahhhhh
06:34 karolherbst: I think I found it
06:35 karolherbst: RA doesn't merge the exports values
06:35 karolherbst: only merges
06:35 imirkin:is finally checking your code
06:35 karolherbst: uhm... wait
06:35 karolherbst: this looks odd/different
06:36 karolherbst: no, it is fine, I thought something else
06:37 imirkin: uhm
06:37 imirkin: unrelated, but...
06:37 imirkin: if (info->io.genUserClip > 0) {
06:37 imirkin: probably only want to do that for output == position
06:38 karolherbst: tgsi does it for all as well
06:38 karolherbst: last block in scanSource
06:38 imirkin: && dst.getIndex(0) == code->clipVertexOutput
06:38 imirkin: not quite.
06:39 karolherbst: ahh, you mean in storeDst
06:39 imirkin: yes.
06:39 karolherbst: ahh I see
06:39 imirkin: not your current problem. but a problem :)
06:39 karolherbst: :)
06:58 imirkin:&
06:58 imirkin: give bindless a shot with DOW3 or something if you get a chance.
07:00 karolherbst: yeah, I usually plan to do things like that at the weekend
07:09 karolherbst: wait a second
07:09 karolherbst: imirkin: is there a strict requiernment for wide regs to be aligned as well?
07:10 karolherbst: or are we just doing that, because it makes things simplier?
07:10 karolherbst: because with TGSI, we don't end up with stuff like this anyway: "export b128 # o[0xa0] $r11q"
07:13 karolherbst: I am currently trying to find the code in RA where this decision is made
07:24 karolherbst: okay :)
07:24 karolherbst: imirkin: there are only two calls to nv50_ir::RegisterSet::assign with size == 4 :)
07:25 karolherbst: I would have expected at least 5
07:25 karolherbst: so node->colors doesn't get filled or the wrong nodes remain
07:35 karolherbst: duh.. not all of those RA merges are pushed on the stack
10:10 karolherbst: mhh, n.livei.isEmpty() is true for some of the export nodes....
10:10 karolherbst: this sounds wrong
10:54 karolherbst: imirkin: !!!!!! DUH!!!1 the heck.... I could flip a table because of the issue now... I found the issue
10:54 karolherbst: apperantly RA doesn't like if an export consumes a phi value directly if it is a wide value
10:54 karolherbst: so
10:54 karolherbst: I need to insert silly moves
10:54 karolherbst: *movs
10:55 karolherbst: what a silly issue
10:57 karolherbst: of course I end up with silly moves now if MemoryOpt is enabled, but meh
11:00 karolherbst: this is no problem for nir SSA values, because they don't end up getting phi nodes
11:57 karolherbst: but adding those movs now leads to weird GPU freezes in other tests :(
13:06 karolherbst: allthough... it should work or TGSI is doing some magic here
13:06 karolherbst: ahh no, the exports always get mov in front
13:08 karolherbst: I really should write down all those RA bugs
13:09 karolherbst: so a "MOV OUT[0], TEMP[3]" results into 4 x mov + 4 x export, I am just not quite sure where this mov is coming from
13:10 karolherbst: ohhhhh
13:10 karolherbst: ohhhhh
13:10 karolherbst: ...
13:11 pmoreau: ihhhhh? :-p
13:11 karolherbst: the generated mov is the original opcode generated
13:11 karolherbst: and the export is just a fixup, because we can't move into o[] and so on
13:11 karolherbst: I guess I just add movs for every export I do then....
13:12 pmoreau: Yeah, you should write them down, possibly a Trello task + a bug report with more details.
13:12 karolherbst: I just add TODOs in the from_nir file... good enough for now :p
13:13 pmoreau: :-)
13:14 pmoreau: I think I’ll defer my sending of the clover series: I should test that I properly handle all possible arguments (for OpenCL 1.2 at least), which means supporting samplers and images in Nouveau as well, so that I can test it.
13:16 karolherbst: you mean in clover_
13:16 karolherbst: ?
13:16 karolherbst: ohhh
13:16 karolherbst: I see
13:16 karolherbst: well
13:16 karolherbst: I don't think this is really required if you just add support for consuming spir-v
13:16 karolherbst: I mean, no driver will be able to use it anyway
13:17 karolherbst: for now
13:22 pmoreau: True, I guess I could leave it as “not implemented”, but I’d like to have it complete.
13:29 karolherbst: pmoreau: I think it depends on how much interested other drivers are in adding OpenCL support
13:29 karolherbst: but I think all the other drivers have other priorities right now
13:29 pmoreau: If no one is really interested right now, then I can spend more time on it ;-)
13:30 karolherbst: ;)
14:57 imirkin: karolherbst: oh yeah. merge's don't like phi's as arguments.
14:57 karolherbst: :)
14:57 imirkin: i even knew that.
14:58 imirkin: but ... i forgot.
14:58 karolherbst: it isn't exactly something obvious
14:58 karolherbst: missing things are hard to spot ;)
14:59 imirkin: yes.
14:59 karolherbst: I still have some remaining variable-indexing fails left, but those are all related to tessellation shaders and input/output handling, no issue within the variable indexing stuff directly
15:00 karolherbst: tessellation inputs/outputs are odd in nir....
15:02 karolherbst: kind of have to read up on how that vec4[32] input things works out, or when is it an array for real or when is it just some tesselation stuff
15:05 karolherbst: I kind of still need to figure out if the outer dimension is per_vertex or not
15:05 karolherbst: my first guess is, that it only really affects gl_position
15:07 karolherbst: or more like everything which is related vertex input/output data
15:07 karolherbst: or all inputs in tess ctrl shaders...
15:07 karolherbst: I should read it up
15:09 karolherbst: ohh, my code was nearly correct already, just need to fixup tess eval outputs
15:13 imirkin: tess is a bit of a mindfuck
15:13 imirkin: note that while the IR probably says to write output[gl_InvocationID], in practice the hw can only write that output
15:14 imirkin: so you can skip the indirection there, i believe
15:14 imirkin: i forget if the tgsi backend does or not
15:14 karolherbst: well in nir I get a per_vertex intrinsic
15:14 imirkin: for storing?
15:14 karolherbst: and loading
15:14 imirkin: well i'm talking specifically about storing
15:14 imirkin: you can load any invocations' input
15:14 imirkin: or output for tha tmatter
15:14 imirkin: but you can only *store* to your invocation
15:14 karolherbst: I've read about it
15:14 karolherbst: ahh
15:15 karolherbst: yeah, makes sense
15:15 karolherbst: well in nir_intrinsic_store_per_vertex_output I don't read the invocation id
15:21 karolherbst: well
15:22 karolherbst: in the shader I have here, nir reads the invocation_id, adds 0 and feeds it into store_per_vertex_output
15:22 karolherbst: I guess they added it just in case something wants to know the id regardless
15:26 imirkin: yeah
15:26 imirkin: a future ext could relax the requirement, for example
15:27 karolherbst: I guess no current hw would be able to handle that without hacking that through patch outputs?
15:27 imirkin: dunno
15:27 imirkin: some of the mobile parts might - tess is a bit of a hack there
15:27 imirkin: on adreno it's just a gmem buffer :)
15:27 imirkin: nothing special about vertex attribs
15:27 karolherbst: :)
15:28 imirkin: and even on nvidia, i think maxwell+ have stopped using outputs and are accessing memory directly
15:28 imirkin: the old ways still work though
15:29 karolherbst: I guess the new way is... faster?
15:29 imirkin: seems like a reasonable guess :)
15:32 karolherbst: I don't think I should continue adding new features, like implementing load_patch_vertices_in
15:32 imirkin: well, that's just a sysval
15:32 karolherbst: uhh, okay, sounds easy enough
15:34 karolherbst: I doubt load_per_vertex_output is a sv value though ;)
15:35 karolherbst: that one sounds a bit annoying
15:38 karolherbst: uf ../src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp:1023:assignSlots: Assertion `vary < 80' failed.
15:53 imirkin: wtf is that assert btw?
15:53 imirkin: anyways, gtg
15:53 imirkin: so much work. so little time.
15:56 karolherbst: I put it :) to many input/outupts
15:57 karolherbst: *outputs
16:06 karolherbst: argh..
16:06 karolherbst: driver_location of 1862694993 makes sense :)
17:37 pmoreau: So, now I’m left with the hard part (I think) regarding textures support: say where to find the texture and the sampler.
17:39 pmoreau: I’ll study the from_tgsi a bit more :-)
17:42 karolherbst: pmoreau: have fun :)
17:43 pmoreau: :-)
17:43 karolherbst: imirkin: do you know if there was some super trick regarding gl_PrimitiveID? I don't get it to work at all :(
17:43 imirkin_: yeah, but not at the input ir level
17:43 imirkin_: you have to set bits in the SPH and whatnot of course
17:44 karolherbst: well I already found that in nir the prim_id isn't an input....
17:44 karolherbst: but I already fixed that up
17:44 imirkin_: it's a sysval
17:44 imirkin_: like in tgsi i think
17:44 imirkin_: with the appropriate cap set
17:44 karolherbst: in tgsi it is an input :)
17:44 imirkin_: oh yeah
17:44 imirkin_: something else is a sysval
17:45 imirkin_: PIPE_CAP_TGSI_FS_FACE_IS_INTEGER_SYSVAL and PIPE_CAP_TGSI_FS_POSITION_IS_SYSVAL
17:45 karolherbst: the converter generates a rdsv, but it gets converted to vfetch
17:45 imirkin_: those were the 2 i was thinking of
17:46 karolherbst: mhh is there a reason why tgsi reports one input more than actually needed?
17:46 imirkin_: pastebin
17:47 karolherbst: well, the tgsi only has one, but I was looking at the info struct
17:47 imirkin_: oh
17:47 imirkin_: dunno
17:47 imirkin_: (don't even know which info struct)
17:47 karolherbst: https://gist.github.com/karolherbst/84cfc95c286211e0f0d95c6669aa16a2
17:48 imirkin_: werid.
17:48 karolherbst: nv50_ir_prog_info
17:48 karolherbst: yeah
17:48 imirkin_: i have no idea what computes that
17:48 imirkin_: it probably sees that there are lines
17:48 imirkin_: hence 2 vertices
17:48 imirkin_: and does 2 * 1
17:48 imirkin_: not realizing that primid is actually not indexed
17:48 imirkin_: but i dunno what numInputs is used for
17:49 karolherbst: well the only line setting numInputs is "info->numInputs = scan.file_max[TGSI_FILE_INPUT] + 1;"
17:49 karolherbst: size of info->in
17:50 karolherbst: well, the only difference in the shader I see is that "export u32 # o[%r9][0x80] %r7 (0)" after the last emit
17:51 karolherbst: uhm %r9 is the result of the emit...
17:51 imirkin_: yeah
17:51 imirkin_: that's how GS works
17:51 imirkin_: emit produces a value
17:51 imirkin_: and it's used as the "index" for the 2d of the next export(s)
17:52 imirkin_: (yay. you're learning.)
17:52 karolherbst: okay, but the exports are pointless as long as there is no additional emit?
17:52 imirkin_: yes.
17:52 karolherbst: mhhh
17:52 karolherbst: weird
17:52 imirkin_: so that second export is fairly pointless.
17:52 karolherbst: yeah, we do that in every gs with TGSI :)
17:53 karolherbst: the TGSI even says: GS_MAX_OUTPUT_VERTICES 1
17:53 imirkin_: yeah. it works out a little dumb for stupid tests
17:53 imirkin_: but for real GS's, it's fine
17:54 karolherbst: mhhhh
17:55 karolherbst: something is super odd then
17:55 karolherbst: I am sure the shaders are semantically equal and the nv50_ir_prog_info objects as well
17:56 imirkin_: check the header
17:56 karolherbst: the shader headers are equal as well
17:56 imirkin_: ok :)
17:56 imirkin_: well, if everything the same, that means it works right? :)
17:56 karolherbst: well
17:56 karolherbst: the only difference is that silly export
17:56 imirkin_: pastebin
17:56 karolherbst: afaik
17:57 karolherbst: https://gist.github.com/karolherbst/46c7156959ddc26a8dc9670305402b98
17:58 imirkin_: 3: emit u32 $r0 $r255 $r255 (8)
17:58 imirkin_: and let's see what does the tgsi shader do?
17:59 imirkin_: hm. the same thing. ok
17:59 imirkin_: i thought you had to specify imms, but RZ works too i guess
17:59 karolherbst: the imms buffer is just for TGSI loadImm stuff
18:00 karolherbst: it has no effect at runtime afaik
18:00 karolherbst: I don't fillt it anywhere and I doubt it would cause this test to fail :)
18:02 imirkin_: GS is type 2?
18:02 karolherbst: I think so
18:02 karolherbst: 3,4 are tess and 0,1 are vertex/fragment or so
18:03 imirkin_: yea dunno
18:03 imirkin_: so
18:03 imirkin_: case TGSI_SEMANTIC_PRIMID:
18:03 imirkin_: vp->hdr[5] |= 1 << 24;
18:03 imirkin_: i think that gets triggered...
18:04 karolherbst: looks like it
18:06 imirkin_: dunno. can't look more. sorry.
18:09 karolherbst: doing a mmt trace should still work, right?
18:38 imirkin_: you need to disable the "new" interfaces in the loader
18:41 imirkin_: nvif = false somewhere