02:02 imirkin: in case ReinUsesLisp reads logs ... look at what i did for the existing h* support wrt the H1 flag. iirc it goes before the reg.
02:05 imirkin: look at e.g. 5cc0_1/2/3
02:06 imirkin: or 50f0
03:26 imirkin: ReinUsesLisp: see chan logs for your h0 question
03:27 ReinUsesLisp: ok
03:29 ReinUsesLisp: in this case it's different, they are only available when src is F16
03:29 ReinUsesLisp: I could add them right after src's F16, but that's not how it looks in nvdisasm
03:30 imirkin: yeah, there are a lot of differences against nvdisasm
03:30 imirkin: we stick the neg/abs thing before
03:30 imirkin: i think h0 fits into that
03:30 imirkin: or b0/b1/b2/b3 for certain I2F cases
03:30 ReinUsesLisp: ok ^^
03:30 ReinUsesLisp: do you know if these are SM60?
03:31 ReinUsesLisp: because I'm not testing these in desktop
03:31 imirkin: no. just check with nvdisasm -b SM50 vs SM60
03:31 imirkin: this is what i do: https://hastebin.com/wowigabaci.java
03:32 ReinUsesLisp: oh, hastebin is alive
03:32 imirkin: have to do it in groups of 3 ops, otherwise nvdisasm bails
03:32 imirkin: yeah, it came back
03:33 ReinUsesLisp: btw nvdisasm detects 0ULL as NOP, I don't know if it's worth adding that one
03:33 imirkin: you sure?
03:33 imirkin: i get a segfault
03:34 imirkin: perhasp it doesn't like the 0x7e0 schedule for the NOP?
03:35 ReinUsesLisp: https://hastebin.com/zukitifako.css
03:35 imirkin: btw, thanks for improving envydis :)
03:35 imirkin: hmmmm
03:36 ReinUsesLisp: np, I've been assembling shaders by hand with envyas lately
03:37 imirkin: my nvdisasm disagrees.
03:37 imirkin: https://hastebin.com/qazedufipe.rb
03:37 ReinUsesLisp: I usually get more segfaults with Linux' nvdisasm
03:38 imirkin: that means something's amiss
03:38 imirkin: or a bug in nvdisasm
03:39 ReinUsesLisp: I could send you that shader but it's copyrighted :P
03:39 imirkin: well, note how it's after the EXIT
03:39 imirkin: so it's just padding
03:39 imirkin: the fact that nvdisasm decoded it as NOP could be a different manifestation of a bug
03:40 imirkin: how's your emu going?
03:40 ReinUsesLisp: I think those zeroes are added by my extractor
03:40 ReinUsesLisp: nicely
03:40 ReinUsesLisp: we are getting really complex shaders now
03:40 imirkin: what's the name of the project again?
03:41 ReinUsesLisp: yuzu or Ryujinx, but I've been working on yuzu
03:42 ReinUsesLisp: we added support for compute shaders recently so all those the ugly instructions are starting to pop up
03:42 ReinUsesLisp: STG, VOTE, SHFL, ATOMS
03:43 imirkin: well, ATOMS isn't so bad - that's directly supported
03:43 imirkin: just atomics on shared memory
03:43 imirkin: and VOTE is available with ARB_something
03:43 ReinUsesLisp: yes, the problem is that GLSL doesn't let you reinterpret shared memory
03:43 imirkin: huh?
03:43 imirkin: intBitsToFloat?
03:44 ReinUsesLisp: not with an inout parameter :P
03:44 ReinUsesLisp: I emulate it with SSBOs because those can overlap in binding
03:44 imirkin: shared memory wouldn't be an inout parameter
03:45 ReinUsesLisp: how do you tell GLSL to do an integer atomic addition instead of a float atomic addition?
03:45 ReinUsesLisp: if shared memory is declared as float
03:45 imirkin: just do integer. nobody does float :)
03:46 imirkin: you can also fake it with compare-and-swap
03:46 imirkin: (+ loop)
03:46 ReinUsesLisp: yes, I want to cover all cases :P
03:46 ReinUsesLisp: wouldn't that be non-atomic?
03:46 imirkin: nah, it'd be atomic
03:46 imirkin: just inefficient
03:46 ReinUsesLisp: oh, ATOMS doesn't support floats, just signed/unsigned integers
03:47 ReinUsesLisp: it was SUATOM the evil one
03:47 imirkin: however shared memory DOES support float atomic ops
03:47 imirkin: which means they have to be emulated with somethign like this:
03:47 ReinUsesLisp: let's see what Nvidia emits
03:47 imirkin: oh, nevermind. i don't have an example handy
03:48 imirkin: but it's basically a while loop that tries to compare-and-swap until it works out
03:48 imirkin: (atomic compare and swap)
03:49 ReinUsesLisp: glslangValidator rejects atomic additions with float
03:49 imirkin: SUATOM is similar ... just with an image. esp if you require bindless images, should be a breeze.
03:49 imirkin: you forgot to #extension GL_NV_shader_atomic_float: enable ?
03:50 ReinUsesLisp: I think it doesn't affect shared memory
03:50 imirkin: the ext sure does
03:52 ReinUsesLisp: yup, now it built, time to extract the binary
03:52 imirkin: iirc there is a ATOMS.ADD.F32 variant though with GM107+
03:53 imirkin: on kepler it required a fallback
03:53 ReinUsesLisp: https://hastebin.com/enurokojux.php
03:53 imirkin: ah no, ok. it does the compare-and-swap
03:53 imirkin: (CAS = compare and swap)
03:54 imirkin: basically CAS returns the original value
03:54 imirkin: so it does like
03:54 imirkin: x' = x + 5
03:54 imirkin: t = cas(x, x')
03:54 imirkin: if t != x: repeat
03:55 ReinUsesLisp: interesting, I guess I'll do that, calculating the workgroup padding of shared memory is annoying
03:55 imirkin: and you can totally just do that in GLSL when emulating ATOM.E.ADD.F32
03:55 ReinUsesLisp: I'll use the NV ext when available too :P
03:56 imirkin: well that's just cheating!
03:56 imirkin: if you're going to do nv exts, might as well just use NV_gpu_shader5
03:56 imirkin: which basically lets you specify the various nasty nvidia ops directly
03:56 imirkin: er
03:56 imirkin: NV_gpu_program5
03:56 ReinUsesLisp: I thought of using ARB, but it'll get messy to maintain
03:57 imirkin: yeah, it's ARB-ish
03:57 ReinUsesLisp: I already use NV_gpu_shader5 to emulate votes and shuffles, it's the only way to ensure it works just like in NX without using SSBOs
03:57 imirkin: it'll effectively be a separate backend for you
03:57 ReinUsesLisp: err NV_whatever_shuffles
03:57 imirkin: right
03:57 ReinUsesLisp: when not available I emulate it as if the GPU had a warp size of 1
03:57 imirkin: NV_shader_thread_shuffle presumably
03:58 imirkin: https://www.khronos.org/registry/OpenGL/extensions/NV/
03:58 imirkin: have a browse through that :)
03:58 ReinUsesLisp: I've already watched it over and over, I can't find what emits VOTE.VTG :P
03:58 imirkin: what's VTG?
03:58 ReinUsesLisp: it's some clip state machine trash
03:59 imirkin: right, but ... i'm familiar with like VOTE.EQ
03:59 ReinUsesLisp: vertex, tessellation, geometry
03:59 imirkin: oh, i wonder if it's just vote for those stages?
03:59 ReinUsesLisp: you can vote on any stage afaik
03:59 ReinUsesLisp: it has a different encoding https://github.com/envytools/envytools/blob/master/envydis/gm107.c#L2043
04:00 imirkin: yeah, but what it does on non-frag stages is questionable
04:00 imirkin: huh. takes an immediate too...
04:00 imirkin: have you ever encountered it in an actual program?
04:00 ReinUsesLisp: yes
04:00 ReinUsesLisp: one second
04:01 imirkin: hopefully a non-frag stage?
04:01 ReinUsesLisp: vertex
04:01 ReinUsesLisp: https://hastebin.com/inafododaw.php
04:02 ReinUsesLisp: the immediate might be controlling something with the predicates
04:02 imirkin: looks like a shuffle of some sort
04:02 imirkin: i won't even ask wtf EXIT CC.FCSM_TR is
04:02 ReinUsesLisp: FLOAT CLIP STATE MACHINE (TR???)
04:02 imirkin: SR_WSCALEFACTOR_XY
04:03 imirkin: impressive that these get used. we never do.
04:03 ReinUsesLisp: looks like Nintendo paid extra to Nvidia on Smash Ultimate
04:04 imirkin: ok, so varyings 0x70..0x7c are gl_Position
04:04 ReinUsesLisp: now that I think about it I think FCSM_TR was used in a game that used double vertex shaders
04:04 imirkin: if gl_Position isn't set, that primitive effectively gets discarded
04:04 ReinUsesLisp: doesn't it default to zero?
04:04 ReinUsesLisp: 0,0,0,1
04:04 imirkin: ok, so there's some shady VP_A and VP_B shader business
04:04 imirkin: doubt it
04:04 imirkin: i never understood what it really was
04:05 imirkin: we just always use VP_B, iirc
04:05 ReinUsesLisp: it's looks like a D3D10 era compute
04:05 ReinUsesLisp: so does NV
04:06 ReinUsesLisp: thanks for supporting GL_ARB_compute_variable_group_size, we use that to emulate compute for now ^^
04:07 imirkin: you could just do a lot of recompiles
04:07 ReinUsesLisp: that'd invalidate people's transferable cache, which gets them angry
04:07 imirkin: iirc we reduce the max allowed group size when it's variable
04:07 ReinUsesLisp: I usually stack some breaking changes and push them togheter
04:08 imirkin: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c#n546
04:08 imirkin: you only get 512 on fermi, 1024 on kepler+ though
04:08 imirkin: and iirc amd does something similar
04:08 imirkin: basically there aren't an infinite number of regs on the chip, sadly
04:08 imirkin: invocs * per-thread gpr usage <= N
04:11 imirkin: 32K on fermi, 64K on kepler1, and i guess it got bumped up to 256K on kepler2+
04:11 imirkin: or we limit to 64 regs on kepler2+ for compute with varying groups?
04:13 ReinUsesLisp: I didn't investigate it since desktop Nvidia limits are NX limits (as long as NVN is not getting premium access to something)
04:13 imirkin: hm. looks like 64k throughout, which means we limit compute to 64 regs on all gens if you use variable size
04:14 ReinUsesLisp: when you supports half floats, here's how HSETP2 seems to work https://github.com/yuzu-emu/yuzu/blob/master/src/video_core/shader/decode/half_set_predicate.cpp
04:14 ReinUsesLisp: and an unmerged fixup https://github.com/yuzu-emu/yuzu/pull/2802/files#diff-fcf26e378dd9a62ff9a1127ccee983f8
04:15 imirkin: cool
04:16 ReinUsesLisp: that code has a bug if a shader looks like this https://hastebin.com/owofehedey.bash
04:16 ReinUsesLisp: but that's minor and doesn't matter on hardware
04:17 imirkin: i did the h* ops based on nvdisasm, i could have messed something up btw
04:17 imirkin: check carefully.
04:19 ReinUsesLisp: not my work so I can't talk much about it yet, I'm running those tests on NX so if something's broken it will pop up
04:25 imirkin: just looking randomly at the code...
04:25 imirkin: https://github.com/yuzu-emu/yuzu/blob/master/src/video_core/shader/decode/shift.cpp
04:25 imirkin: doesn't seem to distinguish between SHR.U32 and SHR
04:25 imirkin: i.e. arithmetic shift vs logical shift or whatever
04:26 ReinUsesLisp: we don't implement SHL.W either
04:26 imirkin: yeah, wrap is missing too :)
04:26 imirkin: except glsl is wishy-washy on how it should work
04:26 imirkin: so you get what you get there
04:26 ReinUsesLisp: I skipped a beat when I saw FSETP missing a neg
04:26 imirkin: heh
04:26 imirkin: we might even emit it in codegen
04:26 ReinUsesLisp: not on envydis, it was missing on yuzu too
04:27 imirkin: envydis ain't perfect
04:27 imirkin: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp#n1585
04:27 imirkin: emitNEG (0x06, insn->src(1));
04:28 imirkin: in there since the initial commit, so i guess ben just missed it
04:30 imirkin: alright. time for me to go
04:30 imirkin: lmk if you have any questions. even if i'm not around, i check logs.
04:31 ReinUsesLisp: ok, see you!
04:31 ReinUsesLisp: I'll be going to sleep too
04:52 HdkR: Shift mask versus clamp is easy enough to emulate with doing it manually in the shader
04:53 HdkR: Just mask or clamp the operand depending on instruction mode
05:31 PaulePanter: danvet: How should https://patchwork.freedesktop.org/series/65489/ be tested?
05:37 PaulePanter: But the oldest card, I could find, is nv50.
10:06 cosurgi: imirkin: !
10:06 cosurgi: imirkin: I have it.
10:06 cosurgi: It just crashed, and I have full backtrace from gdb
10:08 cosurgi: imirkin: this is gdb 'bt full': https://paste.ubuntu.com/p/cChBxgTGVr/
10:09 cosurgi: imirkin: this is Xorg.log: https://paste.ubuntu.com/p/rkzdSzDtdN/
10:09 cosurgi: imirkin: this is dmesg: https://paste.ubuntu.com/p/PCwMgjYRwR/
10:10 cosurgi: Same error: [3248211.193] nouveau_exa_upload_to_screen:380 - falling back to memcpy ignores tiling
10:10 cosurgi: skeggsb, pmoreau !
10:12 cosurgi:reconnects gdb to another Xorg session.
10:13 cosurgi: BTW: is that the correct repository to pull from: git://anongit.freedesktop.org/nouveau/xf86-video-nouveau ?
10:13 cosurgi: this is the one that I'm running right now.
10:13 cosurgi: and against which I have the gdb backtrace.
10:13 cosurgi: ec2b45d 'Bump version to 1.0.16'
13:48 imirkin: cosurgi: you still have it in bt?
13:49 imirkin: er, in gdb
13:49 imirkin: if so, "p *bo"
13:49 imirkin: anyways, the dst=0x7f1741a8a280 <error: Cannot access memory at address 0x7f1741a8a280> thing is ominous
13:49 imirkin: of the bus error :)
13:51 karolherbst: wild guess, the dst pointer got corrupted :p
13:51 karolherbst: or was never set in the first place
14:07 mooboat: Hi there. Is there a configuration option to change that will stop screen tearing?
15:15 cosurgi: imirkin: unfortunately no
15:16 cosurgi: next time I will try to not end it. I though that my VTs got completely blocked. But I didn't check for sure.
15:16 cosurgi: imirkin: this happened when I was waking up the screens from 'xset dpms force off' after the night.
15:17 cosurgi: imirkin: the CPU cures were under heavy load for several days. So this happened (again) under heavy CPU load.
15:17 cosurgi: imirkin: the CPU cores were under heavy load for several days. So this happened (again) under heavy CPU load.
15:18 cosurgi: imirkin: next time I will do "p *bo" whatever it does, you will get the output.
20:33 cosurgi: imirkin: maybe this crash ineed is related to 'xset dpms force off' - something, some bitmap is freed when the screens are off. And somewhere else it is expected that it wasn't freed?