00:22 imirkin: orbea: i'm going to go with retroarch bug. without reading the details, can you point some retroarch kms backend person at this commit and see if they go "aha, didn't know i had to do that"? https://cgit.freedesktop.org/mesa/kmscube/commit/?id=56c3917ffd1f05942246e2532ca4a5707554a2fc
00:23 imirkin: karolherbst: remember - our firmware doesn't support those firmware methods
00:24 imirkin: karolherbst: those FIRMWARE calls call into the ctxsw firmware
00:25 imirkin: i looked into implementing those calls but it's a bit of a pain
00:25 imirkin: since it doesn't jive with how the internal firmware data passing api works
00:25 imirkin: so i'd have to make more extensive changes. at least that's my recollection.
01:13 karolherbst: imirkin: I know, we could only use those for maxwell2+pascal for now, or just implement the same thing for the older one as well
01:13 karolherbst: imirkin: well, the only firmware call I do is to write into 0x419e10
01:14 imirkin: or use the nvidia firmware on kepler
01:14 karolherbst: that would be quite painful
01:14 karolherbst: I doubt that it would be hard to acutally implement those or would it?
01:14 imirkin: why?
01:14 karolherbst: well, we can't redistribute nvidia ones, so user wouldn't get a trap handler by default
01:14 imirkin: use my script to extract from blob
01:14 imirkin: sure
01:15 imirkin: i just mean for testing
01:15 imirkin: we can implement. just requires a bit of a refactor iirc.
01:15 karolherbst: mhh
01:15 karolherbst: yeah maybe
01:15 imirkin: (in the asm code, so less-than-pleasant)
01:15 karolherbst: that firmware call nvidia has sounds quite dangerous actually
01:15 imirkin: hehe yeah
01:15 imirkin: iirc it masks the register
01:15 imirkin: so only graph can happen
01:15 imirkin: but still
01:15 karolherbst: I would only whitelist all the regs we want it to write into
01:15 karolherbst: ohh
01:15 karolherbst: yeah, still
01:16 imirkin: i.e. 0x4.....
01:16 karolherbst: mhhh
01:16 karolherbst: also non per channel regs?
01:16 karolherbst: wouldn't be so bad if only per channel regs are accessible
01:17 imirkin: yeah, i never looked at what nvidia fw did there
01:17 karolherbst: anyway, I still didn't figure out why I can't get the trap handler to execute from real traps or "bpt trap" :(
01:17 imirkin: but right - there MIGHT be a reason that's not a method that exists in the regular hw :)
01:18 karolherbst: huh?
01:18 imirkin: writing random reg via class method
01:18 karolherbst: what do you mean
01:18 karolherbst: ahhh
01:18 karolherbst: yeah, sounds pretty risky
01:20 karolherbst: mhh, maybe I should try writing other values into it, maybe nvidia doesn't set it up as well by default and you have to use magic software (like a debugger) so that it starts putting something else in there
01:20 imirkin: where are you writing and how?
01:20 karolherbst: the firmware method
01:20 imirkin: which reg
01:20 karolherbst: PGRAPH.GPC_BROADCAST.TPC_ALL.MP.BPT_CONTROL which is 0x419e10
01:21 karolherbst: the three params are 0x0, 0x1 and 0x7
01:21 karolherbst: and the value set will be 00001c01
01:21 imirkin: TRAP_GLOBAL_ERROR_EN
01:21 karolherbst: ohhhhhhhhh
01:21 karolherbst: wait
01:21 imirkin: are you setting that with the BPT_* ones?
01:21 karolherbst: 0x0: no idea what that value does, but
01:21 karolherbst: value to write: 0x1
01:21 karolherbst: mask: 0x7
01:22 karolherbst: imirkin: "are you setting that with the BPT_* ones?" what do you mean by that?
01:22 imirkin: grep for that in hwdoc
01:22 imirkin: and then look at our sw method 0x644
01:22 karolherbst: TRAP_GLOBAL_ERROR_EN is set by the driver
01:22 imirkin: which lets you set that register
01:24 imirkin: or gr (3d and compute) method 0x1528
01:24 karolherbst: mhh
01:24 karolherbst: interesting
01:24 imirkin: check gf100_gr_mthd_sw and gf100_gr_mthd_set_shader_exceptions
01:25 imirkin: i *assume* that the logic works to actually call thsoe
01:25 karolherbst: at least we don't use the SET_SHADER_EXECPTION one
01:26 imirkin: set the BPT_INT one
01:26 karolherbst: yeah, will try that
01:27 karolherbst: I thought I didn't see anything nvidia doing, but maybe I have to
01:31 karolherbst: mhh, that did _something_
01:31 karolherbst: I think
01:33 karolherbst: mhh, those sched instructions are really annoying
01:48 karolherbst: mhhh
01:48 imirkin: you have to call 0x1528 in the gr class init in mesa
01:48 karolherbst: imirkin: I mean, I get a "nouveau 0000:01:00.0: gr: GPC0/TPC0/MP trap: global 00000010 [BPT_INT] warp 0000 []" inside dmesg, but the shader just keeps going without going into the trap handler
01:48 imirkin: so 50% there ;)
01:48 imirkin: at least you know it's happening
01:49 imirkin: maybe that's just for the kernel driver to get notifications
01:49 karolherbst: imirkin: currently I call it inside nve4_screen_compute_setup
01:49 imirkin: and not part of the real flow. dunno.
01:49 karolherbst: it is also where I setup the TRAP_HANDLER address and so on
01:49 karolherbst: (and where the old code is)
01:49 imirkin: right
01:50 imirkin: let me see if i can find some notes...
01:54 imirkin: hrmph, nope
02:12 imirkin: orbea: yeah, egl_init_context needs enhancement.
02:12 imirkin: it may actually return multiple configs, in this case rgb10 and rgb8, and it needs to pick the one it likes (i.e. which matches its pixel format)
02:13 orbea: so it could be fixed in mesa?
02:14 imirkin: no
02:14 orbea: hmm, no results with my grep
02:14 orbea: okay, I see it now
02:15 imirkin: https://github.com/libretro/RetroArch/blob/master/gfx/common/egl_common.c#L312
02:15 imirkin: have a look at the change i linked to earlier, and observe the similarity
02:15 imirkin: https://cgit.freedesktop.org/mesa/kmscube/commit/?id=56c3917ffd1f05942246e2532ca4a5707554a2fc
02:15 imirkin: that was fixing the same issue in kmscube
02:15 orbea:looks
02:16 imirkin: basically you can't just pick any ol' config for any format. you have to pick the right one.
02:16 imirkin: here they want GBM_FORMAT_XRGB8888
02:16 imirkin: so you have to pick the right one
02:16 imirkin: and for reasons which aren't apparent to me, you can't actually pass that in as one of the attributes
02:16 imirkin: to eglChooseConfigs
02:17 imirkin: which seems like the dumbest thing ever
02:17 imirkin: but i don't make the rules
02:17 imirkin: [and there was some semi-ok hypothetical reason why it couldn't be done. still annoying though.]
02:20 orbea: thanks, that helps a lot! I'll paste it on the issue so that others can see it.
02:22 imirkin: yw
02:23 imirkin: not sure why it's not an issue on intel
02:23 imirkin: perhaps you just get lucky with the config ordering
02:25 orbea: could be, some parts of RetroArch do need fixing....
02:37 tertl3: whats the relationship between mesa anmd nouveau and Vulkan?\
02:47 imirkin: R-rated
02:56 HdkR: hah
15:11 random-nick: hello, why is automatic reclocking not supported on card where manual recolocking is supported?
15:31 imirkin: work that isn't so simple that needs to be done
15:31 imirkin: reliability is also a concern
15:32 imirkin: if 1/1000 reclocks fails, that'll be a whole lot more visible with automatic reclocks
15:32 imirkin: just get an amd gpu and avoid the pain.
17:23 endrift: imirkin: I heard some PPC bugs recently got fixed, is it worth trying my 6800 GT again?
17:23 endrift: or should I just stick to the Radeon
17:24 imirkin: i have heard no such thing
17:24 endrift: hmm
17:24 imirkin: what's the source of this rumor?
17:24 endrift: I heard this actually maybe a month or two ago so I don't remember
17:24 imirkin: iirc you had the NV40 AGP?
17:24 endrift: yeah
17:25 endrift: well, I have two different AGP cards
17:25 imirkin: and it was having some very low-level issues
17:25 endrift: one of them is an NV40
17:25 imirkin: like dma not working or some such
17:25 endrift: sounds right
17:25 endrift: it's been a while
17:25 endrift: the radeon works but it's slow as hell
17:25 endrift: can't really run 1080p XFCE on it even :/
17:26 imirkin: you can't take a gpu from 2007 and expect a modern composited environment to work on it
17:26 imirkin: they were optimized for different things
17:26 endrift: even xfce?
17:26 imirkin: never used xfce
17:26 imirkin: i just know the high-level of what won't work
17:26 endrift: it's rather stripped down compared to e.g. kde
17:26 imirkin: and that's redrawing the screen on every frame
17:26 endrift: running plasma I'd expect to be a shitshow
17:26 imirkin: if you're redrawing the whole screen every frame, it won't work
17:26 endrift: fair
17:26 imirkin: if you're not, then it's fine
17:27 endrift: I guess it does work better in macOS
17:27 endrift: or Mac OS X I should say
17:27 endrift: it's pre-Lion
17:28 endrift: maybe I should try i3wm instead of xfce :P
17:28 imirkin: i use windowmaker
17:28 imirkin: and Xorg
17:28 imirkin: works great.
17:28 endrift: not heard of that one
17:28 imirkin: i suppose the test setup is a bit anacrhonistic, but 1920x1200 on a i7-920 had absolutely no problems on a Riva TNT2
17:29 endrift: anyway it sounds like it's probably not worth trying the Nvidia GPUs again then
17:29 imirkin: i've been using windowmaker since about 2000 or so, no complaints.
17:30 endrift: and tbh the last time I booted this was to fix a mesa bug, and the time before that to try and see if I had some BE bugs in my software
17:30 imirkin: it's one of the NeXT-y ones
17:30 endrift: so having a real GPU isn't a big issue
17:30 endrift: ooh NeXT
17:30 imirkin: part of the whole GNUstep thing
17:30 endrift: aha
17:30 endrift: isn't GNUstep dead
17:30 imirkin: quite.
17:31 imirkin: http://www.windowmaker.org/
17:31 imirkin: a website freshly designed in 1995 :)
17:31 endrift: perfect for running on a mac from the early 2000s
17:33 karolherbst: mhhh
17:35 karolherbst: mwk, imirkin: we can't really correctly express QMDs with our current code inside envytools. QMDs are basically a big block of how to launch a compute program, but there are some issues: the version is encoded _inside_ that QMD, so we have to display certain offsets differently depending on the version inside that block. Second issue: we might need something like reg8 if we don't have it already as fields aren't aligned to 32
17:35 karolherbst: bits anymore (so 0:7 of a 32 bit value might be the QMD version and 8:31 can be something different across all versions)
17:36 karolherbst: or maybe we can express something like that (but I doubt that and actually wanted to hear your oppinions on that, before I would even start to add support for that)
17:36 imirkin: karolherbst: usually the surrounding parser will read out the appropriate value
17:36 karolherbst: well sure
17:36 imirkin: and then based on that value select the appropriate domain/whatever
17:37 imirkin: 95% sure reg8 is a thing already btw
17:37 karolherbst: but in this case the parser would need to know the layout already
17:37 imirkin: but keeping everything 32-wide will save a lot of hassle
17:37 imirkin: yeah
17:37 imirkin: is that a big deal?
17:37 imirkin: are there lots of options for where to read the version from?
17:38 imirkin: isn't it always byte X at offset Y?
17:38 karolherbst: well with "reg8": GP104_COMPUTE.INLINE_QMD.VERSION = { MINOR = 0x1 | MAJOR = 0x2 | 0x10000 }
17:39 karolherbst: imirkin: I think so? would be strange otherwise
17:39 karolherbst: kind of hoped we could just handle all that within the XML
17:39 karolherbst: the QMD versions don't differ that much and a lot of fields are the same
17:39 imirkin: we've generally offloaded that to demmt and demmio
17:39 imirkin: there's a similar issue with TIC's iirc
17:39 imirkin: you can see how that was handled
17:40 imirkin: (i don't remember tbh)
17:40 karolherbst: ahh okay, so the TIC's embed their versions as well?
17:40 imirkin: don't remember :)
17:40 karolherbst: ahh
17:40 imirkin: there's a G80 "original" TIC and a G84+ v2 TIC
17:40 imirkin: and then there's a GM107 TIC which is all different
17:40 imirkin: the G84+ thing is indicated by some bit in an early word
17:40 karolherbst: well in the case of QMDs it would make sense do have it all inside the XML as you really don't need any knowledge outside the QMD block
17:40 imirkin: and only affects the last word
17:40 karolherbst: I see
17:41 imirkin: and no one ever uses the G80 version of the TIC except on the G80
17:41 imirkin: so it's ... less of an issue.
17:41 imirkin: the GM107 thing was more of a thing. no recollection as to how that was resolved.
17:41 imirkin: (it was optional on GM107 and required on GM200)
17:42 imirkin: and that latter one has lots of options
17:42 imirkin: if bit x == y, then word x means this, otehrwise that
21:23 pendingchaos: karolherbst, imirkin: how does functionality in envyas to calculate scheduling information sound?
21:23 pendingchaos: mainly for codegen/lib/gm107.asm and DDX shaders
21:23 karolherbst: pendingchaos: I am all for it, but I would add a bit more features, so it shouldn't be just an envyas feature
21:24 karolherbst: pendingchaos: mainly what I want to have is a way to feed in a shader with set sched opcodes and envydis/envyas try to figure out if it would come up with the same result or something else
21:27 pendingchaos: that could probably be done by feeding the original shader through an independent tool and then diffing the disassembly of the output with the disassembly of the original shader
21:27 karolherbst: pmoreau: that SVM stuff inside the nvidia OpenCL impl seems quite buggy...
21:27 pendingchaos: so: diff(envydis(envysched(shader)), envydis(shader))
21:28 karolherbst: pendingchaos: basically, yes
21:29 pendingchaos: so: a .schedbegin/.schedend directive in envyas and an "envysched" tool sounds good?
21:30 karolherbst: pendingchaos: why the .schedbegin stuff?
21:31 pendingchaos: instead of having a option for the entire file?
21:32 karolherbst: I know that maxas is kind of doing that, but I don't really see the point. In worst case we could also have something like "sched (auto) (auto) (st 0x1)"
21:32 karolherbst: or.. let it handle all st 0x0 entries
21:33 pendingchaos: to separate functions in gm107.asm (so they still begin with "wt 0x2f") and envyas shouldn't create scheduling information for the "gm107_builtin_offsets" section
21:33 pendingchaos: maybe the (auto) thing could be done
21:33 karolherbst: yeah, I am not quite sure
21:33 karolherbst: that builtin section would be a good reason to tell it when to no add those at all
21:34 pendingchaos: but I think it would be easier if manual and automatic scheduling didn't interact
21:35 karolherbst: yeah
21:43 imirkin: pendingchaos: dunno if it belongs in envyas, but perhaps some tool in the suite makes sense
21:44 karolherbst: I think the bigger problem is, that with having something like that there are quite some compiler features needed
21:44 imirkin: not really
21:44 imirkin: should be pretty straightforward to port the code over
21:44 imirkin: you do have to classify the ops
21:44 pendingchaos: I think putting it in envyas is more practical for gm107.asm than using an independent tool for it
21:44 karolherbst: yeah
21:44 karolherbst: not saying that it will become a compiler
21:44 karolherbst: but you need to feed in latency information as well
21:45 karolherbst: and so on
21:45 karolherbst: all thinks a stupid assembler usually doesn't deal with :)
21:45 karolherbst: &things
21:45 karolherbst: *
21:45 imirkin: pendingchaos: an independent tool that uses the envydis logic would make more sense. something like sched-recalculate. dunno.
21:45 karolherbst: mhh, I think it touches the same code in the end
21:46 karolherbst: because if you do a recalculate, you have to disassemble or assemble
21:46 karolherbst: as you need to parse the input anyway
21:46 pendingchaos: mainly so it preserves sections and only touches certain data
21:46 imirkin: sounds complex and unnecessary
21:46 pendingchaos: you only have to identify the instruction and extract registers
21:46 imirkin: but perhaps i'm not seeing it. wtvr.
21:47 karolherbst: pendingchaos: well, not only that
21:47 imirkin: get mwk to agree - that's good enough for me.
21:47 pendingchaos: imirkin: what sounds complex and unnecessary?
21:47 imirkin: section handling
21:47 imirkin: just have it take a stream of instructions
21:47 imirkin: and spit out a new stream that has sched info
21:48 karolherbst: imirkin: question is, how to we handle files which have multiple sections?
21:48 karolherbst: *do
21:48 karolherbst: allthough I guess in the end we can always cat files togeter
21:48 karolherbst: *together
21:48 imirkin: sched cares about word boundaries
21:49 imirkin: not about sections
21:49 imirkin: but again - get mwk to go along with your plan, and i'm all for it
21:49 imirkin: you've probably noticed i don't have much time to dedicate to nouveau
21:50 karolherbst: pendingchaos: ohh, I have an idea
21:50 pendingchaos: well, if you have it as an independent tool, you would have to decompose gm107.asm and recompose it somehow
21:50 karolherbst: pendingchaos: that pre tool could indeed only parse instructions inside a file, insert scheds for each "block"
21:50 karolherbst: pendingchaos: whenever it encounters a ".section" or something, it fills up the group with nops
21:51 karolherbst: or it throws an error
21:51 karolherbst: then you don't need section handling anymore
21:51 karolherbst: I thiink
21:52 karolherbst: mhh, it should ignore labels though
21:52 pendingchaos: I'm not sure I follow?
21:54 karolherbst: pendingchaos: https://gist.github.com/karolherbst/dbc3755f4b88a59027454904a9ca8f5c
21:56 karolherbst: pendingchaos: for example for the trap handler I also have to put in an ".align 0x40" in front of the trap handler, so we want the tool to handle that as well, as in, do nothing :)
21:59 karolherbst: imirkin: I am currently thinking what might be a 100% guarenteed way to trigger the trap handler inside nvidia. I was thinking accessing g[0x0] might do it? Maybe you know something else?
22:00 pendingchaos: "fill up the group with nops"?
22:01 karolherbst: pendingchaos: sched group
22:01 karolherbst: you always need 3/7 instructions inside a sched group ;)
22:02 imirkin: karolherbst: sorry, not sure
22:02 pendingchaos: IIRC, the nvc0 gallium driver does that with the generated code?
22:02 karolherbst: and because you skip on writing no sched, you ususally don't keep track of the instruction count as well
22:02 karolherbst: pendingchaos: right, but for envyas we have to insert those ourself
22:03 karolherbst: but if you have 150 instructions + comments, happy instruction counting ;)
22:04 karolherbst: maybe it isn't really necessary, but usually we kind of fill up functions with nops at the end so that the next label can start at 0x20/0x40 aligned addresses
22:04 karolherbst: which leads to another problem: when is a label a point to do that
22:05 karolherbst: imirkin: I _think_ the nvidia driver doesn't even get the traps inside the kernel module, or ignores the interrupt or whatever, but I also kind of want to be 100% sure that the code I am launching does indeed trigger faults
22:06 imirkin: karolherbst: afaik it ignores traps
22:06 karolherbst: okay
22:06 karolherbst: so maybe that's the reason
22:06 karolherbst: trap handling inside the shader
22:07 karolherbst: and because we catch them in kernel space, the trap handler gets ignored
22:21 imirkin: weird
22:24 karolherbst: anyway, nvidia doesn't seem to do it, we do, might be related ;)