05:17koz_: Has anyone here tried Civilization 6 with Nouveau? I find it keeps crashing just before you go into a game, and I wanna make sure it's not a Nouveau thing.
11:40karolherbst: I am annoyed by the falcons, writing a crappy falcon debugger now...
13:28jamm: hakzam: in https://cgit.freedesktop.org/~hakzsam/mesa/tree/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c?h=gm107_scheduler&id=a153e2e1317957f82da240a21557004c9ce0d2dc#n578, why use wt 0x1 for $r0? Isn't barrier 1 used for $r8 in Line 542 already?
13:48dboyan: jamm: Isn't wt 0x1 means "wait for barrier 0"?
13:48dboyan: * Doesn't
13:51jamm: dboyan: yeah, that's what i meant
13:54jamm: so, wr 0x0 is first used at line 542 for writing on $r8, wt 0x1 (wait for barrier 0) is used to wait on $r0? Even though waiting on a barrier that isn't set/already unset doesn't have any overhead according to the maxas article, i'm curious
14:03dboyan: jamm: Isn't the 'wt 0x1' on line 579 waiting for the 'rd 0x0' of the previous st g[...] $r0?
14:14jamm: dboyan: true, that makes it even more confusing I think. I think it's probably coz of my lack of knowledge of pipeline depth...
14:17jamm: both write and read bars share the same resources, so using rd 0x0 alongside wr 0x0, doesn't make sense to me atm
14:17jamm: by alongside, i mean within the same program
14:36karolherbst: .... the heck
14:36karolherbst: a script I wrote behaves differently when I source it
14:38karolherbst: ohhh, extglob is off when not sourcing
14:41imirkin: jamm: not 100% sure, but i think read dep bars are bars that have to be waited on before processing inputs (to protect against a WaR hazard), and write dep bars have to be waited on before writing to outputs (to protect against a WaW hazard)
14:45jonan: hey guys, using DRI_PRIME=1 seems to offer no performance increase ,as well as introduces artifacts over everything it renders no matter what application I start it with. Am I missing some kind of dependency?
14:46karolherbst: jonan: artifacts as in weird tearing?
14:46jonan: not necessarily, but like white squares, and other odd multi-colored lines
14:46karolherbst: well that shouldn't happen
14:47jonan: even wierder is that it actually performs worse than my integrated GPU
14:47jonan: anyone have any ideas for how to check what's going on?
14:47karolherbst: if you stay on the lowest perf level, that's normal
14:49karolherbst: jonan: what is the chipset of your GPU?
14:50jonan: it's the nvidia 960m
14:51karolherbst: I asked for the chipset, but this one is 100% a gm107, so okay
14:51jonan: oh my bad, i don't know how to check for the chipset :S
14:52karolherbst: it's in dmesg usually
14:52karolherbst: for mine: "[17863.807958] nouveau 0000:01:00.0: NVIDIA GK106 (0e6200a1)"
14:52jamm: imirkin: right, but do they share the same barrier resource? as per my understanding, we have a total of 6 barrier resources which are shared amongst read and write bars
14:53imirkin: jamm: yeah, it's the same barrier
14:53imirkin: jamm: it's just a question of when you wait on it
14:54imirkin: or when you set it
14:54imirkin: or whatever
14:54jonan: sorry, how do I specify under dmesg to display a particular term? my noob understanding was that something like dmesg | grep nvidia would list it?
14:54karolherbst: or grep nouveau
14:55jonan: oh thanks
14:55jonan: yeah it's as you said karolherbst
14:55karolherbst: anyway you should have a file in /sys/kernel/debug/dri/1 called pstate, but on a prime system it is all buggy currently, cause it may hang your system when the GPU is turned off
14:55jonan: [ 2.428384] nouveau 0000:01:00.0: NVIDIA GM107 (1171b0a2)
14:56karolherbst: if you run a 4.10 kernel or newer you can change the perf state while an application is running on the GPU
14:56pmoreau: jamm: You can reuse the barrier 1 at that point, since $r8 has been overwritten line 559
14:56karolherbst: and you should get a decent speedup
14:56jonan: ooh, there's a bit of an issue
14:56jonan: i'm running 4.9 atm
14:57jonan: i guess that's where I can start to debug my issue lol
14:58karolherbst: but regarding your artifacts, it may be due to outdated mesa or a real issue still being there
14:58pmoreau: Unless I misunderstood, and $r8 is not the dest reg of the bfe insn.
14:58jamm: pmoreau: i see
14:58jonan: hm my mesa is up to date unfortunately
14:58jonan: let me try and get 4.10 to work properly on my laptop this time, and then i'll go from there
14:59jonan: thanks for the heads up about the pstate
14:59karolherbst: mupuf, mwk: I slowly get annoyed by nvapeek printing ..., any objections if I just remove this?
14:59mupuf: karolherbst: you would rather see 0?
14:59karolherbst: it makes scripting _a_lot_ easier
15:00mupuf: it made sense before, because gpus were returning 0 when making a request that was never answered
15:00karolherbst: well sure
15:00mupuf: now, we get 0xbadf, so it is pointless
15:00karolherbst: but sometimes 0 also means 0
15:00mupuf: yes, nowadays, it always mean 0
15:01mupuf: well, the safest option is to add a "raw" option
15:01mupuf: this way it works for scripts and it doesn't change the existing output, in case scripts already do work around the ...
15:03karolherbst: well those shouldn't break in this case anyway
15:21mwk: karolherbst: sure, do it
15:21mwk: we shouldn't really print ... for single-register peek
15:22imirkin: yeah, i've always found that annoying
15:22imirkin: but when i looked at fixing it, it didn't seem easy
15:22imirkin: coz it's the same code as for the scan logic
15:22karolherbst: /dbg_falcon.sh status
15:22karolherbst: Breakpoint at: 0x64 ( f5 0e 41 03 bra 0x341)
15:22mwk: well, we should keep it for multi-reg peeks
15:22mwk: or you'll get a lot of 0s
15:27mwk: mupuf: FWIW "..." was never meant to signify a request that wasn't answer, I added it only to compress big chunks of 0s that usually result from peeking big ranges
15:28mwk: though nowadays with the badf's, it'd be better to change it to mean "same as last printed value, repeated lots of times"
15:58karolherbst: mhh, now how do I actually activate that breakpoint
15:59karolherbst: I guess I need to enable that in DEBUG_CMD
16:01karolherbst: nice, it works :)
16:52karolherbst: mwk: any idea how I get the current PC of a falcon?
16:52karolherbst: debug_cmd and read from reg? or is there a different way
17:08karolherbst: reading out registers also implemented :)
17:26RSpliet: karolherbst: is that stuff still exposed on GK110+?
17:27karolherbst: no idea, but I think so
17:28RSpliet: I recall it isn't... or not all of it
17:28RSpliet: (still need to fix&port this "backtrace on timeout" patch of mine)
17:29karolherbst: is there even a stack for the pc?
17:29karolherbst: I thought that is just put on the normal stack
17:29karolherbst: RSpliet: I am talking about facons, you as well?
17:30RSpliet: yes I'm talking falcons
17:30RSpliet: and yes they have a stack
17:31RSpliet: we don't use nicely aligned stack frames though, but for the modest size of firmware it can be interpreted manually
17:32karolherbst: sounds annoying though
17:36karolherbst: RSpliet: do you know by any chance how I get the current $pc from the host?
17:47mupuf: mwk: yes, that's what I meant. Illegal addresses used to return 0, which would be compressed
17:48mwk: karolherbst: yeah, read $pc special register
18:10RSpliet: karolherbst: https://github.com/RSpliet/kernel-nouveau-nv50-pm/commit/5dd5f19fb9aebad545449aff847bc55a52e9379f
18:11RSpliet: the code is broken in dumping the stack
18:11RSpliet: specifically, the loop condition is bad. sp and pc are read out correctly
18:25karolherbst: RSpliet: ff0 is 0 for me on the PMU, do I need to "enable" those regs first?
18:26karolherbst: ex is also 0
18:28karolherbst: also, "+fec" and "+ff0" already look like stuff nobody should depend on in the first place
18:31karolherbst: and I don't see the pc in the entire falcon space. I am sure I know what the current PC is, and it's nowhere
18:31karolherbst: mhhh, maybe somewhere shifted
18:32RSpliet: karolherbst: as I said, I have not identified these regs in GK110+
18:32karolherbst: RSpliet: I am on gk106
18:33RSpliet: that's weird, I've tested this on GK107 and I think some Fermis
18:33karolherbst: maybe only on the gr
18:33RSpliet: hmm that might be true
18:34karolherbst: we need something inside the first 0x400 space of the falcons
18:34RSpliet: you might be hoping for something that doesn't exist :-P
18:34karolherbst: I am sure it does
18:34karolherbst: I can already read out registers
18:34karolherbst: why not $pc as well?
18:35karolherbst: the idx field is 5 bits big
18:35karolherbst: with 4 you can cover the normal registers
18:36karolherbst: found it
18:36karolherbst: is $pc 5?
18:37karolherbst: so register idx 21 gets me $pc
18:37karolherbst: I set the breakpoint at 0x888 but it stoped at 0x88d, weird
18:41karolherbst: RSpliet: do you need anything else for gk110 except pc and sp?
18:47karolherbst: I am getting there: https://gist.github.com/karolherbst/e1c5cd411b44a26267dcdd2f0df60bd5 :)
18:48karolherbst: uhhhh, I know something
18:52karolherbst: much better: https://gist.github.com/karolherbst/e1c5cd411b44a26267dcdd2f0df60bd5
18:58karolherbst: .. RSpliet: everything needed is already in rnndb
19:05karolherbst: note to myself: don't annoy falcons too much
19:43karolherbst: why can't I continue the falcon anymore
20:09karolherbst: mwk: I thought by setting "CONTINUE_FROM_PC" in DEBUG_CMD I should get the falcon to continue, but DEBUG_CMD sets the ERROR flag, any ideas?
20:18karolherbst: mhh, if I break manually, I can at least single step
20:21karolherbst: this is fun :)
20:30RSpliet: karolherbst: nice one
20:31karolherbst: RSpliet: https://github.com/karolherbst/nouveau_tools/blob/master/dbg_falcon.sh
20:31karolherbst: long term goal is a nice cli interface rewritten in C and everything
20:32RSpliet: as for breakpoint - it's a 6-stage pipeline, and fetch is only the first stage
20:32RSpliet: so I'd guess it breaks on decode, at which point the PC has already been incremented
20:32karolherbst: depends how nice the hw is to the debugger
20:32karolherbst: most likely
20:32karolherbst: but what about the reg content?
20:33karolherbst: ohhh, I see
20:33RSpliet: reg read will be the stage after decode
20:33karolherbst: I am sure it is already fully executed though
20:33RSpliet: oh ok
20:33karolherbst: because it does do the jumps right and so on
20:34karolherbst: and it was my mistake
20:34karolherbst: I fixed the parsing and now everything is fine
20:35karolherbst: except continuing from breakpoint doesn't work
20:35karolherbst: if I do a manual break, it works
20:36RSpliet: could a breakpoint trigger an interrupt *in* the falcon?
20:36karolherbst: should that matter?
20:36RSpliet: (not on the host)
20:36RSpliet: well, if the interrupt handler chickens out because it doesn't know what that interrupt means
20:36karolherbst: the pc doesn't change
20:37RSpliet: ok, that's not it then
20:37karolherbst: and the reg returns an error
20:37imirkin: usually you break out of interrupt handlers with an NMI
20:37karolherbst: imirkin: I am debugging the falcon :p
20:37imirkin: if a non-NMI is triggered while interrupts are masked, it will wait until the interrupt flag is cleared before dispatching it
20:37imirkin: karolherbst: yeah, but this is like general info about all (semi-modern) cpu's
20:38imirkin: iirc falcons have interrupt priorities
20:38karolherbst: the breakpoint was pointing inside code executed through the interrupt handler
20:38Bl4ckb0ne: what's the state of the driver with arch and the GTX970?
20:38karolherbst: aka a timer
20:39Bl4ckb0ne: i tried to use the driver thisa fternoon, but apparently my chip is not supported
20:39karolherbst: Bl4ckb0ne: it should work with a new kernel (4.12? or mayb 4.11, dunno)
20:39RSpliet: Bl4ckb0ne: ask arch people, we don't know what they do to their kernels
20:40karolherbst: I guess it is 4.12
20:40Bl4ckb0ne: oh maybe thats why
20:40Bl4ckb0ne: im using 4.10
20:40RSpliet: karolherbst: oh that's the 3+0,5GiB card? Yeah, 4.12 it is then
20:40karolherbst: RSpliet: exactly
20:41karolherbst: imirkin: okay, so if I put a breakpoint inside an interrupt handler, I need to do other things first before I can continue? or at least so you guess?
20:42imirkin: yeah, it's a question as to how the breakpoint stuff works, since it kinda operates "outside" the regular cpu flow
20:42Bl4ckb0ne: Ill try to update my kernel and give it a new try
20:42imirkin: is the breakpoint handled by falcon code
20:42imirkin: or by the host?
20:42karolherbst: imirkin: and a manual break is different?
20:43imirkin: karolherbst: not 100% sure. i bet mwk knows.
20:43karolherbst: well it has to be, otherwise I would see no reason it shouldn't work the same
20:45karolherbst: how did I get the pc to be 0xf536
20:49karolherbst: this is quite funny as well: https://gist.github.com/karolherbst/2f22e9cbe13a6404693ae6a753cedf42
20:49karolherbst: I bet at 0xf7 is the interrupt handler
20:51karolherbst: /* 0x00f5: intr */ there we go
20:51karolherbst: and the first thing was inside /* 0x0f4f: idle_proc_next */
20:52karolherbst: funny that I always hit the bra though
20:52karolherbst: because there is a sleep before
20:57karolherbst: the exit flag is set
21:03karolherbst: imirkin: I think I set the breakpoing wrongly
21:03karolherbst: because on the breakpoint the falcon goes into STOPPED state, when I break manually, it stays SLEEPING
21:10karolherbst: continue and stop from address works as well :)
21:12karolherbst: uhh setting registers,
21:19pmoreau: imirkin: I haven't thought too much about it yet, but this is what I am thinking of doing: https://hastebin.com/uqewimuqaz.cpp
21:27karolherbst: pmoreau: are you sure you can move maxGPR into the Function?
21:28karolherbst: I thought the GPR count is fixed for the entire binary
21:39pmoreau: karolherbst: maxGPR isn't moved to be per function, but to be per entry point.
21:40karolherbst: pmoreau: mhhh, ohhhhh okay, now it makes sense
21:40pmoreau: As all the other attributes I moved
21:42pmoreau: To avoid things like : uploading this binary (https://hastebin.com/fobibosedu.m), and when launching either of the 2 kernels, it requests (16 + 4) uints of shared mem, 2x the regs, etc
21:43pmoreau: Even though half of those resources aren't needed nor used.
21:43karolherbst: I see
21:49pmoreau: imirkin: https://hastebin.com/dilolulita.cpp this would be slightly better: I was missing some data per entrypoint in nv50_ir::Program
22:32karolherbst: skeggsb: regarding that odd PMU issue I found.... I think I forgot a clear....
22:34karolherbst: works as expected now
22:34karolherbst: having a falcon debugger is nice :)