05:17 koz_: Has anyone here tried Civilization 6 with Nouveau? I find it keeps crashing just before you go into a game, and I wanna make sure it's not a Nouveau thing.
11:40 karolherbst: I am annoyed by the falcons, writing a crappy falcon debugger now...
13:28 jamm: hakzam: in https://cgit.freedesktop.org/~hakzsam/mesa/tree/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c?h=gm107_scheduler&id=a153e2e1317957f82da240a21557004c9ce0d2dc#n578, why use wt 0x1 for $r0? Isn't barrier 1 used for $r8 in Line 542 already?
13:48 dboyan: jamm: Isn't wt 0x1 means "wait for barrier 0"?
13:48 dboyan: * Doesn't
13:51 jamm: dboyan: yeah, that's what i meant
13:54 jamm: so, wr 0x0 is first used at line 542 for writing on $r8, wt 0x1 (wait for barrier 0) is used to wait on $r0? Even though waiting on a barrier that isn't set/already unset doesn't have any overhead according to the maxas article, i'm curious
14:03 dboyan: jamm: Isn't the 'wt 0x1' on line 579 waiting for the 'rd 0x0' of the previous st g[...] $r0?
14:14 jamm: dboyan: true, that makes it even more confusing I think. I think it's probably coz of my lack of knowledge of pipeline depth...
14:17 jamm: both write and read bars share the same resources, so using rd 0x0 alongside wr 0x0, doesn't make sense to me atm
14:17 jamm: by alongside, i mean within the same program
14:36 karolherbst: .... the heck
14:36 karolherbst: a script I wrote behaves differently when I source it
14:38 karolherbst: ohhh, extglob is off when not sourcing
14:41 imirkin: jamm: not 100% sure, but i think read dep bars are bars that have to be waited on before processing inputs (to protect against a WaR hazard), and write dep bars have to be waited on before writing to outputs (to protect against a WaW hazard)
14:45 jonan: hey guys, using DRI_PRIME=1 seems to offer no performance increase ,as well as introduces artifacts over everything it renders no matter what application I start it with. Am I missing some kind of dependency?
14:46 karolherbst: jonan: artifacts as in weird tearing?
14:46 jonan: not necessarily, but like white squares, and other odd multi-colored lines
14:46 karolherbst: well that shouldn't happen
14:47 jonan: even wierder is that it actually performs worse than my integrated GPU
14:47 jonan: anyone have any ideas for how to check what's going on?
14:47 karolherbst: if you stay on the lowest perf level, that's normal
14:49 karolherbst: jonan: what is the chipset of your GPU?
14:50 jonan: it's the nvidia 960m
14:51 karolherbst: I asked for the chipset, but this one is 100% a gm107, so okay
14:51 jonan: oh my bad, i don't know how to check for the chipset :S
14:52 karolherbst: it's in dmesg usually
14:52 karolherbst: for mine: "[17863.807958] nouveau 0000:01:00.0: NVIDIA GK106 (0e6200a1)"
14:52 jamm: imirkin: right, but do they share the same barrier resource? as per my understanding, we have a total of 6 barrier resources which are shared amongst read and write bars
14:53 imirkin: jamm: yeah, it's the same barrier
14:53 imirkin: jamm: it's just a question of when you wait on it
14:54 imirkin: or when you set it
14:54 imirkin: or whatever
14:54 jonan: sorry, how do I specify under dmesg to display a particular term? my noob understanding was that something like dmesg | grep nvidia would list it?
14:54 karolherbst: or grep nouveau
14:55 jonan: oh thanks
14:55 jonan: yeah it's as you said karolherbst
14:55 karolherbst: anyway you should have a file in /sys/kernel/debug/dri/1 called pstate, but on a prime system it is all buggy currently, cause it may hang your system when the GPU is turned off
14:55 jonan: [ 2.428384] nouveau 0000:01:00.0: NVIDIA GM107 (1171b0a2)
14:55 karolherbst: but
14:56 karolherbst: if you run a 4.10 kernel or newer you can change the perf state while an application is running on the GPU
14:56 pmoreau: jamm: You can reuse the barrier 1 at that point, since $r8 has been overwritten line 559
14:56 karolherbst: and you should get a decent speedup
14:56 jonan: ooh, there's a bit of an issue
14:56 jonan: i'm running 4.9 atm
14:57 jonan: i guess that's where I can start to debug my issue lol
14:58 karolherbst: but regarding your artifacts, it may be due to outdated mesa or a real issue still being there
14:58 pmoreau: Unless I misunderstood, and $r8 is not the dest reg of the bfe insn.
14:58 jamm: pmoreau: i see
14:58 jonan: hm my mesa is up to date unfortunately
14:58 jonan: let me try and get 4.10 to work properly on my laptop this time, and then i'll go from there
14:59 jonan: thanks for the heads up about the pstate
14:59 karolherbst: mupuf, mwk: I slowly get annoyed by nvapeek printing ..., any objections if I just remove this?
14:59 mupuf: karolherbst: you would rather see 0?
14:59 karolherbst: yes
14:59 karolherbst: it makes scripting _a_lot_ easier
15:00 mupuf: it made sense before, because gpus were returning 0 when making a request that was never answered
15:00 karolherbst: well sure
15:00 mupuf: now, we get 0xbadf, so it is pointless
15:00 karolherbst: but sometimes 0 also means 0
15:00 mupuf: yes, nowadays, it always mean 0
15:00 karolherbst: okay
15:01 mupuf: well, the safest option is to add a "raw" option
15:01 mupuf: this way it works for scripts and it doesn't change the existing output, in case scripts already do work around the ...
15:03 karolherbst: well those shouldn't break in this case anyway
15:21 mwk: karolherbst: sure, do it
15:21 mwk: we shouldn't really print ... for single-register peek
15:22 imirkin: yeah, i've always found that annoying
15:22 imirkin: but when i looked at fixing it, it didn't seem easy
15:22 pmoreau: +1
15:22 imirkin: coz it's the same code as for the scan logic
15:22 karolherbst: yay:
15:22 karolherbst: /dbg_falcon.sh status
15:22 karolherbst: Breakpoint at: 0x64 ( f5 0e 41 03 bra 0x341)
15:22 mwk: well, we should keep it for multi-reg peeks
15:22 mwk: or you'll get a lot of 0s
15:22 karolherbst: mhhhhh
15:23 karolherbst: true
15:27 mwk: mupuf: FWIW "..." was never meant to signify a request that wasn't answer, I added it only to compress big chunks of 0s that usually result from peeking big ranges
15:28 mwk: though nowadays with the badf's, it'd be better to change it to mean "same as last printed value, repeated lots of times"
15:58 karolherbst: mhh, now how do I actually activate that breakpoint
15:59 karolherbst: I guess I need to enable that in DEBUG_CMD
16:01 karolherbst: nice, it works :)
16:52 karolherbst: mwk: any idea how I get the current PC of a falcon?
16:52 karolherbst: debug_cmd and read from reg? or is there a different way
17:08 karolherbst: reading out registers also implemented :)
17:26 RSpliet: karolherbst: is that stuff still exposed on GK110+?
17:27 karolherbst: no idea, but I think so
17:28 RSpliet: I recall it isn't... or not all of it
17:28 RSpliet: (still need to fix&port this "backtrace on timeout" patch of mine)
17:29 karolherbst: is there even a stack for the pc?
17:29 karolherbst: I thought that is just put on the normal stack
17:29 karolherbst: uhmmm
17:29 karolherbst: RSpliet: I am talking about facons, you as well?
17:29 karolherbst: *falcons
17:30 RSpliet: yes I'm talking falcons
17:30 RSpliet: and yes they have a stack
17:31 RSpliet: we don't use nicely aligned stack frames though, but for the modest size of firmware it can be interpreted manually
17:32 karolherbst: sounds annoying though
17:36 karolherbst: RSpliet: do you know by any chance how I get the current $pc from the host?
17:47 mupuf: mwk: yes, that's what I meant. Illegal addresses used to return 0, which would be compressed
17:48 mwk: karolherbst: yeah, read $pc special register
18:10 RSpliet: karolherbst: https://github.com/RSpliet/kernel-nouveau-nv50-pm/commit/5dd5f19fb9aebad545449aff847bc55a52e9379f
18:11 RSpliet: the code is broken in dumping the stack
18:11 RSpliet: specifically, the loop condition is bad. sp and pc are read out correctly
18:25 karolherbst: RSpliet: ff0 is 0 for me on the PMU, do I need to "enable" those regs first?
18:26 karolherbst: ex is also 0
18:26 karolherbst: *ec
18:28 karolherbst: also, "+fec" and "+ff0" already look like stuff nobody should depend on in the first place
18:31 karolherbst: and I don't see the pc in the entire falcon space. I am sure I know what the current PC is, and it's nowhere
18:31 karolherbst: mhhh, maybe somewhere shifted
18:32 RSpliet: karolherbst: as I said, I have not identified these regs in GK110+
18:32 karolherbst: RSpliet: I am on gk106
18:33 RSpliet: that's weird, I've tested this on GK107 and I think some Fermis
18:33 karolherbst: maybe only on the gr
18:33 RSpliet: hmm that might be true
18:34 karolherbst: we need something inside the first 0x400 space of the falcons
18:34 RSpliet: you might be hoping for something that doesn't exist :-P
18:34 karolherbst: I am sure it does
18:34 karolherbst: I can already read out registers
18:34 karolherbst: why not $pc as well?
18:35 karolherbst: the idx field is 5 bits big
18:35 karolherbst: with 4 you can cover the normal registers
18:36 karolherbst: found it
18:36 karolherbst: :D
18:36 karolherbst: is $pc 5?
18:36 karolherbst: indeed
18:37 karolherbst: so register idx 21 gets me $pc
18:37 karolherbst: easy
18:37 karolherbst: I set the breakpoint at 0x888 but it stoped at 0x88d, weird
18:37 karolherbst: *0x88b
18:41 karolherbst: RSpliet: do you need anything else for gk110 except pc and sp?
18:47 karolherbst: I am getting there: https://gist.github.com/karolherbst/e1c5cd411b44a26267dcdd2f0df60bd5 :)
18:48 karolherbst: uhhhh, I know something
18:52 karolherbst: much better: https://gist.github.com/karolherbst/e1c5cd411b44a26267dcdd2f0df60bd5
18:58 karolherbst: RT
18:58 karolherbst: .. RSpliet: everything needed is already in rnndb
19:05 karolherbst: note to myself: don't annoy falcons too much
19:43 karolherbst: hum
19:43 karolherbst: why can't I continue the falcon anymore
20:09 karolherbst: mwk: I thought by setting "CONTINUE_FROM_PC" in DEBUG_CMD I should get the falcon to continue, but DEBUG_CMD sets the ERROR flag, any ideas?
20:18 karolherbst: mhh, if I break manually, I can at least single step
20:21 karolherbst: this is fun :)
20:30 RSpliet: karolherbst: nice one
20:31 karolherbst: RSpliet: https://github.com/karolherbst/nouveau_tools/blob/master/dbg_falcon.sh
20:31 karolherbst: long term goal is a nice cli interface rewritten in C and everything
20:32 RSpliet: as for breakpoint - it's a 6-stage pipeline, and fetch is only the first stage
20:32 RSpliet: so I'd guess it breaks on decode, at which point the PC has already been incremented
20:32 karolherbst: depends how nice the hw is to the debugger
20:32 karolherbst: most likely
20:32 karolherbst: but what about the reg content?
20:33 karolherbst: ohhh, I see
20:33 karolherbst: mhh
20:33 RSpliet: reg read will be the stage after decode
20:33 karolherbst: I am sure it is already fully executed though
20:33 RSpliet: oh ok
20:33 karolherbst: because it does do the jumps right and so on
20:34 karolherbst: and it was my mistake
20:34 karolherbst: I fixed the parsing and now everything is fine
20:34 RSpliet: cool
20:35 karolherbst: except continuing from breakpoint doesn't work
20:35 karolherbst: if I do a manual break, it works
20:36 RSpliet: could a breakpoint trigger an interrupt *in* the falcon?
20:36 karolherbst: should that matter?
20:36 RSpliet: (not on the host)
20:36 RSpliet: well, if the interrupt handler chickens out because it doesn't know what that interrupt means
20:36 karolherbst: the pc doesn't change
20:37 RSpliet: ok, that's not it then
20:37 karolherbst: and the reg returns an error
20:37 imirkin: usually you break out of interrupt handlers with an NMI
20:37 karolherbst: imirkin: I am debugging the falcon :p
20:37 imirkin: if a non-NMI is triggered while interrupts are masked, it will wait until the interrupt flag is cleared before dispatching it
20:37 imirkin: karolherbst: yeah, but this is like general info about all (semi-modern) cpu's
20:38 imirkin: iirc falcons have interrupt priorities
20:38 karolherbst: mhhhhhh
20:38 karolherbst: interesting
20:38 karolherbst: the breakpoint was pointing inside code executed through the interrupt handler
20:38 Bl4ckb0ne: hi
20:38 Bl4ckb0ne: what's the state of the driver with arch and the GTX970?
20:38 karolherbst: aka a timer
20:39 Bl4ckb0ne: i tried to use the driver thisa fternoon, but apparently my chip is not supported
20:39 karolherbst: Bl4ckb0ne: it should work with a new kernel (4.12? or mayb 4.11, dunno)
20:39 RSpliet: Bl4ckb0ne: ask arch people, we don't know what they do to their kernels
20:40 karolherbst: I guess it is 4.12
20:40 Bl4ckb0ne: oh maybe thats why
20:40 Bl4ckb0ne: im using 4.10
20:40 RSpliet: karolherbst: oh that's the 3+0,5GiB card? Yeah, 4.12 it is then
20:40 karolherbst: RSpliet: exactly
20:41 karolherbst: imirkin: okay, so if I put a breakpoint inside an interrupt handler, I need to do other things first before I can continue? or at least so you guess?
20:42 imirkin: yeah, it's a question as to how the breakpoint stuff works, since it kinda operates "outside" the regular cpu flow
20:42 Bl4ckb0ne: Ill try to update my kernel and give it a new try
20:42 imirkin: is the breakpoint handled by falcon code
20:42 imirkin: or by the host?
20:42 karolherbst: imirkin: and a manual break is different?
20:43 imirkin: karolherbst: not 100% sure. i bet mwk knows.
20:43 karolherbst: well it has to be, otherwise I would see no reason it shouldn't work the same
20:45 karolherbst: how did I get the pc to be 0xf536
20:49 karolherbst: this is quite funny as well: https://gist.github.com/karolherbst/2f22e9cbe13a6404693ae6a753cedf42
20:49 karolherbst: I bet at 0xf7 is the interrupt handler
20:51 karolherbst: /* 0x00f5: intr */ there we go
20:51 karolherbst: and the first thing was inside /* 0x0f4f: idle_proc_next */
20:52 karolherbst: funny that I always hit the bra though
20:52 karolherbst: ohhh
20:52 karolherbst: because there is a sleep before
20:57 karolherbst: mhh
20:57 karolherbst: the exit flag is set
20:58 karolherbst: hum
21:03 karolherbst: ohhh
21:03 karolherbst: imirkin: I think I set the breakpoing wrongly
21:03 karolherbst: because on the breakpoint the falcon goes into STOPPED state, when I break manually, it stays SLEEPING
21:10 karolherbst: nice
21:10 karolherbst: continue and stop from address works as well :)
21:12 karolherbst: uhh setting registers,
21:19 pmoreau: imirkin: I haven't thought too much about it yet, but this is what I am thinking of doing: https://hastebin.com/uqewimuqaz.cpp
21:27 karolherbst: pmoreau: are you sure you can move maxGPR into the Function?
21:28 karolherbst: I thought the GPR count is fixed for the entire binary
21:39 pmoreau: karolherbst: maxGPR isn't moved to be per function, but to be per entry point.
21:40 karolherbst: pmoreau: mhhh, ohhhhh okay, now it makes sense
21:40 pmoreau: As all the other attributes I moved
21:42 pmoreau: To avoid things like : uploading this binary (https://hastebin.com/fobibosedu.m), and when launching either of the 2 kernels, it requests (16 + 4) uints of shared mem, 2x the regs, etc
21:43 pmoreau: Even though half of those resources aren't needed nor used.
21:43 karolherbst: mhhh
21:43 karolherbst: I see
21:49 pmoreau: imirkin: https://hastebin.com/dilolulita.cpp this would be slightly better: I was missing some data per entrypoint in nv50_ir::Program
22:32 karolherbst: skeggsb: regarding that odd PMU issue I found.... I think I forgot a clear....
22:34 karolherbst: works as expected now
22:34 karolherbst: having a falcon debugger is nice :)