10:05 karolherbst: imirkin: ping on https://patchwork.freedesktop.org/series/40199/ and https://patchwork.freedesktop.org/series/40754/
10:05 karolherbst: limm and RA stuff for post RA mad load propagation
13:33 RSpliet: karaul32: we've done a fair bit of work on REing the firmwares running on the falcon μcs. Additionally, we have code for uploading signed firmwares which expose how the signature checking is triggered. IIRC it's an asymmetical crypto signature for which the pubkey and signature check routine is hard-wired in HW, and the privkey is in the hands of NVIDIA and NVIDIA only
13:34 RSpliet: We understand the basics of how microcontrollers work and have their ISA sort-of figured out (... and newer ones will start implementing RISC-V instead of fμc \o/), any help on the more advanced REing topics I'm sure is welcome :-)
13:34 imirkin_: RSpliet: don't get joss'd
13:34 RSpliet: imirkin_: tnx
13:35 karolherbst: RSpliet: well uploading signed firmware is pretty boring though
13:35 imirkin_: i've pushed my script btw, for extracting stuff from new versions
13:35 imirkin_: not sure if anyone saw
13:35 RSpliet: karolherbst: upload, press "verify" button, see HW run... or not
13:36 karolherbst: RSpliet: no, generally
13:36 karolherbst: running signed binaries is always boring
13:36 karolherbst: runnning _unsigned_ binaries is where it starts getting interesting
13:38 karolherbst: RSpliet: and are you sure it is async?
13:38 karolherbst: the signature is just 128 bits, which would be like brutally weak for async crypto
13:39 karolherbst: md5 is 128 bit digest size
13:39 karolherbst: and doing 4k RSA + md5 is kind of ... pointless
13:40 RSpliet: Mmm, no I'm not sure. I'm somewhat surprised if it isn't, because it's a cheap protection against CMOS reverse engineering attempts
13:41 karolherbst: well we have a 128 bit signature :)
14:42 PaulePanter: Hi. A user reported a stalled X session.
14:42 PaulePanter: https://paste.debian.net/1020579/
14:43 PaulePanter: (EE) NOUVEAU(0): [DRI2] DRI2SwapComplete: bad drawable
14:43 PaulePanter: nouveau: kernel rejected pushbuf: Device or resource busy
14:43 PaulePanter: Should I report a bug?
14:43 imirkin_: probably something in dmesg
14:43 imirkin_: not sure what i'd do with it...
14:43 PaulePanter: The errors in the X.Org X server log goes on.
14:44 imirkin_: "random non-reproducible issue happened"
14:44 PaulePanter: imirkin_: Let me check, but I didn’t notice anything before.
14:45 PaulePanter: imirkin_: Sorry, I was *very* blind, because the user rebooted the system, and I only looked in `dmesg`. :(
14:46 PaulePanter: imirkin_: https://paste.debian.net/1020582/
14:48 PaulePanter: From this boot, I see the messages below already.
14:48 PaulePanter: [24345.413226] nouveau 0000:01:00.0: gr: TRAP DISPATCH_QUERY
14:48 PaulePanter: [24345.413228] nouveau 0000:01:00.0: gr: no stuck command?
14:48 PaulePanter: [24345.413241] nouveau 0000:01:00.0: fb: trapped write at 0020215000 on channel 7 [1f7c3000 timetunnel[10223]] engine 00 [PGRAPH] client 03 [DISPATCH] subclient 02 [QUERY] reason 00000002 [PAGE_NOT_PRESENT]
14:49 PaulePanter: Anyway, any hints on how to proceed to help solve this, would be great.
14:57 imirkin_: nothing useful from me, sorry
14:58 PaulePanter: So report the issue to the bug tracker, or will it just stay there?
14:58 imirkin_: you can if you like
14:58 PaulePanter: … with nobody having time to look at?
14:59 imirkin_: not a question of time
14:59 orbea: PaulePanter: at the very least the issue wont get lost then
14:59 imirkin_: there's just nothign to look at in there
14:59 imirkin_: "gpu hung"
14:59 PaulePanter: Understood.
15:00 karolherbst: imirkin_: I think we have a bug there if userspace gets rejected anyway
15:00 karolherbst: or something terrible hapens and something just still continues
15:01 karolherbst: I think we should think about how we make such bugs easier to trace back where the root cause is
15:09 karolherbst: yeah
15:09 karolherbst: just I have no idea how to make that better traceable
15:09 karolherbst: do you have some ideas?
16:05 PaulePanter: imirkin_, karolherbst: I see a lot of `nouveau 0000:01:00.0: gr: TRAP DISPATCH_QUERY` in the logs.
16:05 PaulePanter: Any options to increase the debug level?
16:06 PaulePanter: Searching the Web for that doesn’t give any hint, which is strange.
16:06 imirkin_: i haven't seen it before either
16:07 gnarface: grsecurity patch, PaulePanter?
16:11 PaulePanter: gnarface: No. Plain Linux.
16:11 PaulePanter: *upstream Linux
16:34 cliff-hm: https://bugs.freedesktop.org/show_bug.cgi?id=106077 - is this your bug report PaulePanter ?
16:40 PaulePanter: cliff-hm: Yes, it is.
16:45 cliff-hm: K. I have no real knowledge. I was just trying to google the bit of error message to understand where it is coming from. Not in mesa code, but within the kernel - drivers/gpu/drm/nouveau/nvkm/engine/gr/nv50.c code, which I know nothing about, but maybe hardware
17:39 Lyude: mupuf: poke; you around? wondering if you might have some insight on this power consumption sensor issue i'm having with a maxwell1 card here
17:39 mupuf: Lyude: go for it!
17:39 Lyude: mupuf: so: nouveau doesn't detect any sort of power consumption sensors, but I'm getting valid power consumption readings through nvidia-smi
17:39 Lyude: (jfyi: this is a NV117, vbios is uploaded to the vbios repo)
17:39 mupuf: ha, cool!
17:40 mupuf: cool, pm me the name of the vbios, I'll check it outt
17:40 Lyude: mupuf: should just be nv117/Lyude/vbios.rom
17:42 Lyude: trying to scan all of the i2c ports on this but I don't see any sort of transactions happening when I launch nvidia-smi, so I'm wondering if maybe it's communicating with whatever sensors this thing has without using i2c?
17:42 Lyude: imirkin noticed a suspecious extdev as well: EXTDEV 0: type 0xa0 [INTERNAL_A0] at 0xa8 defbus 0
17:44 Lyude: (jfyy; trying to get this working so I can see the power consumption difference with blcg/slcg on maxwell1)
17:45 mupuf: hmm, the communication may come from the PMU
17:45 Lyude: that's what i figured, seeing as there's ptimer reg read/writes over mmio when I run nvidia-smi I believe
17:46 mupuf: starting from maxwell 2, the PMU is responsible for reading the power usage
17:46 mupuf: where did imirkin_ see the suspicious extdev? It is not in your vbios
17:47 Lyude: hopefully I didn't pull that from the wrong vbios
17:47 Lyude: sec
17:47 mupuf: you may not be able to see the i2c device from the host because the line is connected to the PMU
17:47 mupuf: there is a register to say who is responsible for the SDA and SCL lines
17:47 Lyude: makes sense, which means we just need to figure out how to communicate with the PMU correct?
17:48 Lyude: erm, in the context of reading sensors
17:49 mupuf: Lyude: well, no, if possible, let's just read the stuff ourselves from the CPU ;)
17:49 imirkin_: mupuf: my GF108
17:49 mupuf: but the vbios may enable the pmu-controled mode for the i2c buses
17:49 imirkin_: quadro 600
17:50 imirkin_: DP + DVI outputs, fwiw
17:50 mupuf: imirkin_: ok. And I guess nvdia's doc did not help?
17:50 imirkin_: didn't see anything about extdev's
17:53 Lyude: imirkin_: if we're just reading it with the CPU then I guess the appropriate thing would be to try to see if we can take control of those i2c lines away from the pmu vs. what's set by the vbios, correct?
17:54 mupuf: imirkin_: you may be right, indeed, Can't find it in the DCB doc
17:58 Lyude: oh. or I could just find the 0x200f8 register that is currently looking very sus
18:05 mupuf: Lyude: are you suggesting the HW is computing its own power consumption?
18:05 mupuf: there are blocks for that... but nvidia never used them
18:05 mupuf: so... it is possible that they do it now, but... it would be odd
18:07 mupuf: Lyude: https://phd.mupuf.org/files/xdc2013-nvidia_pm.pdf <-- FSRM
18:07 Lyude: mupuf: I mean; I see a value that's constantly fluctuating in that register and we seem to read it a bunch in the mmio trace of nvidia-smi I did, and sure enough if I mess with the power on either nouveau or nvidia's (through incurring load) the values seem to fluctuate a lot more along with rise and lower
18:08 mupuf: oh, well, nvidia maybe finally made it work!
18:08 mupuf: if so, you should see a lot of additional writes everywhere to set up the weights
18:08 Lyude: Every time the register is read? or during device init
18:10 Lyude: also: keep in mind this card has no molex connectors. It's powered solely by the PCIe port
18:24 mupuf: Lyude: the fact it is entirely powered through the PCIe port makes sense
18:24 mupuf: Lyude: no, it would be once during device init
18:24 mupuf: or after suspend
18:25 mupuf: so, we may misunderstand the power line table, if this is really what is going on
18:25 mupuf: I will have a look at your mmiotrace when I come back home
18:25 mupuf: this is super interesting
18:26 Lyude: sure thing :), I'll take a look as well
19:36 Lyude: interesting; the maxwell1 card in this t460p seems to have the same register active; and I'm guessing for the same reason of there not being an external power supply
19:37 Lyude: likewise; I think this pascal card doesn't have the register but does have an external power supply. On that note: does anyone remember the value I should be expecting to see if I try to access registers that you need to be in high-security mode for?
20:34 Lyude: mupuf: poke me when you get back, curious if you might know anything about [0] 34.503283 MMIO32 R 0x020130 0x81071400 PTHERM.SENSOR_CALIB_0[0x3] => { 0x81071400 }
20:34 Lyude: that looks like a read though, so probably not programming weights
21:01 mupuf: Lyude: doesn't it come straight from the fuses?
21:02 mupuf: pretty sure it did
21:02 Lyude: mupuf: pardon?
21:02 mupuf: PTHERM.SENSOR_CALIB_0 --> the value is directly sourced from the fuses IIRC
21:02 mupuf: this is factory-calibratred
21:03 Lyude: ah, was curious if it had any relation to the 200f8 values
21:03 Lyude: they're definitely not an exact representation of the wattage, so I don't think that register has calibrated values
21:04 Lyude: https://paste.fedoraproject.org/paste/Nv9ehASUTtuD~nAT1ghYgQ keep in mind: it's very likely some of the wattages reported there by nvidia-smi are not in sync with the actual register values being displayed since they update at different intervals
21:06 mupuf: Lyude: point clouds would show you a better trend ;)
21:08 Lyude: aaaa, i've never heard of those before lol. looking at ddg, but mind giving some examples?
21:09 mupuf: they are usually called scatter plots
21:09 Lyude: oh! i know what that is
21:09 mupuf: https://www.mathsisfun.com/data/scatter-xy-plots.html
21:10 Lyude: hm, before I continue wasn't there some trick you used to control the load% on the nvidia driver?
21:10 mupuf: Lyude: no, I always faked a load :s
21:11 mupuf: but that was to RE the reclocking policy
21:11 Lyude: ahh, so yeah that wouldn't do much for power consumption
21:11 mupuf: indeed
21:11 mupuf: but honestly, I think you are quite likely not going the right way
21:11 Lyude: oh?
21:11 mupuf: they introduced this in 2006
21:11 mupuf: and no usage until maxwell?
21:12 mupuf: pretty sure they have this as a backup, in case some GPUs are shipped and some new loads exceed the expected power budget
21:12 mupuf: this cannot be accurate because it does not take into account the voltage
21:13 mupuf: the weights would have to be calibrated for the highest voltage
21:15 Lyude: So do you think it's likely that nvidia-smi is getting this information some other way? or are you just getting at that we need to figure out how to calibrate the weights for this in order to get an accurate reading
21:16 mupuf: Lyude: pretty sure they have a real power meter
21:16 mupuf: I could check on my maxwell 1
21:16 mupuf: sweet, it is already plugged
21:20 mupuf: fun. I do not remember nvidia ever exposing power consumption on thois gpu before
21:20 mupuf: oh well, it is there now
21:20 mupuf: so let's have a look!
21:21 Lyude: yeah I don't think they used to
21:22 mupuf: Power Draw : 1.67 W --> not bad for an idling discrete GPU
21:22 Lyude: right? it's even lower here
21:22 Lyude: 0.87W
21:22 Lyude: man page says that value is ±5W though
21:22 mupuf: 1W now that I started X
21:28 karolherbst: mupuf: on GPUs without power meters the reported power consumption isn't really realible afaik
21:28 karolherbst: I might be wrong though
21:28 Lyude: i mean, if you're not wrong though then that would explain why we're seeing reads on that register when playing with nvidia-smi
21:29 mupuf: Lyude: why would it wait on nvidia-smi?
21:29 mupuf: there is a power budget, this should be polled every seconds at least
21:29 Lyude: ah, I did not realize that last part :s
21:30 mupuf: well, time to edit some of the weights and see if we get a different value
21:30 Lyude: where are the weights btw?
21:32 mupuf: a bit everywhere
21:32 mupuf: but the last levels are in ptherm
21:33 mupuf: I never documented that, because I wanted to actually find which one was which engine
21:33 mupuf: but... time got the best of me
21:33 mupuf: the better is the enemy of good!
21:34 imirkin_: and time always wins
21:35 mupuf: that too :p
21:35 mupuf: Lyude: hmm, the update rate got drastically lowered. It used to be 1kHz
21:35 mupuf: now, it is 33 Hz
21:37 karolherbst: mupuf: maybe the increase it under load?
21:37 mupuf: karolherbst: I doubt it. It still is fast-enough
21:37 karolherbst: yeah, right
21:38 karolherbst: well
21:38 karolherbst: worth checking though
21:38 mupuf: oopsie, I fuzzed at the wrong location O:-)
21:38 karolherbst: or that
21:39 karolherbst: mupuf: I still want to figure out the mystery why my GPU was throttled with nouveau for going over the battery power budget
21:39 mupuf: karolherbst: yeah, that may indicate they fixed this issue :o
21:39 mupuf: but this must involve the pmu if they did this
21:39 karolherbst: what issue?
21:40 karolherbst: you mean throttling on battery?
21:40 mupuf: no, computing the power consumption
21:40 karolherbst: ahh
21:40 mupuf: no way they do everything in HW
21:40 karolherbst: or maybe not if they document +-5
21:40 mupuf: karolherbst: where did you see that?
21:40 karolherbst: "<Lyude> man page says that value is ±5W though"
21:40 Lyude: yep
21:41 Lyude: man nvidia-smi
21:41 mupuf: karolherbst: well, for the highest GPUs, this would be expected
21:41 karolherbst: 1W is too low
21:41 karolherbst: way too low
21:41 mupuf: you have a point
21:41 Lyude: yeah, I was thinking that...
21:41 karolherbst: I would expect around 10W for the maxwell1
21:41 karolherbst: as the lowest
21:41 Lyude: but then why even offer the feature on nvidia-smi if it's that innaccurate
21:42 karolherbst: or maybe even 6 or 7W
21:42 karolherbst: it is around 60W TDP
21:42 karolherbst: soo
21:42 Lyude: ~6W sounds like it would be in that margin of error
21:42 karolherbst: yeah
21:42 karolherbst: Lyude: why not?
21:42 karolherbst: values under load are kind of accurate enough
21:42 karolherbst: nobody cares about idle power consumption really
21:42 karolherbst: and if you care, you measure differently
21:42 Lyude: true
21:43 mupuf: 200f0's bit 20 is the enable for the power measurement
21:43 mupuf: karolherbst: try disabling that on your laptop ;)
21:43 karolherbst: mhh
21:44 karolherbst: I am wondering still how the throttling happens
21:44 mupuf: karolherbst: exactly, idle power usage should be nuts
21:44 mupuf: well, you'll see if this is that
21:44 mupuf: because if it is, then it is already all documented
21:44 karolherbst: I meant why does it get throttled at all
21:44 karolherbst: or how
21:44 karolherbst: the clocks don't get lowered
21:44 karolherbst: or maybe they get somewhat
21:44 karolherbst: or somehow
21:45 mupuf: oops, I crashed the computing :D
21:45 karolherbst: duh
21:45 Lyude: btw mupuf, mind if I get some of the register offsets for those weights so I can play around with them as well?
21:46 mupuf: Lyude: you start by having fun with 200f0-4
21:46 mupuf:is trying to find the addresses again
21:49 mupuf: hmm, blocking the update of the power consumption monitoring still yields updates on nvidia-smi
21:50 mupuf: sooooooo.... there is another way
21:51 mupuf: oops, I crashed it again!
21:52 Lyude: same
21:54 mupuf: well, let's see if disabling the PMU stops updating the power meter
22:03 mupuf: Lyude: I agree though, this is suspicious that nvidia would be reading this value at all
22:04 Lyude: mm
22:05 Lyude: it's definitely not reading it directly; or it's doing something else with it. playing with the weights changes the register value, but not the smi output
22:05 mupuf: yeah, so I doubt this is the primary source
22:06 mupuf: I am trying to find the register that allows the host to control the i2c lines
22:12 Lyude: so, do we have any reason why we wouldn't be seeing any more suspecious looking reads in demmio from this?
22:30 mupuf: Lyude: g80_pnvio_i2c_bitbang --> this is where the selection is made to knoiw who controls the i2c bus
22:32 mupuf: Lyude: every 2 seconds, the blob reads the i2c bus 0
22:45 mupuf: I would likely investigate that more
22:45 mupuf: now, time to sleep