04:09 driverb: FAQ
04:10 driverb:faq
04:11 driverb: quit
04:11 HdkR: Fracken FAQ
10:05 karolherbst: imirkin: ping on https://patchwork.freedesktop.org/series/40199/ and https://patchwork.freedesktop.org/series/40754/
10:05 karolherbst: limm and RA stuff for post RA mad load propagation
13:26 karaul32: so karolherbst: do you need any assistance how microcontrollers and processors work? how to reverse engineer the firmware of power management?
13:27 karaul32: including the underlying alus of crypto, i.e signature stuff, lately we talked about this, some slavic named hacker did similar stuff to ME that of intel processors
13:29 karaul32: powerlines are kinda of allmighty, if you have access to playing with the lines, you can go passed any hw bug too probably, if usual methods to boost performance do not work
13:33 RSpliet: karaul32: we've done a fair bit of work on REing the firmwares running on the falcon μcs. Additionally, we have code for uploading signed firmwares which expose how the signature checking is triggered. IIRC it's an asymmetical crypto signature for which the pubkey and signature check routine is hard-wired in HW, and the privkey is in the hands of NVIDIA and NVIDIA only
13:34 RSpliet: We understand the basics of how microcontrollers work and have their ISA sort-of figured out (... and newer ones will start implementing RISC-V instead of fμc \o/), any help on the more advanced REing topics I'm sure is welcome :-)
13:34 imirkin_: RSpliet: don't get joss'd
13:34 RSpliet: imirkin_: tnx
13:35 karolherbst: RSpliet: well uploading signed firmware is pretty boring though
13:35 imirkin_: i've pushed my script btw, for extracting stuff from new versions
13:35 imirkin_: not sure if anyone saw
13:35 RSpliet: karolherbst: upload, press "verify" button, see HW run... or not
13:36 karolherbst: RSpliet: no, generally
13:36 karolherbst: running signed binaries is always boring
13:36 karolherbst: runnning _unsigned_ binaries is where it starts getting interesting
13:38 karolherbst: RSpliet: and are you sure it is async?
13:38 karolherbst: the signature is just 128 bits, which would be like brutally weak for async crypto
13:39 karolherbst: md5 is 128 bit digest size
13:39 karolherbst: and doing 4k RSA + md5 is kind of ... pointless
13:40 RSpliet: Mmm, no I'm not sure. I'm somewhat surprised if it isn't, because it's a cheap protection against CMOS reverse engineering attempts
13:41 karolherbst: well we have a 128 bit signature :)
13:41 karaul32: what about pointers, do you need any assistance how they function? there is a buffer descriptor in particular, for maximum 32times the whole superreg, and the real cache entry that uses addr and load destination registers..what address belongs to what reg is determined how you call stuff, is it access cohrent , access consistent or random etc.
13:43 karolherbst: RSpliet: well actually we have 2x 128 bit. one is for production and one for developer boards or something
13:48 karaul32: the theory of this originates from times of millenium, but i did enhance their versions a bit, and in practice if stuff is called correctly, then only with scheduling thousound or so fold improvement becomes possible cause of the num_records pointer instances of the buffer descriptor which get flipped from addr flops in hw to execute 32 pixel vectors in fragment shader at time
13:49 karaul32: so if the hw has no bugs there, then powerline correction of the pipeline i mean changing it arbitrarily like needed, is not needed too
14:04 karaul32: the whole point is currently NVIDIA hides the latency with balancing the occupation by freeing regs of long latency ops, it uses those regs to refetch and decode new instructions finding ILP, but it is also possible to freeze the alus and halt the wavefronts alltogether and use the reordered instances to work on pixel arrays
14:05 karaul32: which would end up performing like a bullet, so pascal and vega can be further improved
14:42 PaulePanter: Hi. A user reported a stalled X session.
14:42 PaulePanter: https://paste.debian.net/1020579/
14:43 PaulePanter: (EE) NOUVEAU(0): [DRI2] DRI2SwapComplete: bad drawable
14:43 PaulePanter: nouveau: kernel rejected pushbuf: Device or resource busy
14:43 PaulePanter: Should I report a bug?
14:43 imirkin_: probably something in dmesg
14:43 imirkin_: not sure what i'd do with it...
14:43 PaulePanter: The errors in the X.Org X server log goes on.
14:44 imirkin_: "random non-reproducible issue happened"
14:44 PaulePanter: imirkin_: Let me check, but I didn’t notice anything before.
14:45 PaulePanter: imirkin_: Sorry, I was *very* blind, because the user rebooted the system, and I only looked in `dmesg`. :(
14:46 PaulePanter: imirkin_: https://paste.debian.net/1020582/
14:48 PaulePanter: From this boot, I see the messages below already.
14:48 PaulePanter: [24345.413226] nouveau 0000:01:00.0: gr: TRAP DISPATCH_QUERY
14:48 PaulePanter: [24345.413228] nouveau 0000:01:00.0: gr: no stuck command?
14:48 PaulePanter: [24345.413241] nouveau 0000:01:00.0: fb: trapped write at 0020215000 on channel 7 [1f7c3000 timetunnel[10223]] engine 00 [PGRAPH] client 03 [DISPATCH] subclient 02 [QUERY] reason 00000002 [PAGE_NOT_PRESENT]
14:49 PaulePanter: Anyway, any hints on how to proceed to help solve this, would be great.
14:57 imirkin_: nothing useful from me, sorry
14:58 PaulePanter: So report the issue to the bug tracker, or will it just stay there?
14:58 imirkin_: you can if you like
14:58 PaulePanter: … with nobody having time to look at?
14:59 imirkin_: not a question of time
14:59 orbea: PaulePanter: at the very least the issue wont get lost then
14:59 imirkin_: there's just nothign to look at in there
14:59 imirkin_: "gpu hung"
14:59 PaulePanter: Understood.
15:00 karolherbst: imirkin_: I think we have a bug there if userspace gets rejected anyway
15:00 karolherbst: or something terrible hapens and something just still continues
15:01 karolherbst: I think we should think about how we make such bugs easier to trace back where the root cause is
15:04 karaul32: anyways i won't tell how to call things, but i assume you'd be capable to find it out, of course i know 100percent that it would work, frankly some frekazoids delusional ones make my life very difficult, and i am not interested in doing any work for them to screw me and humiliate me in the instution and court
15:08 imirkin_: karolherbst: sounds good. as-is, it's untraceable.
15:09 karolherbst: yeah
15:09 karolherbst: just I have no idea how to make that better traceable
15:09 karolherbst: do you have some ideas?
15:19 karaul32: we just end up seeing how those delusional ones fail in everything like they have done troughout their lives, just ignoring that and playing heros while diagnosing me, not a simple kind of acheivement to show , not a primitive one either, always a whitewash coming by when going against me, still saying i am crazy
15:20 imirkin_: karolherbst: sorry, nope
15:22 karaul32: well country is sold, all businesses sold, every it solution fucked up and cracked and scammy, themselves not comprehending any security risks involved, i'd say later when you screwd everything up i was doing my sentence in the insitution while you bullied with me and screwd all up that was ever possible
15:23 gnarface:suddenly feels this isn't just about video drivers
15:25 imirkin_: it's about trolling. ignore.
15:32 karaul32: i posted everything but fine-grained details, my lawyer says ones need a proof of concepts, i said i tend to know how my invented code works, and i won't hand over the solutions to trolls so theyd screw me every day
15:41 karaul32: very simple stuff, recently as you have lifted your levels of understanding i as said, assume as i gave lots of hints in pm and on the channels, prolly you RSpliet and karolherbst and maybe even imirkin would dig the method up, however it is known that our local terrorists do not understand shit
16:05 PaulePanter: imirkin_, karolherbst: I see a lot of `nouveau 0000:01:00.0: gr: TRAP DISPATCH_QUERY` in the logs.
16:05 PaulePanter: Any options to increase the debug level?
16:06 PaulePanter: Searching the Web for that doesn’t give any hint, which is strange.
16:06 imirkin_: i haven't seen it before either
16:07 gnarface: grsecurity patch, PaulePanter?
16:11 PaulePanter: gnarface: No. Plain Linux.
16:11 PaulePanter: *upstream Linux
16:34 cliff-hm: https://bugs.freedesktop.org/show_bug.cgi?id=106077 - is this your bug report PaulePanter ?
16:40 PaulePanter: cliff-hm: Yes, it is.
16:45 cliff-hm: K. I have no real knowledge. I was just trying to google the bit of error message to understand where it is coming from. Not in mesa code, but within the kernel - drivers/gpu/drm/nouveau/nvkm/engine/gr/nv50.c code, which I know nothing about, but maybe hardware
17:39 Lyude: mupuf: poke; you around? wondering if you might have some insight on this power consumption sensor issue i'm having with a maxwell1 card here
17:39 mupuf: Lyude: go for it!
17:39 Lyude: mupuf: so: nouveau doesn't detect any sort of power consumption sensors, but I'm getting valid power consumption readings through nvidia-smi
17:39 Lyude: (jfyi: this is a NV117, vbios is uploaded to the vbios repo)
17:39 mupuf: ha, cool!
17:40 mupuf: cool, pm me the name of the vbios, I'll check it outt
17:40 Lyude: mupuf: should just be nv117/Lyude/vbios.rom
17:42 Lyude: trying to scan all of the i2c ports on this but I don't see any sort of transactions happening when I launch nvidia-smi, so I'm wondering if maybe it's communicating with whatever sensors this thing has without using i2c?
17:42 Lyude: imirkin noticed a suspecious extdev as well: EXTDEV 0: type 0xa0 [INTERNAL_A0] at 0xa8 defbus 0
17:44 Lyude: (jfyy; trying to get this working so I can see the power consumption difference with blcg/slcg on maxwell1)
17:45 mupuf: hmm, the communication may come from the PMU
17:45 Lyude: that's what i figured, seeing as there's ptimer reg read/writes over mmio when I run nvidia-smi I believe
17:46 mupuf: starting from maxwell 2, the PMU is responsible for reading the power usage
17:46 mupuf: where did imirkin_ see the suspicious extdev? It is not in your vbios
17:47 Lyude: hopefully I didn't pull that from the wrong vbios
17:47 Lyude: sec
17:47 mupuf: you may not be able to see the i2c device from the host because the line is connected to the PMU
17:47 mupuf: there is a register to say who is responsible for the SDA and SCL lines
17:47 Lyude: makes sense, which means we just need to figure out how to communicate with the PMU correct?
17:48 Lyude: erm, in the context of reading sensors
17:49 mupuf: Lyude: well, no, if possible, let's just read the stuff ourselves from the CPU ;)
17:49 imirkin_: mupuf: my GF108
17:49 mupuf: but the vbios may enable the pmu-controled mode for the i2c buses
17:49 imirkin_: quadro 600
17:50 imirkin_: DP + DVI outputs, fwiw
17:50 mupuf: imirkin_: ok. And I guess nvdia's doc did not help?
17:50 imirkin_: didn't see anything about extdev's
17:53 Lyude: imirkin_: if we're just reading it with the CPU then I guess the appropriate thing would be to try to see if we can take control of those i2c lines away from the pmu vs. what's set by the vbios, correct?
17:54 mupuf: imirkin_: you may be right, indeed, Can't find it in the DCB doc
17:58 Lyude: oh. or I could just find the 0x200f8 register that is currently looking very sus
18:05 mupuf: Lyude: are you suggesting the HW is computing its own power consumption?
18:05 mupuf: there are blocks for that... but nvidia never used them
18:05 mupuf: so... it is possible that they do it now, but... it would be odd
18:07 mupuf: Lyude: https://phd.mupuf.org/files/xdc2013-nvidia_pm.pdf <-- FSRM
18:07 Lyude: mupuf: I mean; I see a value that's constantly fluctuating in that register and we seem to read it a bunch in the mmio trace of nvidia-smi I did, and sure enough if I mess with the power on either nouveau or nvidia's (through incurring load) the values seem to fluctuate a lot more along with rise and lower
18:08 mupuf: oh, well, nvidia maybe finally made it work!
18:08 mupuf: if so, you should see a lot of additional writes everywhere to set up the weights
18:08 Lyude: Every time the register is read? or during device init
18:10 Lyude: also: keep in mind this card has no molex connectors. It's powered solely by the PCIe port
18:24 mupuf: Lyude: the fact it is entirely powered through the PCIe port makes sense
18:24 mupuf: Lyude: no, it would be once during device init
18:24 mupuf: or after suspend
18:25 mupuf: so, we may misunderstand the power line table, if this is really what is going on
18:25 mupuf: I will have a look at your mmiotrace when I come back home
18:25 mupuf: this is super interesting
18:26 Lyude: sure thing :), I'll take a look as well
19:36 Lyude: interesting; the maxwell1 card in this t460p seems to have the same register active; and I'm guessing for the same reason of there not being an external power supply
19:37 Lyude: likewise; I think this pascal card doesn't have the register but does have an external power supply. On that note: does anyone remember the value I should be expecting to see if I try to access registers that you need to be in high-security mode for?
19:39 karaul32: i went as far as reading every bit of hw design, frankly there is GCN's miaow which implements the shared and global memory pointers, when you go thinking global memory pointers is only something that foes through onchip vram then you are mistaken twice, first it is cached, and second its addr and dest can be exchanged with temps, i.e register arrays
19:45 mupuf: Lyude: yeah, this reg is always there... but the question is whether it is calibrated or not
19:45 mupuf: ok, finally going back home
19:45 karaul32: so in an event say, cache is some lru, you go changing the cache line to be modulo of earlier access where the other side is the pointer you access, as i was saying registers in hw only retrigger the alu, when their value has been chancged, and depending on the access it is either atomic or absolutely randomly ordered
19:50 karaul32: what happen is when you take a modulo of the pointers address and take it back to original value, all the addresses except the traced captured natural stream reg which has write conflict, you effectively point the whole array against a single element one sorta, the one you captured, the other thing is that the fragments of free registers will formulate new array indices
19:52 karaul32: the thing is all of the sudden the register which worked on single pixel vector, can work on a range of pixel vectors as how many have you freed from long latency ops
20:34 Lyude: mupuf: poke me when you get back, curious if you might know anything about [0] 34.503283 MMIO32 R 0x020130 0x81071400 PTHERM.SENSOR_CALIB_0[0x3] => { 0x81071400 }
20:34 Lyude: that looks like a read though, so probably not programming weights
21:01 mupuf: Lyude: doesn't it come straight from the fuses?
21:02 mupuf: pretty sure it did
21:02 Lyude: mupuf: pardon?
21:02 mupuf: PTHERM.SENSOR_CALIB_0 --> the value is directly sourced from the fuses IIRC
21:02 mupuf: this is factory-calibratred
21:03 Lyude: ah, was curious if it had any relation to the 200f8 values
21:03 Lyude: they're definitely not an exact representation of the wattage, so I don't think that register has calibrated values
21:04 Lyude: https://paste.fedoraproject.org/paste/Nv9ehASUTtuD~nAT1ghYgQ keep in mind: it's very likely some of the wattages reported there by nvidia-smi are not in sync with the actual register values being displayed since they update at different intervals
21:06 mupuf: Lyude: point clouds would show you a better trend ;)
21:08 Lyude: aaaa, i've never heard of those before lol. looking at ddg, but mind giving some examples?
21:09 mupuf: they are usually called scatter plots
21:09 Lyude: oh! i know what that is
21:09 mupuf: https://www.mathsisfun.com/data/scatter-xy-plots.html
21:10 Lyude: hm, before I continue wasn't there some trick you used to control the load% on the nvidia driver?
21:10 mupuf: Lyude: no, I always faked a load :s
21:11 mupuf: but that was to RE the reclocking policy
21:11 Lyude: ahh, so yeah that wouldn't do much for power consumption
21:11 mupuf: indeed
21:11 mupuf: but honestly, I think you are quite likely not going the right way
21:11 Lyude: oh?
21:11 mupuf: they introduced this in 2006
21:11 mupuf: and no usage until maxwell?
21:12 mupuf: pretty sure they have this as a backup, in case some GPUs are shipped and some new loads exceed the expected power budget
21:12 mupuf: this cannot be accurate because it does not take into account the voltage
21:13 mupuf: the weights would have to be calibrated for the highest voltage
21:15 Lyude: So do you think it's likely that nvidia-smi is getting this information some other way? or are you just getting at that we need to figure out how to calibrate the weights for this in order to get an accurate reading
21:16 mupuf: Lyude: pretty sure they have a real power meter
21:16 mupuf: I could check on my maxwell 1
21:16 mupuf: sweet, it is already plugged
21:20 mupuf: fun. I do not remember nvidia ever exposing power consumption on thois gpu before
21:20 mupuf: oh well, it is there now
21:20 mupuf: so let's have a look!
21:21 Lyude: yeah I don't think they used to
21:22 mupuf: Power Draw : 1.67 W --> not bad for an idling discrete GPU
21:22 Lyude: right? it's even lower here
21:22 Lyude: 0.87W
21:22 Lyude: man page says that value is ±5W though
21:22 mupuf: 1W now that I started X
21:28 karolherbst: mupuf: on GPUs without power meters the reported power consumption isn't really realible afaik
21:28 karolherbst: I might be wrong though
21:28 Lyude: i mean, if you're not wrong though then that would explain why we're seeing reads on that register when playing with nvidia-smi
21:29 mupuf: Lyude: why would it wait on nvidia-smi?
21:29 mupuf: there is a power budget, this should be polled every seconds at least
21:29 Lyude: ah, I did not realize that last part :s
21:30 mupuf: well, time to edit some of the weights and see if we get a different value
21:30 Lyude: where are the weights btw?
21:32 mupuf: a bit everywhere
21:32 mupuf: but the last levels are in ptherm
21:33 mupuf: I never documented that, because I wanted to actually find which one was which engine
21:33 mupuf: but... time got the best of me
21:33 mupuf: the better is the enemy of good!
21:34 imirkin_: and time always wins
21:35 mupuf: that too :p
21:35 mupuf: Lyude: hmm, the update rate got drastically lowered. It used to be 1kHz
21:35 mupuf: now, it is 33 Hz
21:37 karolherbst: mupuf: maybe the increase it under load?
21:37 mupuf: karolherbst: I doubt it. It still is fast-enough
21:37 karolherbst: yeah, right
21:38 karolherbst: well
21:38 karolherbst: worth checking though
21:38 mupuf: oopsie, I fuzzed at the wrong location O:-)
21:38 karolherbst: or that
21:39 karolherbst: mupuf: I still want to figure out the mystery why my GPU was throttled with nouveau for going over the battery power budget
21:39 mupuf: karolherbst: yeah, that may indicate they fixed this issue :o
21:39 mupuf: but this must involve the pmu if they did this
21:39 karolherbst: what issue?
21:40 karolherbst: you mean throttling on battery?
21:40 mupuf: no, computing the power consumption
21:40 karolherbst: ahh
21:40 mupuf: no way they do everything in HW
21:40 karolherbst: or maybe not if they document +-5
21:40 mupuf: karolherbst: where did you see that?
21:40 karolherbst: "<Lyude> man page says that value is ±5W though"
21:40 Lyude: yep
21:41 Lyude: man nvidia-smi
21:41 mupuf: karolherbst: well, for the highest GPUs, this would be expected
21:41 karolherbst: 1W is too low
21:41 karolherbst: way too low
21:41 mupuf: you have a point
21:41 Lyude: yeah, I was thinking that...
21:41 karolherbst: I would expect around 10W for the maxwell1
21:41 karolherbst: as the lowest
21:41 Lyude: but then why even offer the feature on nvidia-smi if it's that innaccurate
21:42 karolherbst: or maybe even 6 or 7W
21:42 karolherbst: it is around 60W TDP
21:42 karolherbst: soo
21:42 Lyude: ~6W sounds like it would be in that margin of error
21:42 karolherbst: yeah
21:42 karolherbst: Lyude: why not?
21:42 karolherbst: values under load are kind of accurate enough
21:42 karolherbst: nobody cares about idle power consumption really
21:42 karolherbst: and if you care, you measure differently
21:42 Lyude: true
21:43 mupuf: 200f0's bit 20 is the enable for the power measurement
21:43 mupuf: karolherbst: try disabling that on your laptop ;)
21:43 karolherbst: mhh
21:44 karolherbst: I am wondering still how the throttling happens
21:44 mupuf: karolherbst: exactly, idle power usage should be nuts
21:44 mupuf: well, you'll see if this is that
21:44 mupuf: because if it is, then it is already all documented
21:44 karolherbst: I meant why does it get throttled at all
21:44 karolherbst: or how
21:44 karolherbst: the clocks don't get lowered
21:44 karolherbst: or maybe they get somewhat
21:44 karolherbst: or somehow
21:45 mupuf: oops, I crashed the computing :D
21:45 karolherbst: duh
21:45 Lyude: btw mupuf, mind if I get some of the register offsets for those weights so I can play around with them as well?
21:46 mupuf: Lyude: you start by having fun with 200f0-4
21:46 mupuf:is trying to find the addresses again
21:49 mupuf: hmm, blocking the update of the power consumption monitoring still yields updates on nvidia-smi
21:50 mupuf: sooooooo.... there is another way
21:51 mupuf: oops, I crashed it again!
21:52 Lyude: same
21:54 mupuf: well, let's see if disabling the PMU stops updating the power meter
22:03 mupuf: Lyude: I agree though, this is suspicious that nvidia would be reading this value at all
22:04 Lyude: mm
22:05 Lyude: it's definitely not reading it directly; or it's doing something else with it. playing with the weights changes the register value, but not the smi output
22:05 mupuf: yeah, so I doubt this is the primary source
22:06 mupuf: I am trying to find the register that allows the host to control the i2c lines
22:12 Lyude: so, do we have any reason why we wouldn't be seeing any more suspecious looking reads in demmio from this?
22:30 mupuf: Lyude: g80_pnvio_i2c_bitbang --> this is where the selection is made to knoiw who controls the i2c bus
22:32 mupuf: Lyude: every 2 seconds, the blob reads the i2c bus 0
22:45 mupuf: I would likely investigate that more
22:45 mupuf: now, time to sleep