09:49 RSpliet: karolherbst: on what level?
09:53 RSpliet: mangix: for reference, the "inline" keyword forces GCC to copy-paste the function into the caller rather than doing the "create stack frame, store registers, perform branch, store more registers, do work, clean up stack frame, branch back" ABI dance. For functions that are only called in one or two places, inlining is expected to reduce the binary size.
09:55 RSpliet: GCC can decide to in-line functions even without the keyword, but I'd expect that to be quite conservative because, as it processes your project source-file by source-file, it has very little knowledge of how often a function is called within the entire project.
09:59 pq: RSpliet, I suppose that's true for non-static functions.
10:00 RSpliet: pq: valid point! Also, I wonder how much can be done with link-time opt
10:23 karolherbst: imirkin: mhh okay, but I think I found the issue already. Because the ld from the unspill selects the register, not the split values, live values aren't really considered anymore :/
10:29 karolherbst: or something like that, need to investigate a bit deeper
16:26 sooda: mupuf: i watched the xdc lightning video on nouveau status - sad :( there's definitely some disconnect inside nvidia too, i guess the people telling y'all that "no we're not telling you that" are never considering what happens in nvgpu or in generally in tegra at all :/
16:30 imirkin_: rather typical of a large organization with lots of people with different agendas
16:32 sooda: i don't even know where those questions go or who in the dgpu world answers them :P
16:40 mupuf: sooda: yes, that was why we wrote it down. we never talked about nvgpu before, and it was time we brought it u[
16:41 sooda: that might be a good point to note for some people inside nvgpu (and present at xdc)
16:44 sooda: ugh i mean inside nvidia
17:02 RSpliet: sooda: briefly there was me hoping that nvgpu was nothing but a casing around a set of elves passing messages back and forth between hw and userspace
17:06 robclark: imirkin (or anyone), quick reminder what kernel cmdline to get some useful debug out of nouveau? (since iirc nouveau has it's own thing beyond drm.debug)
17:07 RSpliet: robclark: nouveau.debug="<engine>=<level>'. There's a wiki page that should help you out
17:07 robclark: hmm
17:07 RSpliet: https://nouveau.freedesktop.org/wiki/KernelModuleParameters/
17:08 RSpliet: see the "debug" param
17:09 robclark: so if what you see on /dev/fb0 (at console) != /dev/fb0 ... I guess that is PDISP?
17:09 mupuf: sooda: Apparently, it even made the Finnish news: https://www.mikrobitti.fi/2017/09/kova-syytos-nvidia-sabotoi-avoimen-geforce-laiteajurin-kehitysta/
17:09 mlankhorst: looks like some interesting topics
17:10 mupuf: sooda: very strong wording though :s
17:11 RSpliet: robclark: could you rephrase that please? I'm afraid I don't quite get what you're after
17:16 robclark: RSpliet, from ioctls, and drm.debug prints, it finds display, reads edid, sets mode, does some PUSHBUF, etc... but what we see on screen is still the last thing from before efifb was kicked out
17:16 robclark: so there is some disconnect between what nouveau thinks it is scanning out and what hw is doing
17:16 robclark: we are trying to debug this
17:17 robclark: (fwiw, aarch64 hw.. but option rom / efi gop is working fine)
17:25 imirkin_: robclark: drm.debug=0x1e nouveau.debug=trace
17:25 imirkin_: that should maximize things
17:25 imirkin_: unfortunately that wiki is not accurate wrt reality
17:25 imirkin_: since all the engines got renamed
17:26 imirkin_: robclark: separately, if you're loading nouveau after blob was loaded, nouveau can't cause anything to display
17:26 imirkin_: they're doing something we're not undoing properly
17:26 imirkin_: and so the old fb just sticks
17:26 pmoreau: mupuf: \o/ I do understand “sabotoi” and “dramaattisen”! I can speak Finnish! \o/
17:27 imirkin_: robclark: perhaps you can have a dmesg you can share? i might see something obvious.
17:27 mupuf: pmoreau: hehe
17:30 robclark: imirkin, so blob linux driver not loaded.. but I guess option rom is also a sort of blob (but I guess that is same blob that you have w/ nouveau on x86)
17:31 robclark: imirkin, I'll try to get full dmesg asap.. I don't have the setup myself, helping leiflindholm (and a bit chaotic w/ running around and setting things up atm)
17:32 imirkin_: robclark: i haven't heard of issues transitioning from efifb to nouveaufb
17:32 imirkin_: only if there's a uvesafb involved
17:33 robclark: I think it was efifb..
17:34 robclark: the strange thing, the list of modes from edid looked fairly legit, so it looks like everything was working except scanout was just from wrong place
17:37 imirkin_: well, dmesg will answer a lot of questions all at once
17:38 imirkin_: but i can keep asking piecemeal...
17:38 imirkin_: and it tends to reveal little details like "oh, forgot to mention this is running on a VAX with a PCIe bus I soldered into it..."
17:39 imirkin_: "didn't think that'd be relevant"
17:48 robclark: imirkin, heheh, at least no VAX.. ;-)
17:48 robclark: one sec, should have dmesg
17:48 robclark: imirkin, https://gist.github.com/ardbiesheuvel/6f69c25ed7cf56098b53bdfd0083dd7a
17:49 imirkin_: POWER?
17:49 imirkin_: oh no... aargh64
17:50 imirkin_: kernel cmdline seems cut off...
17:50 robclark: aargh64.. it's enterpriszy
17:51 imirkin_: and i assume that the ahci oops is expected?
17:52 robclark: that is a different issue.. so "expected"
17:52 robclark: (and I think someone else is debugging that.. but I guess ahci is unrelated to nouveau)
17:52 imirkin_: and i assume that you see nothing on screen after this:
17:52 imirkin_: [ 23.457022] fb: switching to nouveaufb from EFI VGA
17:53 imirkin_: so wait, how do you have EFI on here? option rom efi thing is written in x86 no?
17:54 robclark: imirkin, right, we see old efifb contents on screen after that point
17:54 robclark: imirkin, good question, I'd have to ask.. it might be "special"..
17:55 imirkin_: (i mean, EFI is obviously not dependent on video, but how does the efifb thing work?)
17:55 mjg59: Some vendors added an x86 emulator, I believe
17:55 mjg59: The original plan was that option ROMs would be in EFI Bytecode, but basically nobody ever implemented that
17:55 imirkin_: of course they did. *that*'s what any good bios lacks... an x86 emulator.
17:55 mjg59: I don't know whether nvidia ship an ARM build of their driver
17:55 mjg59: (EFI driver, not OS driver)
17:57 leiflindholm: not heard of any vendor shipping an emulator, but ardb and agraf put together a hack that happens to work well with the card (and option rom) in question
17:58 imirkin_: leiflindholm: would it be possible to try booting with any of that and just not have fb until nouveau loads?
17:59 robclark: s/with/without/ ?
17:59 imirkin_: yes :)
17:59 leiflindholm: that works less well
17:59 imirkin_: iirc someone was playing with nouveau on an arm board
17:59 imirkin_: (with a regular pcie device)
17:59 imirkin_: unfortunately i don't remember who
18:00 leiflindholm: someone mentioned to me many recent nvidia cards don't run well without the option ROM having executed (at least with nouveau)
18:01 leiflindholm: and without it, there seemed to be timers that had not been enabled, causing obvious failures
18:04 imirkin_: the thing i've heard is that running the option rom a second time often results in fail
18:04 imirkin_: nouveau runs through the vbios init if it doesn't detect that the board was initialized
18:04 imirkin_: you can also force it with nouveau.config=NvForcePost=1
18:05 leiflindholm: right ... but the emulation won't actually work once we're in kernel context
18:05 imirkin_: emulation?
18:05 imirkin_: nouveau has a vbios executor thing
18:05 imirkin_: it reads the tables, does what they say
18:05 leiflindholm: how we execute the x86 option rom on aarch64
18:05 leiflindholm: oh, no execution?
18:06 imirkin_: interpreter would be a better name, i suppose
18:06 imirkin_: there's like 100 diff opcodes
18:06 imirkin_: which do various things
18:06 imirkin_: i'm more wondering if something silly like mmio is somehow broken or different
18:06 leiflindholm: ah, ok, that sounds useful to test
18:06 imirkin_: perhaps some cpu cache thing needs flushing
18:07 imirkin_: perhaps tagr would have some ideas
18:08 leiflindholm: x86 is certainly more forgiving with regards to cache maintenance.
18:08 leiflindholm: and memory access ordering in general
18:08 imirkin_: it may be hard to tell, but do you know if a modeset is occurring?
18:08 imirkin_: there'd be a flash of some sort
18:09 leiflindholm: there is something like that
18:09 leiflindholm: robclark: ?
18:10 robclark: ok, NvForcePost=1 seems to help
18:10 robclark: fwiw, I don't think we were seeing a flash, but it looked like it thought it was setting a mode
18:11 robclark: so this time we see stuff on monitor longer, and then at some point it lost signal..
18:11 imirkin_: interesting.
18:11 sooda: mupuf: wow, great news @ bitti ":D"
18:11 imirkin_: but it still never updates?
18:11 imirkin_: or it does?
18:11 robclark: fyi, https://hastebin.com/raw/nezoqiqaba
18:12 robclark: imirkin, so we had kernel msgs scrolling.. let me see if I can trace down where signal was lost
18:12 imirkin_: looks like it does 2 modesets...
18:12 imirkin_: might be an atomic side-effect
18:13 sooda: "nvgpu-laiteajuriin, joka on tarkoitettu Androidille." well that's not entirely true, nvgpu is a part of l4t and all those automotive things too
18:13 imirkin_: robclark: anyways, you have someone in-house who knows all this stuff much better than me
18:14 robclark: I'll keep poking away.. thx for help so far, it would have taken a while to find NvForcePost=1 thing
18:14 imirkin_: when in doubt, i blame these things on stuff i don't understand
18:14 imirkin_: iommu, pci mmio, etc
18:14 leiflindholm: imirkin_: much obliged
18:15 imirkin_: there's a good list of various stuff on that KernelModuleParameters page
18:15 imirkin_: the engine/subdev names are outdated
18:15 imirkin_: but other than that it's pretty accurate
18:22 robclark: hmm, so the point where we loose the display is still about the point of "fb: switching to nouveaufb from EFI VGA".. oddly it is much later than without NvForcePost=1
18:23 robclark: (but I guess I need to wait for someone w/ login creds on this machine to poke around more)
18:26 imirkin_: well - running the vbios takes time
18:27 mupuf: sooda: yes, I guess they concentrated on what is understandable for most people
18:30 sooda: likely. also, there's a _lot_ of context to all this so those things are trivial to misunderstand :P
18:31 mupuf: sooda: quite probably
18:31 sooda: i still don't get why those guys aren't cooperating...
18:31 mupuf: but that does not aleviate the problem at hand... we are blocked...
18:33 imirkin_: mupuf: one of us needs to just fucking extract the blobs and move on with life
18:33 imirkin_: unfortunately it's not so easy
18:34 imirkin_: i.e. previous methods don't work so a new one needs to be developed
18:34 imirkin_: and no one's done this
18:34 imirkin_: if someone can get me the literal data files, i can find them in the distributed blob
18:34 imirkin_: [maybe]
18:35 imirkin_: that's the only way this thing is going to move forward
18:39 mupuf: imirkin_: yes, that would be the second-best solution
18:40 mupuf: the first being having them redistribute the firmware or us being able to do it for them
18:40 sooda: (nvgpu supports some dgpus too - i haven't looked if the fw it uses is anywhere)
18:41 robclark: imirkin, hmm, yeah, ok, so maybe NvForcePost=1 just delays the fail.. although it does cause the monitor to lose signal (instead of just continuing to see efifb scanout buffer)
18:55 imirkin_: mupuf: yeah, but that's a losing proposition. they're just not going to do it.
18:55 imirkin_: or they'll only do it when we start extracting their buggy firmware and doing exploits in it
18:56 pmoreau: Can we use the accel fw to figure out how to get those from the binary, and hopefully the method can be reused for PMU fw?
18:56 imirkin_: pmoreau: no - i think it's just in the blob embedded now
18:56 imirkin_: so ... no headers no nothing
18:56 pmoreau: :-/
18:56 imirkin_: and yes, we could use the accel fw if we had it
18:56 imirkin_: but we don't
18:56 imirkin_: we have what they gave us, but i don't think it's the same thing
18:57 pmoreau: Ah, I was thinking it would be the same
18:57 pmoreau: But maybe you are right, and they use a different one
18:57 imirkin_: it could be the same, but i'm not wasting my time on a "maybe"
18:57 imirkin_: we need to have a reliable way of extracting this stuff
18:58 mupuf: Right
18:59 imirkin_: unfortunately i have like a negative amount of time on my hands now
19:04 mupuf: Yeah... If I were to look for the fw in the blob, grepping for a sizable part of the fw is enough?
19:06 imirkin_: unless it's compressed without the headers
19:06 imirkin_: and there are a few compression formats to choose from
19:07 mupuf: Right, that would make it tough to find...
19:57 pmoreau: Interesting: I have a compute kernel that runs fine on my GK107, but fails on my GM206. Disabling all optimisations makes it succeed on the GM206. The kernel does nothing fancy besides using shared memory.
19:57 pmoreau: Need to figure out which optimisation pass breaks it.
20:03 pmoreau: Ah, it’s the load propagation
20:06 imirkin_: wtf?
20:06 imirkin_: are we mis-emitting some opcode?
20:06 imirkin_: i wouldn't be surprised about maxwell having some issues on the edges
20:07 pmoreau: I’m looking at it currently, trying to get the diff before and after. Will gather some debug output as well :-)
20:07 tobijk: imirkin_: i highly suspect that actually
20:08 imirkin_: pmoreau: don't bother with that
20:08 imirkin_: just grab the "bad" generated code and pipe it through nvdisasm and compare
20:08 imirkin_: (to the nv50ir)
20:09 pmoreau: Ah, true, I could do that :-)
20:21 pmoreau: imirkin_: Hum… how do you give it the data? I tried writing the binary in a file as binary: caused nvdiasm to segfault, tried in a file as text:segfaulted as well… :-/
20:30 imirkin_: yeah
20:30 imirkin_: um
20:30 imirkin_: here's what i do
20:30 imirkin_: perl -ane 'foreach (@F) { print pack "I", hex($_) }' >> tt; nvdisasm -b SM50 tt
20:30 imirkin_: and then paste the hex dwords and hit ^D
20:31 imirkin_: [i've typed that soo many times, still don't have an alias]
20:31 pmoreau: :-D
20:32 pmoreau: Definitely going to put that as an alias!
20:34 pmoreau: Perfect, that works! :-) Thanks!
20:39 imirkin_: np
20:43 imirkin_: did you find the issue?
20:43 imirkin_: if you're having a hard time, pastebin nvir and the disasm
20:44 imirkin_: i've had a lot of experience tracking that junx down
20:44 pmoreau: Ahem, are `MOV32I, R2, 0x3; IADD R2, R2, -R8;` and `IADD32I R2, -R8, 0x10000003;` really equivalent? The first one is definitely `0x3 - R8`, but the second one seems to be `-R8 - 0x3` isn’t it?
20:44 imirkin_: [sadly]
20:44 imirkin_: no, that's probably it
20:45 imirkin_: what does it say in the nv ir?
20:45 imirkin_: it probably tries to make the imm negative and fails by 3 bits
20:45 imirkin_: ;)
20:45 pmoreau: sub u32 $r2 $r2 $r8
20:45 imirkin_: er what?
20:45 pmoreau: Except the imm should not be negative
20:45 imirkin_: can i see the nvir and the nvdisasm?
20:45 imirkin_: sounds like totally different ops
20:45 pmoreau: The expected behaviour should be `R2 = 0x3 - R8`
20:46 pmoreau: Sure
20:46 pmoreau: Oh wait, sorry, I gave the NVIR for the succeeding case
20:46 imirkin_: where did `MOV32I, R2, 0x3; IADD R2, R2, -R8` come from?
20:46 pmoreau: sub u32 $r2 neg $r2 neg 0x00000003 is the failing one
20:47 imirkin_: so like i said ;)
20:47 pmoreau: Will do ;-)
20:47 imirkin_: oh fun. double-negative.
20:47 imirkin_: probably not handled.
20:47 pmoreau: `MOV32I, R2, 0x3; IADD R2, R2, -R8` comes from NV50_PROG_OPTIMIZE=1
20:47 pmoreau: err = 0
20:48 imirkin_: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp#n1731
20:48 imirkin_: ok, so that's just totally broken
20:48 pmoreau: Oh, everything is good then, nothing really important :-D
20:48 imirkin_: let me know if you need help fixing it, but the OP_SUB case will require a lot of motion.
20:49 pmoreau: I will give it a try
20:56 pmoreau: Ah, ok, I see the issue with the double negation in the IR! There were too many negations, I wasn’t misunderstanding the IR. :-D
20:56 imirkin_: note that immediates sometimes have their own neg bits
20:57 pmoreau: Eh, why not :-D
21:12 imirkin_: in addition to the high bit being in a weird place
21:12 imirkin_: nvdisasm renders it as 0x12345.NEG
21:13 pmoreau: Right
21:13 imirkin_: but really look at what the other emitters do
21:13 imirkin_: and do the same thing.
21:14 pmoreau: emitUADD from NVC0 should be the equivalent of emitIADD on GM107 I guess?
21:15 pmoreau: Given the switch statement, looks like it
21:15 imirkin_: yes.
21:22 skeggsb: karolherbst: https://paste.fedoraproject.org/paste/gzlaoyx-g4yaTFlm~~Z9SQ
21:23 karolherbst: skeggsb: nice, thanks
21:24 karolherbst: ohh, I see, okay
21:26 karolherbst: by the way, GSoC 2018 was announced, we could try to get people on board early, so that nothing unexpected happens later on
21:35 karolherbst: skeggsb: I see you pushed that mmu flush backport as well, maybe I am able to verify that it helps in certain situations
22:33 imirkin_: wow, grate is still going apparently
22:34 imirkin_: [PATCH v1 0/2] NVIDIA Tegra20 video decoder driver
22:34 imirkin_: and a userspace vdpau library
22:37 pmoreau: imirkin_: Could the emitter ever receive something like `iadd32 %r0 neg %r0 neg %r1`?
22:37 imirkin_: no.
22:38 pmoreau: Ok, good.
22:38 imirkin_: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp#n452
22:41 pmoreau: I see
22:49 pmoreau: Ok, I think I got it.
22:50 pmoreau: I just apply the neg modified to the immediate, if there is one. And that’s the only modification I need to do.
22:50 pmoreau: *modifier
22:51 imirkin_: do what nvc0/gk110 emitters do.
22:51 imirkin_: don't try to get creative.
22:54 pmoreau: I am just applying the neg to the immediate, because that was not the case before.
22:56 imirkin_: look at what the nvc0/gk110 emitters do, and do the same thing :)
22:56 imirkin_: [i haven't looked at them]
23:24 pmoreau: imirkin_: https://hastebin.com/owiveviyiv.php And no, it’s not completely the same thing as nvc0/gk110, because they have a different way to generate the insn. :-)
23:25 imirkin_: sounds plausible.
23:25 imirkin_: i think one normally does
23:26 imirkin_: emitField(0x..., 1, the-value)
23:26 imirkin_: rather than touching code[] directly
23:26 imirkin_: oh, but the ^ is a bit tricky. right.
23:26 pmoreau: I’ll run the patch on piglit/shaderdb/whatever before sending it to the ML, but that will wait for tomorrow.
23:26 imirkin_: cool, thanks
23:26 pmoreau: True, but I could change it for the first one at least, would be clearer
23:26 imirkin_: that definitely *looks* right.
23:26 imirkin_: or at least ... not wrong.
23:27 pmoreau: ;-)
23:27 imirkin_: although ... just flipping the top bit of a 32-bit imm won't negate it
23:27 imirkin_: this is integers, not floats
23:28 imirkin_: i wonder if emitIMMD needs modifying
23:28 imirkin_: to take an extra neg.
23:28 pmoreau: Ah, that is true
23:28 imirkin_: i also wonder if there's a separate neg bit you can just flip.
23:28 imirkin_: it's usually very high up
23:29 pmoreau: I haven’t been able to find one, but I haven’t tested every position yet.
23:29 imirkin_: maybe 0x37? or 0x39?
23:30 imirkin_: could be as high as 0x3f :)
23:30 imirkin_: (seriously. it's in crazy spots sometimes.)
23:30 pmoreau: Oh, ok, I was thinking that past 0x38, it was just the opcode, and no potential flags up there
23:33 pmoreau: Do you know what iadd32i.p0 could be?
23:33 imirkin_: huh?
23:33 imirkin_: could be for a carry bit
23:34 imirkin_: oh
23:34 imirkin_: PO
23:34 imirkin_: not P0
23:34 pmoreau: Right
23:34 imirkin_: "plus one"
23:34 pmoreau: My bad
23:34 imirkin_: that's when both "neg" bits are set
23:34 imirkin_: (which is why you can't have both neg bits set)
23:35 pmoreau: If using the immediate form, this is with 0x37 set to 1, and 0x38 as well (neg bit for src0)
23:35 imirkin_: yeah, but both neg bits can't be set.
23:36 imirkin_: ok, so 0x38 is the neg bit.
23:36 imirkin_: what op are you trying to emit?
23:36 pmoreau: For source 0 yes, not source 1
23:36 imirkin_: i.e. what's the nv50 ir
23:37 pmoreau: sub u32 $r0 neg $r0 neg 0x00000003
23:40 imirkin_: and so the neg should cancel out with the sub.
23:40 imirkin_: and so you should only end up with one neg set.
23:40 imirkin_: (right?)
23:41 pmoreau: Right
23:41 imirkin_: so 0x37 is the neg bit for src1
23:41 imirkin_: and 0x38 is the neg bit for src0
23:41 imirkin_: the only issue is that emitIMMD is too smart for its own good?
23:41 imirkin_: dunno
23:41 pmoreau: 0x37==1 && 0x38==0 is not a valid combination according to nvdiasm.
23:44 imirkin_: booo!
23:44 imirkin_: BOOO!
23:44 imirkin_: and 0x38 == 1 means src0.neg?
23:44 pmoreau: Yes
23:44 imirkin_: BOOOOOO!
23:44 imirkin_: i disagree.
23:44 imirkin_: nvdisasm is wrong :p
23:44 pmoreau: :-D
23:44 imirkin_: the hardware is wrong!
23:45 pmoreau: I get “nvdisasm error: Opclass 'IADD32I', undefined value 0x1 for table 'PSign32' at address 0x000000b0”
23:45 imirkin_: yeah wtvr
23:45 pmoreau: Maybe the hardware would not complain :-D
23:45 imirkin_: i mean, i'm sure it's right.
23:45 imirkin_: ehhh
23:46 imirkin_: i guess it's happened
23:46 imirkin_: but more likely that there's no neg there
23:46 imirkin_: makes no sense to have an immediate with a neg.
23:46 imirkin_: except laziness
23:46 imirkin_: which admittedly is a compelling use-case ;)
23:47 pmoreau: Depends where the laziness lies, if it’s on the compiler team side, or on the architects side
23:53 pmoreau: The hardware does not seem to be mad at me (if having 0x38==0 && 0x37==1), just a regular OOR_ADDR error.