15:26 oday: I'm still trying to make optimus passthrough work
15:26 oday: I've patched nv-acpi.c with a hardcoded vbios and gotten nvidia to load on a Linux vm
15:26 oday: Next step is, obviously, windows
15:27 oday: But first, I need to get acpi rom loading working on unpatched nvidia drivers, with a custom acpi table
15:52 oday: Cont.: jscinoz came up with https://github.com/jscinoz/optimus-vfio-docs/blob/master/asl/ssdt1.asl and it works on his system, but
15:52 oday: only for Linux guests, and it doesn't work at all on my system
15:53 oday: dmesg shows an infinite loop in FWRD when _ROM calls ROMG calls FWLD calls FWRD
15:54 oday: Well, it says ACPI Error: AE_AML_INFINITE_LOOP
15:55 oday: And the names of these functions, starting from _ROM to FWRD
15:55 oday: And the ssdt only has 2 loops in its entirety, so
15:56 oday: But there's really no conceivable reason for FWRD to infinite loop here
15:56 oday: I mean, the loop is literally just for (Local2 = 0, Local2 < Local0, Local2++) { Index(Local1, Local2) = DATA }
15:58 imirkin: Local0 = -1?
15:58 imirkin: er no, that wouldn't matter
15:59 oday: dmesg also shows: Arg0 is an integer = 19600, Local0 is also an integer = 19600, Local1 is a buffer of 103936 which contains the correct ROM data (well, dmesg showed the first 8 bytes, which were all correct) and Local2 is also an integer = 10000
16:00 oday: Link to the SSDT again: https://github.com/jscinoz/optimus-vfio-docs/blob/master/asl/ssdt1.asl
16:00 imirkin: (yeah, i'm not sufficiently interested to dig into it... but good luck)
16:01 oday: ty, no
16:01 oday: *np
16:02 oday: No is also accurate though, as I've had everything but good luck working on this issue
16:41 oday: Hmm, it went AE_AML_INFINITE_LOOP at exactly 0x10000 loops
16:41 oday: Since you're incrementing Local2 and Local2 was at 10000
16:41 imirkin: sounds like there's something about AML that you don't know
16:41 oday: Maybe there's a recursion limit, after which point it gives you that error?
16:42 oday: @imirkin: ?
16:42 imirkin: that's what always happens whenever i look at AML
16:42 oday: ^^^^^^^^^^^^^^^^^^
16:42 imirkin: it turns out to be some thing that goes counter to all programming concepts i'm familiar with
16:42 oday: Oh I found this
16:42 oday: https://www.acpica.org/node/131
16:43 karolherbst: yeah, aml is weird
16:43 oday: "Increased the maximum loop count value that will result in the AE_AML_INFINITE_LOOP exception. This is a mechanism that is intended to prevent infinite loops within the AML interpreter and thus the host OS kernel. The value is increased from 0xFFFF to 0xFFFFF loops (65,535 to 1,048,575)."
16:44 karolherbst: some company paid big money for this
16:44 oday: This is version 20160930
16:44 karolherbst: they could just fix their broken fw instead
16:44 karolherbst: I don't even want to hear the story about this "fix"
16:46 oday: The code I'm having a problem with isn't from a company, though
16:46 karolherbst: right, don't do big loops in firmware
16:46 oday: Though, again, this fix did get added for some reason
16:48 oday: Ok, that shouldn't trigger a >FFFF exception
16:48 oday: Yay it worked
16:49 karolherbst: well, if you need to run a loop above 65k times to find an element
16:49 karolherbst: you are doing it wrong
16:49 karolherbst: *over
16:49 oday: I was trying to reimplement _ROM for a VM
16:50 oday: And it was supposed to read from qemu fw_cfg
16:50 karolherbst: right, just suggesting that there might be a much smarter way to do it
16:50 oday: Yeah, probably
16:51 oday: I'm just trying anything I can think of until something sticks and loads on windows
16:52 oday: Once I figure that out, should be trivial to fully rewrite it
16:52 karolherbst: :)
16:52 oday: Thanks for the suggestion
17:03 pmoreau: Why are things so complicated --"
17:05 pmoreau: Trying to run an OpenCL program, only NVIDIA GPU available: Before uninstalling Intel’s beignet: https://hastebin.com/ucoloxusan.vbs , and after: https://hastebin.com/fasizakojo.js no other changes
17:10 imirkin: icd
17:10 imirkin: i wonder if it just dies on probe
17:25 karolherbst: pmoreau: beignet has a buggy ICD implementation
17:25 karolherbst: both sides
17:26 karolherbst: and I think there is even an evil buffer overflow on the client side
17:26 pmoreau: :-/
17:26 karolherbst: run it with valgrind :)
17:26 karolherbst: anyway, the system OpenCL implementation should be ocl-icd
17:26 pmoreau:spent way too much time trying to get NVIDIA prime to work today
17:26 karolherbst: or something else with a working ICD implementation
17:27 pmoreau: I have ocl-icd installed, but maybe there is an issue with my setup.
17:27 karolherbst: doubtful
17:27 karolherbst: a icd server parses all /etc/OpenCL/vendors/*.icd files
17:27 karolherbst: and just dlopens those libraries
17:28 karolherbst: and calls the icd client functions
17:28 karolherbst: pmoreau: install pocl and see if it still works
17:29 karolherbst: I think the issue I had with beignet as the system OpenCL impl was, that it didn't even bother to search for other OCL implementations
17:31 pmoreau: It no longer does
17:32 karolherbst: as in it selects the wrong device or it crashes?
17:33 pmoreau: It prints https://hastebin.com/osuvelasim.vbs and does nothing
17:36 karolherbst: mhh, weird
17:36 karolherbst: maybe your setup is indeed broken :p
18:16 pmoreau: It looks like it comes from a bug in LLVM: https://bugs.llvm.org/show_bug.cgi?id=30587#c9
18:17 pmoreau: If each OpenCL implementation links against a different version of LLVM, it’s fine. If two or more links against the same version, it fails.
18:46 karolherbst: ....
19:13 imirkin: hakzsam: what was the thing to enable bindless in feral games?
19:14 imirkin: i might just try to start DOW3
19:15 imirkin: GT 730 should be enough for that, i hope ;)
19:18 imirkin: aha. https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=nouveau&date=2017-06-24 has the answer.
20:09 imirkin: interesting. i appear to be ending up with multiple copies of the same handle in my resident list.
20:09 imirkin: that could cause all kinds of fail...
20:16 oday: Now that the nvidia driver loads and my acpi code is a fair bit cleaner, I'm trying to get some sort of graphical output working
20:16 oday: On a Linux VM, that is
20:17 oday: Can I use QXL for that?
20:17 oday: With the modesetting driver
20:18 imirkin: you're asking in the wrong chan
20:18 oday: Which one should I be asking?
20:19 imirkin: one that has something to do with operating the vm software you're using
20:19 oday: I see
20:19 oday: I'll try #vfio
20:20 oday: BTW I'm not trying to figure out how to get any graphical output working on the vm
20:20 oday: I'm trying to figure out how to get a graphical output from the passed-through nvidia gpu, which in itself has no outputs
20:21 imirkin: oh, that's easy then
20:21 imirkin: you can't :)
20:21 oday: ..?
20:21 oday: I mean it's an Optimus GPU
20:21 imirkin: ... ok
20:21 oday: So it's not directly connected to any outputs
20:22 imirkin: "i have this thing which has no graphical outputs. how do i get graphical output from it"
20:22 oday: But I can get graphical output from it on baremetal
20:22 oday: Oh that's what you mean
20:22 imirkin: either it has outputs connected to it, or it doesn't
20:22 imirkin: it's a pretty binary thing
20:22 imirkin: optimus vs not optimus is irrelevant
20:22 oday: I'm looking for some help on getting something like PRIME to work with qxl or some vgpu setup
20:23 imirkin: i see. so then you want some *other* video device which has a graphical output
20:23 imirkin: and do render offload onto the nvidia gpu
20:23 oday: Exactly
20:23 oday: Sorry for the miscommunication
20:23 imirkin: so then ... do that.
20:24 oday: Well, when start up the vm with qxl and the nvidia gpu (and all xorg drivers installed), I get providers: 1 on xrandr
20:24 oday: And it refers to qxl
20:24 oday: So I assume I'd have to manually configure xorg
20:24 imirkin: are you using nouveau or blob?
20:24 oday: blob
20:25 imirkin: blob doesn't play well with offloading
20:25 imirkin: there are a number of guides, i think, that explain how it can be set up
20:25 imirkin: but it's nothing straightforward.
20:25 oday: I've gone through the one on the gentoo wiki
20:25 oday: Nothing but "no screens found"
20:26 oday: Xorg.0.log showed some pretty bizarre errors
20:26 oday: And sometimes none at all
20:26 imirkin: either way, i'm sure nothing relevant to nouveau.
20:27 oday: Yeah, so is there a channel in which it would be more appropriate to ask this question?
20:27 imirkin: dunno, some end user support thing that deals with blob stuff?
20:27 oday: LUL
20:28 imirkin: "thank you for choosing nvidia hardware. we appreciate that you have a choice in gpu vendors, and you appear to have chosen poorly."
20:28 oday: My gpu isn't a quadro
20:28 imirkin: and you didn't buy a million of them.
20:28 oday: Yep
20:29 oday: Don't think I could even ask on the official nvidia forums without getting:
20:29 oday: "This isn't possible on non-Quadro hardware" Thread Locked
20:30 oday: Which is, of course, bullshit, because literally the only difference is that there are some checks on the windows driver that make it fail to load if it detects it's running under a hypervisor
20:30 oday: These checks are very easily bypassed, by the way
20:33 imirkin: https://www.youtube.com/watch?v=A-7-5f3DwmY
20:44 Guest98: What would stop /sys/kernel/debug/vgaswitcheroo/switch from turning off the nouveau card on AC, but it will turn off on battery? Is there a way to force the card off while on AC when echoing to vgaswitcheroo doesn't work?
20:45 imirkin: fairly old hw right?
20:45 Guest98: older then maxwell yes
20:45 imirkin: i was thinking older than like 2008
20:46 Guest98: Oh no, it is a mobile kepler card.
20:46 imirkin: ok, i wasn't aware that you could even do anything with vgaswitcheroo/switch on anything semi-modern
20:46 imirkin: i thought it was all runpm-controlled
20:47 imirkin: and vgaswitcheroo just reports the status
20:47 imirkin: some GPUs don't auto-suspend because they have a phantom VGA output situation
20:47 imirkin: grep . /sys/class/drm/card*-*/status
20:47 imirkin: (pastebin output of that)
20:49 Guest98: It was working properly in kernel version 4.9 but after my upgrade to 4.14 it works for power management still. But while on AC the card is always on. But on 4.9 it turns off when not in use. What is run-pm? I don't have the phantom VGA problem and pastebin is here https://pastebin.com/h6LiFCBR
20:49 imirkin: card1-VGA-1
20:49 imirkin: is that a physical VGA port?
20:49 imirkin: i.e. do you have an actual VGA connector somewhere on your laptop?
20:50 imirkin: the idea is that the nvidia gpu auto-suspends when it's not in use
20:50 Guest98: No I don't have a vga port. All I have is the eDP-1 which is connected to my monitor and the intel card. But HDMI1-3 are listed in the pastebin for what is physically there..
20:51 imirkin: however sometimes they leave a VGA port "running" that's not pinned out
20:51 imirkin: with a floating hpd voltage or something
20:51 imirkin: which keeps getting nouveau to check if there's a cable plugged in
20:51 imirkin: if you boot with video=VGA-1:d that should make the VGA connector disappear
20:52 imirkin: i suspect, but am unsure, that this will cause nouveau to be able to runpm-suspend the gpu
20:52 imirkin: (i.e. it should say DynOff in the vgaswitcheroo status file)
20:53 Guest98: Does this have anything to do with my issue not being in kernel 4.9? I had the same output via xrandr for 4.9 as that command you gave me but powermanagement worked properly then. Currently with AC power on it is always in a state of DynPwr which always keeps the card on. I can try disabling that VGA port and see what happens though.
20:54 imirkin: hmmmmm
20:54 imirkin: well what can also happen
20:54 imirkin: so the thing is, it's not like there's any way to power off the gpu
20:54 imirkin: it's a completely platform-controlled thing
20:54 imirkin: in your case, ACPI
20:55 imirkin: so we call some acpi method, which is supposed to power off the gpu
20:55 imirkin: however, if that acpi method decides to check if power is plugged in and not power the gpu off in that case
20:55 imirkin: then there's not a whole lot we can do about it
20:56 imirkin: oh also
20:56 Guest98: How is it that I can manually call this method and bypass ACPI? I upgraded my BIOS in between 4.9 and 4.14 so that's probably what did this.
20:56 imirkin: some kernel, maybe it *was* 4.10 switched something related to all this
20:56 imirkin: hold on...
20:56 imirkin: it switched from using the optimus stuff to something else
20:56 imirkin: Lekensteyn: do you remember how to test that stuff out?
20:56 imirkin: Guest98: i have no idea how you're calling it manually
21:05 Guest98: I am going to go disable that vga port and see if that fixes it.
21:09 imirkin: another thing to try is pcie_port_pm=off
21:10 imirkin: however all that stuff was already settled by kernel 4.9
21:10 imirkin: the other thing that changed in 4.10 was that atomic modesetting was implemented. but i can't imagine how it would affect your situation.
21:13 Guest98: I will try that pcie_port thing and then be back.
21:15 mooch: does anybody know where mwk ran off to?
21:15 imirkin: "gone drinkin'"? dunno.
21:16 mooch: i haven't been able to contact him in days, about getting his pgraph tests fixed
21:16 mwk: sorta drinking right now
21:16 mwk: happy nouveau year by the way
21:17 annadane: i see what you did there
21:18 Guest98: That didn't work either.
21:35 Manoa: yhe happy new year, and thank ilia for help on the work on r600, I have two of this cards, I didn't know you were involved in radeon development as well
21:36 imirkin: i didn't know i was either
21:37 Manoa: xD
21:37 imirkin: (i am?)
21:37 imirkin: (oh, the IEEE vs not-IEEE stuff?)
21:37 Manoa: dave airlie put your name on the commit of add ARB_shader_storage_buffer_object support (v3)
21:38 imirkin: oh. that's probably more like "no-thanks-to' than anything else
21:38 imirkin: i kept pointing out problems in his approaches :)
21:40 Manoa: I do have a question though
21:40 Manoa: you think fermi has a chance to be functional in the end ?
21:41 imirkin: yeah. ben just posted patches to implement reclocking on it.
21:41 Manoa: I have sutch a big fermi card, tried quite a few things to get it going, didn't realy have a chance
21:41 Manoa: that's nice
21:42 imirkin: which one? GTX 580?
21:42 Manoa: yhe
21:42 imirkin: cool
21:42 imirkin: well you can give it a shot -- https://github.com/skeggsb/nouveau/commits/devel-clk
21:42 Manoa: I suppose we are talking about latest kernel and latest mesa
21:42 imirkin: you'll need a fairly recent kerenl, check out that tree and do "cd drm; make -j8"
21:42 imirkin: mesa isn't relevant to it
21:43 imirkin: (fairly recent = 4.15-rc i suspect)
21:43 Manoa: ha ! only a week ago ! just in time for chrismas !
21:43 imirkin: iirc he said it worked on all his boards
21:44 imirkin: but his selection is hardly complete
21:44 imirkin: (there's probably over 1k different boards out there)
21:44 Manoa: isn't fermi the only architecture that has a hardware draw call queue ?
21:44 Manoa: in terms of nvidia, I mean
21:44 mooch2: mwk, ah hey
21:45 mooch2: i need you to fix your nv3 tests
21:45 mooch2: they don't work on my nv3t lol
22:01 imirkin: mooch2: sounds like you'd be the one well-placed to fix them
22:04 mooch2: i don't know how
22:04 mooch2: i have no idea dafuq's going on with them
22:04 mwk: hm
22:04 mwk: I'll look at it
22:04 mooch2: like, the results are pseudo-random too
22:04 mwk: I might've broken something when doing newer cards
22:04 mwk: uh
22:04 mwk: so are results non-deterministic?
22:04 mooch2: yes
22:04 mwk: that might be hardware problems
22:04 mwk: which tests are failing?
22:04 mooch2: weird because this card's svga parts work fine
22:04 mooch2: scan tests and vtx tests
22:05 mwk: huh.
22:05 mooch2: also state tests
22:05 mwk: that's very strange
22:05 mwk: I'll try on my NV3T
22:05 mwk: when I sober up
22:05 mwk: hm
22:05 mwk: actually fuck that, my NV3 isn't drunk
22:06 mooch2: i'm currently sshd into my nv3t testing rig, so lemme know if you've got a clue, and i can modify and run the tests as necessary
22:06 mooch2: would you also like my log files?
22:07 mwk: they surely won't hurt
22:07 mooch2: btw, mwk, most of the failures seem to be centralized around VTX_X and VTX_Y
22:09 mooch2: mwk: i'm also having strange state test failures in the nv3 pfifo tests i wrote
22:09 mooch2: only in the cache1_addr and cache1_data arrays tho
22:09 mwk:testing on NV3
22:12 mwk: hmm
22:12 mwk: well fuck
22:12 mwk: something's failing
22:12 mooch2: da heck is failing for you?
22:13 mwk: something about interrupt checking, I think
22:13 mwk: state tests seem to be passing
22:13 mwk: method tests, not so much
22:15 mwk: oh
22:15 mwk: invalid method tests
22:15 mwk: meh
22:16 mooch2: state...
22:16 mooch2: Difference in reg VTX_X[0]: expected dc561334 real dc561330
22:16 mooch2: Difference in reg VTX_X[1]: expected fffffffc real fffffff8
22:16 mooch2: Difference in reg VTX_X[8]: expected ff1550aa real ff1550a8
22:16 mooch2: Difference in reg VTX_X[11]: expected ffffffff real fffffff9
22:16 mooch2: Difference in reg VTX_X[13]: expected fffffffe real fffffffa
22:16 mooch2: Difference in reg VTX_X[27]: expected 000019bd real 000019b9
22:17 mooch2: oop
22:17 mwk: hm
22:17 mwk: I don't see that
22:17 mooch2: that's in my logs
22:17 mooch2: something like that always appears
22:18 mwk: but that does sound worryingly like hw failure
22:18 mwk: esp if it's non-deterministic
22:18 mooch2: then why does the svga part work fine?
22:18 mooch2: i don't get it
22:19 mwk: it's less sensitive to bitflips, perhaps?
22:19 mwk: no, seriously
22:19 mwk: when you run the tests
22:19 mwk: are the diffs the same every time?
22:19 mwk: or not?
22:20 mooch2: mwk, the state failures seem to be the same every time
22:20 mwk: hrm
22:20 mwk: then please paste the full log somewhere
22:21 mooch2: before, there were no scan failures in vtx_x and vtx_y, but now every single vtx_x reg fails
22:21 mwk: also, please show nvalist output
22:21 mooch2: mwk, this is nvalist 0: (pci) 0000:01:02.0 NV3T 20030121
22:21 mwk: huh
22:22 mooch2: also, i can't get this log file off my computer
22:22 mooch2: i can't paste between the ssh session and the host
22:22 mwk: that's a different nv3t than I have
22:22 mooch2: or at least, i can't paste files
22:22 mwk: mine is 00030122
22:22 mooch2: mwk, different in which way?
22:22 mooch2: da heck is that last number?
22:23 mwk: last is unimportant probably
22:23 mwk: but the first is the foundry
22:24 Manoa: NV3? the 5800 Ultra mooch2 ?
22:24 mooch2: Manoa, no, riva 128
22:24 mooch2: nv03
22:24 Manoa: oh ok
22:25 mooch2: mwk, maybe different foundries had different hw bugs?
22:25 mwk: *shrug* might be
22:26 mooch2: and, foundry 2 isn't documented in my docs
22:26 mooch2: foundry 0 is sgs, foundry 1 is helios
22:26 mooch2: dafuq is foundry 2
22:27 mwk: nfi
22:27 mwk: tsmc?
22:28 imirkin: listed as TSMC in rnndb, but ... who knows.
22:28 imirkin: was TSMC a thing back then?
22:28 mooch2: ooh it was tsmc
22:28 mooch2: founded in 1987
22:28 mooch2: so yes
22:29 mooch2: so i have an earlier FIB revision
22:32 mooch2: mwk, any progress yet?
22:33 mooch2: or do i have to send you my nv3t for analysis? :P
22:37 mwk: mooch2: I'm kinda busy drinking with my CTF team :p
22:37 mwk: but I'll look at it in a few days
22:37 mooch: aw okay
22:59 mwk: so there it goes
23:00 mwk: HAPPY NOUVEAU YEAR!