02:19 gnurou: RSpliet: I am currently struggling to get the Pascal ones loading in Nouveau (because much has changed against Maxwell). Once this is done, expect a release, but no acceleration support
02:52 mooch2: mwk: can you please make some docs on nv1's display engine? particularly the partial vga compatibility would be nice, so that i can start implementing nv1 in 86box
03:50 mooch2: i've got a nice 1498 mhz clock on my geforce 750ti
04:31 Satchelboi_: That's a really good clock for that card
04:36 mooch2: well, i had to turn it down for furmark
04:36 mooch2: but for dolphin, it's great
05:48 mwk: mooch2: the vga compatibility part is horrible
05:48 mwk: the only part that's even remotely sane is the dac
05:50 mwk: huh, the palette part is actually documented
05:50 mwk: I forgot I did that
05:55 mwk: mooch2: ok, I'll try to throw in some documentation about DAC pixel formats and PFB operation soonish
05:57 mwk: VGA compat is a much more complex matter though
05:58 mwk: there are 3 parts to it
05:59 mwk: one is the palette register emulation... this stuff goes straight to DAC and is documented
06:00 mwk: second is the VGA memory window handling, which probably works like a normal VGA, except for the weirdo access windows at b1800+ in physical memory
06:00 mwk: ideally, I'd slap a hwtest on it
06:01 mwk: and third is the text mode emulation, which sort-of implements text mode, but is entirely unlike any actual VGA card
06:02 mwk: and batshit crazy
06:03 mwk: and only the DAC part is actually part of the display subsystem, the other two parts just write stuff to memory that PFB will later read
06:22 mooch: ah
06:23 mooch: mwk: didn't you say that some of the crtc regs work?
07:03 mwk: mooch: a few of them, yes
07:03 mwk: the ones that are used by the text mode emulation
07:04 mwk: + some ATC, SEQ, GC ones as well
07:04 mwk: all GC ones even
09:23 RSpliet: gnurou: ok thanks... no accel support sounds a bit odd, is that because of work on the mesa side or because of limitations in the distributed fw?
09:24 gnurou: RSpliet: fw limitations. And I am sad to say I do not expect the situation to improve any time soon
09:27 karolherbst: pascal fw?
09:27 gnurou: yup
09:27 karolherbst: I see
09:28 karolherbst: "any time soon" means half a year or more?
09:28 gnurou: err hold on, I made a typo
09:28 gnurou: s/acceleration/reclocking
09:28 gnurou:is fighting against flu
09:28 karolherbst: well okay
09:28 karolherbst: this is no big deal for pascal
09:28 karolherbst: because
09:28 karolherbst: we support like nothing in pascal right now
09:29 karolherbst: most of the P vbios tables are different
09:29 gnurou: so of course fw will enable basic acceleration, but no reclocking
09:29 karolherbst: maxwell PMU images are a bit more important right now
09:29 karolherbst: important as in, we need them asap
09:29 RSpliet: gnurou: ah okay that's no shocker
09:29 gnurou: by basic acceleration, I mean GR will be enabled
09:29 RSpliet: not good, shame on your managers, but having GR enabled is first priority now
09:30 gnurou: and that's what I meant to say, is that there are no Maxwell PMU releases on the horizon
09:30 karolherbst: gnurou: so most likely no images in 2017?
09:30 gnurou: ... if at all!
09:31 karolherbst: okay
09:31 karolherbst: that's enough information for me
09:32 karolherbst: to be honest, if we don't get images from nvidia for the desktop GPUs, I don't see a point in supporting nvidias images for the tegras then
09:32 gnurou: which is exactly what I am arguing internally
09:32 karolherbst: yeah
09:32 gnurou: I am not taking sides. I will submit code and images, please do according do your conscience
09:32 karolherbst: you would have to add and support the changes to nouveau code
09:33 karolherbst: I am not saying we wouldn't include it, but we won't code it
09:33 karolherbst: most likely
09:33 gnurou: not that NVIDIA is expecting the community to manage Tegra support. At least we have always submitted code for it
09:33 karolherbst: right
09:34 gnurou: but not supporting dGPU at the same level is a missed opportunity IMHO
09:34 karolherbst: but getting the PMU done in a good way expects some changes to the nouveau code as well
09:34 gnurou: yeah, and not light ones
09:34 karolherbst: originally we wanted to have the same interface for nvidias and nouveaus pmu image
09:34 karolherbst: and completly rewrite our pmu code
09:34 karolherbst: so that the interfaces matches
09:35 gnurou: note that these changes will also be required for Pascal FW as well, since the falcon reset code has moved into SEC2
09:35 gnurou: so it won't be totally Tegra-centric
09:35 gnurou: far from it actually
09:36 karolherbst: it isn't about secboot really, I am more talking about the host <-> PMU communication
09:36 gnurou: yeah - and that part will be needed in Pascal to enable GR
09:36 gnurou: secboot will prepare the SEC2 falcon, and SEC2 will boot FECS and GPCCS upon receiving a given message
09:36 karolherbst: the easy way would be: nvidia releases documentation about the pmu interface, and we adjust our pmu images to that, so we have the same
09:37 karolherbst: I see
09:37 gnurou: that's the funny thing, PMU images and messages format change all the time
09:37 karolherbst: splendid
09:37 gnurou: internally, firmware is bound to RM and they evolve in lockstep, so this is not a problem
09:37 karolherbst: for nouveau as well
09:37 karolherbst: kind of
09:37 gnurou: but yeah - I am already supporting 5 different RM versions in my code
09:38 gnurou: the changes are small if the code is architectured to handle this, but still
09:38 karolherbst: falcon images are shiped inside the kernel
09:38 gnurou: right
09:38 karolherbst: yeah,
09:38 karolherbst: it's messy
09:38 gnurou: and my calls to standardize this a bit are... well you can imagine
09:38 karolherbst: :D
09:39 karolherbst: is the interface different for chipsets within the same release?
09:39 gnurou: no, thank the Gods of Poor Software Engineering
09:39 karolherbst: okay, so within one nvidia release, every chipset has the same pmu interface
09:39 karolherbst: okay
09:39 karolherbst: this is sane enough then
09:39 gnurou: yes
09:40 gnurou: well
09:40 karolherbst: depends on how much the interface changes between releases
09:40 karolherbst: but
09:40 gnurou: considering that the GM20B FW comes from r352
09:40 gnurou: other Maxwells from r361
09:40 gnurou: and Pascal will likely come from r367
09:40 karolherbst: yeah, but how much does the actual message interfaces differ
09:40 karolherbst: if there is a new "method" okay who cares
09:40 gnurou: not that much, and I managed to confine the differences into small source files
09:41 gnurou: but on some versions you have a different number of queues, etc
09:41 gnurou: and the way to do ACR changes as well
09:41 karolherbst: okay, so it would be a piece of cake fo make changelogs
09:41 karolherbst: *to
09:41 gnurou: yeah, explaining differences is not too difficult
09:41 gnurou: still a PITA
09:41 karolherbst: well true
09:41 karolherbst: but in the end, it would take like 1 hour per mayor release?
09:42 karolherbst: just to write it down I mean
09:42 gnurou: less than that?
09:42 karolherbst: I was pessimistic
09:42 karolherbst: okay, so this is no big deal. Only to come up with the first draft of the interface I figure
09:44 gnurou: yeah, I will send everything as soon as I get the Pascal FW to run...
09:44 gnurou: which is also a PITA
09:44 karolherbst: sure
09:44 karolherbst: are there plans to do updates of the already released firmwares so that they match the same version?
09:45 karolherbst: but I guess this opens another can of worms
09:45 karolherbst: external firmwares are always crappy
09:53 RSpliet: at least post-pascal we can start disassembling firmware with llvm or gcc
09:53 karolherbst: I doubt that volta will get the new ISA already
09:54 RSpliet: Why? NVIDIA has already presented their own RISC-V processor design
09:54 karolherbst: and even that, compiling our own stuff is somehow more important
09:54 RSpliet: karolherbst: http://lists.llvm.org/pipermail/llvm-dev/2016-August/103748.html
09:54 karolherbst: RSpliet: and? Does it mean it will be shiped with the next gen already?
09:54 RSpliet: karolherbst: it means they are further with developing the core than you might think
09:55 RSpliet: there's no guarantee of course, but it's not an open source project where you need to present early in order to attract more developers
09:55 karolherbst: I know
09:55 karolherbst: I still doubt volta will have those
09:55 RSpliet: it's a closed source project that you don't present until the last moment to keep your competition in the dark ;-)
09:55 RSpliet: time will tell
09:56 RSpliet: if it does, we only need to figure out the brownfield extensions they did for vdec and the likes
09:56 karolherbst: what good is this if we can't use our own stuff anyway
09:56 RSpliet: opcode encoding schemes and a large body of scalar ops are documented and implemented in toolchains
09:57 RSpliet: it eases reverse engineering quite a bit
09:57 RSpliet: handy if you're genuinely curious about how the hardware works
09:57 karolherbst: well true, but you know..
09:58 karolherbst: Okay sure, maybe in 10 years you can actually brute force those keys and deploy your own images or so
09:59 RSpliet: I like your practical mindset, but nouveau for me has also just been highly educational :-)
09:59 karolherbst: :D true
10:03 karolherbst: gnurou: one think I was thinking a bit about: would it be possible to get a small signed LS image which really only contains the fan control, which also just returns back to the unsigned callee and call this image from unsigned code?
10:04 gnurou: karolherbst: technically, yes... practically, probably won't happen :(
10:05 karolherbst: well, you could also give us a imagine where we put the reg we want to read/write into a register call the signed function and do our stuff ;)
10:05 gnurou: and regarding updates to match newer versions: that would not be very useful, since nouveau.ko would still have to update older firmwares anyway to avoid breaking user-space
10:05 karolherbst: mhhh
10:05 karolherbst: we really would like to rather code our own stuff
10:05 gnurou: I would like that too
10:06 karolherbst: but I guess there is an internal reason for those images and this wouldn't really comply to it
14:39 RSpliet: karolherbst: I think there's only one viable route to getting that sorted
14:40 RSpliet: and that is by scandalously outpace the blob on Kepler and 1st gen Maxwell
14:40 RSpliet: *outpacing
14:42 RSpliet: should be quite achievable for DirectX 9 games...
15:34 NanoSector: how's GTX 950M supported on Linux nowadays, does Bumblebee support it?
15:36 imirkin: my notes suggest that's a GM107...
15:37 NanoSector: which is bad?
15:37 imirkin: should be generally fine with nouveau. occasional rendering artifacts.
15:37 NanoSector: but no reclocking right?
15:37 imirkin: with 4.10, should reclock
15:37 NanoSector: :o
15:37 NanoSector: time to try, then
15:38 imirkin: iirc there are weirdo artifacts in unigine valley. never diagnosed. they appear to be random, unfortunately.
15:38 imirkin: which probably means we're not initializing something, or not flushing something, or who knows
15:39 NanoSector: yeah. my GPU being unsupported was my main concern for not moving to Linux
15:39 imirkin: with updated mesa, you should get basically the same level of feature support as on kepler
15:39 imirkin: just a little buggier.
15:40 NanoSector: nice
15:41 NanoSector: time to ditch windows this weekend then
15:42 NanoSector: does nouveau give better optimus performance than bumblebee nowadays btw? :x the latter was really abysmal with my Kepler, often being slower than intel graphics
15:42 imirkin: sounds like that was most likely due to you not reclocking the kepler gpu
15:42 NanoSector: provided you reclock
15:42 imirkin: or due to you using glxgears as a measure of performance
15:43 imirkin: whereas in such a case it's a measure of pcie bus bandwidth
15:43 NanoSector: no no I mean if nouveau is faster than nvidia + bumblebee nowadays
15:43 imirkin: highly unlikely.
15:44 NanoSector: i see
15:44 imirkin: nvidia has a 20-50% lead over nouveau. however worse the bumblebee approach is, i doubt it's that much worse.
15:45 NanoSector: what's the bottleneck for nouveau actually, mesa or the kernel driver?
15:45 imirkin: by bottleneck you mean "where the improvements have to happen"?
15:45 imirkin: if so, it's in mesa
15:46 imirkin: from a strict data flow analysis, with nouveau the bottleneck is the gpu
15:46 imirkin: nvidia is able to make the gpu do things faster :)
15:46 NanoSector: how?
15:46 NanoSector: or do you mean incomplete reclocking support?
15:46 imirkin: by reading the docs the hw engineers provided?
15:46 imirkin: and then acting on that documentation
15:47 NanoSector: ah
15:47 imirkin: i'm not talking about reclocking...
15:47 NanoSector: I always thought the proprietary drivers were just a bunch of per-game hacks to improve performance
15:47 imirkin: and there's a bunch of stuff left that nouveau could improve. notable instruction scheduling is a sorely missing feature in our compiler.
15:47 imirkin: well, i'm sure they have those too
15:47 imirkin: but that's hardly the whole thing.
15:48 NanoSector: interesting
15:48 imirkin: i suspect their data handling strategy is superior
15:48 imirkin: and they make use of various little features that we're oblivious to
15:48 NanoSector: hmm
15:49 NanoSector: maybe there's things nouveau does in software that the hardware can do?
15:49 NanoSector: as example
15:49 imirkin: no, just not driving the hw as effectively as possible
15:49 NanoSector: ah
15:49 imirkin: one thing that comes to mind is use of ZCULL (which is akin to HiZ in other GPUs)
15:49 imirkin: [nouveau doesn't use it]
15:49 imirkin: and 75 other things we don't know about.
15:50 NanoSector: :(
15:50 imirkin: nouveau is basically 2 full-time's-worth-of-engineers (maybe) without hw docs competing against a team of 100s of full-timers with hw docs. not exactly a fair competition.
15:51 NanoSector: yeah
15:51 NanoSector: therefore it's great how far you have gotten
15:51 imirkin: so we do what we can.
15:51 hakzsam: imirkin_: 2?
15:51 imirkin: hakzsam: i figure all us part-time volunteer contributors add up to a full-timer's worth of effort...
15:51 imirkin: plus ben, obviously
15:52 hakzsam: ok
15:53 imirkin: also nouveau supports a much wider array of hw than nvidia.
15:53 imirkin: riva tnt -> current are support in one form or another
15:53 imirkin: while nvidia is fermi+ for their current drivers
15:54 imirkin: gtg
15:54 NanoSector: cya
18:08 pmoreau: (Short notice: if anyone had troubles accessing the images at nouveau.pmoreau.org using Firefox >= 51.0 due to revoked certificates, this has now been solved; I have regenerated the certificates using Let's Encrypt rather than StartCom SSL.)
19:57 mooch: are there any other nvidia emulators besides 86box, mame (lol okay), xqemu, and rpcs3?
19:57 mooch: like, i need an emulator that emulates the vesa portion of the nvidia card
19:57 mwk: ... vesa portion?
19:58 mwk: you mean the extra vga regs?
19:58 mooch: yeah
19:59 mooch: literally the only ones i've found have been 86box, and spc/at
19:59 mooch: and i'm not sure spc/at is open source
20:00 mooch: nope, doesn't seem to be
20:00 mooch: the author is from belarus tho
20:04 mooch: mwk: do you know of any other emulators that emulate the extra vga regs of an nvidia card?
21:29 gregory38: hello a quick question
21:29 gregory38: does nouveau support openCL (1.2)
21:38 imirkin_: gregory38: nope
21:38 gregory38: ok thanks you
21:38 imirkin_: nouveau supports compute shaders on fermi+ though.
21:39 imirkin_: [GL compute shaders]
21:39 gregory38: Ok. But I have a full program written in openCL
21:40 gregory38: I will test it on Nvidia closed driver when I reboot
21:40 gregory38: thanks for the info
21:40 imirkin_: yeah sorry
21:41 gregory38: don't be sorry
21:42 gregory38: having a gl driver is already a huge achievement
21:42 imirkin_: unfortunately there's no path from llvm's opencl c compiler to nouveau's codegen
21:42 imirkin_: ultimately it should be a spirv-style api, but that's not piped through yet
21:44 imirkin_: someone was working on TGSI output from llvm, but that's gone incomplete from 2 separate attempts
21:44 gregory38: oh too bad
21:46 gregory38: anyway, not important, I'm pretty sure openCL is rather slow
21:46 gregory38: (at least the current implementation)
21:46 gregory38: (of my app)
21:47 imirkin_: not 100% sure that clover supports CL 1.2 either, but that should be fixable... hopefully
21:47 imirkin_: at least i think it has images now
21:47 gregory38: what is clover ?
21:48 imirkin_: state tracker exposing OpenCL (and converting it to gallium api's)
21:48 gregory38: oh ok
21:50 gregory38: by the way, you said that persistent buffer are always in GART
21:51 imirkin_: with nouveau
21:51 gregory38: so potentially you could put them in the vram
21:51 imirkin_: for non-coherent ones
21:51 gregory38: and the user (the app) can access it
21:51 gregory38: or does it requires a temporary kernel buffer
21:53 gregory38: I guess if it is only possible for non-coherent, it means there is 2 duplicated buffers. one in host and one in vram
21:53 imirkin_: tbh i haven't really thought about it
21:55 gregory38: I was curious (if for small buffers) it won't be faster to memory map the PCIe BAR in user space
21:55 gregory38: so an application can directly write into it
21:55 gregory38: instead to write in host and then read it from the GPU
21:55 imirkin_: could be
21:56 imirkin_: i haven't really investigated it
21:56 gregory38: ok.
21:57 imirkin_: it's easy to flip if you want
21:57 imirkin_: the logic's in nouveau_buffer.c
21:58 gregory38: yes files was already open
21:58 imirkin_: ;)
21:58 gregory38: need to look at your competitor ;)
21:58 gregory38: dunno what amd does
21:59 imirkin_: well, they have various restrictions
21:59 imirkin_: like they can only map 256MB of vram at a time
22:01 gregory38: but do we have more ? lspci -vv
22:01 gregory38: give smaller bar (or is it unrelated)
22:01 gregory38: Region 0: Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
22:01 gregory38: Region 1: Memory at f0000000 (64-bit, prefetchable) [size=128M]
22:01 gregory38: Region 3: Memory at f8000000 (64-bit, prefetchable) [size=32M]
22:01 gregory38: Region 5: I/O ports at e000 [size=128]
22:01 imirkin_: we can map as much as we want with some kind of craziness that i don't fully understand
22:04 gregory38: or maybe the above is the gart memory
22:10 gregory38: hum it seems AMD use GTT when GL_CLIENT_STORAGE_BIT is set (for write)
22:10 gregory38: otherwise VRAM
22:11 gregory38: with extra flags for write-combining and cpu access
22:54 imirkin_: mlankhorst: could you explain how the vram mapping stuff works for g80+? i.e. how is it that we're not constrained to bar size?
22:55 imirkin_: iirc you made it that way
23:54 snkcld: if i have optimus and am using KMS, should i see just the one "modsetting" provider in "xrandr --listproviders" ?