00:08orbea: I (and others) are getting segfaults (Sometimes it just locks up requiring dolphin to be killed via ssh) after around 20-30s of running dolphin-emu with asynchronous ubershaders. The backtrace seems to indicate its a nouveau issue, is this right? https://pastebin.com/YUcaH3ie Is this right?
00:09orbea: i tried mesa commits back to 2017 without finding a working commit, but I do think this worked at some point....
01:40imirkin: orbea: hmmm a null bo? yeah, that feels like async is doing more than compiling shaders
01:40imirkin: probably also doing some "test draws", which mess everything up
01:40imirkin: why on earth is it calling glFinish?
01:42HdkR: imirkin: "If we don't do this, some driver can lock up (e.g. AMD)"
01:42HdkR: Is the comment in the source
01:44HdkR: It also binds the vertex attribute that was set at that point in to time work around the vertex attribute optimization in the Nvidia blob
01:44imirkin: HdkR: and if you do, you lock up other drivers, like nouveau ;)
01:45HdkR: at that point in time to work around*
01:57orbea: so is this something that should be fixed in mesa or dolphin?
01:58imirkin: i mean, nouveau should deal with multiple GL contexts
01:58nyef: Longstanding known issue in nouveau, you might as well work around it in dolphin for the time being?
01:58imirkin: but it doesn't
01:58orbea: ah, i see
02:00orbea: can just not use asynchronous ubershaders I guess...
02:11HdkR: orbea: Dolphin can implement a bug in their DriverDetails system to avoid the VAO binding and glFinish when Nouveau is being used
02:13orbea: cool to know, guess that should be passed along to Stenzek?
02:13HdkR: Would only take a few minutes to write the code
07:48MaximLevitsky: I have the 1080Ti here, and I wonder how well nouveau should support it. I bought it for gaming mostly and I use it for pci pass through to a VM, but still use it in the host sometimes
07:49MaximLevitsky: The blob gets on my nerves....
07:50gnarface: MaximLevitsky: last i heard it still can't reclock the ram so it's basically crippled for gaming under nouveau. no sign of nvidia forking over the encryption key to unlock it, either. for gaming the binary blob is unfortunately the only serious choice on that card.
07:50gnarface: i hear the 780 Ti's work great though ...
07:50MichaelLong: MaximLevitsky, same here, I only use it for pass-through right now
07:50MaximLevitsky: I know... but if I extract the reclocking firmware from the blob, it would work?
07:54MichaelLong: MaximLevitsky, no it is not that easy AFAIK, the firmware would only allow you to do more sophisticated function but the clocking itself needs to be done in the driver, it is parly possible with way older cards right now.
07:57MaximLevitsky: I understand. Fucking nvidia. My next card will be vega for sure
07:58MichaelLong: mine will still be a nvidia despite all that
08:00MaximLevitsky: When I bought my card, vega wasn't out yet, and I wanted something that works well for GPGPU for doing serious work in rendering for my girlfriend. We don't use this much, so no more nvidia.
08:00MichaelLong: I'm following the topic lightly, and using recent amd cards in a vfio-setup is hit and miss, plus the cards are not competitive
08:00karolherbst: MaximLevitsky: no reclocking on those GPUs sadly
08:01karolherbst: everything is kind of locked down there, so it requires signed firmware
08:01MaximLevitsky: MichaelLong: that what I am afraid too, but I give it a try. It improves over time
08:01annadane: i am avoiding nvidia in future as well
08:01MichaelLong: MaximLevitsky, nah it doesn't really
08:01annadane: wish i had asked about this before buying my current computer
08:01MichaelLong: cards that are known to have reset problem will have them. it is not the devs priority to fix that.
08:02MaximLevitsky: MichaelLong: Exactly what I was thinking about. I think they did improve something about that reset bug
08:03MaximLevitsky: Anyway, what a sad world we live at it, with all these locks. I almost wish that encryption wasn't possible despite all of its benefits.
08:03MichaelLong: btw. the 1080 runs nicely with noveau in a pass-through-setup :)
08:04MaximLevitsky: nouveau in the guest?
08:04MaximLevitsky: Good to know. So far didn't need a Linux guest here.
08:04MaximLevitsky: *I didn't
08:05MaximLevitsky: Might need soon. I joined redhat last week (the KVM team) :-)
08:05MichaelLong: yeah it is more or less a test system with the same specs as my "gaming-vm".
08:05MichaelLong: MaximLevitsky, ah nice :)
08:05annadane: incidentally, do people prefer AMD or Intel for ethical reasons? apparently they're about the same
08:06annadane: and functionality with linux etc
08:06MaximLevitsky: AMD is a *bit* better
08:06MaximLevitsky: Intel has SGX and they don't SGX = DRM on steroids
08:07annadane: i guess i'll buy a stock ryzen, or whatever
08:07MaximLevitsky: DRM for running software black boxes, and it actually has a chance to be unbreakable.
08:07annadane: they do or don't?
08:07MaximLevitsky: *Assuming that they fix all their spectre/meltdown/etc bugs.
08:11MaximLevitsky: annadane: AMD doesn't have SGX, but they do have a 'secure coprocessor' and they already started using it
08:11annadane: AFAIK the PSP is not as bad as the IME
08:12MaximLevitsky: I bet that it will be eventually....
08:12MaximLevitsky: The point it that they discovered that software DRM is actually theoretically possible. Its done by making the CPU not trust YOU with its own private keys.
08:13MaximLevitsky: So unless you open the silicon, its game over.
08:13annadane: it's stupid. i hope RISC-V takes off
08:14MaximLevitsky: I too, I am praying for this to happen.
08:15karolherbst: RISC-V doesn't protect against such things
08:15karolherbst: it isn't open hardware, just the ISA is kind of open
08:15karolherbst: more or less
08:15MaximLevitsky: Yes, but it adds some competition to the scene
08:16karolherbst: nvidia plans to use risc-v for the co-processors on the GPUs
08:16MaximLevitsky: Indeed :-(
08:17karolherbst: well, makes it easier for us
08:17HdkR: ARM's one day smear campaign on SPIR-V was great
08:17karolherbst: so we can write firmware for non secure ones easily
08:17karolherbst: well, they weren't wrong though
08:17HdkR: True, almost all the points were valid
08:18karolherbst: which point wasn't?
08:18MaximLevitsky: To be honest the only real solution to that is when someone, someone like Linus Torvalds comes up with a way to create an inexpensive silicon process so that we as a community could design our own chips.
08:18HdkR: The one I don't remember and now the site is gone so woop
08:18karolherbst: HdkR: :D
08:18MaximLevitsky: Otherwise the silicon world is closed and full of such shit
08:19MaximLevitsky: In fact even when running Linux these days, the system is barely open source, due to so much firmware running everywhere.
08:19karolherbst: but having an open ISA helps a lot
08:19karolherbst: because you don't have to care about the tooling
08:20MaximLevitsky: I understand that firmware is needed, and it can be closed source, but the assumption always was that firmware is relatively simple and dumb
08:20HdkR: karolherbst: How much time is wasted documenting the ISA each time it changes on Nvidia do you think?
08:20karolherbst: less time than making it work with the older one
08:20MaximLevitsky: These days each firmware is full blown OS......
08:20karolherbst: MaximLevitsky: *toy OS
08:21karolherbst: at least the linux one is based on a toy OS
08:21karolherbst: intel one
08:21MaximLevitsky: I understand what you mean :D
08:21karolherbst: no, literally
08:21karolherbst: but yeah
08:21MaximLevitsky: Minux is for sure a toy OS :D
08:22karolherbst: allthough v3 was made for embedded systems...
08:23HdkR: karolherbst: er, I mean. Volta - New ISA, have to redocument all the crap. Maxwell - New Isa, had to redocument....Kepler? Or was it Fermi..?
08:23karolherbst: HdkR: kepler also got a new one
08:23HdkR: lol, derp
08:23MaximLevitsky: my name at redhat is mlevtsk, so I soon will use it here too
08:24HdkR: It's such a waste of time running the cuda compiler through its paces to get the isa documentation
08:24karolherbst: HdkR: oh well
08:24karolherbst: HdkR: intel is changing their internal ISA as well
08:24HdkR: That'll be fun to read the documentation from 01.org :P
08:24karolherbst: they just have the disadvantage that they have to add a compatibiliy layer for x86
08:25karolherbst: it's secret
08:25HdkR: Oh, the uarch on the CPU you mean?
08:25MaximLevitsky: The funny thing is that when I worked at Intel, my name was mlevtsky (They forgot the 'i'), and here due to 8 char limit, I lost 'y' at the end of my name :-)
08:25HdkR: That's not public facing so it is something I don't really care about :P
08:25karolherbst: MaximLevitsky: when did you join?
08:25HdkR: If the Gen ISA changes that'll be fun to read about
08:26MaximLevitsky: Last Tuesday
08:26karolherbst: MaximLevitsky: ahh
08:26MaximLevitsky: Actually Monday
08:26MaximLevitsky: kvm team
08:27HdkR: That sounds like it'll be a team with unique challenges
08:27MaximLevitsky: I will be working on improving support for virtualization of fast storage
08:27MaximLevitsky: HdkR: that for sure.
08:28karolherbst: yeah, that kind of sounds usefull
08:28karolherbst: never really used kvm with storage backed by a SSD
08:28karolherbst: or do you mean fast storage as in _super_ fast storage?
08:29MaximLevitsky: karolherbst: more like super fast storage. the SSD stuff of course
08:29karolherbst: like 10x nvme SSD Raid0 setups
08:29HdkR: PCIe storage is great
08:29karolherbst: yeah, multi queue + super fast
08:29MaximLevitsky: I guess so, we focus on NVMe of course
08:29karolherbst: ohh, I see
08:29MaximLevitsky: karolherbst: exactly
08:30karolherbst: some days ago somebody was complaining about getting "1 Mbit/s" as overall system read, I was like: yeah well, 100 IOPS, what do you expect on a HDD :D
08:31karolherbst: anyway, that nvme stuff is already quite close on hitting the pcie v3 x4 bandwidth, which is kind of insane
08:32HdkR: We just need PCIe v4/v5 to fix that issue ;)
08:32HdkR: Sadly desktop x86 chips don't ship that yet
08:32karolherbst: HdkR: no...
08:32karolherbst: v4/v5 won't help
08:32karolherbst: v4 is just double of speed
08:33karolherbst: you get nearly 4GB/s with v3, 8GB/s with v4
08:33karolherbst: but the SSDs are already hitting the 4GB/s mark
08:33HdkR: Isn't the upperend of NVMe SSDs hitting the upper ends on reads?
08:33karolherbst: so, we get v4, and one year later we have 8GB/s SSDs
08:34karolherbst: I doubt that the M.2 interface will be able to use more than x4, maybe this could be somehow bumped with a new interface
08:34karolherbst: x16 v4 gives nearly 32GB/s which is a bit harder to reach
08:34HdkR: If you key it differently you might be able to fit x8 on to it
08:35karolherbst: I think in the end we might get a completely new interface
08:35karolherbst: which works more like RAM
08:35karolherbst: and the CPU just states a top speed
08:35karolherbst: but I guess this would require to many sw changes
08:35HdkR: You mean like the Intel Optane Dimms but is actually storage rather than RAM? :)
08:36MaximLevitsky: Will reboot my system and will debug some stuff. Bye for now!
08:36karolherbst: HdkR: optane is storage
08:36karolherbst: they are basically simply enterprise level SSDs
08:37HdkR: Optane in DDR4 form factor :D
08:37karolherbst: yeah well
08:37karolherbst: that might do
08:37HdkR: Stick a boot drive in your DIMM slot
08:37karolherbst: how fast are those?
08:38karolherbst: ohh, not disclosed
08:38HdkR: No information on them yet
08:39MaximLevitsky: I still don't know if the optane is a new kind of memory or it is DRAM + Flash backing + battery in the same package
08:39MaximLevitsky: Any idea?
08:40karolherbst: they just have some special memory chips
08:40HdkR: XPoint is a type of persistent storage, kind of like NAND
08:40karolherbst: and they aren't that great either
08:41karolherbst: you already get consumer rate hardware with 500k IOPS
08:41karolherbst: for much cheaper
08:41HdkR: Doesn't have quite the performance numbers that Samsung gets, supposed to be quite a bit cheaper though
08:42karolherbst: no, the optane ones are frigging expensive
08:42karolherbst: 1.5 K$ for 375GB
08:42karolherbst: I don't really know where they are better
08:46MaximLevitsky: Thats funny...
08:46MaximLevitsky: I like that 'floptane':-)
08:46MaximLevitsky: But I hope that it does takes of
08:46MaximLevitsky: *take off
08:50karolherbst: why though?
08:50karolherbst: overpriced SSDs?
08:53HdkR: Oh wait, it was meant to cost more than traditional NAND but less than RAM. Right
08:56karolherbst: HdkR: while being faster than NAND SSDs
08:56karolherbst: I doubt they will be able to hold up to that
08:56karolherbst: maybe the DDR4 optanes will be much faster
08:57HdkR: Seems like they have super fast latency, but throughput isn't winning anything with the current released products
08:57karolherbst: well, IOPS aren't that high either
08:59HdkR: Yea. wonder what the bottleneck is there.
08:59karolherbst: HdkR: the dimm optanes have to be faster than 10GB/s to stay relevant
08:59karolherbst: or well, to become relevant
09:00HdkR: They have to compete with the highend PCIe SSDs
09:00HdkR: not the M.2 drives stuck on a 4x lane configuration
09:00HdkR: 8x and 16x things, their own products even
09:00karolherbst: mhh, true
09:01karolherbst: so the fastest are around 6.5 GB/s
09:01karolherbst: kingston DCP1000
09:01karolherbst: which are pcie x8 ones
09:01karolherbst: not that expensive actually
09:01karolherbst: 900€ for 800GB
09:02karolherbst: 900k IOPS read
09:03HdkR: I think there are some random non-consumer 16x SSDs around as well
09:03HdkR: No idea if they are actually products since they are outside of the consumer space
09:04karolherbst: HdkR: https://www8.hp.com/us/en/workstations/z-turbo-drive-g3.html
09:04karolherbst: so 9 GB/s
09:05HdkR: Hopefully the DIMM variants compete with that
09:05karolherbst: anyway, that doesn't change the fact that intel has to reach at least 10 GB/s ;)
09:07karolherbst: uhm wait
09:08karolherbst: DDR4 dimms aren't that much faster actually
09:08karolherbst: cpus support like 37.5 GB/s
09:08HdkR: In dual channel right?
09:09karolherbst: well and I am sure you can go up to 70 GB/s in overclocking situations with XMP
09:09HdkR: Who needs PCIe v5 when your bottlneck is RAM? :P
09:09karolherbst: fastest are 4600 MHz
09:10HdkR: Time to give CPUs 8 site HBM2
09:10karolherbst: there will be DDR5 like next year
09:11HdkR: That will be nice
11:00karolherbst: skeggsb, imirkin: any ideas? https://gist.github.com/karolherbst/b10b86747af9b21a73f27450359a8d7d
11:01karolherbst: this is on a laptop with a gm204
11:02karolherbst: display connected through an active HDMI -> DP adapter
11:08karolherbst: I also hit that "disp: 0x00006341: INIT_GENERIC_CONDITON: unknown 0x07" thing
11:08karolherbst: it sometimes kind of seems to work
11:08karolherbst: especially when the laptop display is turned on
11:09karolherbst: also I know that the cable is a little broken, but it usually works quite good
11:27karolherbst: and sometimes this happens: https://fpdl.vimeocdn.com/vimeo-prod-skyfire-std-us/01/996/11/279980046/1049345788.mp4?token=1531582025-0xbf96569990220febf23578c4c1e20089c74b0ba7
16:29karolherbst: imirkin: mind reviewing those two patches? I would like to get all the cts fixes in see how the situation is on kepler later. https://patchwork.freedesktop.org/series/45307/ and https://patchwork.freedesktop.org/series/45313/
16:39annadane: is there *any* way whatsoever to convince nvidia to open up their process, or do they still claim "because security reasons"
16:40imirkin: annadane: of course there is
16:41imirkin: just condition the purchase of a few $B of hardware on it. i'm sure it'll get their attention.
16:41imirkin: karolherbst: i'll try to get to it in the next hour or two
16:42karolherbst: annadane: patience
16:42karolherbst: imirkin: nice, thanks!
16:42karolherbst: annadane: there is "some" progress, we can just hope that progress will be faster in the future
16:43karolherbst: annadane: of course being a customer wanting to spend 1B $ on GPUs and requiring open drivers might help
16:44pendingchaos: imirkin: any idea what MEM_BARRIER does?
16:46imirkin: flushes constbuffer caches
16:46imirkin: 0x1011 is the "correct" value to put in there
16:46imirkin: [but no, not _really_ sure...]
16:54scientes: what is the cheapest card that does HEVC?
16:55imirkin: scientes: your intel GPU should be able to do it, if it's new enough
16:55scientes: i already have an old nvidia card that doesn't
16:55scientes: this is a desktop
16:56imirkin: nouveau doesn't support HEVC decoding in any case
16:56scientes: i guess i can mix-and-match nouveau and ati?
16:56scientes: oh ok
16:56imirkin: i think some GM20x's support it, but not all? not sure, tbh.
16:56imirkin: video decoding support stops at kepler for nouveau though
16:57imirkin: i think AMD gpu's should support it well (the newer ones, obviously)
16:57imirkin: and in general have a vastly better driver stack
17:02scientes: i can run a nouveau card and amd-gpu at the same time?
17:04imirkin: scientes: just not out of the same PCIe slot ;)
17:04scientes: yeah i have two x16s
17:04imirkin: they're just PCIe devices
17:04imirkin: like any other
17:04imirkin: one will be the primary for VGA and such
17:04scientes: i just heard that fglrx and nvidia-binary conflicted before
17:05imirkin: oh, that may be.
17:05scientes: I plan on vga forward with kvm
17:05imirkin: but nouveau + amdgpu should be no trouble
17:05imirkin: well, kvm is later
17:05imirkin: at boot time, for VGA access, the PCI bus can only forward those to a single device
17:06imirkin: this tends to become your "primary" GPU, although the only thing that's special is that it receives VGA io port reads/writes
17:07imirkin: (and probably some vga memory mapping stuff too)
21:37pendingchaos: imirkin: am I correct in thinking https://github.com/mesa3d/mesa/blob/master/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c#L583 is an optimization to avoid setting up the binding?
21:54imirkin: glUniform* sets up uniforms in the "default" uniform buffer
21:54imirkin: on the mesa side, it's just kept in user memory
21:54imirkin: and is passed to gallium as a "user" constbuf
21:55imirkin: that means that on every draw we have to upload the data
21:55imirkin: but also we keep track of the previous max size, since that's something that has to be associated with the binding
21:56imirkin: and update it if it changes
21:58pendingchaos: can you elaborate on why it needs to keep track of the previous max size?
21:58imirkin: the length of the constbuf is part of the binding
21:59imirkin: to avoid updating it every time, we keep track of it
22:00pendingchaos: so doing "uniform_buffer_bound[s] < size" instead of just "true" is an optimization to avoid needlessly setting up the binding?
22:10nyef: It's "if (!sufficient_space(...)) allocate_more_space(...);", isn't it?
22:11pendingchaos: I think so
22:13nyef: It's probably also one of those things that I'd paper over with a separate (possibly static inline) function.
22:14nyef: Might even make it an "ensure" function, wrapping the entire thing, and putting the sufficient-space test as a guard-clause.
22:16imirkin: pendingchaos: well, needlessly *changing* the binding