01:04HdkR: karolherbst: I'm going to guess you missed a comment I made a few days ago? :P
04:23rhyskidd: oh goodie, new netlist regions in ucode :)
04:23rhyskidd: incoming to an envytools near you
04:48rhyskidd: imirkin: https://github.com/envytools/firmware/pull/1
04:49imirkin: rhyskidd: how'd you like Go?
04:49rhyskidd: great if you could merge that. i'm now seeing no further unknown netlist regions
04:49rhyskidd: that script is like my third Go file
04:49rhyskidd: i like it so far, but am probably writing in too "Python" a way
04:49rhyskidd: i'm sure there's lots of cool language features i'm not making use of
04:50rhyskidd: i used it for a bloom filter before, kinda no benefits over python for bitfield hacks that i could see
04:51imirkin: except not-slow
04:51imirkin: i really like the threading model
04:51rhyskidd: well, i wasn't doing much concurrent threading
04:51rhyskidd: yeh, that's what i hear is the big plus of Go
04:52rhyskidd: and its taking off within big G for many of their "routine-three-year-api-service-rewrites" for that reason
04:57rhyskidd: how's that GT1030 going for you?
04:59imirkin: i actually have a GTX960 plugged in - have been meaning to play with DP-MST
04:59imirkin: but thus far ... haven't had a chance. this weekend will be the one!
05:07rhyskidd: this weekend will be the one! <-- motto of nouveau ...
05:07imirkin: 3 days
05:07imirkin: the odds are 50% higher
08:02karolherbst: HdkR: depends
08:03karolherbst: imirkin: ohh, for the nouveau ddx?
09:04pmoreau: Hello, I am having some issues getting an OpenGL context on my laptop (both on the MCP79 and G96); when using the modeset DDX, eglGetDisplay() fails (https://hastebin.com/gefebakefa.bash) but with the Nouveau DDX I don’t see any obvious failures (https://hastebin.com/eroninizuh.pl).
09:06pmoreau: Using `LIBGL_DEBUG=verbose` sadly does not give any more information; the error I am getting is the following: https://hastebin.com/famupawiku.js
09:07pmoreau: (And this is with Mesa=18.2.2, Linux=4.18.12, X=1.20.1, libdrm=2.4.95)
09:12karolherbst: okay... nice. I fixed running nouveau with glamor, but texturing is a bit broken
09:13karolherbst: pmoreau: you won't have much fun with the modesetting DDX anyway
09:14pmoreau: Well, if it let’s me run glxgears and other small OpenGL applications, that’s already quite good! :-D
09:14karolherbst: it will crash
09:14karolherbst: glamor creates multiple Gl contexts ;)
09:14karolherbst: but yeah, no clue why eglGetDisplay fails
09:14pmoreau: Is that a new feature?
09:14karolherbst: was there since day one afaik
09:15karolherbst: glamor + nouveau was always a bad combination afaik
09:15pmoreau: Ah, okay, never run in the issue then.
09:15pmoreau: Do you also agree that there doesn’t seem to be anything wrong in the Xorg log when using the Nouveau DDX?
09:16karolherbst: ohh uhm... are you sure mesa is getting used?
09:16pmoreau: I am not
09:16karolherbst: and did you check dmesg?
09:17karolherbst: nvidias userspace tries to load nvidia modules regardless of any blacklisting anyway
09:17karolherbst: so something might show p
09:17pmoreau: It looked sane: https://hastebin.com/qefosutozu.coffeescript
09:18karolherbst: LD_DEBUG=libs in worst case and see what libGL gets loaded
09:18karolherbst: or libEGL
09:19pmoreau: It’s using the NVIDIA libGL
09:19karolherbst: what a surprise :p
09:19pmoreau: Stupid thing!
09:19karolherbst: I am sure with lbglvnd everything will be better :p
09:20pmoreau: I’m sure NVIDIA will update their 340xx driver to add libglvnd support
09:20pmoreau: Thanks, I couldn’t remember how to show which libs were being used
09:20karolherbst: I have _no_idea_ why they didn't add version dispatching as well btw
09:21karolherbst: because, if you update nvidia packages, your only choice is to reboot :/
09:22pmoreau: Yeah... I just removed the NVIDIA driver completely, I don’t think I’ll need it on that old laptop.
09:23pmoreau: BTW, I haven’t made any progress regarding the reviewing, and most likely won’t before the 15th of October, due to paper deadline on that day; barely a week left, and so many things left to look at + write the paper.
09:24karolherbst: I see
09:24karolherbst: I guess I will rerun piglit and just push updated patches then
12:09karolherbst: Lyude: I fixed texturing: https://github.com/karolherbst/mesa/commit/8604c43c58d3935a41bcab1ad9b84c568736fef4
12:09karolherbst: now even chromium runs fine (with glamor!)
12:11karolherbst: or well. more or less
12:28karolherbst: still some minor corruptions in a few places.. but I think piglit might figure that out
15:37imirkin: karolherbst: not sure why you're proceeding on this path... it's a non-starter to have separate hw contexts per thread.
15:38karolherbst: I seriously don't see the problem with that, because it actually works and it makes applications which won't work, working.
15:38karolherbst: and has 0 impact on the common one context per application case
15:39karolherbst: sure, we could optimize and have a solution with higher performance or what not
15:39imirkin: it's the wrong way to do it
15:39imirkin: and it will slow down any application that tries to upload textures via one context, and render via another
15:40imirkin: because the right way is to do everything over one context
15:42wrl: does nouveau still not support multiple contexts in the same process?
15:42karolherbst: are there actually applications doing that? also from an API perspective you have to explicitly share resources anyway
15:42karolherbst: wrl: it does, it just doesn't work
15:42wrl: karolherbst: hah!
15:42wrl: that's the best kind of support
15:42karolherbst: well, it crashes at some point
15:42wrl: i have an audio plugin that uses GL the the UI and it breaks once more than one is open at the same time
15:43imirkin: karolherbst: yes.
15:43imirkin: wrl: nouveau supports multiple contexts just fine. it doesn't support concurrent operations on them.
15:43karolherbst: I mean, without numbers any argumentation about context switching speed is useless anyway
15:43karolherbst: as we don't know the numbers
15:44wrl: imirkin: alright cool. yeah the contexts are completely independent
15:44karolherbst: and with my patch I can actually benchmark what impact that solution would have
15:44karolherbst: wrl: doesn't matter
15:44karolherbst: wrl: concurrent operations on multiple contexts -> crash
15:44karolherbst: not on the same resource
15:45karolherbst: imirkin: also, I am not a big fan of adding synchronization points, which your plan implies
15:45imirkin: karolherbst: and your plan doesn't? hw isn't magic...
15:45karolherbst: sure it does, but not inside software
15:45imirkin: karolherbst: fewer context switches = better
15:45wrl: karolherbst: oh yeah that'd do it
15:46imirkin: a lot faster to reload a handful of parameters than to switch an entire context
15:46karolherbst: right, but I still don't agree on that one pushbuffer idea
15:46imirkin: the internal synchronization will be for very short time periods, never something like rendering
15:46karolherbst: anyway, I want to have numbers before jumping into conclusions
15:47karolherbst: maybe my idea is indeed significantly slower
15:47karolherbst: maybe it doesn't matter
15:47karolherbst: I don't know and you don't know it either
15:47imirkin: well, i know it based on what i was told by others. context switch = slow.
15:47imirkin: (others = calim or mwk, not sure)
15:47imirkin: (maybe both)
15:48karolherbst: yeah. maybe it was slow on older hardware and is reasonable fast on newer
15:48karolherbst: or maybe the other way around
15:48RSpliet: context switch is 25μs
15:48karolherbst: RSpliet: frontend or shader?
15:48RSpliet: on Fermi, Kepler and presumably Maxwell
15:48RSpliet: They run in lock-step
15:49RSpliet: context switch request reaches FE, that passes the request on to all GPCs
15:49karolherbst: RSpliet: I would assume the FE waits until the shader are done, because that would be reasonable, more or less
15:50RSpliet: yes, 25μs excludes waiting for GPCs to idle, which costs time as well
15:50karolherbst: okay sure, but that doesn't cause perf penalties
15:50RSpliet: (as one GPC could idle sooner than the others, wasting resources
15:50karolherbst: as long as the GPU is activly doing something, it basically doesn't matter
15:51RSpliet: But the GPCs don't all idle at the exact same time.
15:51karolherbst: okay, valid point
15:51karolherbst: but 25μs isn't that much, I've heard higher numbers
15:52RSpliet: These are my measurements ;-) On newer GPUs that have more fine-grained context switch support, waiting time is reduced but context switch time according to NVIDIA can be up to 100μs
15:52karolherbst: RSpliet: ohh, you are talking about compute, right?
15:52karolherbst: yeah, 100μs is also what I've heard
15:52RSpliet: Doesn't matter, a context is a context. I measured GL
15:52karolherbst: do you know how aggressive nvidia tries to share one context?
15:53RSpliet: As aggressive as a grime artist.
15:54RSpliet: Tbh, idk. I based my info in nouveau and the Fermi whitepaper.
15:54karolherbst: I see
15:55RSpliet: I'm not sure whether a HW context can be shared between GL and D3D, or CUDA, or OpenCL...
15:55karolherbst: I think this should work more or less? dunno though
15:56RSpliet: context also contain configuration bits for things like rounding modes, denorm... w/e distinguishes between the two.
15:58karolherbst: right, but we could deal with that
15:58karolherbst: you just need to adjust the context accordingly
15:59karolherbst: but, my biggest concern is that we would have to over synchronize inside mesa and basically have to resubmit most state whenever we get a request on a different context
15:59karolherbst: even worse if we actually share one pushbuffer, because we would have to wait until the other thread is done
16:00karolherbst: and we would have to wait until the hw is done with the other context anyway
16:00karolherbst: more or less
16:00karolherbst: we still have the fifo, which I assume won't reorder stuff
16:04imirkin: even if you resubmit a bunch of state, that's still faster
16:04imirkin: you could have separate pushbufs, but you still have to be a little careful
16:04imirkin: for "shared" state bits
16:04imirkin: i don't think a ton of time goes into generating pushbuf contents though
16:04RSpliet: Essentially you'd be doing a software context switch if you force mulitple applications in the same HW context. I don't have an answer for you on this matter, just numbers
16:05RSpliet: HW contexts provide isolation though, separate page tables and the like
16:05RSpliet: From a security perspective, this seems desirable
16:05karolherbst: but we talk about multiple contexts within the same application
16:05karolherbst: like GL + VDPAU
16:05imirkin: or GL + GL
16:05karolherbst: or that
16:06RSpliet: In that case single context could be desirable, sharing your VDPAU buffers with GL is easier if you have a unified view of your memory
16:06imirkin: you can have multiple hw contexts that share a single page table
16:07imirkin: in nouveau (and nvidia), i think a "client" owns the page tables
16:07karolherbst: yeah, just wanted to say that using the same physical memory should be fairly easy
16:07imirkin: i'm weak on whether client == hw context or not
16:07karolherbst: I use one client for multiple channels
16:08imirkin: right. we also have channels.
16:08imirkin: channel == hw context
16:08imirkin: client = vm
16:08imirkin: a single client can have any number of channels
16:08imirkin: (up to 128 or something)
16:08RSpliet: imirkin: gotcha. In what way does that get messy if you allocate a buffer using GL calls, then free them with VDPAU APIs, to name something ridiculous :-)
16:08karolherbst: okay, so with my solution, we already have a shared vm, but multiple channels
16:08imirkin: RSpliet: can't happen
16:08imirkin: these sorts of things don't work that way...
16:09RSpliet: Well, they shouldn't...
16:09imirkin: the underlying hw resource is hidden
16:09imirkin: it's all done in terms of front-end concepts
16:09imirkin: which refcount the underlying hw resource
16:09imirkin: there is no *free* at the API level
16:10RSpliet: Ah yes that'd make buffer sharing easier
16:11karolherbst: but mhh, 100μs only allows up to 166 context switches per frame, which isn't that much in the end
16:12karolherbst: but I kind of doubt that applications would even consider making use of multiple contexts that aggressively, or maybe they do?
16:13karolherbst: also, a texture upload shouldn't lead to a context switch if it is really just about uploading, or do I miss something here?
16:13imirkin: they might use one context to upload textures
16:13imirkin: but like
16:13imirkin: those commands gotta go somewhere
16:13karolherbst: mhhh, true
16:14karolherbst: would be interesting to know if the firmware actually does full context switches or is able to execute things on multiple contexts in parallel
16:14karolherbst: as long as stuff is seperated enough
16:14imirkin: karolherbst: btw, i assume you haven't done any work towards fixing our broken textures since ... like a year ago?
16:14karolherbst: ohh right, we also have that issue :(
16:14imirkin: if not, i might tackle it this weekend
16:15karolherbst: ahh okay, nice
16:17pendingchaos: broken textures?
16:18karolherbst: yeah... nothing really hits it, but there are a few applications
16:25karolherbst: imirkin: anyway, I think I am more or less done on nvc0 with that one channel per context approach. I think I will just start another patch doing the shared channel thing and see how far I get there. But I think for that we need to rework how we are doing stuff inside mesa. Like don't have command submissions spread out everwhere, but rather just store stuff in context and submit it if we actually need it on hardware
16:25karolherbst: but that would help with both approaches anyway
16:27karolherbst: or maybe all push buffer accesses are indeed only triggered through validate... need to check on that
16:53imirkin: pendingchaos: state tracking got broken by some st/mesa changes
16:53imirkin: (or cso changes? i forget)
16:54imirkin: karolherbst: yes, fixing it for real is more effort. but it's well worth it.
16:55imirkin: having an "in-between" fix is highly undesirable, since it makes it less pressing to do the real fix
16:56pendingchaos: https://bugs.freedesktop.org/show_bug.cgi?id=106577 ?
17:09karolherbst: imirkin: yeah, I know
17:26imirkin: pendingchaos: ya
17:26imirkin: i'm 73% sure it's triggerable in GL
17:26imirkin: while i was at it, i also wanted to fix another issue with texture buffers on nv50/fermi
17:26imirkin: whereby if sampler 0 isn't configured, texture buffers don't work
17:27imirkin: it was confirmed that this is a side-effect of how the linked_tsc=0 sampling logic works
18:12karolherbst: imirkin: I thought we actually had gl applications breaking due to it? Or how did we even noticed that bug?
18:13imirkin: karolherbst: yeah, it can happen in GL too
18:13imirkin: just much rarer/harder
18:13karolherbst: ohh, was the initial report actually about vdpau?
18:13karolherbst: or nine?
18:13HdkR: karolherbst: The comment that SM70 Cuda ELF files can run on SM75 without modification :P
18:14karolherbst: HdkR: sounds reasonable
18:14karolherbst: afaik SM75 just adds stuff
18:14HdkR: woo stuff
18:16karolherbst: but uhm
18:16karolherbst: SM70 cuda elf files will run much slower
18:16karolherbst: SM75 added a few optimizations which the hw depends upon
18:16imirkin: karolherbst: nine
18:16imirkin: it alerted me to the situation
18:16karolherbst: turing is a bit weird, because the cores aren't faster
18:17HdkR: I wouldn't say it necessarily runs slower, just isn't as optimal as it could be :P
18:17karolherbst: HdkR: it runs slower
18:17karolherbst: HdkR: I am 75% sure that for same clock speed it will be slower
18:18karolherbst: HdkR: but you should know that better than I do anyway :p
18:19karolherbst: but SM70 vs SM75 might not be that much of a difference
18:19karolherbst: was the FPU/ALU split already done on SM70?
18:19HdkR: That is already stated in the whitepaper
18:19karolherbst: then maybe it indeed runs as fast as
18:19HdkR: er, whitepapers*
18:20HdkR: Only advertised strongly on Turing
18:20karolherbst: well, who uses volta anyway
18:21HdkR: Someone who has stupid amounts of money, wants FP64 perf, and needs CUDA so they can't run their code on AMD?
18:21karolherbst: no, I meant compared to any quadro pascal card
18:21karolherbst: allthough those 3k$ aren't that much if you think about avarage developmenet costs anyway
18:22HdkR: Oh, someone that really abuses the Tensor cores
18:22karolherbst: I mean, it makes a difference if you need those cards on production hardware or for development
18:22karolherbst: but I guess most companies don't care as much
18:22karolherbst: or maybe some do
18:23HdkR: Still 61% higher FP64 perf on GV100 versus P100. So if you really need FP64 it is a decent choice I guess?
18:27karolherbst: what I meant is, you can develop on a crappy GPU and then just buy those expensive things for production
18:28glennk: speaking of context switches, pascal and later have some non-preemptive version of it that is cheaper?
19:27karolherbst: imirkin: btw, what issues are you seeing inside libdrm_nouveau so that you suggest removing it?
19:28karolherbst: personally I think we should have some kind of abstraction layer somewhere between the API implementations and kernel space, but it doesn't have to be the current API of libdrm_nouveau. I just think it is a good idea to have something we could replace without much trouble
19:28imirkin: submitting a command buffer forces a callback into the kick handler
19:29imirkin: and submits can get triggered by all sorts of stuff
19:29imirkin: like nouveau_bo_wait()
19:29imirkin: or even more innocuous things
19:29imirkin: like nouveau_bo_map()
19:29imirkin: which makes any kind of locking scheme impossible to implement
19:29imirkin: agreed on the abstraction layer bit
19:30imirkin: but it should be harder to use :)
19:30imirkin: the current libdrm_nouveau api is "easy" to use
19:30imirkin: but as a result it introduces some of these problems
19:30imirkin: the api should be more manual
19:30HdkR: karolherbst: Oh yes, definitely. If you don't need the perf then the feature set is there. Wait until you actually need the $3000 GPUs to spend the monies on them :P
19:30karolherbst: I would prefer an API where we don't have to lock at all, but maybe there is no way around that
19:32karolherbst: imirkin: you mean a call into nouveau_pushbuf_kick?
19:32karolherbst: which calls into push->kick_notify
19:34imirkin: karolherbst: yeah
19:34HdkR: Like OpenCL you mean? :)
19:35karolherbst: HdkR: no, I meant the libdrm API
19:35HdkR: oh wait, misinterpreted
19:35karolherbst: I would prefer to find a way where threads don't have to lock stuff all the time
19:36karolherbst: like if every context would have its own pushbuffer, that callback situation might be not an issue anymore
19:36karolherbst: but then you have that "screen->cur_ctx->state.flushed = true;" thing
19:37karolherbst: but... you could use "push->context->state.flushed = true" instead
19:37karolherbst: currently pushbuffers are bound by the screen so they don't have any context references
19:38karolherbst: imirkin: mhhh maybe we should have a multiqueue and queue all pushbuffers from all contexts
19:38karolherbst: and push/kick them round robin like :p
19:38karolherbst: to prevent sw/hw context switches
19:39karolherbst: I mean, command submissions
19:41imirkin: not sure what that would help
19:41karolherbst: but I guess that is already done more or less, depending on what API calls you have
19:42karolherbst: never actually looked into how the pushbuffers are filled and when they are actually submitted to the hardware
19:42karolherbst: so maybe it doesn't even make sense what I said
19:45karolherbst: imirkin: well, if you have two threads active on two contexts and they are causing the pushbuffer to be submitted, you might have to change the contexts on every submit, right?
19:47karolherbst: so you could reorder things to reduce the amount of context switches, no?
19:48karolherbst: I mean, you should still submit stuff and don't wait. Makes only really sense if something is still in the fifo, etc...
19:54karolherbst: imirkin: btw, the new chromium might cause some serious issues with nouveau. chromium 69.
19:54HdkR: Is it doing threaded GL things?
19:54karolherbst: you got those multi context crashes, sure, but even with my patch I get missrendering
19:54imirkin: it disabled accel on nouveau
19:55karolherbst: yeah... it is pretty bad
19:55imirkin: i bitched them out on the bug tracker, they said to reach out more directly
19:55imirkin: i'm fixing a handful of webgl test failures to see how bad the situation is overall
19:55karolherbst: it isn't about webgl though
19:55karolherbst: the entire GUI is broken
19:56karolherbst: dunno if you installed 69 already
19:56imirkin: no, they have that lame user disaster
19:56karolherbst: :D ohh true
19:56karolherbst: but the interface is also different
19:56karolherbst: looks like pure OpenGL
19:56karolherbst: but I think they already did it before?
19:57karolherbst: anyway, I encountered quite a lot of missrendering
19:57karolherbst: might be some deeper multi context issues we have or I messed up...
20:02imirkin: karolherbst: do you have an updated mesa? if you're on a maxwell+, you'll need pendingchaos's fix probably
20:09karolherbst: imirkin: I pulled today