01:08imirkin: karolherbst: yes, for implement KHR_blend_equation_advanced
02:34AndrewR: imirkin, thanks for reviewing patch fixing postprocessing shaders, now they appear to work \o/
02:34imirkin: cool
02:35imirkin: i'm surprised people use those
06:33pabs3: pmoreau, imirkin: I managed to replace the old GPU with a friend's spare GeForce GT 740. seems to work with nouveau, anything I should know about the support for it?
07:12pmoreau: pabs3: You might run into this bug: https://bugs.freedesktop.org/show_bug.cgi?id=93629 But other than that, Kepler is nicely supported, and you should get OpenGL 4.3 + reclocking.
07:24pabs3: pmoreau: is using the firmware the default these days?
07:25pabs3: ah, reading through the bug it looks like yes
07:28pmoreau: By default it should be using Nouveau’s own firmware. Using NVIDIA’s one improves the situation for some people, but not everyone.
07:28pabs3: hmm, ok
07:29pabs3: guess I'll leave it running for a while to see if the issue comes up
07:29pabs3: is there anything definitive (versions/etc) to trigger it or determine if I might be affected?
07:31pmoreau: Not that I know of.
07:32pabs3: ok, thanks a lot for all the info and the friendly atmosphere on this channel :)
07:33pmoreau: You’re welcome :-) Hopefully you’ll have better luck with this card regarding Nouveau support.
07:34pabs3: it is a lot faster than the other one, so that would be excellent :)
07:36pmoreau: Well, going from a G98 to a GK106, right? And you should be able to get Vukan and the latest OpenGL someday with Nouveau.
07:39pabs3: it was GT21x but yeah G98 seems to be the same family
07:39pabs3: great :)
07:40pmoreau: Ah, right
14:33pendingchaos: is the zero query used for PIPE_QUERY_SO_OVERFLOW_PREDICATE useful? it's 64 bit so it uses nouveau's fences for synchronization, not the zero query
15:54imirkin_: pendingchaos: i don't understand your question
15:55imirkin_: note that i'm somewhat rusty on all the query stuff, so you may just need to provide more back-story to jostle my memory
15:55imirkin_: pabs3: the firmware thing has only been found to be helpful on *some* GTX 660's. i don't think this affects you.
15:55imirkin_: pabs3: those GPUs tend to run into problems fairly quickly
15:56imirkin_: pabs3: note that you should have working reclocking -- if you run into issues, definitely let us know.
15:56pendingchaos: on line 248 of nvc0_query_hw.c, QUERY_GET is used for a zero query to write QUERY_SEQUENCE
15:56pendingchaos: this would be used in nvc0_hw_query_fifo_wait(), but PIPE_QUERY_SO_OVERFLOW_PREDICATE is 64 bit
15:57pendingchaos: so it uses nouveau's fences, not QUERY_SEQUENCE
15:57pendingchaos: I'm wondering if I'm correct here and if there's some other use of it I've missed
15:58imirkin_: it's likely that the logic has changed throughout the ages.
15:58imirkin_: SO_OVERFLOW_PREDICATE was never used before
16:00pendingchaos: also: what's the exact meaning of QUERY_GET's UNIT and MODE? they seem to specify what to write at QUERY_ADDRESS, but I don't know why it's split into two fields
16:02imirkin_: i don't have a clean explanation
16:02imirkin_: try not to think about it :)
16:02imirkin_: iirc one is short vs long
16:02imirkin_: short is a 32-bit value
16:02imirkin_: long is a 128-bit value
16:02imirkin_: the 128-bit value includes the first 64-bit which is a timestamp, followed by 64-bit which is the value
16:03imirkin_: i don't understand this stuff nearly as well as i ought to though, sorry
16:03imirkin_: it was all implemented before my time, and has, for the most part, just kinda worked.
16:03imirkin_: iirc i added the hq->is64bit style of wait
16:04imirkin_: my guess is that the original SO_OVERFLOW_PREDICATE impl wanted to rely on the "regular" wait style
16:05imirkin_: feel free to rip the whole thing apart and redo it
16:06imirkin_: (note that i also added the get_query_result_resource logic)
16:06imirkin_: perhaps we can get rid of the nouveau fence-based wait entirely
16:07imirkin_: just have to emit the sequence when using pipeline stats and maybe a few others.
20:37pendingchaos: imirkin_: I think I've got PIPE_QUERY_SO_OVERFLOW_ANY_PREDICATE working
20:37pendingchaos: what gpus should it be tested on?
20:37imirkin_: GF100+
20:37imirkin_: but ... it should all be the same
20:37imirkin_: if it works on one, should work on all
20:38HdkR: +1
20:38imirkin_: and any differences can be dealt with later
20:38imirkin_: on rare occasions it can happen that some difference comes up, but as you can see in that query code, other than the perf counters, it's all shared
20:45pendingchaos: should I include an addition of PUSH_KICK in nvc0_hw_query_wait() in the patch? or should I post it separately
20:45imirkin_: pendingchaos: i'd do it separately. your call.
20:53pendingchaos: I think I've confirmed that ACQUIRE_EQUAL should be ACQUIRE_GEQUAL
20:55pendingchaos: adding PUSH_KICK(); nouveau_fence_ref(nvc0->screen->base.fence.current, &fence); PUSH_KICK(); in nvc0_hw_query_fifo_wait() causes the semaphore to never trigger when using ACQUIRE_EQUAL, unlike ACQUIRE_GEQUAL
20:56imirkin_: ok cool. that should definitely be a separate patch
20:56imirkin_: basically when someone's thing breaks and they do a bisect, i don't want them to land on a large commit
21:05pendingchaos: separate from the PUSH_KICK(), OVERFLOW_ANY_PREDICATE or both?
21:06martjallalla: ouh hello cockblockers, do you have a master? masta is from estonia?
21:06martjallalla: master cockblocker? estonian?
21:06imirkin_: pendingchaos: everything separate
21:07pendingchaos: *nods*
21:12plutoo: does anyone know if const buffer is in dram?
21:12plutoo: or if it's in some fast vram-thing inside the gpu?
21:12imirkin_: dram != vram?
21:13plutoo: s/vram/sram then:p
21:13imirkin_: constbufs are ... special
21:13HdkR: imirkin_: What is vram on a TX1? :P
21:13imirkin_: in order to be used properly, they have to be uploaded in the cmdstream
21:13imirkin_: the graphics unit will then stage those writes to the backing memory
21:13imirkin_: but in a way that concurrent draws can have the same constbuf bound
21:14imirkin_: but get different values
21:14imirkin_: accessing these from shaders is meant to be pretty fast
21:14imirkin_: but i don't have concrete metrics.
21:14HdkR:confirms for very fast
21:15imirkin_: it's a lot of memory, but it's shared across all threads
21:15imirkin_: 64K * 16
21:15imirkin_: i.e. 1MB
21:15plutoo: but it's not dram?
21:15imirkin_: constbuf data is ultimately backed by vram
21:16imirkin_: however the way it's uploaded generally leads to it sitting in fast-ram
21:16plutoo: right
21:16imirkin_: i don't know how big the fast-ram is
21:16plutoo: you bind a vram/dram ptr using CB_BIND
21:16plutoo: but you can't write directly to that ptr, because there's a super highspeed cache that's inside the gpu
21:16plutoo: ?
21:17imirkin_: something like that.
21:17imirkin_: you can bind a data buffer that was previously written to
21:17imirkin_: and its data will get retrieved into the cache
21:17imirkin_: i think as the shader uses it
21:17imirkin_: not sure
21:17imirkin_: in order for that to happen, iirc you have to do some kind of flush
21:18plutoo: 0x43D has a bit to invalidate const buffers
21:18plutoo: https://github.com/envytools/envytools/blob/master/rnndb/graph/gf100_3d.xml#L512
21:23imirkin_: <reg32 offset="0x021c" name="MEM_BARRIER">
21:23imirkin_: also there's that
21:33pendingchaos: bugfix patches sent
21:35plutoo: so as far as i can tell there are 5 const buffers
21:35plutoo: each one has up to 16 "indicies" ?
21:36plutoo: in the nintendo driver, all const buffers index 0 point to the same blob
21:36plutoo: but all const buffer index 1 points to a per-const-buffer-region
21:41imirkin_: plutoo: sounds about right.
21:41imirkin_: there are 5 binding points
21:41imirkin_: one for each shader stage
21:42imirkin_: (compute sets up constbufs via a descriptor, and can only have up to 8 diff constbufs)
21:42imirkin_: pendingchaos: those look good, i'll apply them tonight (not in front of a computer with keys right now)
21:45plutoo: and the const buffer has a fixed layout? or is that programmable
21:45plutoo: i'm seeing:
21:45plutoo: 0x20 + 8*i: Textures
21:45plutoo: 0x120 + 8*i: Images
21:45imirkin_: plutoo: no, it can be whatever
21:46imirkin_: but a particular driver will stick to something concrete for its "driver parameters" constbuf
21:46imirkin_: since the shader will have to read it from a fixed location too
21:46plutoo: what about texture description structs?
21:46imirkin_: the only exception is that the first 8 words appear to have to be 0, 1, 2, 3, 4, 5, 6, 7 -- or else random things fail. no idea why.
21:47imirkin_: i haven't played with it extensively
21:47imirkin_: the TIC/TSC's live in buffers based on the TIC_ADDRESS and TSC_ADDRESS
21:47imirkin_: the "texture" is a 32-bit thing that holds both the tic id and tsc id
21:47imirkin_: which both index into those large tables
21:49imirkin_: oh, unless LINKED_TSC is set
21:49imirkin_: in which case it only reads the tic id, and assumes tic id == tsc id
21:50pendingchaos: and feature patch sent
21:52plutoo: is TIC_ADDRESS an offset into the const buffer?
21:52plutoo: why is it 64 bit
21:52imirkin_: TIC_ADDRESS is a vram address. it's 40-bit
21:52imirkin_: or rather, it's a VA
21:52imirkin_: vram or sysram, depending on what the PTE says :)
21:53plutoo: so you're saying TIC/TSC are not in the constbuffer itself?
21:53imirkin_: i am indeed saying that.
21:53plutoo: nvm i should read better
21:55imirkin_: pendingchaos: are you kidding me?! there was a single query for the "any" and i just missed it??
21:55pendingchaos: it seems so? perhaps it should be tested with a pre-Pascal card
21:55imirkin_: well, i have a fermi plugged in at home
21:55imirkin_: so it'll be getting some testing ;)
21:56imirkin_: given that it's f0, i wonder if it's a new thing
21:56imirkin_: but we'll see
21:57imirkin_: even if it's hw-specific, that's fine to enable partly too
21:57plutoo: looks like TIC_ADDRESS is officially known as TexHeaderPoolOffset
21:57imirkin_: sure, why not
22:19plutoo: sometimes addresses are set to 4
22:19plutoo: any clue what that means
22:19imirkin_: what addresses?
22:19plutoo: sorry, i mean methods that take gpu VA, sometimes gets value 4 written to them
22:20imirkin_: like what methods
22:20plutoo: instead of an actual gpu VA
22:20imirkin_: (i'm guessing you're mistaken)
22:26plutoo: yup
22:26plutoo: it's 0x400000000, and it's being written to CODE_ADDR only
22:26plutoo: i guess it might actually be a valid gpuva
22:27imirkin_: that's a legal va
22:27imirkin_: i'm guessing they put all the "huge" pages up there
22:28plutoo: i think i got an address like this when i messed with gmmu allocator flags
22:28imirkin_: yeah, so these gpu's support 4K pages as well as "large" pages. these can be configurable to either be 64K or 128K
22:29imirkin_: with a global config switch :)
22:29imirkin_: (thanks guys)
22:30imirkin_: some later GPUs also support some even larger pages, but i forget if that's starting pascal or maxwell
22:30imirkin_: might be pascal, with the 1MB pages
22:30imirkin_: to match x86 cpu's
22:32imirkin_: plutoo: btw, tell the switchbrew guys that they got TesselationMode slightly wrong
22:32plutoo: i'm the switchbrew guy
22:32plutoo: :)
22:32imirkin_: you've been informed :p
22:32imirkin_: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/nvc0_program.c#n308
22:33plutoo: big_page_size appears to be 128KB on the switch, fwiw
22:34imirkin_: sounds right.
22:40plutoo: TesselationMode describes how to construct primitives from the vertex buffer right?
22:41plutoo: or that might be VERTEX_ATTRIB_FORMAT
22:42plutoo: i think it's both.
22:45imirkin_: mmm ... from the uniform domain that is being tesellated
22:46imirkin_: specifies how to assemble the evaluated points into primitives
22:50plutoo: i wonder if instead of bit8, bit9 it's a single field bit8-9
22:50plutoo: that specifies the size of a primitive, in no. of points
22:50imirkin_: it most likely is :)
22:50imirkin_: well no
22:50imirkin_: connected vs points
22:50imirkin_: and cw vs ccw
22:50plutoo: so if you put triangle with 2 points per triangle, that effectively makes it connected
22:50imirkin_: lol
22:50plutoo: and if you put a line with 1 point per line, that is connected
22:51imirkin_: that's not what those mean
22:51plutoo: i need to work out with pen and paper if it works out
22:51imirkin_: but thank you for playing ;)
22:51plutoo: lol
22:51imirkin_: isolines / tris / quads has to do with the domain being tessellated
22:52imirkin_: the output of isolines are lines, the output of tris/quads are tris
22:52imirkin_: if connected = false, then the output is actually the individual points
22:52imirkin_: cw/ccw specifies the winding of the triangles
22:52imirkin_: but it has no meaning for lines
22:54plutoo: cw for tris permutes (a,b,c) -> (a,c,b)
22:54plutoo: and for lines (a,b) -> (b,a)
22:55imirkin_: no
22:55imirkin_: it has to do with the tessellation results
22:55imirkin_: basically it evaluates a bunch of points
22:55imirkin_: and then it's a question of how to build triangles out of them
22:55imirkin_: cw/ccw affects the order in which those points are picked to create triangles
22:56plutoo: is this config only for tesselation shaders?
22:56imirkin_: yea
22:56imirkin_: of course
22:56plutoo: oh
22:57imirkin_: what would it even mean without tess?
23:00plutoo: i thought it was also used to describe how to interpret points when assembling primitives
23:00plutoo: in general
23:00imirkin_: that's the primitive that's given when starting a draw
23:01imirkin_: and also the geometry shader output format :)
23:01imirkin_: which can effectively change it towards the end of the pipeline
23:01plutoo: right
23:06plutoo: updated the wiki
23:06plutoo: now that's a mess haha
23:07imirkin_: yeah sorry
23:07imirkin_: ;)
23:08imirkin_: next time i design that hw, i won't make it so confusing :p
23:10imirkin_: plutoo: anyways, most of those methods are documented and/or used in nouveau
23:10imirkin_: you should refer to rnndb and the nouveau codebase
23:10imirkin_: i'd also recommend normalizing your numbering with nouveau's
23:10plutoo: yeah i've peeked in nouveau quite a few times
23:10imirkin_: since it just causes confusion for no reason
23:11imirkin_: at least add a column that's like X*4
23:11plutoo: yeah i'll do that this weekend or something
23:11imirkin_: i dunno which view of it is "right", or if there's a naturally correct thing there
23:12plutoo: official code uses *4 when calculating dynamic offsets, then divides by 4 as a final step
23:12imirkin_: well, the encoding into the pushbuf requires various things
23:12plutoo: does it?
23:13plutoo: not on nvc0 i think
23:13imirkin_: i just mean it's bit shifts
23:13imirkin_: not divisions
23:14imirkin_: http://envytools.readthedocs.io/en/latest/hw/fifo/dma-pusher.html#gf100-commands
23:14imirkin_: so like the M's are the method id's
23:15imirkin_: they live in various places, depending on the exact command type
23:16imirkin_: in the pre-GF100 format, there tended to be 2 bits of padding at the end
23:16imirkin_: in the GF100 format, most commands don't have those
23:18plutoo: what are S and X?
23:18imirkin_: X = ignore
23:18imirkin_: S = subchan
23:19imirkin_: C = count of methods to call
23:20plutoo: man i feel like i'm staring at 100's of hours of work
23:20imirkin_: probably.
23:21imirkin_: it probably took me hundreds of hours to understand all this
23:21imirkin_: so to figure it out in the first place was probably quite a bit more :)
23:23imirkin_: skeggsb: any plans to refresh your fermi reclock series?
23:24imirkin_: skeggsb: i have a Quadro 600 plugged in =]
23:24skeggsb: imirkin_: you mean like, rebase it on latest code? i don't think that'll take much, i'll take a crack at it shorly if you need it
23:24imirkin_: ideally with fixes...
23:24imirkin_: but even a plain rebase would be nice if it's not a lot of trouble
23:24skeggsb: the quadro 600 should work already, it's one of the boards i wrote it against...
23:25imirkin_: should i expect a DP -> HDMI adapter to work in there?
23:25imirkin_: (i haven't tried it, just want to know)
23:25imirkin_: (passive)
23:26skeggsb: i don't see why not
23:28pabs3: imirkin_, pmoreau: after a short 0ad session and a day of idling in GNOME shell, no problems. so I think I'll switch over to that machine as my main system next reboot. thanks a lot for your nouveau work and help here
23:28imirkin_: pabs3: glad it worked out ok
23:29imirkin_: it's not too bad if you don't do anything too crazy
23:29imirkin_: unfortunately "crazy" has become a lot harder to determine
23:29imirkin_: as even the most basic desktop components think it's a good idea to do things with GL
23:29imirkin_: pabs3: did the reclocking work ok?
23:29imirkin_: i.e. were you able to reclock without crashing
23:30pabs3: not sure how to check
23:30imirkin_: then you probably didn't reclock in the first place :)
23:30imirkin_: it's manual... cat /sys/kernel/debug/dri/0/pstate
23:30imirkin_: that will show you a bunch of values, as well as the current clock state
23:30imirkin_: you can echo those values to change the clock
23:31imirkin_: should get you moar fps in 0ad
23:31pabs3: it was pretty fast as-is, my monitor size isn't huge
23:32imirkin_: probably boots to a middle-ish clock
23:32pabs3: is the AC line the current clock state?
23:32imirkin_: yes
23:32pabs3: AC: core 324 MHz memory 648 MHz
23:32pabs3: yeah, that is the lowest
23:32pabs3: max is 0f: core 1071 MHz memory 5000 MHz
23:32imirkin_: that should be faster.
23:32imirkin_: because more mhz is more better :)
23:33pabs3: so this? echo 0f > /sys/kernel/debug/dri/0/pstate
23:33imirkin_: yup
23:33imirkin_: save first
23:33imirkin_: in case, you know, the unthinkable happens
23:33pabs3: its running a test install, I'm on another machine :)
23:34imirkin_: might want to wait until an opportune time when you don't mind the system crashing and burning
23:34pabs3: yeah, I've a few things to do today before I get distracted by this :)
23:34imirkin_: (reclocking involves disconnecting vram while settings are changed... if it doesn't get reconnected properly, the video card becomes unhappy)
23:36pabs3: does removing the power solve that?
23:36imirkin_: yes
23:36pabs3: that is fine then
23:36imirkin_: any reboot fixes it
23:36imirkin_: just ... well you have to reboot :)
23:36pabs3: ah good. not an issue here, I don't care about the machine until I move the hard drive from my primary install across
23:37imirkin_: then flip it over, and, uh, test it, with 0ad
23:37pabs3: will do. I better go, thanks for the help :)
23:37imirkin_: good luck
23:41pabs3: oh, do GPUs support dynamic reclocking based on usage like CPUs support frequency adjustment at runtime?
23:41imirkin_: not sure exactly how CPUs work. GPUs allow adjusting a number of things, but they don't do this automatically - it's done by the operating system directly.
23:42imirkin_: there are a number of things that can be adjusted - e.g. memory reclocks are fairly heavy
23:42imirkin_: but there are also shader clocks and so on which can be changed much more straightforwardly
23:43imirkin_: (thing is, memory is used for scanout, so you have to do it within a vblank period, OR ELSE)
23:43skeggsb: imirkin_: updated, but not tested
23:43imirkin_: skeggsb: ok thanks =]
23:44imirkin_: do you have plans around finishing it up and/or upstreaming the initial version so people can play with it?
23:49skeggsb: imirkin_: yeah, it's a matter of finding a decent chunk of time to do it is all
23:49skeggsb: ie. in the very least make sure it doesn't regress kepler, more than making fermi work any better
23:50imirkin_: yeah
23:50imirkin_: that's the bit i'd be most concerned by
23:50imirkin_: and tesla too
23:50imirkin_: it modifies some shared ddr3 stuff
23:50skeggsb: not too concerned about tesla, it's a lot simpler
23:50skeggsb: but yes
23:51skeggsb: dealing with gf119's weirdness would be nice too, but not as important
23:51imirkin_: gotta start somewhere =]
23:52skeggsb: gf119 has some minor steps towards being more like kepler, but mostly the same as fermi still.. also, i think the kepler code is actually "gf117"
23:52imirkin_: if only you could find one somewhere
23:52skeggsb: those are *really* rare, only in a couple of laptops i think
23:52imirkin_: yeah