01:49dboyan_: imirkin: I just saw in the spec of ARB_shader_clock: "The clockARB() and clock2x32ARB() functions serve as a code motion barriers." Do we need to care about other things besides setting the fixed flag of the instructions that read clock registers?
01:55dboyan_: well, what exactly?
01:56imirkin: i->fixed = true;
01:57imirkin: oh, sorry. misread your question. no, we don't need to care about other things.
01:57imirkin: "code motion barrier" is not a precisely well-defined concept. i don't think we'd have to flush e.g. global memory/etc caches for CSE/etc
01:58imirkin: i.e. i don't think it should be treated as a all-memory clobber
01:58imirkin: the whole point of those things is to measure shader performance, not to alter its performance.
01:59dboyan_: yeah, good point. I don't think I found any special handling in i965 either.
10:34RSpliet: dboyan_: http://www-sop.inria.fr/members/Sid.Touati/publis/RS-IJPP05.pdf
10:34RSpliet: http://www.cl.cam.ac.uk/teaching/1617/OptComp/notes.pdf page 36
10:35RSpliet: the first one in particular might be very interesting, could give some hints on how insn scheduling and register pressure interact
11:49dboyan: imirkin: Now I suspect the blob is wrong in ARB_shader_clock. I checked values returned by clockARB() with the blob, and they only differs on the higher 32 bits, lower bits are all zero.
11:50dboyan: imirkin: I guess I also need FOR_EACH_DST_ENABLED_CHANNEL when handling TGSI_OPCODE_CLOCK
14:47imirkin: dboyan: either that, or check for dst0[x] != null
14:47imirkin: or whatever check FOR_EACH_DST_ENABLED_CHANNEL does
15:06dboyan: imirkin: The blob was generating really weird code for clockARB(), but it was wrong, I guess
15:07imirkin: mmmm... not necessarily. but maybe.
15:07imirkin: i'd like to see the full glsl shader + blob output for both clockARB and clock2x32ARB
15:08dboyan: wait a moment, I'll boot and configure my machine
15:09imirkin: nearly every time i've thought "oh yeah, blob is wrong", it's actually been right
15:20dboyan: imirkin: https://pastebin.com/ePbEx2Vt
15:22dboyan: It basically writes clock2x32ARB and clockARB results to an ssbo. when I read them out, they are like 0x323c100000000 and 0x3260200000000
15:24imirkin: so when reading out the clock
15:24imirkin: it's trying to prevent rollover somehow?
15:24imirkin: so it reads sr81 twice
15:24imirkin: and if it's different, it starts over
15:25imirkin: but if they're the same, then it reads sr81/sr80 again
15:25imirkin: which is confusing, since the rollover may have already occurred
15:25imirkin: not sure what they're trying to prevent with that loop
15:27imirkin: ok, so i'm pretty sure that this clockhi thing
15:27imirkin: isn't what we think it is
15:29imirkin: i think it's writing 0x51 (clockhi) into the low bits and 0x50 (clocklo) into the high bits. and yet it tries to ensure that 2 subsequent reads of clockhi are identical
15:30dboyan: but why is it reversed in the first place?
15:30dboyan: now I can understand the concern about rollover
15:43imirkin: it's reversed because clockhi doesn't contain what we think it contains
15:43imirkin: perhaps it contains a 1 on rollover, which is cleared by reading it -- who knows =/
15:45dboyan: really, that way the value read out actually only cantain 32 valid bits, that's quite insane
15:46dboyan: I'll try to make the shader run longer (make clocklo overflow) and see what I'll get
15:47imirkin: the valid bits should be the high ones though
15:47imirkin: so at least that part is right
15:48imirkin: since that allows a shader to have a chance to detect overflow
15:56dboyan: imirkin: I ran something really time-consuming in that shader, and the clockARB after I got becomes 0x1de0596800000001
15:56dboyan: so I guess clockhi is actually clockhi
16:00dboyan: If rollover is really a concern, I think the following algorithm will work:
16:01dboyan: 1. Sample $clockhi, $clocklo, and $clockhi again
16:01dboyan: 2. If the two $clockhis agree, then we get the clock value
16:03dboyan: 3. Otherwise, decide which value we want according to $clocklo value, if it is big, choose the previous one, if small, choose the latter
16:14dboyan: well, I trying to get $clockhi more than 1 failed because the blob seemed to think the gpu got stuck and stopped the shader
16:38imirkin_: dboyan_: i dunno... so there are 2 issues
16:39imirkin_: issue 1: rollover can happen between 2 independent readings
16:39imirkin_: issue 2: rollover can happen IN THE MIDDLE of one reading
16:53karolherbst: imirkin: any plan on when you want to merge my patches?
16:54imirkin_: no concrete plan. keep reminding me. i've been beyond busy.
16:54imirkin_: there's nothing wrong with them. i just need to spend some time and ensure all's well.
16:54karolherbst: okay, nice
16:54imirkin_: [at least i'm not currently aware of anything wrong with them. i may become aware of such a thing after spending some time... heh.]
16:55karolherbst: as usual
17:01imirkin_: dboyan_: ok, so looks like the gsoc proposals are due *soon*, i'll be sure to make some time and review yours
17:42karolherbst: imirkin_: in the "optimise slct(t, f, set) to mov(set) or not(set)" opts I could do it for slct(t, f, pred) as well, right? (except that the mov needs to be a cvt)
17:44imirkin_: or more ideally, make the set that generates the pred return a u32 instead
17:44imirkin_: but yes, that can be done as a separate step
17:44karolherbst: mhh, it doesn't need to be a set though
17:44karolherbst: there is more which could produce a predicate
17:44karolherbst: or write into one
17:45karolherbst: cvt or mov?
17:45imirkin_: heh, i guess
17:45imirkin_: anyways, yes, the cvt is fine
17:45karolherbst: ohh and all the set variants...
17:45imirkin_: and then a separate opt can be made
17:45imirkin_: yea, but all those can return either one
17:45karolherbst: is there a function to catch all set variants at once?
17:45imirkin_: no, just add all 4 to the switch
18:44NanoSector: what is the most reliable way to detect if a system is Optimus or not?
18:44NanoSector: I thought of checking whether both Intel and NVIDIA are present, but that'd mean desktops with the integraded GPU enabled would also pass
19:13airlied: NanoSector: why do you care about optimus specificlly?
19:28NanoSector: airlied: asking for the Antergos installer, it needs a way to detect if a laptop has Optimus and then allow the installation of bumblebee
19:29airlied: you'd have to probe the ACPI tables
19:31karolherbst: NanoSector: bumblebeeisn
19:31karolherbst: 'nt limited to optimus
19:31karolherbst: only bbswitch is mainly
19:33karolherbst: NanoSector: wouldn't it make sense to install bumblebee whenever there are two GPUs installed?
19:39NanoSector: karolherbst: you mean also on desktop systems?
19:43karolherbst: why not?
19:44karolherbst: imagine somebody with intel main and nvidia dedicated, but without a display
19:44karolherbst: on nvidia
19:44karolherbst: NanoSector: or better: if the script detects multiple GPUs, promp for something or so
19:45karolherbst: let the user decide in this case
19:45NanoSector: we'll default to Nouveau but we show an option to install nvidia proprietary drivers
19:46NanoSector: I suppose we can modify it to also install bumblebee
20:29Lyude: imirkin_: parts of that test for seeing if lines output by the gs ignore fill_rect were broken anyway as i just found out, but the test for points from the gs does the same out-of-bounds accesses and outpnuts exactly what I'd expect
20:29Lyude: well, not what I'd expect now that I know that's supposed to work, but it outputs three points in the shape of a rectanglwe
20:30Lyude: you sure gl_in is always passed as just the length of the number of expected vertices for the given primitive?
20:32Lyude: or is it just coincidence that works at all
21:18imirkin_: Lyude: gl_in is a fixed number of vertices, depending on the quantity of vertices of the incoming primitive, 1..6
21:18imirkin_: (for triangles_adj)
21:18Lyude: yeah, me and kayden discussed it
21:23imirkin_: ok cool
21:29xerpi: drmModePlaneRes::count_planes is 0 on my machine, I guess no hw planes for me :(
21:29xerpi: (I have a G86M [GeForce 8400M GS])
21:30xerpi: I guess the hw is too old to even support a cursor plane?
21:31skeggsb: xerpi: 4.10 should expose "planes"
21:31xerpi: I have 4.10.6-1-ARCH
21:32skeggsb: well, internally we'll create 2xcursor + 2xprimary for that gpu
21:32xerpi: is this info exposed somewhere in the debufs?
21:32skeggsb: the overlays could be supported too if someone cares, but nv hw overlay is somewhat useless if you ask me
21:32skeggsb: ie. no scaling
21:32xerpi: skeggsb, I see, so no "overlay planes" other than the primary and cursor ones
21:32skeggsb: sorry.. no *useful* scaling
21:33xerpi: skeggsb, I see. even without scaling I think they can be useful imho
21:33skeggsb: patches welcome, it shouldn't be to hard to add :)
21:33skeggsb: i've played with them manually before, but didn't bother to implement them with atomic
21:34skeggsb: too* hard
21:35xerpi: skeggsb, oh that would be cool, I'd be interested if it isn't "very hard"
21:35skeggsb: the hw interfaces are documented in the display class headers on nvidia's ftp site
21:35xerpi: gonna check nouveau's code :)
21:36imirkin_: xerpi: ftp://download.nvidia.com/open-gpu-doc/Display-Class-Methods/1/
21:36imirkin_: may be useful.
21:36aaronp: skeggsb, the other thing that confuses newcomers is that the overlay has formats with alpha, but it uses alpha color keying, not alpha blending.
21:36xerpi: imirkin_, thanks!
21:36skeggsb: aaronp: right!
21:37xerpi: I hope it's a matter of writing to a few hw regs and that not a lot of "hard stuff" (sync) is needed
21:37imirkin_: coz why would overlays have alpha blending... color keys are all the rage. it's 1995 right? :)
21:38skeggsb: aaronp: i'm guessing there was a very specific use-case in mind for >=g80 overlay? it seems like a step backwards from nv40 :P
21:38aaronp: I don't think the nv40 overlay had alpha blending either.
21:38aaronp: Or are you talking about the scaling?
21:38aaronp: For "workstation" overlays, it doesn't need scaling or alpha blending.
21:38skeggsb: mostly the scaling.. i could only ever figure out how to make evo scale in one direction, which was a bit odd
21:38xerpi: is there a doc explaining the cards codenames? nv40 and all this stuff
21:39aaronp: skeggsb, yeah, it only scales horizontally, IIRC. There was some use case for video for that, I think.
21:39aaronp: We don't use it in VDPAU.
21:39imirkin_: nv30 had the last useful overlays
21:40skeggsb: imirkin_: oh? how does that differ from nv40?
21:40imirkin_: nouveau exposes those, but they don't get used for anything
21:40imirkin_: skeggsb: well, iirc nv40 itself had the same ones, but nv41+ lost them
21:40skeggsb: oh right, i forgot about that
21:41aaronp: Didn't nv4x have a lot more memory bandwidth so it could afford to do scaling in the GPU?
21:41aaronp: That's right around when I joined nvidia so my memory is fuzzy.
21:41imirkin_: i do have a nv4a plugged in, pretty easy to experiment...
21:41imirkin_: aaronp: yeah, nv4x was competently able to use texturing for the YUV -> RGB stuff i think
21:43skeggsb: xerpi: you mean, how they map to marketing names?
21:43xerpi: skeggsb, yeah, something along that
21:43imirkin_: xerpi: https://nouveau.freedesktop.org/wiki/CodeNames/
21:43skeggsb: the answer is: not very well... :P
21:43xerpi: imirkin_, oh awesome!
21:43aaronp: also https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units
21:44imirkin_: the mapping gets progressively worse with time
21:44imirkin_: e.g. the 8400 GS was a G86, G98, and GT218
21:44xerpi: interesting, I have the "NVIDIA Corporation G86M [GeForce 8400M GS] (rev a1)" which I guess is the rev1?
21:44aaronp: No, the "rev1" refers to the silicon revision.
21:44imirkin_: or GT 630 which is GF108, GK107, and GK208
21:44imirkin_: the fact that it says G86M is a good indicator though :)
21:44aaronp: I.e. for that chip we shipped the first version we fabbed, with no additional reworks.
21:45aaronp: er, rev a1 rather.
21:45imirkin_: or that you went through 161 revisions before you got a chip you could ship ;)
21:45skeggsb: haha :P
21:45aaronp: Heh. I don't know why they start at a1 rather than 01.
21:45aaronp: One of those "because it's always been that way" things, I guess. :)
21:46skeggsb: it leaves room for "b1" as "major revision", i guess
21:46aaronp: Yeah, that's the idea.
21:47xerpi: I see, interesting stuff
21:49xerpi: oh an atomic seems disabled by default: /sys/module/nouveau/parameters/atomic -> 0
21:49skeggsb: it'll still be used internally, the legacy interfaces are wrapped on top of it
21:50skeggsb: i was being paranoid, and didn't expose it to userspace by default yet
21:52imirkin_: skeggsb: so what do i need to do to make the VPE1 stuff work with nvif now?
21:53imirkin_: do i have to define new nvif APIs for sending the data down to the driver?
21:53imirkin_: and then that's it?
21:53skeggsb: imirkin_: i'm working on proper channel interfaces in nvif right now actually
21:53imirkin_: ah, well this isn't urgent ;) i was just contemplating
21:54imirkin_: it's waited 10 years, can wait a little longer. heh.
21:54skeggsb: you can write the code to setup 0x1774 inside nvkm in the meantime ;)
21:54imirkin_: i'd rather see the fix for the nv40 + mpeg thing
21:55skeggsb: err, yes :P
22:08xerpi: skeggsb, I see that nv50_wndw_ctor is only called for "curs" and "base" (primary plane)
22:08xerpi: at least on nv50
22:33xerpi: that's odd, passing nouveau.nouveau_atomic=1 to the command line didn't enable the module param
22:33imirkin_: probably nouveau.atomic=1
22:34xerpi: oh right
22:35xerpi: time to reboot again
22:43Lyude: imirkin_: Changes made, https://github.com/Lyude/mesa/tree/wip/NV_fill_rectangle-v3 and https://github.com/Lyude/piglit/tree/wip/nv_fill_rectangle-v3
22:43xerpi: "There is one overlay immediate channel per head" I guess head means crtc?
22:47imirkin_: Lyude: ok great. i'll look tonight and push.
22:55imirkin_: Lyude: in tests/spec/nv_fill_rectangle/execution/lines-ignore-fill-rect.shader_test, that test would probably work better as GL_LINE_LOOP
22:55imirkin_: Lyude: since with that, you end up with a diagonal line, which would cause the whole rect to be filled if things didn't work correctly
22:56imirkin_: Lyude: otherwise you end up drawing a single line, and even if it's doing the "wrong" thing, you'd end up with the same result
22:56xerpi: skeggsb, setting DRM_CLIENT_CAP_UNIVERSAL_PLANES reports 4 planes, which is what you said (2xcursor + 2xprimary)
22:58imirkin_: Lyude: also ... is gl_TessCoord.y a thing for isolines tessellation? i'm not 10000% sure how isolines tessellation works in the first place tbh. i guess it still tries to tessellate the whole domain, jsut the choice of points to evaluate is different? hm.
23:01Lyude: imirkin_: jfyi that shader is pretty much copied from the only other isoline shader thingy. But yeah, gl_TessCoord.y is (because this totally makes perfect sense....) for which line the vertex is for
23:02imirkin_: sorta, yea
23:02Lyude: it seems bizzarre to me that isolines are actually a primitive in the first place
23:02imirkin_: well, think about what tessellation is trying to achieve...
23:03imirkin_: it allows you to evaluate a function (i.e. the TES) along logical isolines in the tessellation domain
23:05Lyude: that makes more sense
23:08xerpi: skeggsb, so I guess adding overlay planes is "just a matter" of adding a new struct nv50_wndw_func?
23:11Lyude: imirkin_: https://github.com/Lyude/piglit/tree/wip/nv_fill_rectangle-v4
23:11imirkin_: thanks. i'll poke around it tonight, probably make some changes myself and push
23:12imirkin_: (since i don't have the requisite GPU myself... i'll just be futzing with the negative tests)
23:12Lyude: you want me just to add your public ssh key to the machine I've got with the GM200?
23:12imirkin_: that'll tempt me to do even more things. i need to do fewer things.
23:13imirkin_: thanks for the offer though