01:54 rhyskidd: skeggsb: do you have a mmiotrace from turing that you're able to share?
04:08 imirkin: skeggsb: https://bugs.freedesktop.org/show_bug.cgi?id=108980#c11 -- volt speedo not available on gf117?
04:16 imirkin: skeggsb: hmm - i see SPEEDO accessed in nvd9 and nvcf but not nvd7 in our mmiotrace list
04:24 skeggsb: not a clue, gf117 is probably the worst supported gpu we have tbh
04:25 skeggsb: it's a weird mixup that i *know* we get wrong in a couple of places
04:25 skeggsb: (most notably: the gk104 memory clock code should actually be gf117...)
04:33 imirkin: skeggsb: good one. it does read 3a8, which contains a speedo-looking value
04:34 imirkin: no mention of 0x122634 though
04:35 blipk: 0xB000B5
04:36 blipk: who can I give my GTX1060 too so they can make nouveua work with it?
04:36 skeggsb: yeah, it's a weird mix between fermi and kepler, and we get it wrong
04:37 blipk: I'll trade for a RX 580
04:37 imirkin: blipk: should already work
04:39 blipk: it does not
04:39 blipk: not with my 4K monitor over display port anyway
04:39 blipk: xrandr doesn't pick up any devices besides 'default' and it won't set to native res
04:39 imirkin: sounds like nouveau doesn't load
04:40 blipk: hmm
04:40 blipk: wheres the log for that
04:40 imirkin: dmesg
04:40 blipk: I tried on 3 different debian variants
04:40 imirkin: ah. therein lies your problem
04:40 imirkin: debian ships a kernel that's too ancient, in all likelihood
04:41 blipk: ah
04:41 blipk: yeah I'm on 4.9
04:41 imirkin: try 4.19
04:42 imirkin: which was released after those GPUs came out, rather than before
04:42 blipk: I've got to learn how to upgrade kernel
04:43 blipk: I guess I'll just try a non-debian OS next
04:43 imirkin: with 4.20, you should even be able to get 4k@60 over hdmi
04:43 blipk: I like my display port cable though
04:43 blipk: and not sure if my HDMI cables are 2.0
04:43 imirkin: not sure if that's a thing
04:44 imirkin: either way, DP should work fine too
04:44 blipk: or whatever the latest is
04:44 blipk: I was reading the old HDMI versions won't support 4k@60
04:44 blipk: sure
04:44 blipk: I'll try a new kernel I guess
04:44 imirkin: yeah, but that's a transmitter/sink thing. i don't think cables are different.
04:45 blipk: hmm
04:45 blipk: most cables are marketed as 1.0/1.2 compatible
04:45 airlied: I think some low qual hdmi 1.0 cables can't do hdmi 2.0
04:45 blipk: ya
04:45 airlied: at least one of mine causes blinky screens
04:45 imirkin: conceivable.
04:45 imirkin: probably wouldn't have passed hdmi certification before either
04:45 blipk: they won't do 60Hz at certain res
04:45 airlied: they just don't need to be directional and gold plated
04:46 imirkin: however it's not like any are checked in the first place
04:46 karolherbst: imirkin: well, the volt code was written with kepler in mind, might be a different reg on fermi :)
04:46 imirkin: karolherbst: the gf100.c code looks at a different reg
04:46 imirkin: karolherbst: but gf117 (in traces) looks at the gk104 fuse register
04:47 karolherbst: ohhhh, I remember
04:47 karolherbst: I've added that, right
04:47 karolherbst: imirkin: interesting
04:47 imirkin: i.e. it wants fuse 0x3a8
04:47 imirkin: instead of 0x1cc
04:47 imirkin: but all the fuse readout logic around it is still the regular fermi fuse readout logic
04:48 karolherbst: afaik you can change the volt class to gk104 and it should just work
04:48 karolherbst: I've added the volt classes for the speedo stuff
04:48 imirkin: no
04:48 imirkin: but ... close
04:48 karolherbst: ohh, pmu voltage
04:48 karolherbst: right :/
04:48 imirkin: and i'm talkign specifically about speedo value readout
04:49 karolherbst: I kind of remember us having this discussion way in the past
04:49 imirkin: about gf117?
04:49 imirkin: doubtful.
04:49 imirkin: at least i don't remember
04:49 karolherbst: what's worth it, we have to rework some bits of that anyway
04:49 karolherbst: I've found a vbios with a reference to a i2c connected PWM :/
04:49 imirkin: well, i just want to avoid the MMIO read error
04:50 karolherbst: right
04:50 imirkin: skeggsb: do you want me to add a gf117.c for this silly thing, or can i just stick a bit of logic into gf100.c?
04:51 imirkin: [imho things have been over-split...]
11:47 RSpliet: Oh interesting. There's an NVIDIA patent describing how hardware masks the cost of register bank conflicts using an "operand collector". That explains why avoiding bank conflicts in RA didn't have a major performance implication
12:09 mupuf: RSpliet: interesting!
12:12 RSpliet: It is! The timing of the patent is such that it might be tech implemented for Tesla already...
12:14 mupuf: Tesla, as in geforce 8?
12:15 RSpliet: Yep
12:15 RSpliet: 2006
12:24 RSpliet: It's blindingly obvious once you see it ;-) The order of fetching operands for a single instruction doesn't matter, so if you look at multiple instructions at a time, odds are you'll be able to issue a read to every bank in every cycle. If each instruction comes from a different warp, you don't have to worry about hazards. The only downside I guess is that forwarding becomes an absolute nightmare... let's see if the patent says an
12:56 AndrewR: so, plain wine 4.0-git + d3d (nine) patch also failed to run mafia2 (but run those ATI demos, with nine enabled ..still, nothing like 2-3x fps increase) ..
12:57 AndrewR: patch from https://bugzilla.freedesktop.org/show_bug.cgi?id=108263 also had no positive impact, at least on this particular game
12:57 diogenes_: AndrewR, what about steam proton?
12:58 AndrewR: diogenes_, well, if I find slackbuild for it :}
12:58 diogenes_: AndrewR, haha slack omg
13:48 AndrewR: "nv50_constbufs_validate:62 - user constbufs only supported in slot 0" - while running 3dmark05 with current mesa and nine-patched wine
13:52 imirkin: AndrewR: grrr ... if nine is trying to feed user constbufs to other slots, we're in trouble
13:52 karolherbst: uhhh :/
13:52 imirkin: although i don't think that's possible
13:52 karolherbst: "user constbufs" is the uniform one?
13:52 imirkin: karolherbst: no, client-side
13:52 imirkin: vs in a pipe_resource
13:52 karolherbst: ahh
13:54 karolherbst: imirkin: btw, I think I will look into reducing the amount of convert instructions, should have a significant impact.
13:54 karolherbst: pendingchaos: were you looking into optimizing neg/abs -> add? otherwise I would do that?
13:55 pendingchaos: you can do that
13:56 karolherbst: merging f2i + i2f will be interesting as well
13:56 imirkin: karolherbst: iirc i did that
13:56 imirkin: or at least thought about doing that ;)
13:56 karolherbst: :) which one?
13:57 imirkin: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp#n2036
13:57 imirkin: have a look at that and a few following functions
13:57 imirkin: i handle a few cases.
13:58 imirkin: specifically f2i(trunc) is handled
13:58 karolherbst: imirkin: ohh, I meant two things: fneg/fabs -> fadd neg/abs a 0 and ineg -> iadd neg a 0
13:58 karolherbst: and f2i + i2f -> f2f with integer rounding mode
13:59 imirkin: that's dangerous
13:59 karolherbst: the second one?
13:59 imirkin: e.g. float is out of range of the int
13:59 karolherbst: mhh, yeah
13:59 imirkin: yea
13:59 karolherbst: but that's undefined afaik
14:00 imirkin: yeah dunno - like i said - dangerous :)
14:00 imirkin: you could try it
14:00 imirkin: but ... dangerous
14:00 karolherbst: casting high value floats to int leads to all kind of funky results anyhow
14:00 imirkin: check to see what other driver folks think of such an optimization
14:00 karolherbst: yeah
14:00 karolherbst: I could check what nvidia does
14:00 imirkin: as for the fneg/fabs thing ...
14:01 imirkin: you could do it as a post-RA thing even
14:01 karolherbst: ohh, right
14:01 karolherbst: should be harmless
14:01 imirkin: the only reason not to do it at emission time is that it will complicate the scheduling calculations
14:01 karolherbst: but I would have to insert r63/r255 then, right?
14:01 imirkin: ya
14:02 imirkin: rZero iirc
14:02 karolherbst: k
14:02 karolherbst: I was thinking about late algebraic opts
14:02 karolherbst: but we could do the super late stuff also post ra
14:07 karolherbst: imirkin: mhh, seems like nvidia doesn't merge those :/
14:09 karolherbst: but I only checked with cl kernels, maybe for glsl that would actually be fine
14:09 karolherbst: dunno
16:53 karolherbst: imirkin: I am wondering if it's worth adding a convert instruction count to the shader stats
16:53 karolherbst: doesn't matter how you look at those, you want to have as few as possible of them
17:26 karolherbst: mhh, rZero is only there in NVC0LegalizePostRA
17:28 imirkin_: which is where it would go
17:29 imirkin_: so it works out nicely.
17:29 karolherbst: huh? you want to have that while legalizing?
17:30 karolherbst: thought we could just add a pass to Program::optimizePostRA
17:32 imirkin_: legalize = make the IR be more emittable
17:32 imirkin_: this seems like a reasonable place to do the cvt -> not cvt stuff
17:33 karolherbst: well, it does't make the IR to be more emittable
17:33 imirkin_: just makes the emissions better. seems reasonable :)
17:33 imirkin_: rZero is a pretty platform-specific concept
17:34 karolherbst: right. I can put the logic into BuiltUtil or wherever anyway. Just have to be careful about chipset specific rules
17:34 imirkin_: you can also just stick an immediate zero and let the pass that does it fix it up
17:34 karolherbst: like on Tesla we don't have abs on the adds
17:34 imirkin_: i forget where those run -- presumably in legalize too
17:34 karolherbst: postRa opts are after legalize
17:34 imirkin_: ah, so too late.
17:34 karolherbst: yeah :/
17:34 imirkin_: yeah, i really think you want legalize
17:34 imirkin_: since it's platform-specific anyways
17:34 karolherbst: true
17:35 karolherbst: mhhh
17:35 karolherbst: on tesla we could do max(abs a, abs a) ...
17:36 imirkin_: or nothing at all - i dunno that it's such a hit there
17:36 karolherbst: mhh, let's see what nvidia decides to do :)
17:36 karolherbst: allthough I guess my installed cuda tools are way too new
17:38 karolherbst: mhh sm_30 is max
17:38 karolherbst: uhm
17:38 karolherbst: min
17:38 karolherbst: imirkin_: by any change, do you know the highest cuda version with tesla support?
17:38 karolherbst: *chance
17:39 imirkin_: pretty sure its still supported in nvdisasm
17:39 imirkin_: that's the only bit i've ever used :)
17:39 karolherbst: mhhh
17:39 karolherbst: well I want to compile a cl kernel to sass :/
17:40 karolherbst: seems like 7.x dropped tesla support
17:40 karolherbst: much easier to go the cl -> ptx -> sass route than doing mmts :)
18:04 YuGiOhJCJ: hello, I am using xf86-video-nouveau-1.0.15 with a GeForce GTX 1060, when I do 'startx' I see that this card is not in the supported list https://pastebin.com/FDbNKQSs (Xorg log file) that's weird because I see that it is listed on this page: https://nouveau.freedesktop.org/wiki/CodeNames/#NV130 is it because 2D features are a WIP as in this page? https://nouveau.freedesktop.org/wiki/FeatureMatrix/ or just because I need to use th
18:04 imirkin_: YuGiOhJCJ: that's just a fixed string in the driver that hasn't been updated
18:04 imirkin_: should probably remove it...
18:07 YuGiOhJCJ: oh you mean that eventually my card is supported by this driver version?
18:07 imirkin_: it already is
18:07 YuGiOhJCJ: nice
18:07 imirkin_: just that string hasn't been updated in ages
18:07 imirkin_: coz ... no one looks at it :)
18:08 YuGiOhJCJ: but I think I have an issue loading the driver, because I am in 1024x768
18:08 imirkin_: that may well be
18:08 imirkin_: but it's not because there's a lack of support in xf86-video-nouveau :)
18:09 imirkin_: pastebin dmesg + xorg logs
18:10 YuGiOhJCJ: https://pastebin.com/nK1zgPY7 (xorg)
18:10 imirkin_: [ 17048.383] (EE) open /dev/dri/card0: No such file or directory
18:10 imirkin_: and you're using fbdev
18:10 imirkin_: so ... you're not getting nouveau kernel driver loaded
18:14 YuGiOhJCJ: https://pastebin.com/0fqJmNbM (dmesg)
18:15 YuGiOhJCJ: yes, I am on fbdev I don't know why
18:15 imirkin_: [ 8.792605] nouveau 0000:01:00.0: unknown chipset (136000a1)
18:15 imirkin_: you need a kernel that's not 2 years old
18:15 YuGiOhJCJ: oh a kernel upgrade is needed
18:15 karolherbst: imirkin_: okay, so on tesla nvidia uses F2F.F32.F32 for OP_NEG still
18:16 imirkin_: karolherbst: my guess is that a part of it is increased ability to use short opcodes? dunno.
18:16 YuGiOhJCJ: ok I will upgrade my kernel and see if it works better
18:16 YuGiOhJCJ: thanks
18:16 karolherbst: imirkin_: mhh same goes for sm_30 and sm_35 though
18:16 imirkin_: i'd recommend something new. like 4.19
18:16 imirkin_: karolherbst: so perhaps your test isn't sufficient?
18:16 imirkin_: gotta have more code
18:17 karolherbst: for maxwell it uses fadd
18:17 imirkin_: ah
18:17 imirkin_: without additional info, i'd kidna recommend doing whatever they do
18:17 imirkin_: since they tend to know what they're doing
18:18 karolherbst: imirkin_: mhhh, werid
18:18 karolherbst: with cuda-8 they use f2f
18:18 karolherbst: with cuda-10 they use fadd on fermi/kepler
18:18 karolherbst: (cuda meaning ptxas here)
18:18 karolherbst: abs.f32 is the ptx instruction
18:20 YuGiOhJCJ: how GeForce GTX 1060 will be supported? will I get 1920x1080, 60 FPS and 3D acceleration?
18:20 karolherbst: mhh
18:20 karolherbst: why did I test abs
18:21 karolherbst: same with neg though
18:22 karolherbst: YuGiOhJCJ: most likely not 60 fps for any demanding game
18:24 imirkin_: YuGiOhJCJ: no.
18:25 imirkin_: pick any 2 of those.
18:29 HdkR: 320x240 60fps
18:30 YuGiOhJCJ: so yes with 'glxgears' I will get 60 FPS in the window but my screen will be 1920x1080
18:31 imirkin_: ;)
18:50 karolherbst: imirkin_: uff :/ I also have to deal with predicated abs/negs :/ that's kind of annoying at that point
18:51 imirkin_: how is that more or less annoying?
18:51 karolherbst: predSrc
18:52 karolherbst: well, basically I want to move src0 to src1 and set src0 to 0
18:52 karolherbst: but I ended up overwriting the pred source this way
18:52 imirkin_: adjustSources
18:52 imirkin_: er
18:52 imirkin_: moeSources
18:52 karolherbst: and wouldn't setSrc(0, zero) fails if predSrc is 0?
18:52 imirkin_: first moveSources
18:53 imirkin_: predSrc is always after the regular ones
18:53 karolherbst: ohh, okay
18:53 imirkin_: otherwise insanity ensues
18:53 karolherbst: yeah, that makes it easier then
19:06 karolherbst: nice: total cvt instructions in shared programs : 139343 -> 108443 (-22.18%) :)
19:07 karolherbst: 50% in pixmark_piano... let's see how much of a difference that makes
19:08 HdkR: Nice
19:17 karolherbst: mhh 5012 -> 5024 points
19:17 karolherbst: not as much as I expected, but for the fact we actually simply replace some instructions, it's still quite good
19:19 karolherbst: did only take care of the most trivial things though
19:20 karolherbst: neg(abs a) can be expressed as add(neg abs a) I think?
19:20 karolherbst: maybe that's helpful as well
19:56 karolherbst: "total cvt instructions in shared programs : 139343 -> 95856 (-31.21%)" now :)
20:01 HdkR: Nothing like removing nearly 1/3rd the number of instructions
20:03 karolherbst: uhm
20:03 karolherbst: instruction counts stays the same
20:03 karolherbst: "total instructions in shared programs : 7614782 -> 7614782 (0.00%)" :)
20:04 karolherbst: just less i2i and f2f
20:04 karolherbst: but more adds ;)
20:04 HdkR: ah
20:04 karolherbst: well + 0.2% perf in pixmark_piano
20:04 karolherbst: unreclocked pascal GPU
20:10 HdkR: big gains
20:19 imirkin_: HdkR: for something simple like emitting an op differently? seems not bad.
20:19 imirkin_: although pixmark_piano is super-shader-heavy...
20:23 HdkR: Yea, it is quite nice for just emitting different ops in certain cases :D
21:15 karolherbst: imirkin_: mhh, so I have those conversions:
21:16 karolherbst: - abs(a) -> add(0, abs a)
21:16 karolherbst: - neg(a) -> add(0, neg a)
21:16 karolherbst: - neg(abs a) -> add(0, neg abs a)
21:16 karolherbst: - sat(a) -> sat add(0, a)
21:16 karolherbst: anything else comes to your mind?
21:25 imirkin_: karolherbst: be careful to check that sType == dType
21:26 karolherbst: imirkin_: mhh, do we actually allow type changes for those ops?
21:26 imirkin_: for CVT? sure.
21:26 imirkin_: or are you just checking straight-up ABS/NEG/etc?
21:26 imirkin_: i think we might generate CVT's in places
21:26 karolherbst: yeah, I check if op is one of ABS/NEG/SAT
21:27 imirkin_: maybe not
21:27 imirkin_: either way, those ops aren't allowed to have any funny business like that
21:27 karolherbst: I can put the check there just in case, but I highly doubt it will change anything
21:28 karolherbst: I am wondering if I should add the code for sat(abs/neg a) but nothing none of the shaders we have actually has this
21:29 karolherbst: I guess it wouldn't hurt
21:29 karolherbst: mhhh
21:29 karolherbst: and I need to disable it for when we have immediates because you could disabled opts
21:29 karolherbst: and limm forms usually don't support anything
21:30 karolherbst: or I see no point checking stuff there
21:30 karolherbst: mhh iadd supports sat with limm but not fadd
21:30 karolherbst: oh well, doesn't matter
22:28 karolherbst: imirkin_: on kepler using add should let us do more dual issueing than using abs/neg/sat
22:29 karolherbst: like we can't dual issue two OP_CVT together
22:39 imirkin_: karolherbst: ah cool
22:41 karolherbst: mhh, only +0.2% perf on my gm204 fully reclocked :/
22:41 karolherbst: oh well, I guess having even a small improvement here is good enough
22:46 imirkin_: why did you expect clock speeds to materially affect the % change?
22:49 karolherbst: yeah, dunno. Doesn't make much sense that anything would change here
22:49 karolherbst: checking on kepler would be interesting
22:49 pmoreau: karolherbst: Didn’t SPIRV-LLVM-Translator used to install the llvm-spirv tool (to convert LLVM bytecode to SPIR-V, or vice versa)?
22:49 karolherbst: pmoreau: LLVM_BUILD_TOOLS or something
22:49 karolherbst: yeah. -DLLVM_BUILD_TOOLS=ON
22:50 pmoreau: I do have LLVM_BUILD_TOOLS=ON for my LLVM install.
22:51 karolherbst: and you expect that the llvm build system would add this to your spirv-llvm-translator build automatically?
22:52 pmoreau: Sure! :-)
22:52 karolherbst: ;)
22:52 karolherbst: think again
22:52 karolherbst: I even have to add -DBUILD_SHARED_LIBS=ON manually
22:53 karolherbst: allthough I think that's why my system llvm builds libllvm.so but the components are all static :/
22:53 karolherbst: *s/why/because/
22:54 pmoreau: There should be some instruction about the `LLVM_BUILD_TOOLS=ON`, because you will never see it if you look at the various CMakeFiles in that repo. It’s all hidden away behind LLVM's own macro that are being used.
22:54 karolherbst: true
22:57 airlied: karolherbst: shared libs should never be set with llvm itself, just it's dylib thing
22:57 airlied: but yeah it's annoying when building clang out of tree as weel, it doesn't get the right dylib flags
22:59 karolherbst: airlied: why though?
22:59 karolherbst: to big of a perf overhead having everything shared?
23:00 pmoreau: karolherbst: Also, I’m getting a nice segfault when runing `clinfo` with your branch “nouveau_nir_spirv_opencl_v5” on MCP79/G96. I’ll debug that tomorrow.
23:00 karolherbst: don
23:00 karolherbst: don't use that branch
23:00 karolherbst: pmoreau: nouveau_nir_spirv_opencl_hmm_v2 is the newest one more or less
23:00 karolherbst: cleaned up quite a lot and different approach
23:01 pmoreau: Ah, might help if I don’t run out-of-date stuff. I’ll try that instead.
23:01 airlied: karolherbst: building llvm as a single .so is fine, building it as lots of littleones causes all kinds of breakages
23:01 airlied: and BUILD_SHARED_LIBS does the latter
23:01 karolherbst: okay
23:01 airlied: lots of constructors and things seem to get confused
23:03 pmoreau: karolherbst: That `LLVM_BUILD_TOOLS=ON` worked great, thanks! ;-)
23:12 karolherbst: imirkin_: the biggest surprise is the +0.2% perf gain on my gk106. I kind of thought that impact would be smaller given that nvidia didn't turn it on from the start
23:19 pmoreau: karolherbst: I’m getting some compilation errors on your hmm_v2 branch: https://hastebin.com/bovuwowimu.sql
23:20 karolherbst: ohh wait, let me rebase everything
23:21 karolherbst: mhhh
23:23 karolherbst: might take a while
23:24 pmoreau: No worries, I’ll be hitting the bed anyway. I’ll give it another try tomorrow evening.
23:36 karolherbst: pmoreau: pushed
23:36 imirkin_: skeggsb: are the VGA registers present on boards with display fused off / zero monitors in DCB?
23:36 imirkin_: skeggsb: i'm thinking whether GF117 is super-special in not having them, or if any board without display won't hae them
23:45 imirkin_: s/monitors/connectors/ obviously