01:54rhyskidd: skeggsb: do you have a mmiotrace from turing that you're able to share?
04:08imirkin: skeggsb: https://bugs.freedesktop.org/show_bug.cgi?id=108980#c11 -- volt speedo not available on gf117?
04:16imirkin: skeggsb: hmm - i see SPEEDO accessed in nvd9 and nvcf but not nvd7 in our mmiotrace list
04:24skeggsb: not a clue, gf117 is probably the worst supported gpu we have tbh
04:25skeggsb: it's a weird mixup that i *know* we get wrong in a couple of places
04:25skeggsb: (most notably: the gk104 memory clock code should actually be gf117...)
04:33imirkin: skeggsb: good one. it does read 3a8, which contains a speedo-looking value
04:34imirkin: no mention of 0x122634 though
04:36blipk: who can I give my GTX1060 too so they can make nouveua work with it?
04:36skeggsb: yeah, it's a weird mix between fermi and kepler, and we get it wrong
04:37blipk: I'll trade for a RX 580
04:37imirkin: blipk: should already work
04:39blipk: it does not
04:39blipk: not with my 4K monitor over display port anyway
04:39blipk: xrandr doesn't pick up any devices besides 'default' and it won't set to native res
04:39imirkin: sounds like nouveau doesn't load
04:40blipk: wheres the log for that
04:40blipk: I tried on 3 different debian variants
04:40imirkin: ah. therein lies your problem
04:40imirkin: debian ships a kernel that's too ancient, in all likelihood
04:41blipk: yeah I'm on 4.9
04:41imirkin: try 4.19
04:42imirkin: which was released after those GPUs came out, rather than before
04:42blipk: I've got to learn how to upgrade kernel
04:43blipk: I guess I'll just try a non-debian OS next
04:43imirkin: with 4.20, you should even be able to get 4k@60 over hdmi
04:43blipk: I like my display port cable though
04:43blipk: and not sure if my HDMI cables are 2.0
04:43imirkin: not sure if that's a thing
04:44imirkin: either way, DP should work fine too
04:44blipk: or whatever the latest is
04:44blipk: I was reading the old HDMI versions won't support 4k@60
04:44blipk: I'll try a new kernel I guess
04:44imirkin: yeah, but that's a transmitter/sink thing. i don't think cables are different.
04:45blipk: most cables are marketed as 1.0/1.2 compatible
04:45airlied: I think some low qual hdmi 1.0 cables can't do hdmi 2.0
04:45airlied: at least one of mine causes blinky screens
04:45imirkin: probably wouldn't have passed hdmi certification before either
04:45blipk: they won't do 60Hz at certain res
04:45airlied: they just don't need to be directional and gold plated
04:46imirkin: however it's not like any are checked in the first place
04:46karolherbst: imirkin: well, the volt code was written with kepler in mind, might be a different reg on fermi :)
04:46imirkin: karolherbst: the gf100.c code looks at a different reg
04:46imirkin: karolherbst: but gf117 (in traces) looks at the gk104 fuse register
04:47karolherbst: ohhhh, I remember
04:47karolherbst: I've added that, right
04:47karolherbst: imirkin: interesting
04:47imirkin: i.e. it wants fuse 0x3a8
04:47imirkin: instead of 0x1cc
04:47imirkin: but all the fuse readout logic around it is still the regular fermi fuse readout logic
04:48karolherbst: afaik you can change the volt class to gk104 and it should just work
04:48karolherbst: I've added the volt classes for the speedo stuff
04:48imirkin: but ... close
04:48karolherbst: ohh, pmu voltage
04:48karolherbst: right :/
04:48imirkin: and i'm talkign specifically about speedo value readout
04:49karolherbst: I kind of remember us having this discussion way in the past
04:49imirkin: about gf117?
04:49imirkin: at least i don't remember
04:49karolherbst: what's worth it, we have to rework some bits of that anyway
04:49karolherbst: I've found a vbios with a reference to a i2c connected PWM :/
04:49imirkin: well, i just want to avoid the MMIO read error
04:50imirkin: skeggsb: do you want me to add a gf117.c for this silly thing, or can i just stick a bit of logic into gf100.c?
04:51imirkin: [imho things have been over-split...]
11:47RSpliet: Oh interesting. There's an NVIDIA patent describing how hardware masks the cost of register bank conflicts using an "operand collector". That explains why avoiding bank conflicts in RA didn't have a major performance implication
12:09mupuf: RSpliet: interesting!
12:12RSpliet: It is! The timing of the patent is such that it might be tech implemented for Tesla already...
12:14mupuf: Tesla, as in geforce 8?
12:24RSpliet: It's blindingly obvious once you see it ;-) The order of fetching operands for a single instruction doesn't matter, so if you look at multiple instructions at a time, odds are you'll be able to issue a read to every bank in every cycle. If each instruction comes from a different warp, you don't have to worry about hazards. The only downside I guess is that forwarding becomes an absolute nightmare... let's see if the patent says an
12:56AndrewR: so, plain wine 4.0-git + d3d (nine) patch also failed to run mafia2 (but run those ATI demos, with nine enabled ..still, nothing like 2-3x fps increase) ..
12:57AndrewR: patch from https://bugzilla.freedesktop.org/show_bug.cgi?id=108263 also had no positive impact, at least on this particular game
12:57diogenes_: AndrewR, what about steam proton?
12:58AndrewR: diogenes_, well, if I find slackbuild for it :}
12:58diogenes_: AndrewR, haha slack omg
13:48AndrewR: "nv50_constbufs_validate:62 - user constbufs only supported in slot 0" - while running 3dmark05 with current mesa and nine-patched wine
13:52imirkin: AndrewR: grrr ... if nine is trying to feed user constbufs to other slots, we're in trouble
13:52karolherbst: uhhh :/
13:52imirkin: although i don't think that's possible
13:52karolherbst: "user constbufs" is the uniform one?
13:52imirkin: karolherbst: no, client-side
13:52imirkin: vs in a pipe_resource
13:54karolherbst: imirkin: btw, I think I will look into reducing the amount of convert instructions, should have a significant impact.
13:54karolherbst: pendingchaos: were you looking into optimizing neg/abs -> add? otherwise I would do that?
13:55pendingchaos: you can do that
13:56karolherbst: merging f2i + i2f will be interesting as well
13:56imirkin: karolherbst: iirc i did that
13:56imirkin: or at least thought about doing that ;)
13:56karolherbst: :) which one?
13:57imirkin: have a look at that and a few following functions
13:57imirkin: i handle a few cases.
13:58imirkin: specifically f2i(trunc) is handled
13:58karolherbst: imirkin: ohh, I meant two things: fneg/fabs -> fadd neg/abs a 0 and ineg -> iadd neg a 0
13:58karolherbst: and f2i + i2f -> f2f with integer rounding mode
13:59imirkin: that's dangerous
13:59karolherbst: the second one?
13:59imirkin: e.g. float is out of range of the int
13:59karolherbst: mhh, yeah
13:59karolherbst: but that's undefined afaik
14:00imirkin: yeah dunno - like i said - dangerous :)
14:00imirkin: you could try it
14:00imirkin: but ... dangerous
14:00karolherbst: casting high value floats to int leads to all kind of funky results anyhow
14:00imirkin: check to see what other driver folks think of such an optimization
14:00karolherbst: I could check what nvidia does
14:00imirkin: as for the fneg/fabs thing ...
14:01imirkin: you could do it as a post-RA thing even
14:01karolherbst: ohh, right
14:01karolherbst: should be harmless
14:01imirkin: the only reason not to do it at emission time is that it will complicate the scheduling calculations
14:01karolherbst: but I would have to insert r63/r255 then, right?
14:02imirkin: rZero iirc
14:02karolherbst: I was thinking about late algebraic opts
14:02karolherbst: but we could do the super late stuff also post ra
14:07karolherbst: imirkin: mhh, seems like nvidia doesn't merge those :/
14:09karolherbst: but I only checked with cl kernels, maybe for glsl that would actually be fine
16:53karolherbst: imirkin: I am wondering if it's worth adding a convert instruction count to the shader stats
16:53karolherbst: doesn't matter how you look at those, you want to have as few as possible of them
17:26karolherbst: mhh, rZero is only there in NVC0LegalizePostRA
17:28imirkin_: which is where it would go
17:29imirkin_: so it works out nicely.
17:29karolherbst: huh? you want to have that while legalizing?
17:30karolherbst: thought we could just add a pass to Program::optimizePostRA
17:32imirkin_: legalize = make the IR be more emittable
17:32imirkin_: this seems like a reasonable place to do the cvt -> not cvt stuff
17:33karolherbst: well, it does't make the IR to be more emittable
17:33imirkin_: just makes the emissions better. seems reasonable :)
17:33imirkin_: rZero is a pretty platform-specific concept
17:34karolherbst: right. I can put the logic into BuiltUtil or wherever anyway. Just have to be careful about chipset specific rules
17:34imirkin_: you can also just stick an immediate zero and let the pass that does it fix it up
17:34karolherbst: like on Tesla we don't have abs on the adds
17:34imirkin_: i forget where those run -- presumably in legalize too
17:34karolherbst: postRa opts are after legalize
17:34imirkin_: ah, so too late.
17:34karolherbst: yeah :/
17:34imirkin_: yeah, i really think you want legalize
17:34imirkin_: since it's platform-specific anyways
17:35karolherbst: on tesla we could do max(abs a, abs a) ...
17:36imirkin_: or nothing at all - i dunno that it's such a hit there
17:36karolherbst: mhh, let's see what nvidia decides to do :)
17:36karolherbst: allthough I guess my installed cuda tools are way too new
17:38karolherbst: mhh sm_30 is max
17:38karolherbst: imirkin_: by any change, do you know the highest cuda version with tesla support?
17:39imirkin_: pretty sure its still supported in nvdisasm
17:39imirkin_: that's the only bit i've ever used :)
17:39karolherbst: well I want to compile a cl kernel to sass :/
17:40karolherbst: seems like 7.x dropped tesla support
17:40karolherbst: much easier to go the cl -> ptx -> sass route than doing mmts :)
18:04YuGiOhJCJ: hello, I am using xf86-video-nouveau-1.0.15 with a GeForce GTX 1060, when I do 'startx' I see that this card is not in the supported list https://pastebin.com/FDbNKQSs (Xorg log file) that's weird because I see that it is listed on this page: https://nouveau.freedesktop.org/wiki/CodeNames/#NV130 is it because 2D features are a WIP as in this page? https://nouveau.freedesktop.org/wiki/FeatureMatrix/ or just because I need to use th
18:04imirkin_: YuGiOhJCJ: that's just a fixed string in the driver that hasn't been updated
18:04imirkin_: should probably remove it...
18:07YuGiOhJCJ: oh you mean that eventually my card is supported by this driver version?
18:07imirkin_: it already is
18:07imirkin_: just that string hasn't been updated in ages
18:07imirkin_: coz ... no one looks at it :)
18:08YuGiOhJCJ: but I think I have an issue loading the driver, because I am in 1024x768
18:08imirkin_: that may well be
18:08imirkin_: but it's not because there's a lack of support in xf86-video-nouveau :)
18:09imirkin_: pastebin dmesg + xorg logs
18:10YuGiOhJCJ: https://pastebin.com/nK1zgPY7 (xorg)
18:10imirkin_: [ 17048.383] (EE) open /dev/dri/card0: No such file or directory
18:10imirkin_: and you're using fbdev
18:10imirkin_: so ... you're not getting nouveau kernel driver loaded
18:14YuGiOhJCJ: https://pastebin.com/0fqJmNbM (dmesg)
18:15YuGiOhJCJ: yes, I am on fbdev I don't know why
18:15imirkin_: [ 8.792605] nouveau 0000:01:00.0: unknown chipset (136000a1)
18:15imirkin_: you need a kernel that's not 2 years old
18:15YuGiOhJCJ: oh a kernel upgrade is needed
18:15karolherbst: imirkin_: okay, so on tesla nvidia uses F2F.F32.F32 for OP_NEG still
18:16imirkin_: karolherbst: my guess is that a part of it is increased ability to use short opcodes? dunno.
18:16YuGiOhJCJ: ok I will upgrade my kernel and see if it works better
18:16karolherbst: imirkin_: mhh same goes for sm_30 and sm_35 though
18:16imirkin_: i'd recommend something new. like 4.19
18:16imirkin_: karolherbst: so perhaps your test isn't sufficient?
18:16imirkin_: gotta have more code
18:17karolherbst: for maxwell it uses fadd
18:17imirkin_: without additional info, i'd kidna recommend doing whatever they do
18:17imirkin_: since they tend to know what they're doing
18:18karolherbst: imirkin_: mhhh, werid
18:18karolherbst: with cuda-8 they use f2f
18:18karolherbst: with cuda-10 they use fadd on fermi/kepler
18:18karolherbst: (cuda meaning ptxas here)
18:18karolherbst: abs.f32 is the ptx instruction
18:20YuGiOhJCJ: how GeForce GTX 1060 will be supported? will I get 1920x1080, 60 FPS and 3D acceleration?
18:20karolherbst: why did I test abs
18:21karolherbst: same with neg though
18:22karolherbst: YuGiOhJCJ: most likely not 60 fps for any demanding game
18:24imirkin_: YuGiOhJCJ: no.
18:25imirkin_: pick any 2 of those.
18:29HdkR: 320x240 60fps
18:30YuGiOhJCJ: so yes with 'glxgears' I will get 60 FPS in the window but my screen will be 1920x1080
18:50karolherbst: imirkin_: uff :/ I also have to deal with predicated abs/negs :/ that's kind of annoying at that point
18:51imirkin_: how is that more or less annoying?
18:52karolherbst: well, basically I want to move src0 to src1 and set src0 to 0
18:52karolherbst: but I ended up overwriting the pred source this way
18:52karolherbst: and wouldn't setSrc(0, zero) fails if predSrc is 0?
18:52imirkin_: first moveSources
18:53imirkin_: predSrc is always after the regular ones
18:53karolherbst: ohh, okay
18:53imirkin_: otherwise insanity ensues
18:53karolherbst: yeah, that makes it easier then
19:06karolherbst: nice: total cvt instructions in shared programs : 139343 -> 108443 (-22.18%) :)
19:07karolherbst: 50% in pixmark_piano... let's see how much of a difference that makes
19:17karolherbst: mhh 5012 -> 5024 points
19:17karolherbst: not as much as I expected, but for the fact we actually simply replace some instructions, it's still quite good
19:19karolherbst: did only take care of the most trivial things though
19:20karolherbst: neg(abs a) can be expressed as add(neg abs a) I think?
19:20karolherbst: maybe that's helpful as well
19:56karolherbst: "total cvt instructions in shared programs : 139343 -> 95856 (-31.21%)" now :)
20:01HdkR: Nothing like removing nearly 1/3rd the number of instructions
20:03karolherbst: instruction counts stays the same
20:03karolherbst: "total instructions in shared programs : 7614782 -> 7614782 (0.00%)" :)
20:04karolherbst: just less i2i and f2f
20:04karolherbst: but more adds ;)
20:04karolherbst: well + 0.2% perf in pixmark_piano
20:04karolherbst: unreclocked pascal GPU
20:10HdkR: big gains
20:19imirkin_: HdkR: for something simple like emitting an op differently? seems not bad.
20:19imirkin_: although pixmark_piano is super-shader-heavy...
20:23HdkR: Yea, it is quite nice for just emitting different ops in certain cases :D
21:15karolherbst: imirkin_: mhh, so I have those conversions:
21:16karolherbst: - abs(a) -> add(0, abs a)
21:16karolherbst: - neg(a) -> add(0, neg a)
21:16karolherbst: - neg(abs a) -> add(0, neg abs a)
21:16karolherbst: - sat(a) -> sat add(0, a)
21:16karolherbst: anything else comes to your mind?
21:25imirkin_: karolherbst: be careful to check that sType == dType
21:26karolherbst: imirkin_: mhh, do we actually allow type changes for those ops?
21:26imirkin_: for CVT? sure.
21:26imirkin_: or are you just checking straight-up ABS/NEG/etc?
21:26imirkin_: i think we might generate CVT's in places
21:26karolherbst: yeah, I check if op is one of ABS/NEG/SAT
21:27imirkin_: maybe not
21:27imirkin_: either way, those ops aren't allowed to have any funny business like that
21:27karolherbst: I can put the check there just in case, but I highly doubt it will change anything
21:28karolherbst: I am wondering if I should add the code for sat(abs/neg a) but nothing none of the shaders we have actually has this
21:29karolherbst: I guess it wouldn't hurt
21:29karolherbst: and I need to disable it for when we have immediates because you could disabled opts
21:29karolherbst: and limm forms usually don't support anything
21:30karolherbst: or I see no point checking stuff there
21:30karolherbst: mhh iadd supports sat with limm but not fadd
21:30karolherbst: oh well, doesn't matter
22:28karolherbst: imirkin_: on kepler using add should let us do more dual issueing than using abs/neg/sat
22:29karolherbst: like we can't dual issue two OP_CVT together
22:39imirkin_: karolherbst: ah cool
22:41karolherbst: mhh, only +0.2% perf on my gm204 fully reclocked :/
22:41karolherbst: oh well, I guess having even a small improvement here is good enough
22:46imirkin_: why did you expect clock speeds to materially affect the % change?
22:49karolherbst: yeah, dunno. Doesn't make much sense that anything would change here
22:49karolherbst: checking on kepler would be interesting
22:49pmoreau: karolherbst: Didn’t SPIRV-LLVM-Translator used to install the llvm-spirv tool (to convert LLVM bytecode to SPIR-V, or vice versa)?
22:49karolherbst: pmoreau: LLVM_BUILD_TOOLS or something
22:49karolherbst: yeah. -DLLVM_BUILD_TOOLS=ON
22:50pmoreau: I do have LLVM_BUILD_TOOLS=ON for my LLVM install.
22:51karolherbst: and you expect that the llvm build system would add this to your spirv-llvm-translator build automatically?
22:52pmoreau: Sure! :-)
22:52karolherbst: think again
22:52karolherbst: I even have to add -DBUILD_SHARED_LIBS=ON manually
22:53karolherbst: allthough I think that's why my system llvm builds libllvm.so but the components are all static :/
22:54pmoreau: There should be some instruction about the `LLVM_BUILD_TOOLS=ON`, because you will never see it if you look at the various CMakeFiles in that repo. It’s all hidden away behind LLVM's own macro that are being used.
22:57airlied: karolherbst: shared libs should never be set with llvm itself, just it's dylib thing
22:57airlied: but yeah it's annoying when building clang out of tree as weel, it doesn't get the right dylib flags
22:59karolherbst: airlied: why though?
22:59karolherbst: to big of a perf overhead having everything shared?
23:00pmoreau: karolherbst: Also, I’m getting a nice segfault when runing `clinfo` with your branch “nouveau_nir_spirv_opencl_v5” on MCP79/G96. I’ll debug that tomorrow.
23:00karolherbst: don't use that branch
23:00karolherbst: pmoreau: nouveau_nir_spirv_opencl_hmm_v2 is the newest one more or less
23:00karolherbst: cleaned up quite a lot and different approach
23:01pmoreau: Ah, might help if I don’t run out-of-date stuff. I’ll try that instead.
23:01airlied: karolherbst: building llvm as a single .so is fine, building it as lots of littleones causes all kinds of breakages
23:01airlied: and BUILD_SHARED_LIBS does the latter
23:01airlied: lots of constructors and things seem to get confused
23:03pmoreau: karolherbst: That `LLVM_BUILD_TOOLS=ON` worked great, thanks! ;-)
23:12karolherbst: imirkin_: the biggest surprise is the +0.2% perf gain on my gk106. I kind of thought that impact would be smaller given that nvidia didn't turn it on from the start
23:19pmoreau: karolherbst: I’m getting some compilation errors on your hmm_v2 branch: https://hastebin.com/bovuwowimu.sql
23:20karolherbst: ohh wait, let me rebase everything
23:23karolherbst: might take a while
23:24pmoreau: No worries, I’ll be hitting the bed anyway. I’ll give it another try tomorrow evening.
23:36karolherbst: pmoreau: pushed
23:36imirkin_: skeggsb: are the VGA registers present on boards with display fused off / zero monitors in DCB?
23:36imirkin_: skeggsb: i'm thinking whether GF117 is super-special in not having them, or if any board without display won't hae them
23:45imirkin_: s/monitors/connectors/ obviously