00:17 karolherbst: imirkin: hum, I have a weirdo bug where nouveau doesn't detect my GPU and doesn't print anything in the log
00:19 karolherbst: nvidia works...
00:31 karolherbst: yeah well nice, it works with insmod
01:26 xzhao: Is this donation page still valid? https://nouveau.freedesktop.org/wiki/HardwareDonations/
02:38 imirkin: Subv: the two go hand-in-hand
02:38 imirkin: xzhao: mostly we're missing manpower, not hardware
02:40 xzhao: imirkin: guessed that. I have been blocked by a freezing bug on nouveau+Wayland+plasma5 for ~6months
02:41 imirkin: xzhao: yeah ... my recommendation is (nouveau, plasma5) -- pick one.
02:42 xzhao: imirkin: or stay on X :)
02:42 imirkin: oh, only wayland? well, either way - lots of people have reported a variety of issues with plasma5
02:42 imirkin: i have never looked into any of them tbh
02:42 imirkin: setting it up on my machine is prohibitively complicated
02:43 imirkin: and, frankly, i'm just not that interested
02:44 xzhao: imirkin: so I guess it's basically because no nouveau developer uses plasma5?
02:44 imirkin: karol does, but it seems to work ok for him
02:45 imirkin: he tends to be on dual-gpu setups though, some of the issues tend to be different
02:45 imirkin: also dunno if he's using wayland
02:45 gnarface: does nouveau have some equivalent to this nvidia setting "ForceCompositionPipeline=On" ?
02:46 gnarface: i wonder if it might have something to do with why it works for some people and not others
02:46 imirkin: is that like TearFree in other drivers? if so, no.
02:46 xzhao: imirkin: thanks for the information. I am interested and may look into the problem. Would you mind if I ask you tech questions on nouveau?
02:46 imirkin: xzhao: sure, happy to answer
02:47 gnarface: imirkin: i don't think it's like tearfree, it's definitely distinct from vsync. i think it may be more about enforcing some rendering and/or draw ordering rules maybe?
02:47 imirkin: but while i'm happy to answer directed questions about how X or Y works, i'm less happy to answer general questions like "how do i debug things" or "tell me everything you know"
02:47 imirkin: gnarface: ok, well if i don't know what it does, hard for me to tell if nouveau does it or not.
02:48 imirkin: based on the name, sounds identical to what TearFree does though
02:48 xzhao: imirkin: Right. No worries I got that
02:48 imirkin: which is basically throw in a compositor
02:48 imirkin: [and nouveau does not do that... i don't care much for it, and it would appear that i'm the only person who gives a shit about xf86-video-nouveau]
02:50 gnarface: imirkin: their documentation just says: The NVIDIA X driver can use a composition pipeline to apply X screen transformations and rotations. "ForceCompositionPipeline" can be used to force the use of this pipeline, even when no transformations or rotations are applied to the screen.
02:51 imirkin: gnarface: yeah, sounds like TearFree.
02:52 gnarface: on some models with some versions of the driver this setting would also have the side-effect of magically fixing some problems with rendering gtk2 (gtk3?) windows
04:36 pmoreau: buhman: The prebuilt images can be found here: https://nouveau.pmoreau.org, but note that 1) no images have been generated recently due to some setup being broken, and 2) even some of the last generated images might not work.
05:09 perfinion: \part
05:09 perfinion: fail
05:43 buhman: pmoreau: https://github.com/hakzsam/archlive-nouveau/pull/11
05:44 pmoreau: I just saw that. :-)
05:47 buhman: pmoreau: 4.13 seems ancient; is there anything special about that build process that would prevent it from being hosted on circleci or similar?
05:47 pmoreau: buhman: Ah ah ah, I was going to comment on the use of “packages” instead of “images”, and you changed it before I posted the comment. :-D
05:48 buhman: :)
05:48 pmoreau: Hum, it needs a distribution that is at least Arch Linux based.
05:48 pmoreau: (For generating the image)
05:48 buhman: sure, that's easy
05:49 pmoreau: Other than that, it should be fine, I think.
05:51 pmoreau: It’s stopped building it because the images were slightly useless without Mesa being packaged in (due to a fail when building the Mesa package); I tried to fix it, but stopped when I having issues to get llvm-svn to finish building.
05:51 pmoreau: buhman: Could you please add a note that the building is currently broken? And I’ll merge it.
05:58 buhman: the build.sh depends on clean.sh which is not in the repository
06:03 pmoreau: And it’s quite possible the different scripts in the repo aren’t up-to-date with the version I use for building.
06:08 pmoreau: buhman: I just checked, and all the clean.sh are in the repo. Which ones would be missing?
06:10 buhman: my mistake; I replaced one of the `su` commands with `su - $user` while trying to make it work, which modified the working directory
06:10 pmoreau: Ah, I see. :-)
06:36 buhman: https://circleci.com/gh/buhman/archlive-nouveau/11 that wasn't so hard
06:37 buhman: it's probably going to break at some point I'm sure, but, there's archlinux-on-circleci anyway
06:40 buhman: does this actually need xorg-server 1.19?
06:40 buhman: X-ABI-VIDEODRV_VERSION=23
08:14 pabs3: imirkin: that 'GL apps freeze but X11 works fine' thing recurred for me with Linux 4.16.12-1 (from Debian)
08:15 pabs3: nothing in dmesg this time
08:16 pabs3: my GPU: 01:00.0 VGA compatible controller: NVIDIA Corporation GK107 [GeForce GT 740] (rev a1)
09:35 mwk: sigh
09:35 mwk:wishes whoever designed the NV10-NV40 vertex pipeline could decide on one endianness
09:43 mupuf: mwk: really :o?
09:43 mwk: mupuf: it's a fucking mess.
09:43 HdkR: mwk: You mean you don't like working on a multi-endian supporting device? :P
09:43 mwk: I think whoever designed it was going for big-endian
09:44 mwk: and bits numbered from MSB up
09:44 mwk: the thing is, the rest of the GPU is designed as little-endian
09:44 mwk: so all sorts of evil things happen at the borders
09:44 mupuf: mwk: Is it possible that some of the components of the pipeline obbey the endianness selection you have at the PCI/AGP level?
09:44 mwk: and the big endian design is not exactly consistent either
09:44 mupuf: and enable big endian mode would make it more consistent?
09:45 mwk: some things start from MSB, some start from LSB
09:45 mwk: I particularly love XFMODE_T register set, which controls texture coordinate computations
09:46 mwk: here, they count bits from LSB
09:46 mwk: so bits 0-15 are texture 0, 16-31 are texture 1, 32-47 are texture 2, and so on
09:47 mwk: it sort of is a single 128-bit register
09:47 mwk: but the FE shadow of this register, which is used for context switching, is split into 4 32-bit little endian registers
09:47 mwk: but the split itself is big-endian
09:47 mwk: so 400fbc is part 0, ie. bits 96-127, ie, textures 6-7
09:48 mwk: 400fc0 is part 1, bits 64-95, textures 4-5
09:48 mwk: and so on
09:48 mwk: HdkR: I wouldn't mind a multi-endian supporting device if it could keep to one endianness at a time
09:48 HdkR: :)
09:48 mwk: right now I'm going mad trying to figure out which bit goes where
09:49 mwk: mupuf: some of the components obey the endianness selection, that is true
09:49 mwk: but XF is most definitely not one of them
09:50 mwk: the endianness selection only really affects GPU at the bounduaries... fetching textures, writing surfaces
09:50 mwk: while XF is an internal unit
09:51 mwk: mupuf: FWIW I randomize the state of the PGRAPH endian switch for each test case, so far I only saw it affect a couple of things
09:52 mupuf: mwk: oh well. I guess they were not big believers in modular designs and simple interfaces back in the days
09:53 mwk: "modular designs and simple interfaces" isn't really how you make a fast GPU...
09:54 mwk: and XF is quite "modular"... as in, it works quite differently to every other unit of the GPU
09:55 mwk: eg. it doesn't believe in state bundles, which all other units use for config setting
09:55 mwk: and instead it rolls its own XFMODE thing, which behaves like state bundles, except it's not state bundles
09:56 mwk: matter of fact, it's so modular that it appears to be one of the few pieces of desktop GPUs that made it unchanged to old Tegras
09:57 mupuf: what does XF stands for by the way? Transfer ...
09:57 mwk: transform
09:57 mupuf: ha, right
09:57 mupuf: that makes more sense
09:58 mwk: it's of course accompanied by the LT unit
09:58 mwk: which does lighting
09:59 mwk: it seems nvidia cannot quite decide if XF refers to the whole thing (transform and lighting) or just transform
10:00 mwk: but then, my terminology information comes from filed patents and rules.xml, neither of which is particularly readable
10:00 mwk: or sane
10:19 mupuf: mwk: hehe
17:38 Subv: is there some documentation about what some of the flags in the envydis output mean? for example, what does the 'x' flag mean in this case (lop32i)?
17:38 Subv: { 0x0400000000000000ull, 0xfc00000000000000ull, OP8B, T(pred), N( "lop32i"), T(0400_0), ON(57, x), ON(52, cc), REG_00, ON(55, inv), REG_08, ON(56, inv), U32_20 },
17:39 imirkin_: same as it means with nvdisasm
17:39 imirkin_: .X = consume cond code
17:39 imirkin_: .CC = set cond code
17:40 imirkin_: how each instruction consumes the code code, or what criteria it uses to set it are up to the instruction
17:41 Subv: does condition code refer to the predicate or something else?
17:42 imirkin_: something else.
17:42 imirkin_: it's a 1-bit flag, much like a predicate, but a condition code :)
17:42 imirkin_: there's only one of them
17:42 imirkin_: so there's not a separate (true) register file for it, like there is with predicates
17:43 Subv: huh, interesting
17:44 Subv: thanks, is there a place where i can read more about this condition code btw?
17:44 imirkin_: nope
17:45 imirkin_: Lyude: friendly ping on DP-MST + xf86-video-nouveau
17:46 Subv: heh
17:46 imirkin_: Subv: feel free to write docs about it
17:47 Lyude: imirkin_: swamped at work again :(, will take a while
17:49 imirkin_: can you define 'a while' in units of time? aka when should i bug you again?
17:49 Lyude: probably in a week
17:49 imirkin_: cool
17:49 imirkin_: sounds like your "a week" is my "two weeks" -- i.e. a period of time so unimaginably long that everything one's working on now will have ended by then
17:50 Lyude: hehe
17:50 Lyude: yeah, tbh i kinda gave myself too much work so now i am currently in the process of draining my workqueue
17:50 Lyude: lessons learned at least
18:11 pendingchaos: imirkin_: is "condition code" the preferred word for it? or is both "carry flag" and "condition code" used?
18:11 imirkin_: well, it has nothing explicitly to do with carry
18:12 imirkin_: it just so happens that IADD.CC will stick the carry bit into it
18:12 imirkin_: but what does e.g. LOP.XOR.CC do
18:13 pendingchaos: I think I'll update my iadd3/xmad doc PR to use "condition code" instead of "carry flag"
18:14 imirkin_: it's definitely carry-like
18:14 imirkin_: on nv50 it used to be a fuller eflags-type register
18:14 imirkin_: (and there were 4 of them)
18:14 imirkin_: iirc 4-bit each
18:15 imirkin_: but there were also no predicate regs
20:37 karolherbst: imirkin_: the result of the phi block depends on the block taken before or the entire path? just curiour on what codegen depends here
20:38 imirkin_: not sure what you're talking about
20:39 karolherbst: well I just saw that in nir the phi sources are annotated: vec1 64 ssa_59 = phi block_2: ssa_49, block_9: ssa_58
20:39 imirkin_: right
20:39 imirkin_: a phi source is always from some specific block
20:39 karolherbst: wondering if we could do something explicit like that in codegen
20:39 imirkin_: not quite that explicit
20:39 imirkin_: but phi sources are ordered by block incoming edge order
20:39 imirkin_: this is actually one of the things i hate about our phi nodes
20:39 karolherbst: sure
20:39 karolherbst: but I think we talked once about that
20:40 karolherbst: and that this is kind of not really nice to have it like that
20:40 imirkin_: coz then messing with the edges becomes an enormous fail
20:40 karolherbst: or generally the cfg edge ordering stuff
20:40 karolherbst: right
20:41 karolherbst: actually I have the plan to kind of rework the entire peephole stuff anyway, because it is kind of messy and doesn't give us enough information to make good decisions
20:41 karolherbst: but that's a different issue
20:42 karolherbst: imirkin_: what do you think about splitting the optimizations things into an analysis and a execute phase?
20:42 karolherbst: I think llvm actually does something similiar
20:43 karolherbst: but I had more like things in mind hwere we don't know if an opt is beneficial, because we don't know if a source can be dced away for example
20:43 karolherbst: so we would have to know if all uses of that source can do a similiar opt or something like that
20:46 imirkin_: maybe, i guess
20:46 imirkin_: it's all tricky
20:46 karolherbst: I know
20:46 imirkin_: and unclear how beneficial it would be
20:46 imirkin_: which is something worth thinking about
20:46 karolherbst: well we need to clean it up in some way anyway
20:46 imirkin_: we're not trying to be the best compiler in the world
20:47 karolherbst: in regards to running some opts within a loop or something
20:47 karolherbst: true
20:48 karolherbst: I wouldn't want to replace the current thing anyway, just something we could run after and replace bits while working on the "new stuff" or so
20:48 karolherbst: so it won't end up being a lot of work with no use
20:50 karolherbst: and things like the DCE pass can stay the same anyway
20:51 RSpliet: karolherbst: I'm all in favour of more clever decisions! But... it's all a trade-off. Shader compilers are sort-of-JIT (... sort of :-P), so run-time matters too
20:52 karolherbst: RSpliet: well we are already not good enough at compile time decisions
20:52 karolherbst: like we don't do some opts where the source has multiple uses
20:52 RSpliet: karolherbst: that could be silly :-D
20:52 karolherbst: what I had in mind was something like we analys against certain opts which result in sources getting removed
20:53 karolherbst: and then in the execution step we excute those depending on certain conditions
20:53 RSpliet: Equally, I'm curious about how much we can learn from running perf against shader-db using the nouveau compiler like robclark did for freedreno!
20:53 karolherbst: like: do if source can be removed after execution
20:53 RSpliet: http://bloggingthemonkey.blogspot.com/2017/08/about-shader-compilers-irs-and-where.html
20:53 karolherbst: codegen isn't significant
20:53 imirkin_: it's about 1/3rd
20:54 imirkin_: iirc it's 1/3 glsl ir, 1/3 glsl -> tgsi, 1/3 codegen
20:54 karolherbst: I think it is even less
20:54 imirkin_: obviously depends on the specific shader
20:54 karolherbst: sure
20:54 RSpliet: Ah so such a set-up works. That's good, because if a clever code-gen bumps that 33% to 80% we might be in trouble :-D
20:55 karolherbst: but I kind of like to ignore compilation perf due to having a shader cache
20:55 karolherbst: RSpliet: right
20:55 imirkin_: RSpliet: that'd mean that glsl ir and glsl -> tgsi got so much faster! :)
20:55 imirkin_: they went from 33% to just 10%!
20:55 imirkin_: 3x speed up
20:55 karolherbst: anyway
20:55 imirkin_:likes marketing math
20:56 RSpliet: imirkin_: in other news, hell just froze over!
20:56 karolherbst: I kind of like the idea of replacing the architecture we currently have and just run something super smart after it
20:56 karolherbst: or well
20:56 karolherbst: even just keep the trivial stuff
20:56 karolherbst: no need to really check if constant folding makes sense
20:56 karolherbst: or so
20:57 imirkin_: step 1: remove compiler. step 2: do something super-smart
20:57 imirkin_: i like that.
20:57 karolherbst: I know that we would be able to generate better shaders if we could run opts in loop for example
20:57 karolherbst: or that we are terrible in doing CFG based opts
20:57 karolherbst: where I see a lot of benefit doing smart CFG optimizations
20:58 karolherbst: imirkin_: https://imgs.xkcd.com/comics/repairs_2x.png :)
20:58 RSpliet: oh yeah, the peephole framework is wholely inadqeuate for code motion. They're different things that warrant different "frameworks"
20:58 imirkin_: there's a russian movie with a choice quote... "what one man put together, another can always take apart"
21:00 karolherbst: anyway, just wild thoughts generally
21:00 karolherbst: merging basicBlocks alone might be even allow us to do better opts
21:00 karolherbst: I don't know how many opts actually require sources to be in certain BBs
21:01 karolherbst: but we can end up with empty, unconditional branches and could just merge a few BBs