00:00karolherbst: but it's a bit more tricky as you can run into alignment issues
00:00fincs: Can't see past OP_MERGE
00:00karolherbst: but that's also due to the order of the opts
00:00karolherbst: eg with nir you don't get the OP_MERGE, because we can load the 64 bit value directly
00:00karolherbst: so.. with nir we can skip a lot of those OP_MERGE and OP_SPLIT ops
00:01fincs: Is there any way to do a piecewise 64-bit load directly without OP_MERGE then?
00:01karolherbst: not how the TGSI stuff works right now
00:01fincs: This is already past the tgsi translation code
00:01karolherbst: but we also need to be careful with alignment
00:01karolherbst: it's not that simple
00:01fincs: I'm talking, NVC0LoweringPass::loadResInfo64
00:02karolherbst: fincs: okay.. but then LoadPropagation should be able to opt this
00:03drathir: imirkin: thanks added to check list as well...
00:03karolherbst: I don't know if we allow that for 64 bit ops
00:04karolherbst: fincs: you need to debug TargetNVC0::insnCanLoad to see why that doesn't happen
00:05karolherbst: fincs: but if you talk about loadresInfo64 you talk about an int add, aren't you?
00:05fincs: It's doing u64 + u32 add
00:05karolherbst: can the hw do a 64 bit int add with a cb load at all?
00:05fincs: It's just IADD / IADD.X
00:06fincs: Basically I observe this
00:06fincs: MOV R0, c[0x0][0x140] ; MOV R1, c[0x0][0x144] ; IADD R2.CC, R0, R2 ; IADD.X R3, R1, RZ ;
00:06fincs: Those movs are superfluous, as IADD/IADD.X could just use c instead :\
00:06karolherbst: I guess so
00:07karolherbst: fincs: can you check if those are there right before loadpropagation?
00:07karolherbst: I am sure they aren't
00:07fincs: Hmm, how do I do that
00:07karolherbst: we do the 64 bit lowering quite late
00:07karolherbst: call prog->print() between the passes
00:08karolherbst: just print)=
00:09fincs: It's not printing anything
00:10fincs: Maybe I broke something
00:10fincs: Ah, of course
00:10fincs: NDEBUG lol
00:14fincs: Okay so this compute shader: https://github.com/switchbrew/switch-examples/blob/master/graphics/deko3d/deko_examples/source/sinewave.glsl
00:15fincs: This is what print() prints prior to LoadPropagation: https://gist.github.com/fincs/fde1f26b22588d11f122277ebd3cdce0
00:15karolherbst: so those are still 64 bit adds
00:19fincs: I think I'm barking up the wrong tree
00:19fincs: LDC.64 should be optimized to MOV if the offset is constant
00:20fincs: I undid the change in loadResInfo64 and apparently it *is* able to see past the 64-bit load and use c in IADD/IADD.X (!)
00:21fincs: It seems to like this better
00:21fincs: 46: ld u64 %r107d c17[0x140] (0)
00:21fincs: 47: add u64 %r108d %r107d %r90 (0)
00:21fincs: --> IADD R0.CC, R6, c[0x0][0x140] ; IADD.X R1, RZ, c[0x0][0x144] ;
00:21imirkin: fincs: there's a thing which says whether a particular op can load a value or not
00:22imirkin: i think we don't allow 64-bit consts anywhere
00:22imirkin: an integer 64-bit add gets broken up into 2x 32-bit things
00:22imirkin: in split64BitOp or something like that in build_util.cpp
00:22imirkin: which obviously can load the const
00:22imirkin: so ... perhaps just a bit of adjustment necessary to make it happen
00:22karolherbst: ohh, right
00:23karolherbst: we handle the load propagation there directly as well
00:23fincs: I see why I was observing LDC.64
00:23fincs: In a different shader I'm loading something from offset 0 within the ssbo
00:23karolherbst: fincs: you loaded the 64 bit value instead 2 32 bit ones? :p
00:23fincs: So it's doing LDC.64 R0, c[0x0][0x98] ; LDG.E R0, [R0] ;
00:23fincs: There's no add
00:23fincs: Which means it can't optimize into c
00:26fincs: The previous situation where it was doing the optimized load was a non-constant offset
00:26fincs: And it had to do IADD/IADD.X
00:26fincs: With a constant offset codegen prefers LD/ST with [reg+offset] instead
00:26fincs: Where reg is the 64-bit reg pair, that is
00:26imirkin: coz of the thing i mention
00:27fincs: I guess we need to add a pass to convert any remaining >32-bit non-indirect loads from cbuf into movs
00:29imirkin: or do the thing i said
00:29fincs: Not sure what's the best solution
00:30karolherbst: what imirkin said :p
00:31karolherbst: it's already there
00:31fincs: So changing loadResInfo64 and fixing the opt somewhere else is the better solution?
00:31karolherbst: why changing loadResInfo64?
00:31fincs: To turn the 64-bit load into 32-bit loads
00:32karolherbst: but then you might end up hurting other use cases
00:32imirkin: just tell nouveau it's ok to have 64-bit constbufs in the u64 add/mul ops
00:32imirkin: and everything will work itself out.
00:32karolherbst: yeah, and handle that in the lowering if needed
00:33karolherbst: it is already handled
00:33karolherbst: so yeah...
00:33fincs: nouveau is already optimizing 64-bit load from constbuf + add into IADD/IADD.X with c
00:33imirkin: so what's the issue?
00:33fincs: The problem is the case in which it doesn't need to do an add
00:33imirkin: what's wrong with LDC.64 R0, c[0x0][0x98] ; LDG.E R0, [R0] ?
00:33fincs: Because it loads the 64-bit value from the cbuf and uses it directly as the address of a LD/ST
00:34fincs: LDC.64 is slower than MOV + MOV because former is variable latency
00:34karolherbst: can you load with an address in a cb?
00:34imirkin: if you don't want that stuff to get joined
00:34imirkin: you can tell it not to
00:34fincs: There is already code which splits up non-indirect loads from cbuf into MOVs, but IIRC it runs before ssbo pointer loads are lowered
00:35HdkR: fincs: amortized cost the LDC.64 will beat the dual moves in most cases :P
00:35karolherbst: why did nvidia add an pointless ldc then..
00:35fincs: LDC.64+ is totally fine with indirect
00:35karolherbst: or not
00:36karolherbst: only for uniform values :)
00:36fincs: LDC itself (which is needed for indirect) is also variable latency
00:36fincs: HdkR: I've observed nvidia's compiler preferring mov/mov
00:36karolherbst: fincs: which doesn't proof it's slower
00:36karolherbst: but yeah
00:36HdkR: It does yes but for a different reason
00:36karolherbst: I think nvidia prefers movs
00:36fincs: Actually nvidia's compiler almost never wants to use the non-32bit loads/stores
00:37fincs: Even for stuff like AST/ALD which is var-latency anyway
00:37HdkR: "Different Reason"
00:37fincs: I guess register allocation related?
00:37karolherbst: fincs: the big problem we have in codegen is the order of opts
00:37fincs: The wide loads/stores force you to have consecutive registers
00:38karolherbst: so because we only goes once through that... sometimes ordering messes things up
00:38HdkR: Hardware behaviour related, not RA related
00:38karolherbst: either... clean it up with a new pass or don't care :)
00:38fincs: So what nouveau does right now is fine?
00:38fincs: If so, then I elect for "don't care"
00:38karolherbst: well.. you can always add more opt to clean that up
00:38karolherbst: thing is.. it doesn't matter all that much
00:39karolherbst: as long as it's not too often
00:39karolherbst: and with ssbos you are hurt by the global mem access already
00:39karolherbst: so that cb stuff really doesn't matter
00:39karolherbst: you can always write a microbenchmark and dump the perf counters though
00:39karolherbst: and then you have the numbers :)
00:40karolherbst: maybe it's a bigger impact than we think it is
00:40fincs: Another thing
00:40fincs: I simplified the dual issue logic to just "lol first insn needs to be ALU, second one needs to be non-ALU"
00:40karolherbst: probably that's good enough I guess
00:41karolherbst: I am in the process of reverse engineering the perf counters though
00:41karolherbst: then I can dig into that for real
00:42fincs: So basically it's just this atm: https://0bin.net/paste/fF2cQuUxLrNK8t4K#qqXOKmC+cpciqgXFji64xhlGfMcumtt0eWKISPLmYcx
00:42karolherbst: yeah.. looks way simplier
00:42karolherbst: I am just not sure if you can really dual issue with every non alu
00:42karolherbst: but maybe we can...
00:47karolherbst: I want to be able to upload any shader... :D
00:48karolherbst: would make reverse engineering this stuff way easier
00:48fincs: I kind of can with my setup
00:48fincs: (However you still need to wrap the raw shader code with the proper metadata/header/etc I define in my DKSH format)
00:49karolherbst: well, if you can verify every combination real quick then :p
00:49karolherbst: and verify it with the dual issue perf counter
00:49karolherbst: that would be great
00:49fincs: Someone is using raw sass to do hwtests: https://github.com/ReinUsesLisp/nxgputests
00:50karolherbst: that doesn't scale as well though
00:50fincs: (however they haven't implemented any scheduling stuff yet in their shader assembler)
00:50karolherbst: really want something testing every combination of stuff
00:50fincs: Need dynamic codegen for that :)
00:51karolherbst: or a crappy cl_mesa_shader_re extension :p :D dunno
00:51HdkR: Maxwell JIT time
00:51fincs: Would be cool to do Maxwell codegen tinkering tbh
00:51fincs: However I lack the time/skills needed to actually work on something like that
11:31karolherbst: interesting.. most of those static constants actually make sense
11:31karolherbst: llike 0.299000, 0.587000, 0.114000 which is apparently some RGB stuff
11:31karolherbst: used for luma calculation
12:31cyberpear: karolherbst: thrilled to see your gp108 workaround fix merged after 2+ years! https://github.com/torvalds/linux/commit/028a12f5aa829b4ba6ac011530b815eda4960e89
12:31karolherbst: yeah.. we still would like to know what's going on :D
13:17AndrewR: hi all. I compiled Blender 2.79b, and while it mostly works (after patching for python 3.7) - preferences (but also file open window) doesn't look right: https://ibin.co/5JmG6ab5I50O.png Anyone saw something like this?
13:34linkmauve: AndrewR, I think Blender is at 2.82 nowadays, maybe you should upgrade first?
13:36AndrewR: linkmauve, another recompile :}
13:39linkmauve: Why did you go for such an old version first?
13:41linkmauve: I have no idea if this is related to your issue in any way, but then at least you will know that it wasn’t a known fixed bug.
13:45Doeme: I usually also use a 2.79 build, since 2.82 is the most robust way of crashing nouveau since webgl got more widespread
13:47Doeme: which is incidentally why I joined here. But alas, I did not get around to debug it yet. Still on the lower side of my to-do list :/
13:59AndrewR: linkmauve, because my friend used it ....
14:00imirkin: Doeme: generically, webgl should work
14:00imirkin: Doeme: issues arise when multiple threads do GL concurrently
15:44karolherbst: :) https://github.com/karolherbst/mesa/commit/562c640d72746e5077e48ecc7e17b38df11a9bd0
15:50karolherbst: ohhhh crap
15:50karolherbst: heh.. wait
15:50karolherbst: doesn't matter
15:51Doeme: imirkin: webgl got really stable lately (with lately I mean: the last 3-4 years or so :). the concurrency issues sounds like something blender would run into, though.
15:52karolherbst: imirkin: any good idea how to prevent stuff like this? https://gist.githubusercontent.com/karolherbst/375c90f8c4721a2c8f8bd4cf1efedbab/raw/ea22b5aa11eb2d9167083ad0235212bdfe548692/gistfile1.txt
15:52karolherbst: check if usecount or something better?
15:53karolherbst: I guess I could check the use count of all potential propagation and choose the one with the lower count
15:53imirkin: karolherbst: yeah, that's a bitch...
15:53imirkin: i ran into issues like that
15:53imirkin: with address stuff iirc
15:53imirkin: where the "wrong" thing gets inlined
15:53karolherbst: I think it's easy though
15:54karolherbst: we can iterate over all sources and pick the one with the lowest use count
15:54imirkin: yeah, i think i do it if one has usecount 1
15:54karolherbst: maybe run DCE once before loadpropagation to have better numbers...
15:54imirkin: or osmethin
15:54karolherbst: yeah.. but then we are screwed in cases where we have 2 vs 3 as a use count
15:54karolherbst: but yeah..
15:55karolherbst: imirkin: maybe we should do the loop twice in load propagation
15:55karolherbst: first round, only fold in with use count 1
15:55karolherbst: in the second round everything
15:56karolherbst: heh.. this mad doesn't get optimized to the limm form btw
15:58imirkin: limm form requires regs to match up
15:58karolherbst: but RA tries its best though
15:58imirkin: it does
15:58imirkin: maybe not its best
15:58imirkin: but it tries =]
15:59karolherbst: I kind of like the idea of looping twice.. maybe that helps overall as well
17:01karolherbst: ehh.. that hurts more than it helps
17:11karolherbst: imirkin: if I do two rounds, but the first refuses to propgate values with just one use https://gist.githubusercontent.com/karolherbst/a97a4d5405f35ec50303e8263283f7f2/raw/3696610d14de6886f910ac54e10cdbdc9f79a6e5/gistfile1.txt
17:11imirkin: this stuff is tricky.
17:11karolherbst: I blame scheduling :p
17:12karolherbst: ohh.. right .
17:12imirkin: i did that for adds
17:12imirkin: your problem is slightly different i think
17:12imirkin: but could potentially be solved similarly
17:13karolherbst: it looks like a scheduling issue simply
17:13karolherbst: so if you propagate the value used more often and it gets eliminated (the load) you reduce live values
17:13karolherbst: so... mhh
17:13karolherbst: this is more of a "propagate the most used values, but only if we get rid of the initial load" kind of thing
17:13karolherbst: and both workarounds are just workarounds
17:14imirkin: i propagate the less-used values :)
17:14karolherbst: hurts gpr count
17:14imirkin: chances are the more-used value won't be able to get inlined anyways
17:14imirkin: but yeah - it's not exhaustive.
17:14karolherbst: but maybe we could check that.. will costs only a bit of CPU
17:16karolherbst: the annoying part is, that sources can be swapped
17:18karolherbst: maybe we should approach this pass differently: 1. find each load 2. calculate which load can be eliminated also by swapping sources 3. sort by use count 4. start with the highest one and "repair" the list as we go
19:00polm: Hello, I recently have had trouble starting X and it seems to be something wrong with nouveau. I'm getting an error like the one at the top of the TroubleShooting page (drm failed to open device) but I don't seem to have any of the problems mentioned there.
19:01polm: Here's my xorg.log: https://gist.github.com/polm/c0006523307853caad169fd29e751e40
19:03polm: It seems like KMS is setting an invalid mode on my monitor. Using "nomodeset" lets the framebuffer work, but I can't get any "video=" option to work.
19:10karolherbst: polm: why are you using video= in the first place?
19:12polm: I was not using video= to start with, but it was mentioned in TroubleShooting, so I tried it.
19:12polm: I am not using it at the moment.
19:12karolherbst: polm: mind sharing your dmesg?
19:13polm: sure, here's the parts that mention nouveau (let me know if there are other parts I should look for): https://gist.github.com/polm/d14b89bd15ed1e8f3b6bc32afdc36c5b
19:20karolherbst: polm: what's your /etc/X11/xorg.conf?
19:21polm: This is what is is now: https://gist.github.com/polm/bd5bf9480cdc28ad73cfd6fc7398b949
19:21RSpliet: polm: mind just dumping the entire dmesg. There could well be other non-related hints in there
19:22polm: OK, will dump all of dmesg. I have a ton of audit messages from cron though
19:22karolherbst: polm: well. remove that xorg.conf file
19:22polm: I also tried using no xorg.conf and the minimal four-line one mentioned in troubleshooting.
19:22karolherbst: that's just nvidia generated garbage, no?
19:22polm: ah, the header is really old but I rewrote most of it over time
19:23polm: One part is for marble mouse support
19:23karolherbst: I see nothing of value in it
19:23karolherbst: isn't that picked up automatically?
19:23polm: I believe the refresh rates on my TV had issues with xrandr too
19:23karolherbst: you should get rid of the driver sections at least
19:23polm: Some of the minor buttons don't work without config
19:24karolherbst: usually a libinput bug then though
19:24karolherbst: anyway, the GPU device shouldn't be needed
19:24karolherbst: and the mouse one should be fixable through libinput
19:25karolherbst: the screen is also filled with random stuff...
19:25karolherbst: but if you say that a removed config won't help
19:25RSpliet: I don't see how that will solve [drm] Failed to open DRM device for pci:0000:02:00.0: -19 though
19:25imirkin: polm: so you have an internal panel which is 1280x720, but nouveau thinks it's 1920x1080?
19:26imirkin: nomodeset = "disable graphics completely"
19:26imirkin: so no accel no nothing
19:26polm: Here's xorg.log with no xorg.conf: https://gist.github.com/polm/fb32c36faa665a1e12c834735a40f05d
19:26RSpliet: open /dev/dri/card0: Permission denied
19:26RSpliet: I mean, that's a pretty descriptive error
19:26polm: yeah, if I use nomodeset the framebuffer/console is fine. If I don't use nomodeset, when the login is displayed my tv gets an unsupported signal
19:27imirkin: ok, so your TV is reporting a bad EDID then
19:27imirkin: saying it can do things that it can't =/
19:27imirkin: using video= will override the logic that picks the highest mode for the console
19:27imirkin: but Xorg will then happily use that highest mode again
19:28polm: I see
19:28karolherbst: imirkin: can this cause that ENODEV though?
19:28imirkin: it seems like you're also having permissions issues, and you appear to be using systemd/sddm/whatever, so unfortunately my expertise is limited there
19:28karolherbst: I guess it can..
19:28imirkin: well, the thing could be not there
19:28imirkin: or you don't have permissions
19:28imirkin: and you're not using that shared manager thing
19:28karolherbst: polm: is X started as a user or root?
19:28polm: as a user
19:28karolherbst: mind trying as root?
19:28RSpliet: Is /dev even mounted? Lots of interesting questions :-D
19:28polm: oh dear :P, let me try as root
19:29karolherbst: I guess some permission messup can mess it up quite good
19:30polm: xorg.log after running as root: https://gist.github.com/polm/e0d8f97c34c96cff2917b451c21041fe
19:31RSpliet: But was it working? It looks a lot better
19:31imirkin: VIC 5: 1920x1080i 60.000 Hz 16:9 33.750 kHz 74.250 MHz (native)
19:31imirkin: VIC 16: 1920x1080 60.000 Hz 16:9 67.500 kHz 148.500 MHz
19:31imirkin: perhaps there's an issue with the interlaced mode.
19:31polm: I still can't see anything and the xorg process isn't running
19:32polm: OK, that would make sense. the default mode with xrandr always had issues
19:32imirkin: could be the driver doesn't do something properly
19:32imirkin: 1920x1080i should definitely be supported by a TV
19:32karolherbst: imirkin: on a new enough one..
19:32RSpliet: I recall someone mentioning fixes for that
19:32polm: Is it possible there's something weird about the refresh rate?
19:32karolherbst: forget that
19:33imirkin: anything's possible
19:33karolherbst: interlaced is broken with DP, but that doesn't matter
19:33karolherbst: it will just mess up the picture
19:33polm: the tv is just showing "unsupported signal" this whole time
19:33karolherbst: polm: it seemed like X started and stopped directly
19:33karolherbst: .. soo
19:33karolherbst: dunno why it stopped
19:34imirkin: the EDID decodes like this: http://paste.debian.net/plain/1141617
19:34imirkin: polm: you can specify a different default mode
19:35imirkin: or as root, do DISPLAY=:1 xrandr -s 1280x720
19:35imirkin: DISPLAY=:0 xrandr -s 1280x720
19:35karolherbst: imirkin: X doesn't run ;)
19:35imirkin: X runs fine
19:35karolherbst: last line
19:35imirkin: no client
19:35imirkin: or i dunno
19:35imirkin: maybe he quit
19:36polm: I didn't quit
19:36karolherbst: polm: how did you start X?
19:36karolherbst: startx or Xorg?
19:36polm: I am typing startx
19:36karolherbst: try "Xorg" instead
19:36karolherbst: startx is weird...
19:36imirkin: ah, probably an issue in your .xsession or whatever
19:36karolherbst: it doesn't like missing applications and stuff
19:37polm: oh, now X is running, still can't see anything. I'll try xrandr
19:38karolherbst: or DISPLAY=:0 glxgears or so :p
19:38polm: hm, "unsupported signal" went away and I have a black screen
19:38karolherbst: some displays are...
19:38imirkin: move mouse around
19:38karolherbst: go off if the picture is all black
19:38karolherbst: imirkin: I know displays where even that isn't enough
19:38RSpliet: X.org just gives a black background without clients no?
19:39imirkin: used to do that mask thing
19:39imirkin: but they got rid of it =/
19:39RSpliet: "that mask thing"? N95?
19:39polm: glxgears showed up
19:39imirkin: i think you can still enable it
19:39polm: Thank you!
19:39imirkin: RSpliet: yes, N95.
19:39imirkin: RSpliet: but there was a shortage, so they turned it off.
19:39karolherbst: well, glad to figure out it's not our bug :D
19:39RSpliet: polm: so there you go, 1 permission issue and 1 EDID issue, but your GPU is running fine!
19:40polm: that's a relief
19:40polm: At least now I know what's going on
19:40RSpliet: Didn't expect otherwise from a GT218
19:40karolherbst: polm: for a rootless X you need some kind of login manager handling all the werid shit
19:40karolherbst: it's weird
19:41karolherbst: well.. login manager with logind support at least
19:41karolherbst: plain X won't do
19:41karolherbst: or maybe it does?
19:41polm: Aaah ok
19:41RSpliet: It worked on Fedora before they switched to Wayland
19:41karolherbst: mind just need a special group
19:41RSpliet: So there must be a way
19:41karolherbst: making it pointless to run as a user
19:41polm: I've always used a plain login with no manager, but maybe Arch added something
19:42RSpliet: polm: can you just get the permissions from /dev/dri/card0?
19:42karolherbst: usually video group
19:42RSpliet: owner, group, rwx, that kind of stuff
19:42polm: crw-rw---- 1 root video 226, 0 Apr 21 03:50 /dev/dri/card0
19:42karolherbst: adding your user to the video group makes it all pointless in the end
19:42karolherbst: at least better as full root though
19:42polm: ah, sure enough I'm not in it
19:42polm: the video group
19:43karolherbst: with logind the thing usually goes like that: login manager starts as a user with access to the card* files, if user log ins, it starts a second X server with handed over fds under the users user
19:43karolherbst: so you are all safe
19:43polm: aaah, ok
19:43karolherbst: but I think only gdm supports it
19:43karolherbst: so everything else is less secure :p
19:44polm: I think I can sort it out from here, but thank you all so much!
19:44RSpliet: polm: cool; good luck!
19:44karolherbst: the point is to remove access to card* from your user, so mailicious softare just can't do crappy shit with it
19:44karolherbst: yeah.. good luck :)
19:44RSpliet: Well, that's one point. The other point is not being able to gain privileges when you find a bug in X.org. Which I'm sure there are plenty
19:44imirkin: polm: also you have to be root to do modesetting.
19:44imirkin: doesn't matter if you have rw perms on the device node
19:45karolherbst: imirkin: modesetting as in changing resolution?
19:45imirkin: (or rather, CAP_SYS_ADMIN)
19:45karolherbst: why would you?
19:45imirkin: why would you what?
19:45imirkin: to become drm master you need it.
19:46karolherbst: need priviliges for changing the resolution...
19:46karolherbst: ah yeah..
19:46karolherbst: might work through logind for me
19:46imirkin: the process that takes master must be CAP_SYS_ADMIN
19:46karolherbst: at least my wayland compotior runs under my user and my user doesn't have that privilige :)
19:46imirkin: if it then passes file handles around, that's it's own business
19:46RSpliet: funny how logind now sounds like the daemon you want to find security bugs in rather than X.org
19:46imirkin: yes. funny.
19:47imirkin: who could have predicted such a thing
19:47karolherbst: RSpliet: yep
19:47karolherbst: at least that's a step forward, as X has the bigger attack surface :p
19:47karolherbst: and I am sure loginds code is less horrible than Xs
19:47airlied: yeah logind is missing completely overflowable protocols and rendering stacks
19:48karolherbst: yeah.. X is something alright