00:08 karolherbst: let me recheck
00:09 HdkR: 2/win 22
00:09 HdkR: ...
00:11 karolherbst: imirkin: nouveau doesn't load on that one :p
00:11 imirkin: so why skip?
00:11 karolherbst: I didn't skip that
00:11 karolherbst: git bisect bad 893e591b59036f9bc629f55bce715d67bdd266a2
00:12 imirkin: oh ffs
00:12 imirkin: sorry
00:12 imirkin: that is _very_ surprising
00:12 imirkin: i looked at the diff of EVERYTHING that merge commit brought in
00:13 imirkin: it does make changes to the fdt thing
00:13 karolherbst: the skips mainly won't compile.. which is just super annoying
00:15 imirkin: like what about them doesn't compile?
00:15 imirkin: it's just DT updates
00:15 imirkin: what's the issue?
00:15 karolherbst: they use a field which got added later
00:15 imirkin: but 893e591b59036f9bc629f55bce715d67bdd266a2 was _bad_
00:16 imirkin: or should it have been a skip?
00:16 karolherbst: yep
00:16 karolherbst: it's bad
00:16 karolherbst: not skip
00:16 imirkin: but e9a3bfe38e393e1d8bd74986cdc9b99b8f9d1efc is skip
00:16 imirkin: which is the top commit being merged into mainline
00:16 karolherbst: I think I made some mistakes
00:16 karolherbst: that's why I do another bisect with known good/bads
00:17 karolherbst: it only looks worse though
00:18 imirkin: git bisect good f3ca745d8a0e6ace1f91bd122f5bff0323ff6bd8
00:18 imirkin: # skip: [f3ca745d8a0e6ace1f91bd122f5bff0323ff6bd8] dt-bindings: usb: Convert DWC2 bindings to json-schema
00:18 imirkin: git bisect skip f3ca745d8a0e6ace1f91bd122f5bff0323ff6bd8
00:18 imirkin: # skip: [f3ca745d8a0e6ace1f91bd122f5bff0323ff6bd8] dt-bindings: usb: Convert DWC2 bindings to json-schema
00:18 imirkin: what's this about?
00:18 karolherbst: just wait a sec :p
00:18 imirkin: i think that might have tripped things up a bit
00:19 karolherbst: probably
00:19 imirkin: if it was actually a skip
00:19 karolherbst: but I have a new good/bad pair with 33 commits in between
00:19 imirkin: what's the range?
00:21 karolherbst: imirkin: https://gist.githubusercontent.com/karolherbst/1b040dce50c3e56391d3511ea4dbf315/raw/c4f30e312c21ac2d83701cbd404441b752cadd7b/gistfile1.txt
00:21 karolherbst: :/
00:22 imirkin: gimme a few
00:22 imirkin: finishing up some work stuff
00:24 karolherbst: mhh.. let me try to fake fix that stuff... but uff
00:24 karolherbst: I am sure it would break stuff
00:24 karolherbst: as it's in a driver my hw uses
00:33 imirkin: i seee....
00:33 imirkin: so ........
00:33 imirkin: the issue, i think, is the v5.5-rc2 is a bad base to begin with
00:33 imirkin: so it's not so much the dt-* commits
00:33 imirkin: as the fact that they're based on v5.5-rc2 which doesn't work for you at all
00:33 imirkin: so i think this whole bisect is basically wrong
00:34 karolherbst: but the merge is still broken :p
00:34 imirkin: yeah
00:34 imirkin: so here's what i'd do.
00:35 imirkin: take the range 7dce4d6f151de852925feb1dd6e42d91dab14951..893e591b59036f9bc629f55bce715d67bdd266a2
00:35 imirkin: and rebase them on v5.5
00:35 karolherbst: ohhh.. that might actually work.. yes
00:35 imirkin: i expect there would, at most, be very minor conflicts
00:35 imirkin: then manually merge that with 1c715a659a16e193a23051ddff4becdad8e18ba1
00:35 imirkin: if all goes well, the merge of those 2 should still be bad.
00:35 karolherbst: I just updated my tree on the last good one
00:35 karolherbst: so I will just bisect that on top of that
00:35 karolherbst: should cause less issues
00:36 karolherbst: just want to test that one poweroff fix I've gotten :)
00:36 imirkin: and then you can do a proper bisect.
00:41 karolherbst: ahhhh
00:41 karolherbst: cherry-picking merges is brutally annoying
00:41 imirkin: sigh
00:41 imirkin: that dt/linus thing?
00:42 karolherbst: yeah
00:42 imirkin: i'd just merge it in
00:42 imirkin: i don't think you can cherry-pick it
00:42 karolherbst: mhh, but then I can't bisect :p I mean, I need to adjust the cherry-pick range and replace the last commit by the actual last one
00:43 karolherbst: e9a3bfe38e39.. ahh
00:44 karolherbst: yay that works
00:44 karolherbst: error: commit db0d39aa7f92cc566b70913f40dbaacc8152a308 is a merge but no -m option was given. ehhh
00:44 imirkin: you can't cherry-pick it
00:44 imirkin: go around it
00:45 imirkin: i.e. cherry-pick up to it
00:45 imirkin: then do the merge
00:45 imirkin: then cherry-pick the rest
00:45 karolherbst: ohhh
00:45 imirkin: (and then merge to that netdev commit)
00:45 karolherbst: there is another merge :=
00:45 karolherbst: git cherry-pick --skip :p
00:46 karolherbst: as long as the build is broken I don't care really
00:46 imirkin: :)
00:48 karolherbst: nice
00:48 karolherbst: nouveau doesn't load :)
00:48 karolherbst: and it builds
00:49 imirkin: everything you've ever wanted :)
00:50 karolherbst: I just hope there aren't random boot issues
00:52 karolherbst: ahhh, this looks much better :)
00:52 karolherbst: only 7 commits left
00:54 karolherbst: uffff
00:54 karolherbst: I get a bad feeling about this
00:55 karolherbst: oh no :(
00:56 imirkin: ...
00:56 karolherbst: "of: Rework and simplify phandle cache to use a fixed size"
00:56 imirkin: he he he
00:57 imirkin: a bit too much simplification, i guess
00:57 karolherbst: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=90dc0d1ce890419f977e460b8258d25187dde64f
00:57 karolherbst: I guess so
00:57 imirkin: on the bright side, robher is pretty responsive
00:57 karolherbst: the heck.. and now what :D
00:58 imirkin: ping him on irc, he's in a lot of chans
00:58 imirkin: probably in #tegra
00:58 imirkin: in the meanwhile, try reverting it on top of v5.6 to super-confirm
00:59 karolherbst: yeah well
00:59 karolherbst: conflicts
00:59 karolherbst: mhh a trival one though
01:06 karolherbst: imirkin: yeah.. reverting this fixed it on top of 5.6.3
01:06 imirkin: ok
01:06 imirkin: yeah, so reach out to robher on irc or via email
01:06 imirkin: my impression is that he's pretty good about this stuff
01:12 karolherbst: soo.. no applying local patches back and maybe X even starts :)
01:12 karolherbst: nice
01:14 karolherbst: this smells like "the old code was very good at handling corner cases" kind of issue :/
01:14 karolherbst: nice
01:14 karolherbst: works
01:15 karolherbst: okay..
01:15 karolherbst: and that all just for upgrading to 5.6
01:16 karolherbst: imirkin: I got some nice tips in #tegra though
01:17 karolherbst: /sys/kernel/debug/regulator/regulator_summary
01:17 karolherbst: cat /sys/kernel/debug/pwm; cat /sys/kernel/debug/gpi as well
01:17 karolherbst: missing o
01:17 karolherbst: especially that regulator_summary thing
01:17 karolherbst: https://gist.githubusercontent.com/karolherbst/4186d66577301ace92d558a9eba694e2/raw/881d2d149265c969f64e7db879a59aa94f3676b7/gistfile1.txt
01:18 imirkin: it's almost like they know about these things :)
01:18 karolherbst: :)
01:23 karolherbst: well.. at least my config wasn't wrong
01:54 karolherbst: ohhh my god
01:54 karolherbst: imirkin: cache collision :(
01:54 karolherbst: or well
01:54 karolherbst: two nodes having the same handle
22:35 imirkin: karolherbst: if there's a test, i can look at it
22:36 imirkin: iirc i remember this ... it was looking at TexFormat instead of internalFormat
23:08 karolherbst: imirkin: I can look into it tomorrow and report back.. but yeah, I mean I have the patches to fix it as well... if there is a more straightforward solution that would be helpful :)
23:09 imirkin: a test with the problem would be good
23:09 imirkin: and a reminder where your patches are
23:10 karolherbst: it's quite out of date but "mesa: rename gl_format_info to mesa_format_info" up to "mesa/teximage: for es we have to check the internal format not what t… ": https://github.com/karolherbst/mesa/commits/cts_v3
23:11 imirkin: do they mention which test needs "fixing"?
23:11 karolherbst: the last patch really shows the issue though
23:11 karolherbst: copy image stuff
23:11 karolherbst: afaik
23:11 imirkin: like a specific one :)
23:11 karolherbst: yeah.. let me check locally
23:13 imirkin: so yeah, that change seems totally right
23:18 fincs: Btw, did anyone notice this? https://github.com/NVIDIA/open-gpu-doc/commit/e5e7baac2a3d5310d461c9db12be6e7401a4c2bc
23:18 fincs: Looks fun; kind of a shame it's only Turing
23:21 imirkin: lots of stuff being pushed out there of late
23:22 fincs: Last time (copy methods) it was something that already existed elsewhere though
23:22 karolherbst: fincs: let's say I knew it was coming :p
23:22 imirkin: this is the new MME ISA i guess?
23:22 fincs: Unless I'm mistaken, this is new?
23:22 karolherbst: imirkin: yes
23:22 imirkin: fincs: karolherbst has the inside track
23:22 fincs: Yeah I know lol
23:23 fincs: Still want to see love for Maxwell though
23:23 karolherbst: meh
23:23 karolherbst: :p
23:23 imirkin: i've given it fairly little so far, given the reclock situation
23:24 fincs: I meant open-gpu-docs
23:24 imirkin: ah
23:24 karolherbst: imirkin: huh.. it might be it's actually fixed now :O let me retest on my branch to figure out what was broken
23:24 imirkin: karolherbst: perhaps the format rework magically fixed it :)
23:24 karolherbst: yeah..
23:24 karolherbst: maybe
23:25 imirkin: by making that function do the right thing
23:25 fincs: I guess even if they do release something like 3d class methods for Turing, some stuff will have been inherited from earlier architectures and there's still useful stuff
23:25 imirkin: it reduced the number of various formats throughout
23:25 fincs: Also I have kind of a burning question - could there be undocumented mme opcodes
23:25 imirkin: fincs: i haven't looked _at all_ at volta/turing
23:25 fincs: Like, not all opcode numbers are used
23:25 imirkin: fincs: if it's not in envytools, it doesn't exist
23:26 fincs: But did anyone try setting that opcode number and see how hardware reacts? :p
23:26 imirkin: some very OCD folks tended to do good RE back in the day
23:26 imirkin: mostly mwk :)
23:26 karolherbst: fincs: thre is only one way to find out :p
23:26 fincs: ( ͡° ͜ʖ‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌├┬┴┬┴
23:26 fincs: (I made my own mme assembler with custom syntax btw)
23:26 imirkin: time well spent, i'm sure...
23:27 fincs: Kinda love mme
23:27 karolherbst: fincs: is it llvm based? :D
23:27 fincs: I said assembler, not compiler :p
23:27 imirkin: fincs: do you like my ARB_indirect_parameters macro?
23:27 fincs: Took me only like an afternoon to write the assembler
23:27 fincs: imirkin: You haven't seen my mme
23:28 fincs: Also nvidia implements indirect by doing some hardcore gpfifo trickery
23:28 mwk: fincs: I'm reasonably certain there are no other mme opcodes on the gpu I reversed it on
23:28 fincs: Not with mme
23:28 fincs: mwk: What gpu was that?
23:28 mwk: that said, it's been a few generations ago
23:28 mwk: Fermi
23:28 imirkin: fincs: indirect compute
23:28 fincs: I have Maxwell 2nd gen (Tegra X1 on Nintendo Switch)
23:28 imirkin: fincs: not indirect draw ... that has to be mme, i think
23:28 fincs: Nope
23:28 karolherbst: imirkin: fun.. so uhm.. on my branch less gles3 tests are failing
23:28 fincs: Indirect draw/compute are done with fun gpfifo entry trickery
23:29 karolherbst: but.. ehm
23:29 karolherbst: I didn't find the one it actually fixes
23:29 fincs: Did you know you can end a pushbuffer prior to the parameters, and submit another entry and that has the parameters?
23:29 karolherbst:blames himself
23:29 imirkin: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/mme/com9097.mme#n338
23:29 imirkin: i'm pretty proud of this one.
23:29 fincs: No mme necessary for indirect at all lol
23:30 imirkin: fincs: wtvr, that's for a single draw
23:30 imirkin: did you look at the indirect_parameters thing?
23:30 fincs: Indirect compute: https://github.com/devkitPro/deko3d/blob/master/source/maxwell/gpu_compute.cpp#L167-L187
23:30 imirkin: we don't use a macro for indirect compute at all
23:30 imirkin: (except on fermi)
23:30 karolherbst: ../../../modules/gles3/functional/es3fNegativeTextureApiTests.cpp sounds like the one I am searching for
23:30 fincs: You're doing multidraw right?
23:31 imirkin: fincs: multidraw, but not as simple.
23:31 fincs: Multidraw indirect then
23:31 imirkin: with a configurable number of draws inside a different buffer
23:31 fincs: NVN has that
23:31 imirkin: https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_indirect_parameters.txt
23:31 fincs: "MultiDrawArraysIndirectCountARB" Yeah so what I said
23:31 fincs: All draws are implemented with macros
23:32 fincs: And the multidraw ones are really long, heh
23:32 imirkin: so anyways, that macro does it all in one go. pretty happy with it.
23:32 fincs: Nvidia one has to spill registers into shadow ram iirc
23:32 imirkin: i spill into SCRATCH
23:32 fincs: Yup
23:32 fincs: That's it
23:32 imirkin: coz i'm weak.
23:33 imirkin: couldn't figure out the RA
23:33 fincs: Don't feel bad about it
23:33 imirkin: i think it _might_ be possible to reorganize it
23:33 fincs: Nvidia did the same thing
23:33 imirkin: but it's a lot of things to keep track of
23:33 imirkin: and not a lot of registers
23:33 fincs: There's only 7 gprs, heh
23:33 imirkin: you wish
23:34 fincs: Plus the special zero reg
23:34 imirkin: yeah, actually i guess it is 7
23:34 imirkin: i thought you lost another one
23:34 fincs: Nope lol
23:34 imirkin: but that's just the initial parameter is in $r1
23:34 fincs: Anyway
23:34 fincs: I like abusing delay slots and other tricks in mme :)
23:34 imirkin: did you read my code?
23:35 mwk: abusing delay slots?
23:35 imirkin: branz annul = "i have failed to use the delay slot for something useful"
23:35 imirkin: all the other ones are good
23:35 imirkin: also note that the whole delay slot thing isn't new... MIPS had that
23:35 fincs: I don't think I've used the annul variants at all
23:36 karolherbst: imirkin: heh..https://trello.com/c/iHvcHfoP/3-copyimage-with-rgba4
23:36 karolherbst: ohh wait
23:36 karolherbst: that's the GL one
23:36 karolherbst: forget it
23:36 imirkin: done!
23:37 fincs: I wonder if mme is expensive
23:37 karolherbst: ufff
23:37 karolherbst: I had htis card but that's based on my fix :D https://trello.com/c/ZnM10QHM/30-deqp-master-gles-30-master
23:37 imirkin: it's actually cheaper than dumping stuff into pushbuf
23:37 fincs: I kind of use mme a lot lol
23:37 imirkin: someone was asking me why a giant direct draw was more expensive than the same indirect draw
23:37 imirkin: or rather, not giant...
23:37 imirkin: more like very-instanced
23:37 imirkin: like 1MM instances or something
23:38 imirkin: cheaper to do let the GPU spin 1MM times than emit all that into the pushbuf
23:38 fincs: Okay so I guess I shouldn't feel bad
23:38 imirkin: probably not worth the overhead for like 2 instances though
23:38 imirkin: dunno
23:39 fincs: imirkin: I looked at your code, you're not using the exit flag on branch instructions, disappointing :)
23:39 imirkin: perhaps i didn't knwo about it?
23:39 fincs: Oh wait you are, but on a different macro
23:39 fincs: Good stuff
23:39 imirkin: you have to run out the parameters though
23:39 imirkin: you can't just leave stuff unread
23:39 fincs: I also use branch + exit as a way to do conditional exit
23:39 fincs: Because the branch cancels the exit if it's taken
23:40 imirkin: i do it in mme9097_query_buffer_write
23:40 fincs: And yes you do need to run out the params
23:40 fincs: Yup
23:40 fincs: Good stuff
23:40 fincs: I kind of do that all the time
23:40 fincs: Is annul slower than non-annul?
23:41 imirkin: it's just a way to not have to manually insert a nop after the thing
23:41 imirkin: i don't think it's any faster or slower
23:41 fincs: Why insert a nop when you can just load whatever the next block of code needs :)
23:41 imirkin: you just don't execute the thing in the delay slot
23:41 imirkin: the delay time slot still happens
23:41 fincs: Isn't that a pipeline stall?
23:41 imirkin: well -- sometimes the next thing is another branch
23:41 fincs: Where you are wasting time
23:41 imirkin: or there are multiple entries into the other thing
23:41 karolherbst: you know what.. I just run the full thing
23:41 imirkin: so it's not easy to preload it
23:42 imirkin: all the macros starting with NVC0_3D_MACRO_DRAW_ELEMENTS_INDIRECT is me, i think
23:42 imirkin: everyting before is not-me, presumably calim
23:43 imirkin: open to improving the mme code ;)
23:43 imirkin: oh - conservative raster was pendingchaos ... and i think he inspired my impl of the compute counter thing.
23:43 fincs: I think it's already pretty tight given what you've said
23:44 imirkin: with the shift+add style multiplication... heh
23:44 imirkin: what a waste. whatever.
23:44 fincs: Conservative raster is weird
23:45 fincs: It needs firmware calls
23:45 fincs: And it looks like you're not waiting for the fw call to finish
23:45 imirkin: and all that was just for for the stupid compute invocations counter in ARB_pipeline_statistics
23:45 imirkin: mmm... could be bugs, i dunno
23:45 karolherbst: uff yeah.. that one
23:46 imirkin: i don't know jack about it, and i think pendingchaos was new to the whole thing at the time
23:46 fincs: After you do the firmware call, you're supposed to poll the first scratch register and wait for it to be 1
23:46 imirkin: huh, i had no idea. with a macro? or?
23:47 fincs: I.e. 0xD00 (aka 0x3400)
23:47 fincs: Yes, in mme
23:47 imirkin: is that what the stupid DELAY things are supposed to be used for?
23:47 fincs: Read it in a loop and wait for it to be 1
23:47 imirkin: so that it doesn't end up sitting in a tight loop
23:47 fincs: Which delay thing?
23:47 imirkin: sec
23:47 fincs: Code I've observed is just a loop without a deadline, lol
23:47 fincs: Fun fact: Switch emulators not properly emulating MME/firmware calls made that loop an infloop
23:47 fincs: But I forced them to fix it
23:48 karolherbst: I am sure they had fun writing that mme emulator
23:48 fincs: They seem to dislike mme, and that makes me sad
23:48 imirkin: of course now i can't find it...
23:48 karolherbst: fincs: well, it sucks for emulators :D
23:49 fincs: I have too much fun writing mme
23:49 mwk: imirkin: the so-called DELAY methods on 3d object?
23:49 karolherbst: as long as they don't do pattern detection, the switch emulator will always run significantly faster on nvidia hw than AMD/Intel I guess
23:49 fincs: Yup
23:49 fincs: Also - do we know the size of the mme code area?
23:49 karolherbst: VRAM probably?
23:49 imirkin: MC: 0x01a24021 maddr 0x689 [GP104_3D.DELAY]
23:50 karolherbst: or is there on chip mme stuff?
23:50 imirkin: mwk: yes.
23:50 fincs: Ah yeah 0x689 is something else
23:50 mwk: imirkin: they... are not delay at all
23:50 imirkin: PM: 0x00007353 GP104_3D.DELAY = 0x7353
23:50 imirkin: ah ok
23:50 fincs: Hmm
23:50 fincs: I misread 0x689 as 0x68B
23:51 fincs: I don't think I've observed 0x689
23:51 imirkin: it's always writing 0x7353
23:51 fincs: Maybe it's a Pascal thing
23:51 imirkin: no
23:51 mwk: imirkin: 0x1a24 on the 3D object is a really cute method
23:51 mwk: it's official name is TestForQuadro
23:51 karolherbst: the heck? :D
23:51 fincs: Wat
23:52 mwk: it... is a sleep() if you don't have a quadro, nop if you do have a quadro
23:52 fincs: What's this even
23:52 imirkin: mwk: ah right
23:52 fincs: Heh I guess that's why I haven't seen it
23:52 mwk: it's specifically designed so that the driver can fuck you over at the right points
23:52 fincs: Not useful
23:52 karolherbst: mwk: :/
23:52 karolherbst: the hell :/
23:52 imirkin: no quadro variants of GM20B i guess?
23:52 karolherbst: well, glad that we don't have this issue
23:52 mwk: when it deems you to be doing Expensive Professional Things
23:52 RSpliet: mwk: I thought they were strategically positioned to only really appear in CAD workloads?
23:53 mwk: *shrug* they tie it to weird things
23:53 karolherbst:still wondering how fp64 is done
23:53 mwk: that one's probably done in hw
23:53 karolherbst: one would think.. yes
23:54 karolherbst: I could imaging that low end chipsets don't have it
23:54 karolherbst: but maybe high end desktop chips can be convinced to enable it
23:54 mwk: IMO it's just fuses
23:54 karolherbst: ahh
23:54 mwk: once the GPU is marked as non-tesla in the factory, you're not getting full fp64 performance, and that's the end of discussion
23:55 mwk: sure, the hw is "capable" of it, but it won't let you
23:55 mwk: (and fuses are also how TestForQuadro works)
23:58 karolherbst:starts to have the cTS
23:59 karolherbst: *hate