00:08karolherbst: let me recheck
00:09HdkR: 2/win 22
00:11karolherbst: imirkin: nouveau doesn't load on that one :p
00:11imirkin: so why skip?
00:11karolherbst: I didn't skip that
00:11karolherbst: git bisect bad 893e591b59036f9bc629f55bce715d67bdd266a2
00:12imirkin: oh ffs
00:12imirkin: that is _very_ surprising
00:12imirkin: i looked at the diff of EVERYTHING that merge commit brought in
00:13imirkin: it does make changes to the fdt thing
00:13karolherbst: the skips mainly won't compile.. which is just super annoying
00:15imirkin: like what about them doesn't compile?
00:15imirkin: it's just DT updates
00:15imirkin: what's the issue?
00:15karolherbst: they use a field which got added later
00:15imirkin: but 893e591b59036f9bc629f55bce715d67bdd266a2 was _bad_
00:16imirkin: or should it have been a skip?
00:16karolherbst: it's bad
00:16karolherbst: not skip
00:16imirkin: but e9a3bfe38e393e1d8bd74986cdc9b99b8f9d1efc is skip
00:16imirkin: which is the top commit being merged into mainline
00:16karolherbst: I think I made some mistakes
00:16karolherbst: that's why I do another bisect with known good/bads
00:17karolherbst: it only looks worse though
00:18imirkin: git bisect good f3ca745d8a0e6ace1f91bd122f5bff0323ff6bd8
00:18imirkin: # skip: [f3ca745d8a0e6ace1f91bd122f5bff0323ff6bd8] dt-bindings: usb: Convert DWC2 bindings to json-schema
00:18imirkin: git bisect skip f3ca745d8a0e6ace1f91bd122f5bff0323ff6bd8
00:18imirkin: # skip: [f3ca745d8a0e6ace1f91bd122f5bff0323ff6bd8] dt-bindings: usb: Convert DWC2 bindings to json-schema
00:18imirkin: what's this about?
00:18karolherbst: just wait a sec :p
00:18imirkin: i think that might have tripped things up a bit
00:19imirkin: if it was actually a skip
00:19karolherbst: but I have a new good/bad pair with 33 commits in between
00:19imirkin: what's the range?
00:21karolherbst: imirkin: https://gist.githubusercontent.com/karolherbst/1b040dce50c3e56391d3511ea4dbf315/raw/c4f30e312c21ac2d83701cbd404441b752cadd7b/gistfile1.txt
00:22imirkin: gimme a few
00:22imirkin: finishing up some work stuff
00:24karolherbst: mhh.. let me try to fake fix that stuff... but uff
00:24karolherbst: I am sure it would break stuff
00:24karolherbst: as it's in a driver my hw uses
00:33imirkin: i seee....
00:33imirkin: so ........
00:33imirkin: the issue, i think, is the v5.5-rc2 is a bad base to begin with
00:33imirkin: so it's not so much the dt-* commits
00:33imirkin: as the fact that they're based on v5.5-rc2 which doesn't work for you at all
00:33imirkin: so i think this whole bisect is basically wrong
00:34karolherbst: but the merge is still broken :p
00:34imirkin: so here's what i'd do.
00:35imirkin: take the range 7dce4d6f151de852925feb1dd6e42d91dab14951..893e591b59036f9bc629f55bce715d67bdd266a2
00:35imirkin: and rebase them on v5.5
00:35karolherbst: ohhh.. that might actually work.. yes
00:35imirkin: i expect there would, at most, be very minor conflicts
00:35imirkin: then manually merge that with 1c715a659a16e193a23051ddff4becdad8e18ba1
00:35imirkin: if all goes well, the merge of those 2 should still be bad.
00:35karolherbst: I just updated my tree on the last good one
00:35karolherbst: so I will just bisect that on top of that
00:35karolherbst: should cause less issues
00:36karolherbst: just want to test that one poweroff fix I've gotten :)
00:36imirkin: and then you can do a proper bisect.
00:41karolherbst: cherry-picking merges is brutally annoying
00:41imirkin: that dt/linus thing?
00:42imirkin: i'd just merge it in
00:42imirkin: i don't think you can cherry-pick it
00:42karolherbst: mhh, but then I can't bisect :p I mean, I need to adjust the cherry-pick range and replace the last commit by the actual last one
00:43karolherbst: e9a3bfe38e39.. ahh
00:44karolherbst: yay that works
00:44karolherbst: error: commit db0d39aa7f92cc566b70913f40dbaacc8152a308 is a merge but no -m option was given. ehhh
00:44imirkin: you can't cherry-pick it
00:44imirkin: go around it
00:45imirkin: i.e. cherry-pick up to it
00:45imirkin: then do the merge
00:45imirkin: then cherry-pick the rest
00:45imirkin: (and then merge to that netdev commit)
00:45karolherbst: there is another merge :=
00:45karolherbst: git cherry-pick --skip :p
00:46karolherbst: as long as the build is broken I don't care really
00:48karolherbst: nouveau doesn't load :)
00:48karolherbst: and it builds
00:49imirkin: everything you've ever wanted :)
00:50karolherbst: I just hope there aren't random boot issues
00:52karolherbst: ahhh, this looks much better :)
00:52karolherbst: only 7 commits left
00:54karolherbst: I get a bad feeling about this
00:55karolherbst: oh no :(
00:56karolherbst: "of: Rework and simplify phandle cache to use a fixed size"
00:56imirkin: he he he
00:57imirkin: a bit too much simplification, i guess
00:57karolherbst: I guess so
00:57imirkin: on the bright side, robher is pretty responsive
00:57karolherbst: the heck.. and now what :D
00:58imirkin: ping him on irc, he's in a lot of chans
00:58imirkin: probably in #tegra
00:58imirkin: in the meanwhile, try reverting it on top of v5.6 to super-confirm
00:59karolherbst: yeah well
00:59karolherbst: mhh a trival one though
01:06karolherbst: imirkin: yeah.. reverting this fixed it on top of 5.6.3
01:06imirkin: yeah, so reach out to robher on irc or via email
01:06imirkin: my impression is that he's pretty good about this stuff
01:12karolherbst: soo.. no applying local patches back and maybe X even starts :)
01:14karolherbst: this smells like "the old code was very good at handling corner cases" kind of issue :/
01:15karolherbst: and that all just for upgrading to 5.6
01:16karolherbst: imirkin: I got some nice tips in #tegra though
01:17karolherbst: cat /sys/kernel/debug/pwm; cat /sys/kernel/debug/gpi as well
01:17karolherbst: missing o
01:17karolherbst: especially that regulator_summary thing
01:18imirkin: it's almost like they know about these things :)
01:23karolherbst: well.. at least my config wasn't wrong
01:54karolherbst: ohhh my god
01:54karolherbst: imirkin: cache collision :(
01:54karolherbst: or well
01:54karolherbst: two nodes having the same handle
22:35imirkin: karolherbst: if there's a test, i can look at it
22:36imirkin: iirc i remember this ... it was looking at TexFormat instead of internalFormat
23:08karolherbst: imirkin: I can look into it tomorrow and report back.. but yeah, I mean I have the patches to fix it as well... if there is a more straightforward solution that would be helpful :)
23:09imirkin: a test with the problem would be good
23:09imirkin: and a reminder where your patches are
23:10karolherbst: it's quite out of date but "mesa: rename gl_format_info to mesa_format_info" up to "mesa/teximage: for es we have to check the internal format not what t… ": https://github.com/karolherbst/mesa/commits/cts_v3
23:11imirkin: do they mention which test needs "fixing"?
23:11karolherbst: the last patch really shows the issue though
23:11karolherbst: copy image stuff
23:11imirkin: like a specific one :)
23:11karolherbst: yeah.. let me check locally
23:13imirkin: so yeah, that change seems totally right
23:18fincs: Btw, did anyone notice this? https://github.com/NVIDIA/open-gpu-doc/commit/e5e7baac2a3d5310d461c9db12be6e7401a4c2bc
23:18fincs: Looks fun; kind of a shame it's only Turing
23:21imirkin: lots of stuff being pushed out there of late
23:22fincs: Last time (copy methods) it was something that already existed elsewhere though
23:22karolherbst: fincs: let's say I knew it was coming :p
23:22imirkin: this is the new MME ISA i guess?
23:22fincs: Unless I'm mistaken, this is new?
23:22karolherbst: imirkin: yes
23:22imirkin: fincs: karolherbst has the inside track
23:22fincs: Yeah I know lol
23:23fincs: Still want to see love for Maxwell though
23:23imirkin: i've given it fairly little so far, given the reclock situation
23:24fincs: I meant open-gpu-docs
23:24karolherbst: imirkin: huh.. it might be it's actually fixed now :O let me retest on my branch to figure out what was broken
23:24imirkin: karolherbst: perhaps the format rework magically fixed it :)
23:25imirkin: by making that function do the right thing
23:25fincs: I guess even if they do release something like 3d class methods for Turing, some stuff will have been inherited from earlier architectures and there's still useful stuff
23:25imirkin: it reduced the number of various formats throughout
23:25fincs: Also I have kind of a burning question - could there be undocumented mme opcodes
23:25imirkin: fincs: i haven't looked _at all_ at volta/turing
23:25fincs: Like, not all opcode numbers are used
23:25imirkin: fincs: if it's not in envytools, it doesn't exist
23:26fincs: But did anyone try setting that opcode number and see how hardware reacts? :p
23:26imirkin: some very OCD folks tended to do good RE back in the day
23:26imirkin: mostly mwk :)
23:26karolherbst: fincs: thre is only one way to find out :p
23:26fincs: ( ͡° ͜ʖ├┬┴┬┴
23:26fincs: (I made my own mme assembler with custom syntax btw)
23:26imirkin: time well spent, i'm sure...
23:27fincs: Kinda love mme
23:27karolherbst: fincs: is it llvm based? :D
23:27fincs: I said assembler, not compiler :p
23:27imirkin: fincs: do you like my ARB_indirect_parameters macro?
23:27fincs: Took me only like an afternoon to write the assembler
23:27fincs: imirkin: You haven't seen my mme
23:28fincs: Also nvidia implements indirect by doing some hardcore gpfifo trickery
23:28mwk: fincs: I'm reasonably certain there are no other mme opcodes on the gpu I reversed it on
23:28fincs: Not with mme
23:28fincs: mwk: What gpu was that?
23:28mwk: that said, it's been a few generations ago
23:28imirkin: fincs: indirect compute
23:28fincs: I have Maxwell 2nd gen (Tegra X1 on Nintendo Switch)
23:28imirkin: fincs: not indirect draw ... that has to be mme, i think
23:28karolherbst: imirkin: fun.. so uhm.. on my branch less gles3 tests are failing
23:28fincs: Indirect draw/compute are done with fun gpfifo entry trickery
23:29karolherbst: but.. ehm
23:29karolherbst: I didn't find the one it actually fixes
23:29fincs: Did you know you can end a pushbuffer prior to the parameters, and submit another entry and that has the parameters?
23:29imirkin: i'm pretty proud of this one.
23:29fincs: No mme necessary for indirect at all lol
23:30imirkin: fincs: wtvr, that's for a single draw
23:30imirkin: did you look at the indirect_parameters thing?
23:30fincs: Indirect compute: https://github.com/devkitPro/deko3d/blob/master/source/maxwell/gpu_compute.cpp#L167-L187
23:30imirkin: we don't use a macro for indirect compute at all
23:30imirkin: (except on fermi)
23:30karolherbst: ../../../modules/gles3/functional/es3fNegativeTextureApiTests.cpp sounds like the one I am searching for
23:30fincs: You're doing multidraw right?
23:31imirkin: fincs: multidraw, but not as simple.
23:31fincs: Multidraw indirect then
23:31imirkin: with a configurable number of draws inside a different buffer
23:31fincs: NVN has that
23:31fincs: "MultiDrawArraysIndirectCountARB" Yeah so what I said
23:31fincs: All draws are implemented with macros
23:32fincs: And the multidraw ones are really long, heh
23:32imirkin: so anyways, that macro does it all in one go. pretty happy with it.
23:32fincs: Nvidia one has to spill registers into shadow ram iirc
23:32imirkin: i spill into SCRATCH
23:32fincs: That's it
23:32imirkin: coz i'm weak.
23:33imirkin: couldn't figure out the RA
23:33fincs: Don't feel bad about it
23:33imirkin: i think it _might_ be possible to reorganize it
23:33fincs: Nvidia did the same thing
23:33imirkin: but it's a lot of things to keep track of
23:33imirkin: and not a lot of registers
23:33fincs: There's only 7 gprs, heh
23:33imirkin: you wish
23:34fincs: Plus the special zero reg
23:34imirkin: yeah, actually i guess it is 7
23:34imirkin: i thought you lost another one
23:34fincs: Nope lol
23:34imirkin: but that's just the initial parameter is in $r1
23:34fincs: I like abusing delay slots and other tricks in mme :)
23:34imirkin: did you read my code?
23:35mwk: abusing delay slots?
23:35imirkin: branz annul = "i have failed to use the delay slot for something useful"
23:35imirkin: all the other ones are good
23:35imirkin: also note that the whole delay slot thing isn't new... MIPS had that
23:35fincs: I don't think I've used the annul variants at all
23:36karolherbst: imirkin: heh..https://trello.com/c/iHvcHfoP/3-copyimage-with-rgba4
23:36karolherbst: ohh wait
23:36karolherbst: that's the GL one
23:36karolherbst: forget it
23:37fincs: I wonder if mme is expensive
23:37karolherbst: I had htis card but that's based on my fix :D https://trello.com/c/ZnM10QHM/30-deqp-master-gles-30-master
23:37imirkin: it's actually cheaper than dumping stuff into pushbuf
23:37fincs: I kind of use mme a lot lol
23:37imirkin: someone was asking me why a giant direct draw was more expensive than the same indirect draw
23:37imirkin: or rather, not giant...
23:37imirkin: more like very-instanced
23:37imirkin: like 1MM instances or something
23:38imirkin: cheaper to do let the GPU spin 1MM times than emit all that into the pushbuf
23:38fincs: Okay so I guess I shouldn't feel bad
23:38imirkin: probably not worth the overhead for like 2 instances though
23:39fincs: imirkin: I looked at your code, you're not using the exit flag on branch instructions, disappointing :)
23:39imirkin: perhaps i didn't knwo about it?
23:39fincs: Oh wait you are, but on a different macro
23:39fincs: Good stuff
23:39imirkin: you have to run out the parameters though
23:39imirkin: you can't just leave stuff unread
23:39fincs: I also use branch + exit as a way to do conditional exit
23:39fincs: Because the branch cancels the exit if it's taken
23:40imirkin: i do it in mme9097_query_buffer_write
23:40fincs: And yes you do need to run out the params
23:40fincs: Good stuff
23:40fincs: I kind of do that all the time
23:40fincs: Is annul slower than non-annul?
23:41imirkin: it's just a way to not have to manually insert a nop after the thing
23:41imirkin: i don't think it's any faster or slower
23:41fincs: Why insert a nop when you can just load whatever the next block of code needs :)
23:41imirkin: you just don't execute the thing in the delay slot
23:41imirkin: the delay time slot still happens
23:41fincs: Isn't that a pipeline stall?
23:41imirkin: well -- sometimes the next thing is another branch
23:41fincs: Where you are wasting time
23:41imirkin: or there are multiple entries into the other thing
23:41karolherbst: you know what.. I just run the full thing
23:41imirkin: so it's not easy to preload it
23:42imirkin: all the macros starting with NVC0_3D_MACRO_DRAW_ELEMENTS_INDIRECT is me, i think
23:42imirkin: everyting before is not-me, presumably calim
23:43imirkin: open to improving the mme code ;)
23:43imirkin: oh - conservative raster was pendingchaos ... and i think he inspired my impl of the compute counter thing.
23:43fincs: I think it's already pretty tight given what you've said
23:44imirkin: with the shift+add style multiplication... heh
23:44imirkin: what a waste. whatever.
23:44fincs: Conservative raster is weird
23:45fincs: It needs firmware calls
23:45fincs: And it looks like you're not waiting for the fw call to finish
23:45imirkin: and all that was just for for the stupid compute invocations counter in ARB_pipeline_statistics
23:45imirkin: mmm... could be bugs, i dunno
23:45karolherbst: uff yeah.. that one
23:46imirkin: i don't know jack about it, and i think pendingchaos was new to the whole thing at the time
23:46fincs: After you do the firmware call, you're supposed to poll the first scratch register and wait for it to be 1
23:46imirkin: huh, i had no idea. with a macro? or?
23:47fincs: I.e. 0xD00 (aka 0x3400)
23:47fincs: Yes, in mme
23:47imirkin: is that what the stupid DELAY things are supposed to be used for?
23:47fincs: Read it in a loop and wait for it to be 1
23:47imirkin: so that it doesn't end up sitting in a tight loop
23:47fincs: Which delay thing?
23:47fincs: Code I've observed is just a loop without a deadline, lol
23:47fincs: Fun fact: Switch emulators not properly emulating MME/firmware calls made that loop an infloop
23:47fincs: But I forced them to fix it
23:48karolherbst: I am sure they had fun writing that mme emulator
23:48fincs: They seem to dislike mme, and that makes me sad
23:48imirkin: of course now i can't find it...
23:48karolherbst: fincs: well, it sucks for emulators :D
23:49fincs: I have too much fun writing mme
23:49mwk: imirkin: the so-called DELAY methods on 3d object?
23:49karolherbst: as long as they don't do pattern detection, the switch emulator will always run significantly faster on nvidia hw than AMD/Intel I guess
23:49fincs: Also - do we know the size of the mme code area?
23:49karolherbst: VRAM probably?
23:49imirkin: MC: 0x01a24021 maddr 0x689 [GP104_3D.DELAY]
23:50karolherbst: or is there on chip mme stuff?
23:50imirkin: mwk: yes.
23:50fincs: Ah yeah 0x689 is something else
23:50mwk: imirkin: they... are not delay at all
23:50imirkin: PM: 0x00007353 GP104_3D.DELAY = 0x7353
23:50imirkin: ah ok
23:50fincs: I misread 0x689 as 0x68B
23:51fincs: I don't think I've observed 0x689
23:51imirkin: it's always writing 0x7353
23:51fincs: Maybe it's a Pascal thing
23:51mwk: imirkin: 0x1a24 on the 3D object is a really cute method
23:51mwk: it's official name is TestForQuadro
23:51karolherbst: the heck? :D
23:52mwk: it... is a sleep() if you don't have a quadro, nop if you do have a quadro
23:52fincs: What's this even
23:52imirkin: mwk: ah right
23:52fincs: Heh I guess that's why I haven't seen it
23:52mwk: it's specifically designed so that the driver can fuck you over at the right points
23:52fincs: Not useful
23:52karolherbst: mwk: :/
23:52karolherbst: the hell :/
23:52imirkin: no quadro variants of GM20B i guess?
23:52karolherbst: well, glad that we don't have this issue
23:52mwk: when it deems you to be doing Expensive Professional Things
23:52RSpliet: mwk: I thought they were strategically positioned to only really appear in CAD workloads?
23:53mwk: *shrug* they tie it to weird things
23:53karolherbst:still wondering how fp64 is done
23:53mwk: that one's probably done in hw
23:53karolherbst: one would think.. yes
23:54karolherbst: I could imaging that low end chipsets don't have it
23:54karolherbst: but maybe high end desktop chips can be convinced to enable it
23:54mwk: IMO it's just fuses
23:54mwk: once the GPU is marked as non-tesla in the factory, you're not getting full fp64 performance, and that's the end of discussion
23:55mwk: sure, the hw is "capable" of it, but it won't let you
23:55mwk: (and fuses are also how TestForQuadro works)
23:58karolherbst:starts to have the cTS