00:00lomb: imirkin: the only think I could ever do is sending info on bugs I may find (with logs)
00:00lomb: I just noticed
00:00imirkin: lomb: that's good. just don't file "i have a hang but no additional info" bugs
00:00lomb: that imirkin is Ilia Mirkin xD
00:00imirkin: or even worse -- say "me too" to one of those "random hang" bugs
00:00imirkin: heh - yeah. coincidence, probably.
00:01lomb: imirkin: well, on my sisters computer there is a GT 635. when I reclock it the screen gets covered in all kinds of colours
00:01lomb: when I have access to it I'll try to collect logs
00:01lomb: and send them
00:01imirkin: lomb: ooh, grab the vbios
00:01imirkin: (in addition to just a basic dmesg after boot)
00:01imirkin: if there are any errors that pop up after reclock, that would be interesting too
00:02imirkin: this isn't a mac, is it?
00:02lomb: I think I can do that
00:02lomb: my family has no apple hardware
00:02RSpliet: oh, and there's an option to turn on PMU debugging, that would provide some useful info in the logs... let me think
00:02lomb: I only own a powermac from 1994 that I once bought for 50€, but I never used it
00:02imirkin: reclocking is karolherbst's specialty -- he got kepler reclocking over the usability line (after skeggsb did the lion's share of the underlying work)
00:03RSpliet: or... gah, which one prints the memory script
00:03lomb: hmm, I'll note that
00:03lomb: but you'll have to wait
00:03lomb: since I'll only be able to get to that computer when I have holidays
00:03imirkin: i'm not the one with the issue :)
00:04lomb: I'll try to collect good nouveau cards though for my article
00:04imirkin: i was unaware that we still had issues like that with kepler reclocking
00:04lomb: if you need some tester you can contact me
00:04imirkin: so i think those issues are relatively rare
00:04RSpliet: lomb: if you go ahead and get a log of reclocking failing, please boot with the nouveau.debug=pmu=debug kernel parameter
00:05lomb: I'll definitelly do that
00:05imirkin: lomb: but i dunno what the target for the article is, but nouveau is definitely hard to use for the average user. some random application can totally just hang your system.
00:05lomb: I'll write it down now
00:06lomb: but as I said, you'll have to wait :L
00:06RSpliet: That's alright, I won't be the one working on it :-P
00:06lomb: haha :P
00:07lomb: btw, may I ask what the status of vulkan support is?
00:07imirkin: it's as close today as it was originally
00:07lomb: meh :L
00:07imirkin: i.e. no real work has been done
00:08lomb: that sucks
00:08lomb: but oh well
00:08imirkin: unfortunately we need to make some pretty core changes to enable userspace to do memory layout
00:08RSpliet: well not publicly at least. Who knows what may be happening behind closed doors.
00:08imirkin: currently only the kernel can do memory layout
00:08imirkin: once that happens, the actual vulkan stuff isn't that hard
00:08lomb: it would be nice if someone randomly patched some nice features
00:09RSpliet: imirkin: I thought threading had something to do with it too...?
00:09imirkin: but i've been waiting for that kernel change for years now
00:09imirkin: RSpliet: not at all
00:09lomb: imirkin: so you are stuck?
00:09lomb: I mean, you have to wait for the linux kernel team to implement a change?
00:09imirkin: i mean ... heh
00:09imirkin: you make it sound like there are barriers
00:10imirkin: i could go in and make the change too
00:10imirkin: but it's not like a 5-line thing
00:10lomb: well, yeah
00:10lomb: but then they might not include it
00:10lomb: and that's time wasted
00:10imirkin: no, they'd include it
00:10lomb: oh :P
00:10imirkin: assuming i did a good job
00:10imirkin: (and i'd expect nothing less)
00:11imirkin: i don't think i've had trouble getting skeggsb to take my patches -- sometimes delayed, but he will either take it or reimplement himself in a better way
00:11imirkin: but all this memory management stuff is very hard to grok
00:12skeggsb: it's "coming", but imirkin is right, the kernel changes aren't trivial, especially as there's quite a lot of things to tackle at the same time that all inter-relate
00:12skeggsb: that said
00:12lomb: hmm, hopefully someone's already working on it
00:12skeggsb: they *could* be hacked into the current apis if someone wanted to and had the motivation, for me, given that we're stuck at shitful clocks for the moment, that focus should be on more critical things
00:13skeggsb: i *am* working towards that though, and a bunch of stuff being pushed out soon to support ampere contains some major parts of the work needed for the new mm/command submission stuff
00:13imirkin: skeggsb: side question, but how? i.e. how could we make userspace set modifiers in PTEs at fixed addresses with the current API?
00:15skeggsb: badly/hackishly? it'd still be a heap of work to do tbh, if it were trivial even there i'd have hacked it in long ago
00:15imirkin: hehe ok
00:15lomb: I don't understand how this could possible work, but I know that it's hard to implement reclocking
00:15lomb: it's a guessing game, right?
00:15lomb: anyways, I need to go to bed
00:15imirkin: lomb: i wrote a thing about it some time back
00:15lomb: good night everyone
00:16skeggsb: it's impossible now, you need nvidia's signing key to write firmware that's able to touch all the hw
00:16imirkin: lomb: https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/open-source-nvidia-linux-nouveau/998310-nouveau-persevered-in-2017-for-open-source-nvidia-but-2018-could-be-much-better?p=998427#post998427
00:16RSpliet: lomb: the reclocking code itself isn't hard, just "tedious". The firmware situation makes it impossible
00:16imirkin: not all relevant to your question, but could be interesting anyways
00:16lomb: imirkin: ah, thank you. I bookmarked it and since it's not "a book" I'll read it :P
00:17lomb: skeggsb: I know, but to be honest I don't care about new hardware
00:17lomb: as long as older hardware is aviable in huge numbers on aliexpress and ebay
00:18RSpliet: Heh, TIL Kepler supports Vulkan with the blob
00:18imirkin: skeggsb: btw - m2mf doesn't look like it supports memtype's in sysmem
00:19imirkin: i dunno if that sounds familiar to you
00:19skeggsb: yeah, that's unexpected to me (after nv50, anyway)
00:19imirkin: [923209.629969] nouveau 0000:04:00.0: gr: TRAP_TEXTURE - TP1: 00000009 [ LINEAR_MISMATCH]
00:20skeggsb: oh i believe you, i'm just shocked we've never noticed before
00:20imirkin: oh wait
00:20imirkin: i just looked more closely
00:20imirkin: [923209.629974] nouveau 0000:04:00.0: gr: 00200000  ch 2 [000faa0000 m2mf_test] subc 4 class 502d mthd 08dc data 00000000
00:20imirkin: somehow it's the 2d engine which gets angry?
00:20imirkin: back to experimentation
00:21skeggsb: have you accidentally bound twod to that subchannel?
00:21imirkin: no - i have 2d running for other stuff
00:21imirkin: since i'm testing m2mf, i need SOME way to "reliably" get data in and out
00:21imirkin: so i'm using 2d for that
00:22imirkin: so it's actually the 2d engine which hates GART + memtype?
01:26imirkin: skeggsb: could i have set the 2d engine up incorrectly somehow so that it doesn't like GART sysmem? or do you think the GRAPH engine doesn't like these more generally?
01:33imirkin: at least for 3d rendering, we only support PITCH RT's in sysmem
01:33imirkin: dunno if that's hw or just how the driver is written
01:33imirkin: anyways, leads me to believe the situation doesn't occur too frequently
01:37imirkin: skeggsb: interesting, in nv50_bo_move_m2mf you don't seem to care what the underlying width/height are ... i'm surprised that works
01:38imirkin: maybe it's fine since you're copying the whole object
01:39skeggsb: yeah honestly, i'm often confused by that function and why it works.. i don't recall the original reasons for doing that, but i was avoiding errors i can't recall with other methods
01:39skeggsb: that was written maaaaaaany years ago
01:40skeggsb: nfi about twod and blocklinear sysmem either, another thing i'm not sure we've hit before (probably as a result of essentially forcing vram for fbs in the 3d driver for legacy reasons)
01:41skeggsb: we really need to begin writing test-cases for these hw blocks specifically
01:42imirkin: oh btw - that workaround you added for TU102 rendering to zeta as color?
01:42imirkin: turns out that we *do* need something there for earlier gens
01:42imirkin: it doesn't complain loudly, just doesn't work ... sometimes
01:43skeggsb: some kind of hw race?
01:43imirkin: i think i've seen random zeta fails in the past
01:43imirkin: i think just multiple dispatch
01:43skeggsb: or we're missing more specific flushes.. which i think we actually are, and our fences don't work right sometimes...
01:43imirkin: normally it's safe to render to the same surface "multiple times"
01:43imirkin: it won't step over itself
01:44imirkin: but here since it gets bound in different places, that defeats that logic. or something like that.
01:44skeggsb: yeah, makes sense
01:44imirkin: obv don't know the specifics, so just guesses :)
01:45imirkin: the test that triggered the failure blitted zeta (MS resolve), then used the new zeta surface to perform depth tests
01:45imirkin: and those depth tests didn't fully work out correctly - some small sub-triangles did, others didn't
01:49imirkin: skeggsb: unrelated ... do you remember those GT215 GDDR5 boards? do you remember them still having issues even after your fix?
01:49skeggsb: imirkin: are you *sure* you're pointing twod at the right surfaces btw? i'm fairly certain LINEAR_MISMATCH means you used 'kind != 0' pages as pitch, or 'kind == 0' pages as blocklinear
01:50imirkin: skeggsb: pretty sure. works fine when i set GART + memtype = 0
01:50imirkin: or if i put it in VRAM + memtype = anything
02:00imirkin: skeggsb: and it's a DEST2D_SOMETHING error when i do this on the dst surface
02:11KungFuJesus: imirkin: I feel like we're a bit closer to solving this issue
02:12KungFuJesus: I feel like something is switching on endianness for the non-packed representation when it shouldn't be, since the spec literally says RGBA is the byte order of the texture in memory
02:13imirkin: KungFuJesus: well, GL_RGBA + GL_UNSIGNED_BYTE = array format
02:13imirkin: i.e. first byte is red, second byte is green, etc
02:13imirkin: aka "always LE"
02:13imirkin: GL_RGBA + GL_INT_8_8_8_8 == texture is a sequence of int32's
02:14imirkin: and the low bits are red, etc
02:14imirkin: which in LE == GL_RGBA + GL_UNSIGNED_BYTE
02:14imirkin: but in BE is not
02:26KungFuJesus: Right, according to the spec, GL_UNSIGNED_BYtE + GL_RGBA says "literally R, G, B, A byte order"
02:26KungFuJesus: and since I know the data is in fact R,G,B,A in byte order in memory, something is erroneously flipping endianness somewhere on the way to the card
02:27KungFuJesus: and telling it the wrong thing inverts the bad inversion. Mostly, anyway, the flickering textures you see the video indicates at some point the endianness might be correct in the render pipeline
02:28KungFuJesus: but it sure seems it's more wrong than right, anyway, as it looks "mostly good" when you do the packed byte representation
02:44imirkin: KungFuJesus: it's weird though -- you're saying you have to claim it's a packed format AND byteswap?
02:44imirkin: but without changing the underlying data?
02:44imirkin: so ... packed should effectively be a byteswap compared to array
02:44imirkin: so there's something extra-weird going on
02:45imirkin: always heartening to hear, i'm sure
02:46KungFuJesus: what might be even weirder is it seems that every GL application doesn't seem byte swapped on their textures. Particularly I think mythtv uses GL to render menus and it's using Qt's bindings for most of the UI
02:47imirkin: well, i made sure glxgears worked ;)
02:47KungFuJesus: and all of those textures seem perfectly fine, though it's hard to say if they are compressed representations or not without digging too much into it
02:47KungFuJesus: lol, glxgears just uses glColor, right?
02:47imirkin: yeah - it renders directly, no textures
02:51KungFuJesus: hah, I wonder if maybe the flicker goes away if I say to pack them swapped, too. Or maybe that inverts the inversion and breaks it again
02:52imirkin: yes, i struggled with all these things when i was trying to fix things
02:52imirkin: the whole index buffer thing killed me
02:52imirkin: (and vertex data)
02:52imirkin: since you don't know what it is while it's being uploaded
02:52imirkin: so wtf is one to do
02:52KungFuJesus: how much of this code was in nvidia specific intermediate representation vs core mesa?
02:53imirkin: coz it has to be in "LE" format by the time it hits the GPU
02:53imirkin: well - there are layers
02:53imirkin: and the layers have API's between them
02:53imirkin: so they best not lie to each other
02:53imirkin: i don't think i had too many core fixes
02:53imirkin: maybe a small handful
03:17KungFuJesus: bisecting the "ragel" code generator right now - unexpected weird, probably endian related, issue with some characters being dropped
03:17KungFuJesus: not as fun as fixing nouveau probably will be, but hopefully won't be too hard
03:17imirkin: well, with nouveau you're dealing with hardware
03:18imirkin: which you know nothing about
04:14KungFuJesus: man this is not a clean bisect. Half of the commits half broken automake configs, some of which require specific versions of a library called colm
04:14KungFuJesus: I've had to stash at least 15 different fixups for every one of these hashes
04:46imirkin: KungFuJesus: i can write some very simple sample programs for you to run
04:46imirkin: and we can try to debug with that
04:46imirkin: lmk if that's something you'd be interested in
04:53KungFuJesus: for sure
04:53KungFuJesus: put em on the the gitlab issue or email me them
04:54imirkin: obv don't have anything right now
04:54imirkin: but next week at some point
04:54imirkin: basically i want to try to figure out what the simplest thing is
04:54KungFuJesus: that tiny immediate mode example exhibits the issue that's there
04:55imirkin: i'll actually have a look at it myself
04:55imirkin: and see what it does in practical terms
04:55imirkin: and then maybe try to strip it down
18:01imirkin: kherbst: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4068 - looks like someone broke it with the "let's use nir everywhere" changes - i haven't had a chance to look
18:01imirkin: would you be able to?
18:30kherbst: imirkin: I guess the nir just needs to be fred after converting to TGSI
18:31imirkin: maybe? i have no idea :)
18:32kherbst: well, it's C so you still have to free memory :p
18:32kherbst: but mhh
18:32kherbst: it seems like a leak inside the pass
18:33kherbst: don't do that (tm)
18:33imirkin: it's fine if you do that
18:33imirkin: just don't forget to free it :)
18:33kherbst: yeah.. I think I found the error
18:33kherbst: ntt_emit_impl the c->liveness needs to be fred
18:33kherbst: no clue where, but that's the thing
20:13imirkin: kherbst: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8422
20:16imirkin: oh. you already saw it.
20:23kherbst: well.. I didn't, but anholt said on the issue that something like that will appear :p
21:38kherbst: ehhhh.. this multithreading issue is slowly annoying me.. at least I got userspace worked out, but now I hit this stupid kernelspace race condition :D
21:38kherbst: you know, this "super" race
21:40kherbst: but at least deqp usually doesn't crash: https://gist.github.com/karolherbst/7047ed44d7457b17dcc16d19d8f2c1d2
21:41kherbst: guess I will put the patches in an upstreamable shape next
22:12imirkin: i'm going to try to think about that RA patch tonight
22:13imirkin: cwabbott: hey, do you remember why https://github.com/karolherbst/mesa/commit/b2a86db447c266df51b4d07e135eb4096a054483 needs to *not* move the MERGE/SPLIT to its single use site?
22:25karolherbst: imirkin: I don't know if this answers your question, but I think one problem with the current behaviour was mixed merges with mixed sources or something..
22:25imirkin: yes. i got that.
22:26karolherbst: but probably the answer was "RA bugs"
22:26imirkin: or perhaps it wasn't required at all in the end
22:27karolherbst: could be
22:57karolherbst: mhh... I think I have to wrap around nouveau_pushbuf :/
22:58karolherbst: or... I hack custom stuff into nouveau_pushbuf.user_priv
22:58Lyude: you can do that
22:58Lyude: it's going away at some point soon and I'm already doing exactly that with user_priv for igt :PO
22:59Lyude: afaict user_priv isn't used by anything and is actually for the user, so it should be safe
22:59karolherbst: the main reason was to have attached data in the kick_notify callback
23:00karolherbst: we put the nvc0_screen into user_priv actually...
23:00karolherbst: crap :D
23:00karolherbst: why didn't I think about it before
23:01imirkin: that might be the reason we do that ;)
23:01karolherbst: yeah.. well.. but I want to put in data used by all driver "variants"
23:01karolherbst: mhh.. but yeah
23:02karolherbst: that will allow me to reduce the amount of changes already
23:03karolherbst: oh well.. will fix my patches up tomorrow then
23:03karolherbst: maybe I get my +3500,-3300 loc change down to +300,-100 with that
23:04karolherbst: I still want to have assert/debug code making sure we do the submissions correctly though
23:07karolherbst: imirkin: mhh.. nv30 has still annoying user_priv stuff going on :/
23:09karolherbst: ohhh, I see now :/
23:10karolherbst: imirkin: do you actually have nv30 hw you could test patches on?
23:10karolherbst: or.. wait, I think I have pre nv50 GPUs with PCIe...
23:24imirkin: karolherbst: i mean, not right this second
23:24imirkin: but i do have a number of nv4x's that are PCIe, and one nv34 which is PCI
23:24karolherbst: well, right, I also don't have code I could throw at it yet anyway
23:24imirkin: (and one which has a PCI -> PCIe bridge iirc)
23:25karolherbst: okay... I think if I am able to focus on working on that, I could have something ready by the end of the week. Just those damn annoying kernel races makes it super annoying to verify there aren't any issues left :/
23:25imirkin: i have a NV43 (PCIe), NV44 (PCIe), and NV44A (PCI, not AGP as one might think))
23:25imirkin: and i have a PCI NV34 and some other stuff probably
23:26karolherbst: yeah.. need to check what I have here, but too lazy to unpack stuff atm :D
23:28imirkin: well, these things go for like $10 on ebay
23:29karolherbst: mhh, true
23:30karolherbst: anyway.. first I will fix that stuff up for nvc0 and then continue with nv50... it's super annoying this stuff kind of needs to be fixed for all at once :/
23:30imirkin: this is the nv43 i have: https://www.ebay.com/itm/Nvidia-Quadro-FX-3450-256MB-GDDR3-PCI-Express-x16-Desktop-Graphics-Works
23:30imirkin: this time with the part of the URL which matters
23:31imirkin: they used to put it first
23:31karolherbst: I've got this one GPU with the "huge" cooling block
23:31karolherbst: well.. not huge
23:31karolherbst: still single slot
23:32imirkin: requires extra power
23:32imirkin: yeah - i really prefer single-slot boards
23:32imirkin: it has a fan which feels like it can make your computer go supersonic though
23:32karolherbst: I had this model with the cooling block on both sides
23:32karolherbst: but yeah.. it's burried somewhere
23:33karolherbst: don't know if those even work
23:33imirkin: but like you can also get some random geforce 6600 or whatever
23:34imirkin: which are cheap and weak
23:34imirkin: and have like 128mb vram
23:34karolherbst: I prefer to get those GPUs for free though :D
23:35imirkin: yes, that's preferable.
23:35karolherbst: I mean.. I actually got... 5 GPUs for free in total?
23:36imirkin: ah, nice
23:36karolherbst: the best stuff was two identical NV84 GPUs
23:36karolherbst: hacking up SLI with those could be fun
23:37imirkin: yeah, the thing with SLI is the execution model is never clear
23:37imirkin: it's fine for immediate renders
23:37karolherbst: I also have no idea what you can actually do over SLI
23:37imirkin: i just know the command submission aspect of it
23:38imirkin: namely you can submit commands s.t. they're only executed on one or another board
23:38imirkin: with "identical" cmdstreams
23:38imirkin: but i think there's more to it - shared memory, etc? dunno
23:38karolherbst: yeah.. also wondering what the connection is actually there for
23:38imirkin: i.e. accessing the other GPU's VM
23:38karolherbst: is it some high speed memory bus
23:38karolherbst: or just "interrupts"?
23:38karolherbst: or something else?
23:40airlied: at least on amd it used to be like a dvi link
23:41karolherbst: so you essentially scanned out into the other GPU?
23:41karolherbst: that's.... weird
23:41karolherbst: but I guess that's good enough if you optimistically split the work in half
23:42imirkin: on voodoo2, it was designed to merge alternating scanlines
23:42karolherbst: heh.. sounds like you got like 20% more perf out if it
23:42imirkin: i.e. one GPU would generate half the image, one the other half, and there was a passthrough VGA cable between them
23:42imirkin: no, 100% more perf
23:42imirkin: coz it was all immediate mode rendering
23:42imirkin: no weird loops/etc
23:42karolherbst: yeah.. okay
23:44karolherbst: I bet these days you can probably just do compute on the one, and the graphics bit on the other one and get a fair 50% share of the work on each :p
23:44imirkin: sync'ing is always the hard part
23:44imirkin: (or rather, the expensive part)
23:45imirkin: that's why the alternating scanline thing was so great
23:45imirkin: but yeah, nowadays dual GPU doesn't actually speed things up so much
23:45karolherbst: I think using one GPU as a compute offloading device makes the most sense... like for physx you could dedicate one GPU to it
23:46karolherbst: and maybe these days you could even split raytracing stuff out?
23:46airlied: it was a good plan in the old single pass rendering days