00:27 imirkin: mattst88: any luck? or moved onto other things?
03:22 mattst88: imirkin: decided to take a nap instead :)
03:30 mattst88: building the CD now
04:08 mattst88: imirkin: heh, with nouveau on the 6800, the modeset partially works
04:08 mattst88: about half of the screen is just blank
04:08 mattst88: the left half
04:09 mattst88: going to try 4.4 on both systems now
04:59 mattst88: imirkin: 4.4.150 fails on the iMac in the same way as 4.14
12:11 rhyskidd: so Turing looks like it will require rework within nouveau's codegen
12:21 pendingchaos: apparently control flow is quite different? do you know of any of large differences?
12:21 pendingchaos: *other large
14:57 imirkin: mattst88: weird. modesetting should be ok on the G5 / 6800, at least according to endrift.
14:58 imirkin: mattst88: do you know when the iMac G4 last worked?
14:58 mattst88: imirkin: nope, I only got the iMac G4 recently, so this is the first time I've tried it
14:58 imirkin: oh ok
14:58 imirkin: well ... here's a silly suggestion
14:58 imirkin: forget about nouveau/nvidia
14:58 imirkin: and just use offb
14:59 imirkin: make sure the thing *basically* works
14:59 imirkin: and then you can use modprobe to see what's going on
15:00 mattst88: okay, good idea
15:01 mattst88: it currently boots to OSX, so the hardware works
15:09 imirkin: yeah, but much like nouveau on BE/mac gets broken, plain mac also gets broken with some regularity
15:19 imirkin: e.g. recently there was some thing where OF <-> DT integration broke (in 4.19 or so)
15:22 pendingchaos: karolherbst, imirkin: could https://patchwork.freedesktop.org/patch/240042/ be reviewed sometime?
15:23 pendingchaos: and also https://patchwork.freedesktop.org/patch/229931/ and https://patchwork.freedesktop.org/patch/229930/ (I forgot to git add some changes moving the RA code into it's own function btw)
15:23 imirkin: pendingchaos: did you consider my suggestion to get it all via TXQ/SUQ?
15:23 pendingchaos: I think I did and liked the current approach better?
15:24 imirkin: ok
15:24 karolherbst: pendingchaos: I tested it and it apperantly broke on kepler
15:24 karolherbst: the bindless thing
15:24 karolherbst: but I think I told you that already, no?
15:24 pendingchaos: I think that was some other patch
15:24 karolherbst: or was that another path
15:24 karolherbst: *patch
15:24 imirkin: pendingchaos: ok, so the idea is to just have (ms x, ms y) pairs uploaded?
15:24 karolherbst: mhh, okay
15:25 pendingchaos: yeah
15:25 pendingchaos: I suppose it would be possible to do SUQ and then lookup in a smaller table from the constant buffer
15:26 imirkin: i hate the added dependency on constbuf... that's gotta go in general btw
15:26 imirkin: even for kepler
15:26 imirkin: we have to move to the nvidia model
15:26 imirkin: apparently they knew what they were doing.
15:26 imirkin: who knew.
15:26 imirkin: [they did]
15:26 imirkin: where this additional crap is stored in some side-buffer
15:26 imirkin: pendingchaos: another thing that you *actually* need is the base layer
15:27 imirkin: there's no way around it, either for textures nor for images, for binding a single layer of a 3d image.
15:27 karolherbst: imirkin: ohh, if you find some time, there are still those scalar tex instruction patches ;)
15:27 imirkin: for 2d array we cheat by adjusting the base address
15:27 imirkin: karolherbst: ok. will try today. i even have a maxwell plugged in so i can actually play around with it.
15:27 karolherbst: https://patchwork.freedesktop.org/series/47777/
15:27 pendingchaos: nvidia model?
15:27 imirkin: pendingchaos: but for 3d we can't adjust the base address
15:28 karolherbst: imirkin: well... actually, on maxwell we can
15:28 imirkin: pendingchaos: nvidia model is you just have a buffer allocated in gmem which is the bindless handle
15:28 imirkin: karolherbst: no, we can't.
15:28 karolherbst: imirkin: https://github.com/karolherbst/mesa/commit/8137bafc7a702f72196613750fe1d429a7a1dcd5
15:28 imirkin: karolherbst: not with 3d tiling
15:28 karolherbst: true
15:28 imirkin: which i don't want to sacrifice for the benefit of a feature no one will ever use in the history of time
15:28 karolherbst: but not even nvidia seems to use any tile mode on the z axis
15:29 imirkin: really?
15:29 karolherbst: yeah
15:29 karolherbst: they were setting it to 0
15:29 imirkin: so like you make a 32x32x32 3d texture
15:29 imirkin: and they don't tile on the Z axis?
15:29 imirkin: or even bigger, e.g. 256x256x256
15:29 karolherbst: that's what I saw, but I also wasn't trying very hard to get them to tile on the axis
15:29 imirkin: they might de-tiled
15:29 imirkin: detile
15:29 imirkin: in some cases
15:29 karolherbst: might be that there are some cases they do it
15:29 karolherbst: anyhow
15:30 karolherbst: it doesn't work on kepler
15:30 karolherbst: for whatever reason
15:31 karolherbst: worthwhile to investigate it more
15:31 karolherbst: on kepler as well
15:31 pendingchaos: how expensive would a SUQ be?
15:32 imirkin: pendingchaos: less expensive than the atomic cmpxchg on gmem that follows.
15:32 imirkin: ;)
15:34 pendingchaos: I might do a SUQ+smaller table instead, I don't think this isn't a very common case anyway
15:35 imirkin: MS images? only in tests.
15:36 imirkin: and feature lists.
15:36 imirkin: you can look at how it's done -- TXQS is the exact same operation
15:36 imirkin: (i.e. retrieve number of samples)
15:37 imirkin: (yeah, i'm not very creative with opcode naming)
15:43 karolherbst: why did they add it in the first place?
15:44 imirkin: add what?
15:44 imirkin: MS images? probably same reason they added subroutines.
15:45 karolherbst: yeah, MS images
15:46 karolherbst: subroutines we just inline, don't we?
15:47 imirkin: i mean ARB_shader_subroutine
15:47 karolherbst: yeah
15:47 imirkin: the idea was to have indirect calls
15:47 imirkin: all the hw supports it
15:47 imirkin: no one uses it.
15:48 karolherbst: mhh
15:49 imirkin: not all features pan out
15:49 imirkin: anyways, i dunno. maybe someone does use MS images
15:49 imirkin: just doesn't seem like there's much point in it.
16:36 Subv: when trying to map a bo that is referenced by some pushbuf, nouveau waits for the whole pushbuf to finish instead of only the operations related to that bo, right? ie, if a pushbuf contains { <write to bo0>, <long operation with bo1>} then will trying to map bo0 cause all operations on bo1 to finish as well?
16:38 imirkin: implicit fencing vs explicit.
16:38 imirkin: (yes)
16:39 pendingchaos: I don't think there would be any information on whether the long operation writes to bo0
16:39 imirkin: no, but you could stick a fence in
16:39 imirkin: right now the bo tracking in nouveau is all based on pushbuf completion
16:41 Subv: is there a design reason for this or is it just that nobody has tried to insert separate fences for individual bos?
16:42 imirkin: implicit fencing was the cool thing to do until recently
16:42 imirkin: i think explicit fencing became more common in DX11-era gpu's
16:45 Subv: i guess this is only relevant when doing operations that directly map buffers without needing a transfer while another operation is in progress (because as far as i understand, commands in a single channel are serialized anyway)
16:46 Subv: ie, drawing and mapping a directly-mappable buffer (different from the draw buffer, but modified by the GPU before the draw was issued)
16:46 imirkin: sorta
16:46 imirkin: there can be multiple engines
16:46 imirkin: which operate asynchronously
16:47 Subv: my understanding is that even if you use multiple engines, the channel is still serialized
16:47 Subv: you'd have to create multiple channels for it to be asynchronous
16:47 Subv: (channels, not subchannels)
16:48 imirkin: engines are asynchronous from each other
16:49 imirkin: on kepler+, the subchannel stuff isn't quite what it was on earlier gpu's
16:49 imirkin: but i'll be honest - i don't understand it all completely
16:50 imirkin: channels are a software construct.
16:50 imirkin: but they often map to a hw context, which is a hw construct. switching between contexts is very very expensive
16:54 Subv: recently we found a register in the 3D engine of the gm20b that increments the syncpt value of the syncpt you give it, this is used by official switch games to signal that a pushbuf has finished processing
16:54 Subv: note that this is inside the 3D engine, not in the pfifo method range
16:54 Subv: the only way i see this working is if all engines in the same channel are serialized
16:55 Subv: (otherwise you could end up with an unfinished 2D-engine transfer while the 3D-engine already finished and incremented the syncpt)
16:56 imirkin: Subv: yeah, read the docs in nouveau bout it :)
16:56 imirkin: there are two sets of methods
16:56 imirkin: one is the pfifo method
16:56 imirkin: one in the 3d engine
16:56 imirkin: i'm a little surprised you're not looking at the nouveau docs
16:57 Subv: i'm not sure if you mean the envytools docs or something else
16:57 imirkin: i do mean the envytools docs
16:57 imirkin: all the methods are pretty mapped out
16:58 imirkin: there's additional info to be gained from reading the nouveau code
16:58 Subv: i'm referring to https://github.com/envytools/envytools/blob/master/rnndb/graph/gf100_3d.xml#L83 this register, which is undocumented for maxwell
16:58 imirkin: oh ok
16:59 imirkin: that GLOBAL_BASE thing is clearly just for fermi
16:59 imirkin: i was talking about .... (gimme a min to find it)
16:59 Subv: also, https://envytools.readthedocs.io/en/latest/hw/fifo/puller.html?highlight=puller#id2 registers 0x70 and 0x74 here are used to wait for a syncpt, also undocumented
17:01 imirkin: oh, i think it was part of the query logic
17:01 imirkin: anyways, interesting that there's a more dedicated thing for it on maxwell
17:02 Subv: well, the query and semaphore methods can't increment syncpts
17:02 imirkin: looks like it's there on kepler too
17:02 imirkin: at least some keplers... hm. KeplerC, but not A/B
17:03 imirkin: right. they can wait for a value and then write a new value.
17:03 imirkin: this sounds more like an atomic increment
17:06 Subv: imirkin: is there a register that can set syncpt values? i'm talking about the internal GPU syncpts, not stuff like QUERY where you give it an address to write to
17:10 imirkin: i'm a little curious what a "sync point" is precisely
17:10 imirkin: normally it'd be backed by some kind of memory
17:10 imirkin: but i don't see a "SetSyncPointAddress" type of method
17:10 imirkin: could be in the class instance object, or could be in the channel. dunno.
17:11 imirkin: the objects themselves are backed by memory, which stores various state
17:11 imirkin: this is just a small example of that: https://github.com/skeggsb/nouveau/blob/master/drm/nouveau/nvkm/engine/gr/nv04.c#L1042
17:12 imirkin: for much older chips obviously
17:13 imirkin: this is a more modern bit of logic: https://github.com/skeggsb/nouveau/blob/master/drm/nouveau/nvkm/engine/gr/gf100.c#L319
18:52 Subv: imirkin: the syncpoints seem to be an internal hardware construct
18:52 Subv: you don't have to set them up yourself, they just exist, there seems to be a limited amount of them
18:52 imirkin: what can you do with them other than increment?
18:52 imirkin: can you retrieve?
18:52 imirkin: write out to query buffer or something?
18:52 Subv: but anyway, we deviated too much from the original question: if engines can work in parallel, how does using the DMA engine to copy some buffer and then immediately using that buffer to draw in the same pushbuf work in nouveau without any synchronization?
18:53 Subv: imirkin: no idea yet :D
18:53 imirkin: "dma engine"?
18:53 imirkin: you mean COPY2? that's actually part of PFIFO :)
18:53 Subv: the 0x70 and 0x74 methods in PFIFO seem to be used to wait for a syncpoint to reach a certain threshold, but i'm not 100% sure of the details yet
18:54 imirkin: it processes those sequentially
18:54 imirkin: if you're using a legit copy engine, you *do* have to synchronize
18:54 Subv: imirkin: i mean engine class 0xB0B5
18:54 imirkin: i.e. copy0/copy1
18:54 Subv: or even 0x902D, or any other thing, really
18:55 imirkin: yeah, b0b5 is part of the fifo engine iirc
18:55 imirkin: as opposed to a legit "other" copy engine
18:55 imirkin: 902d and the 3d classes are all serviced by the GR engine
18:57 Subv: is 0xA140 also part of pfifo?
18:58 Subv: mm, so pfifo processes commands to 0xB0B5 sequentially, and then continues processing 0xB197 (3d) commands asynchronously? that sounds really weird
19:07 imirkin: fifo just sends commands to other engines
19:07 imirkin: in sequence
19:07 Subv: then what did you mean by "<imirkin> yeah, b0b5 is part of the fifo engine iirc" ?
19:08 imirkin: so in that case, fifo processes those directly
19:08 Subv: so that's kind of a "virtual" engine that doesn't really exist?
19:09 imirkin: sort of
19:11 Subv: interesting
19:12 Subv: but if i use 0x902D (2D) to write to a buffer A, then consume that buffer with 0xB197 (3D) i have to insert a QUERY (set) after the 2D write, and a QUERY (wait) before the 3D read?
19:13 imirkin: no
19:13 imirkin: 2d/3d are all the same engine
19:13 imirkin: (gr)
19:15 Subv: is there a list somewhere of what class ids are served by which engines?
19:17 Subv: the envytools documentation ( https://envytools.readthedocs.io/en/latest/hw/fifo/puller.html?highlight=puller#fifo-mthd-object ) makes it sound like the things you bind through BIND_OBJECT are all different engines
19:19 imirkin: on fermi and earlier you'd bind classes to subchannels
19:19 imirkin: on kepler+ iirc it's a lot more fixed, where subchannel id == engine id
19:22 Subv: so if i get this straight, the fifo submits stuff to the engines bound to each subchannel asynchronously and keeps executing without waiting for them to end, except for those engines that are processed by fifo itself (like b0b5)
19:23 Subv: envytools says "The engines run asynchronously. The puller will send them commands whenever they have space in their input queues and won’t wait for completion of a command before sending more. However, when engines are switched [ie. puller has to submit a command to a different engine than last used by the channel], the puller will wait until the last used engine is done with this channel’s comma
19:23 Subv: nds"
19:27 Subv: does this apply to pfifo commands too? would { SendOnChannel(3d_commands, channel=3D); SendOnChannel(fifo_commands, channel=3D); } wait until the pgraph engine finishes or just execute the pfifo command asynchronously to the 3D commands?
19:28 imirkin: mmmmaybe
19:28 imirkin: dunno
19:29 imirkin: my mental model of how this all works doesn't go that deep
19:30 Subv: the nouveau kernel driver uses SEMAPHORE pfifo commands to wait for pushbufs to finish executing, if those are executed asynchronously to 3d operations then stuff would break
19:30 Subv: (i think)
19:35 imirkin: well, they stall pfifo processing
19:36 Subv: why would a SEMAPHORE_RELEASE stall pfifo processing?
19:37 Subv: unless all pfifo methods are treated as if they belonged to a completely different engine, thus triggering the OnEngineSwitch->WaitForIdle behavior
19:38 Subv: my main concern is that you can't have real multi-engine parallelism unless you use multiple channels, this seems to be confirmed by the pseudocode in envytools docs: https://envytools.readthedocs.io/en/latest/hw/fifo/puller.html?highlight=puller#fifo-mthd-object
19:39 Subv: ie, submitting to PCOPY and then PGRAPH will cause it to wait until PCOPY finishes before sending the stuff to PGRAPH
19:39 imirkin: with nouveau and implicit sync? yes.
19:41 Subv: i mean in general, if only a single channel is used
19:41 Subv: envytools says, on command submit: "if (engines[subc] != last_engine) { while (ENGINE_CUR_CHANNEL(last_engine) == chan && !ENGINE_IDLE(last_engine)); }"
19:42 Subv: so even without implicit sync you would need multiple channels, one bound to each engine, to achieve this kind of parallelism
19:48 imirkin: ah right. the idle engine thing. i don't know if that's actually the case.
20:21 imirkin: skeggsb: look familiar? http://paste.debian.net/hidden/2192d63d/
20:22 imirkin: just hung in the middle of me doing nothing too fancy, but probably chrome & co using 3d things