08:39 Armada: imirkin, so macros are stored in the context, but where is the context stored? ;)
10:22 RSpliet: Armada: the current context should live in hw registers. Inactive contexts live in DRAM on the graphics card
10:26 karolherbst: RSpliet: we don't save the macros inside registers
10:26 karolherbst: and the context consists of more than just what's inside the registers
10:27 karolherbst: RSpliet: or do you mean "hw registers" as in mmio registers as well?
10:27 RSpliet: karolherbst: Okay... it's either in those regs, or there's a pointer in those regs that will lead up to the required data.
10:28 karolherbst: yeah.. my guess is we upload those graph macros into some vram buffer and just point to them from inside the context
10:28 karolherbst: Armada: did you check the code to where those macros are uploaded?
10:30 RSpliet: Well, mmio simply provides an interface to HW registers using the address space of the host system (the host system being either the x86/arm/... CPU or a falcon core)
10:30 RSpliet: karolherbst: having a pointer makes more sense indeed :-D no need for expensive SRAM arrays for latency tolerant execution.
10:31 karolherbst: yeah
10:31 RSpliet: Shouldn't we simply be able to infer all this from nouveau?
10:31 karolherbst: yes
10:40 karolherbst: Armada: I think the macros are stored inside nvc0/mme/, you could simply follow the code and see what happens on the mesa side
10:42 Armada: karolherbst, I know how it's uploaded, I was just asking because an emulator assumed that macros were stored in VRAM
10:43 karolherbst: Armada: yeah, I guess that's where they ended up for nouveau as well
10:43 karolherbst: would be weird if not
10:44 karolherbst: Armada: or did something else happen?
10:44 Armada: karolherbst, where would it store it in VRAM then? you can't specify a GPU address for macros
10:45 karolherbst: Armada: you just allocate some VRAM and upload it there
10:45 Armada: how would I tell the GPU?
10:45 karolherbst: or does it have to be a static address?
10:45 karolherbst: mhh, I would have to read the code for that actually
10:46 karolherbst: Armada: we do "MK_MACRO(NVC0_3D_MACRO_VERTEX_ARRAY_PER_INSTANCE, mme9097_per_instance_bf);"
10:46 karolherbst: MK_MACRO calls nvc0_graph_set_macro
10:46 Armada: macros are uploaded in the same way as constants
10:47 Armada: constants reside in a cache right? not in VRAM unless it's a UBO?
10:47 karolherbst: uhm
10:47 karolherbst: you never know by just looking
10:47 karolherbst: but mhh
10:48 Armada: wouldn't make sense to upload each instruction individually in the command buffer if they just end up in VRAM
10:48 karolherbst: maybe there is indeed some dedicated memory for the macros and it just gets replaced by the context switching code
10:48 Armada: if that were the case it would make much more sense to just give an GPU address in that case
10:54 Armada: (I used too many cases in that sentence, I'm worried one of them might fall-through)
11:28 imirkin: Armada: there is no ubo vs non-ubo btw
11:32 Armada: ah then I misunderstood it
11:33 Armada: what's the point of CB_DATA then when you already give an address to the UBO which you already filled with data?
11:33 imirkin: imagine you have something like
11:33 imirkin: draw; glUniform; draw; glUniform; draw
11:33 imirkin: you could wait for the previous draw to complete, update the uniform, and draw again
11:34 imirkin: however nvidia introduces an on-chip staging area for uniform updates
11:34 Armada: ah okay, CB_DATA stores it in the command buffer instead of the UBO
11:34 Armada: does CB_DATA write it out to the UBO once executed by the FIFO?
11:34 imirkin: which properly updates the cache, and writes the value out eventually
11:34 Armada: right
11:35 Armada: so how about macros? do they only reside in such an on-chip cache?
11:36 Armada: or are they written out to VRAM?
11:37 imirkin: afaik there's no explicit backing buffer there
11:38 Armada: it uploads it to the cloud?
11:38 imirkin: dedicated on-chip area presumably
11:38 Armada: stores it in the blockchain, got it
12:12 karolherbst: Armada: wondering why the emulator assumes VRAM then..
12:13 Armada: karolherbst, it's already been corrected
12:13 karolherbst: ahh
12:13 karolherbst: I see
12:14 Armada: the reason was probably misleading documentation on our wiki which kept referring to macro "pointers" which seemed to imply those were full gpu addresses
12:19 imirkin: Armada: it's a pretty limited amount of memory, so i'm fairly sure it's in "private" on-chip memory, not general VRAM. i don't think we ever provide a "put macros here" buffer object
12:21 Armada: we don't, but for all I know it might be provided by the kernel driver or something
12:35 imirkin: hmmmm ... could be. iirc we do provide some buffers for context setup? not sure.
12:36 imirkin: i know quite little about all that
13:08 karolherbst: imirkin: yeah, that would be my guess as well kind of
13:08 karolherbst: imirkin: or maybe there is simply a 4k buffer for each context for various stuff.
13:08 karolherbst: maybe mwk knows something about that?
13:12 mwk: huh?
13:13 mwk: you mean macros, as in the macro engine with the funny ISA that sits in front of pgraph?
13:13 karolherbst: mwk: yes
13:13 mwk: then there's a dedicated piece of memory on-chip
13:13 mwk: rather small, 0x800 entries or something like that
13:14 mwk: it's context switched along with all the other graphics state
13:14 mwk: via the ramchain mechanism aka context strands
13:14 mwk: same for the method shadow data cache
13:15 RSpliet: mwk: is that at the top level or per GPC?
13:15 mwk: top level of course
13:15 mwk: gpcs aren't that independent
13:21 karolherbst: mwk: okay, thanks
18:38 LeFarfadetSpatia: Hello everybody out there!
18:40 LeFarfadetSpatia: Well, I am Yoann LE BARS, I have recently joined Nouveau mailing list.
18:42 LeFarfadetSpatia: So, I am wondering if I can give any help to Nouveau project.
18:42 imirkin_: hey
18:43 imirkin_: so ideally you can identify an area of nouveau that you feel is simultaneously lacking and that you can do something to improve the situation
18:44 LeFarfadetSpatia: I happens I have a GeForce 750 Ti GPU.
18:44 LeFarfadetSpatia: I need video decoding and computing on GPU (such as OpenCL).
18:44 imirkin_: just curious -- what do you need in OpenCL that you can't get with OpenGL compute shaders
18:44 LeFarfadetSpatia: I turns out that Nouveau is not really good at it on my GPU.
18:45 LeFarfadetSpatia: imirkin: I do scientific computing.
18:46 imirkin_: right...
18:46 LeFarfadetSpatia: In the scientific computing community, we do not really use shaders.
18:46 imirkin_: if you use opencl, you use shaders...
18:46 LeFarfadetSpatia: Well, yes…
18:46 LeFarfadetSpatia: Sorry, it has been a long time since I did not use IRC.
18:47 imirkin_: is this a situation where opengl compute shaders would be totally fine but you have all this legacy stuff written in OpenCL C?
18:47 LeFarfadetSpatia: OpenGL shaders would probably be fine.
18:47 LeFarfadetSpatia: Anyway, I mostly use legacy stuff.
18:48 imirkin_: ok. well you may be interested in some of the work karolherbst has been doing, trying to get OpenCL going
18:48 LeFarfadetSpatia: Now, to be honest, I mostly use these code on a supercomputer I have an account on.
18:48 imirkin_: not sure how far along that is, but i think it's moderately advanced.
18:48 LeFarfadetSpatia: I note: contact karolherbst
18:48 karolherbst: mhhh well
18:49 karolherbst: status is, things are usually working out quite nice, except there is more complex control flow
18:49 imirkin_: as for video decoding, someone needs to RE the video decoding engine -- apparently it's quite similar to earlier generations, so shouldn't be impossibly difficult.
18:49 LeFarfadetSpatia: Alright.
18:49 karolherbst: LeFarfadetSpatia: and currently it is kind of annoying to setup, because it depends on llvm master and some other dependencies
18:49 karolherbst: best would be to wait until things settle
18:49 imirkin_: if you're interested in the video decoding stuff, i could probably help you get started
18:50 imirkin_: but it's not an easy task
18:50 LeFarfadetSpatia: Now, the thing is: I have never done any reverse engineering.
18:50 imirkin_: no one has, until they have...
18:50 LeFarfadetSpatia: Indeed …
18:50 imirkin_: it's generally not that hard ... just look at what's being done and try to understand why
18:51 LeFarfadetSpatia:receiving a phone call
18:58 LeFarfadetSpatia:is back.
19:00 LeFarfadetSpatia: imirkin: I guess the video decoding engine is the one of nVidia proprietary driver, am I right?
19:00 LeFarfadetSpatia: Hum.
19:00 LeFarfadetSpatia: Let me rephrase it:
19:01 LeFarfadetSpatia: I guess the video decoding engine to be reverse engineered is the one of nVidia proprietary driver, am I right?
19:07 imirkin_: the engine itself is in the hardware
19:07 imirkin_: the driver ... drives it
19:08 imirkin_: you'd have to trace the communications between the driver and the engine
19:08 imirkin_: and figure out how the engine needs to be driven in order to decode data
19:08 LeFarfadetSpatia: Right.
19:08 LeFarfadetSpatia: Now: how do you do this?
19:09 imirkin_: :)
19:09 imirkin_: "carefully"
19:09 LeFarfadetSpatia: :)
19:10 imirkin_: i can explain more in about an hour
19:10 LeFarfadetSpatia: OK
19:32 HdkR: imirkin_: Obviously it is time to implement cuda in Nouveau :P
19:32 Lyude: imirkin_: thanks for the pointer on that bug! that actually turned out to be the fix I needed :)
19:32 imirkin_: ?
19:33 imirkin_: the cursor thing?
19:33 Lyude: imirkin_: the flickering cursor issue with kepler
19:33 Lyude: yeah
19:33 imirkin_: cool
20:30 LeFarfadetSpatia:just discouvered Bryan Lunduke’s statement on Mozilla being untrustworthly.
20:39 LeFarfadetSpatia: Well, he may have a point, but this leads me away from Nouveau …
20:40 LeFarfadetSpatia: imirkin: Do you have some more time now?
20:40 imirkin_: LeFarfadetSpatia: sorry, i got busy =/
20:41 LeFarfadetSpatia: imirkin_: No problem.
20:41 LeFarfadetSpatia: Maybe should I come back some other day?
20:41 imirkin_: i'm always busy
20:41 imirkin_: so now's a good time as any
20:41 LeFarfadetSpatia: !
20:41 imirkin_: ok
20:42 imirkin_: so the basic idea
20:42 imirkin_: is that you should find a constant test file to ... test with
20:42 imirkin_: decoding via vdpau
20:42 imirkin_: then you should do a mmiotrace of the blob which includes playing back the file
20:42 imirkin_: this will hopefully give you enough info on how to get the core engine going
20:43 imirkin_: there will be firmware that needs to be uploaded
20:43 imirkin_: but we can get to that once you have the trace
20:43 imirkin_: similarly you also need to capture the userspace submissions to the hardware for the actual decoding
20:43 imirkin_: this is done with valgrind-mmt
20:43 LeFarfadetSpatia: Then, find a video file. I guess, not a too long one.
20:44 imirkin_: once you have that, we can have a look to see if it's 99.9% identical to the previous gens or not
20:44 LeFarfadetSpatia: This step seems quite easy.
20:44 imirkin_: iirc i saw some info somewhere which suggested that it should be nearly identical
20:44 imirkin_: i spent over a month getting the VP2 engine decoding up and running
20:44 LeFarfadetSpatia: I guess that means to install the proprietary driver, am I right?
20:44 imirkin_: i was very new to nouveau at the time
20:45 imirkin_: and there already was an existing decoder for the later gens (which turned out to be quite different)
20:45 imirkin_: so ... this isn't like a half-hour project
20:45 LeFarfadetSpatia: It seems so …
20:46 imirkin_: my guess is that knowing all that i know now, it would be a week of full-time effort for me to get this stuff going. for you (no offence), it will be much longer, due to all the additional learning you'll have to do.
20:46 imirkin_: this isn't meant to discourage you, but it's meant to set your expectations.
20:46 LeFarfadetSpatia: No offense.
20:47 imirkin_: i know english isn't your native language -- if you ever run into trouble, about half the people here speak french, myself included.
20:48 LeFarfadetSpatia: Well, I have live in the USA and spent some times in the UK, so English is not a problem.
20:48 LeFarfadetSpatia: Après, le français est bien ma langue maternelle.
20:49 imirkin_: however in public discussions on the chan, we do try to stick to english
20:49 LeFarfadetSpatia: This was my guess.
20:49 LeFarfadetSpatia: s/live/lived
20:50 imirkin_: probably won't be too helpful, but i wrote a toy h264 player for my vp2 re'ing efforts, so that i could carefully control what was being submitted to the vdpau api
20:50 imirkin_: https://github.com/imirkin/re-vp2
20:50 imirkin_: [wow. 5 years ago? time flies...]
20:52 LeFarfadetSpatia: Now, clearly I do not have any idea what I am getting myself into.
20:53 LeFarfadetSpatia: And I cannot spent a whole week doing nothing but exploring the decoding video engine.
20:54 imirkin_: well, the idea is that you spend an hour here and there
20:54 LeFarfadetSpatia: So, it definitively will takes me some time.
20:56 LeFarfadetSpatia: Anyway, I have to sort this out: this imply to have a proprietary nVidia driver running, right?
20:56 imirkin_: yes, the idea is to see what it's doing
20:56 imirkin_: and then do that :)
20:57 LeFarfadetSpatia: Well, here is the problem: I am using the RT kernel.
20:57 LeFarfadetSpatia: NVidia’s drivers does not support it.
20:57 LeFarfadetSpatia: And as I use my computer in some ways like a server, I do not restart it.
20:58 LeFarfadetSpatia: Also, switching from one driver to another one depending on the kernel you are running can be tricky …
20:58 imirkin_: yeah ... one thing with RE is ... lots of rebooting
20:59 LeFarfadetSpatia: It would be way easier if I had one test machine and one production machine.
21:00 LeFarfadetSpatia: So, first of all, I have to figure out the most convinient way to switch from one driver (and one kernel) to the other.
21:01 LeFarfadetSpatia: Last time I try and use nVidia proprietary driver, everything crashed with the next kernel update.
21:01 LeFarfadetSpatia: s/try/tried
21:04 LeFarfadetSpatia: I probably shall go and spend some time on Debian forum.
21:05 LeFarfadetSpatia: Anyway, thank you imirkin_, this gave me some elements to go further.
22:44 uriahheep: hey all... sorry about this potential nuisance of a question, but how well can I rely on feature matrix accuracy for an nv40 card nowadays? it's a GeForce 7900gs
22:45 imirkin_: probably best to just ask here about things you care about
22:46 uriahheep: I don't really plan on gaming, just want a smooth enough UI
22:46 imirkin_: are you planning on using a "modern desktop environment"?
22:46 uriahheep: I see memory timing development has stalled, which is pretty important iirc
22:46 imirkin_: nv4x reclocking should work ok
22:46 imirkin_: with the exception of a handful of boards, apparently
22:47 uriahheep: well I'd set the system up for a low-knowledge user so I'd like some sort of easy GUI yes
22:47 imirkin_: easy gui like ... twm? or easy gui like plasmashell?
22:47 uriahheep: ah ok nice to know
22:47 imirkin_: in the latter case, don't even bother trying
22:47 uriahheep: possibly lxqt
22:48 imirkin_: my recommendation would be to disable any and all possible 3d accel
22:48 uriahheep: ok
22:48 imirkin_: just use X's acceleration, which is nice and stable
22:48 uriahheep: good to know
22:48 uriahheep: :)
22:48 imirkin_: and things will be fine
22:49 imirkin_: (i.e. the nouveau ddx)
22:49 uriahheep: wayland is also probably a no go right?
22:49 uriahheep: since it's egl
22:49 imirkin_: well, strictly speaking it could work
22:49 imirkin_: practically speaking, i haven't tried
22:49 imirkin_: and i doubt it'll work well
22:49 uriahheep: ok
22:50 uriahheep: I tried long ago, was decent enough but cpu intensive
22:50 imirkin_: if it's cpu intensive, it wasn't using the GPU :)
22:50 uriahheep: indeed
22:51 uriahheep: well thanks for the tips.
22:51 imirkin_: the next generation and later should work more reliably
22:51 uriahheep: ok
22:51 imirkin_: but if you want an actual nice experience in linux, stick to intel or amd
22:51 uriahheep: yeah
22:53 uriahheep: it's a pretty old system, maybe I can find an old amd/ati card and swap it