08:39Armada: imirkin, so macros are stored in the context, but where is the context stored? ;)
10:22RSpliet: Armada: the current context should live in hw registers. Inactive contexts live in DRAM on the graphics card
10:26karolherbst: RSpliet: we don't save the macros inside registers
10:26karolherbst: and the context consists of more than just what's inside the registers
10:27karolherbst: RSpliet: or do you mean "hw registers" as in mmio registers as well?
10:27RSpliet: karolherbst: Okay... it's either in those regs, or there's a pointer in those regs that will lead up to the required data.
10:28karolherbst: yeah.. my guess is we upload those graph macros into some vram buffer and just point to them from inside the context
10:28karolherbst: Armada: did you check the code to where those macros are uploaded?
10:30RSpliet: Well, mmio simply provides an interface to HW registers using the address space of the host system (the host system being either the x86/arm/... CPU or a falcon core)
10:30RSpliet: karolherbst: having a pointer makes more sense indeed :-D no need for expensive SRAM arrays for latency tolerant execution.
10:31RSpliet: Shouldn't we simply be able to infer all this from nouveau?
10:40karolherbst: Armada: I think the macros are stored inside nvc0/mme/, you could simply follow the code and see what happens on the mesa side
10:42Armada: karolherbst, I know how it's uploaded, I was just asking because an emulator assumed that macros were stored in VRAM
10:43karolherbst: Armada: yeah, I guess that's where they ended up for nouveau as well
10:43karolherbst: would be weird if not
10:44karolherbst: Armada: or did something else happen?
10:44Armada: karolherbst, where would it store it in VRAM then? you can't specify a GPU address for macros
10:45karolherbst: Armada: you just allocate some VRAM and upload it there
10:45Armada: how would I tell the GPU?
10:45karolherbst: or does it have to be a static address?
10:45karolherbst: mhh, I would have to read the code for that actually
10:46karolherbst: Armada: we do "MK_MACRO(NVC0_3D_MACRO_VERTEX_ARRAY_PER_INSTANCE, mme9097_per_instance_bf);"
10:46karolherbst: MK_MACRO calls nvc0_graph_set_macro
10:46Armada: macros are uploaded in the same way as constants
10:47Armada: constants reside in a cache right? not in VRAM unless it's a UBO?
10:47karolherbst: you never know by just looking
10:47karolherbst: but mhh
10:48Armada: wouldn't make sense to upload each instruction individually in the command buffer if they just end up in VRAM
10:48karolherbst: maybe there is indeed some dedicated memory for the macros and it just gets replaced by the context switching code
10:48Armada: if that were the case it would make much more sense to just give an GPU address in that case
10:54Armada: (I used too many cases in that sentence, I'm worried one of them might fall-through)
11:28imirkin: Armada: there is no ubo vs non-ubo btw
11:32Armada: ah then I misunderstood it
11:33Armada: what's the point of CB_DATA then when you already give an address to the UBO which you already filled with data?
11:33imirkin: imagine you have something like
11:33imirkin: draw; glUniform; draw; glUniform; draw
11:33imirkin: you could wait for the previous draw to complete, update the uniform, and draw again
11:34imirkin: however nvidia introduces an on-chip staging area for uniform updates
11:34Armada: ah okay, CB_DATA stores it in the command buffer instead of the UBO
11:34Armada: does CB_DATA write it out to the UBO once executed by the FIFO?
11:34imirkin: which properly updates the cache, and writes the value out eventually
11:35Armada: so how about macros? do they only reside in such an on-chip cache?
11:36Armada: or are they written out to VRAM?
11:37imirkin: afaik there's no explicit backing buffer there
11:38Armada: it uploads it to the cloud?
11:38imirkin: dedicated on-chip area presumably
11:38Armada: stores it in the blockchain, got it
12:12karolherbst: Armada: wondering why the emulator assumes VRAM then..
12:13Armada: karolherbst, it's already been corrected
12:13karolherbst: I see
12:14Armada: the reason was probably misleading documentation on our wiki which kept referring to macro "pointers" which seemed to imply those were full gpu addresses
12:19imirkin: Armada: it's a pretty limited amount of memory, so i'm fairly sure it's in "private" on-chip memory, not general VRAM. i don't think we ever provide a "put macros here" buffer object
12:21Armada: we don't, but for all I know it might be provided by the kernel driver or something
12:35imirkin: hmmmm ... could be. iirc we do provide some buffers for context setup? not sure.
12:36imirkin: i know quite little about all that
13:08karolherbst: imirkin: yeah, that would be my guess as well kind of
13:08karolherbst: imirkin: or maybe there is simply a 4k buffer for each context for various stuff.
13:08karolherbst: maybe mwk knows something about that?
13:13mwk: you mean macros, as in the macro engine with the funny ISA that sits in front of pgraph?
13:13karolherbst: mwk: yes
13:13mwk: then there's a dedicated piece of memory on-chip
13:13mwk: rather small, 0x800 entries or something like that
13:14mwk: it's context switched along with all the other graphics state
13:14mwk: via the ramchain mechanism aka context strands
13:14mwk: same for the method shadow data cache
13:15RSpliet: mwk: is that at the top level or per GPC?
13:15mwk: top level of course
13:15mwk: gpcs aren't that independent
13:21karolherbst: mwk: okay, thanks
18:38LeFarfadetSpatia: Hello everybody out there!
18:40LeFarfadetSpatia: Well, I am Yoann LE BARS, I have recently joined Nouveau mailing list.
18:42LeFarfadetSpatia: So, I am wondering if I can give any help to Nouveau project.
18:43imirkin_: so ideally you can identify an area of nouveau that you feel is simultaneously lacking and that you can do something to improve the situation
18:44LeFarfadetSpatia: I happens I have a GeForce 750 Ti GPU.
18:44LeFarfadetSpatia: I need video decoding and computing on GPU (such as OpenCL).
18:44imirkin_: just curious -- what do you need in OpenCL that you can't get with OpenGL compute shaders
18:44LeFarfadetSpatia: I turns out that Nouveau is not really good at it on my GPU.
18:45LeFarfadetSpatia: imirkin: I do scientific computing.
18:46LeFarfadetSpatia: In the scientific computing community, we do not really use shaders.
18:46imirkin_: if you use opencl, you use shaders...
18:46LeFarfadetSpatia: Well, yes…
18:46LeFarfadetSpatia: Sorry, it has been a long time since I did not use IRC.
18:47imirkin_: is this a situation where opengl compute shaders would be totally fine but you have all this legacy stuff written in OpenCL C?
18:47LeFarfadetSpatia: OpenGL shaders would probably be fine.
18:47LeFarfadetSpatia: Anyway, I mostly use legacy stuff.
18:48imirkin_: ok. well you may be interested in some of the work karolherbst has been doing, trying to get OpenCL going
18:48LeFarfadetSpatia: Now, to be honest, I mostly use these code on a supercomputer I have an account on.
18:48imirkin_: not sure how far along that is, but i think it's moderately advanced.
18:48LeFarfadetSpatia: I note: contact karolherbst
18:48karolherbst: mhhh well
18:49karolherbst: status is, things are usually working out quite nice, except there is more complex control flow
18:49imirkin_: as for video decoding, someone needs to RE the video decoding engine -- apparently it's quite similar to earlier generations, so shouldn't be impossibly difficult.
18:49karolherbst: LeFarfadetSpatia: and currently it is kind of annoying to setup, because it depends on llvm master and some other dependencies
18:49karolherbst: best would be to wait until things settle
18:49imirkin_: if you're interested in the video decoding stuff, i could probably help you get started
18:50imirkin_: but it's not an easy task
18:50LeFarfadetSpatia: Now, the thing is: I have never done any reverse engineering.
18:50imirkin_: no one has, until they have...
18:50LeFarfadetSpatia: Indeed …
18:50imirkin_: it's generally not that hard ... just look at what's being done and try to understand why
18:51LeFarfadetSpatia:receiving a phone call
19:00LeFarfadetSpatia: imirkin: I guess the video decoding engine is the one of nVidia proprietary driver, am I right?
19:00LeFarfadetSpatia: Let me rephrase it:
19:01LeFarfadetSpatia: I guess the video decoding engine to be reverse engineered is the one of nVidia proprietary driver, am I right?
19:07imirkin_: the engine itself is in the hardware
19:07imirkin_: the driver ... drives it
19:08imirkin_: you'd have to trace the communications between the driver and the engine
19:08imirkin_: and figure out how the engine needs to be driven in order to decode data
19:08LeFarfadetSpatia: Now: how do you do this?
19:10imirkin_: i can explain more in about an hour
19:32HdkR: imirkin_: Obviously it is time to implement cuda in Nouveau :P
19:32Lyude: imirkin_: thanks for the pointer on that bug! that actually turned out to be the fix I needed :)
19:33imirkin_: the cursor thing?
19:33Lyude: imirkin_: the flickering cursor issue with kepler
20:30LeFarfadetSpatia:just discouvered Bryan Lunduke’s statement on Mozilla being untrustworthly.
20:39LeFarfadetSpatia: Well, he may have a point, but this leads me away from Nouveau …
20:40LeFarfadetSpatia: imirkin: Do you have some more time now?
20:40imirkin_: LeFarfadetSpatia: sorry, i got busy =/
20:41LeFarfadetSpatia: imirkin_: No problem.
20:41LeFarfadetSpatia: Maybe should I come back some other day?
20:41imirkin_: i'm always busy
20:41imirkin_: so now's a good time as any
20:42imirkin_: so the basic idea
20:42imirkin_: is that you should find a constant test file to ... test with
20:42imirkin_: decoding via vdpau
20:42imirkin_: then you should do a mmiotrace of the blob which includes playing back the file
20:42imirkin_: this will hopefully give you enough info on how to get the core engine going
20:43imirkin_: there will be firmware that needs to be uploaded
20:43imirkin_: but we can get to that once you have the trace
20:43imirkin_: similarly you also need to capture the userspace submissions to the hardware for the actual decoding
20:43imirkin_: this is done with valgrind-mmt
20:43LeFarfadetSpatia: Then, find a video file. I guess, not a too long one.
20:44imirkin_: once you have that, we can have a look to see if it's 99.9% identical to the previous gens or not
20:44LeFarfadetSpatia: This step seems quite easy.
20:44imirkin_: iirc i saw some info somewhere which suggested that it should be nearly identical
20:44imirkin_: i spent over a month getting the VP2 engine decoding up and running
20:44LeFarfadetSpatia: I guess that means to install the proprietary driver, am I right?
20:44imirkin_: i was very new to nouveau at the time
20:45imirkin_: and there already was an existing decoder for the later gens (which turned out to be quite different)
20:45imirkin_: so ... this isn't like a half-hour project
20:45LeFarfadetSpatia: It seems so …
20:46imirkin_: my guess is that knowing all that i know now, it would be a week of full-time effort for me to get this stuff going. for you (no offence), it will be much longer, due to all the additional learning you'll have to do.
20:46imirkin_: this isn't meant to discourage you, but it's meant to set your expectations.
20:46LeFarfadetSpatia: No offense.
20:47imirkin_: i know english isn't your native language -- if you ever run into trouble, about half the people here speak french, myself included.
20:48LeFarfadetSpatia: Well, I have live in the USA and spent some times in the UK, so English is not a problem.
20:48LeFarfadetSpatia: Après, le français est bien ma langue maternelle.
20:49imirkin_: however in public discussions on the chan, we do try to stick to english
20:49LeFarfadetSpatia: This was my guess.
20:50imirkin_: probably won't be too helpful, but i wrote a toy h264 player for my vp2 re'ing efforts, so that i could carefully control what was being submitted to the vdpau api
20:50imirkin_: [wow. 5 years ago? time flies...]
20:52LeFarfadetSpatia: Now, clearly I do not have any idea what I am getting myself into.
20:53LeFarfadetSpatia: And I cannot spent a whole week doing nothing but exploring the decoding video engine.
20:54imirkin_: well, the idea is that you spend an hour here and there
20:54LeFarfadetSpatia: So, it definitively will takes me some time.
20:56LeFarfadetSpatia: Anyway, I have to sort this out: this imply to have a proprietary nVidia driver running, right?
20:56imirkin_: yes, the idea is to see what it's doing
20:56imirkin_: and then do that :)
20:57LeFarfadetSpatia: Well, here is the problem: I am using the RT kernel.
20:57LeFarfadetSpatia: NVidia’s drivers does not support it.
20:57LeFarfadetSpatia: And as I use my computer in some ways like a server, I do not restart it.
20:58LeFarfadetSpatia: Also, switching from one driver to another one depending on the kernel you are running can be tricky …
20:58imirkin_: yeah ... one thing with RE is ... lots of rebooting
20:59LeFarfadetSpatia: It would be way easier if I had one test machine and one production machine.
21:00LeFarfadetSpatia: So, first of all, I have to figure out the most convinient way to switch from one driver (and one kernel) to the other.
21:01LeFarfadetSpatia: Last time I try and use nVidia proprietary driver, everything crashed with the next kernel update.
21:04LeFarfadetSpatia: I probably shall go and spend some time on Debian forum.
21:05LeFarfadetSpatia: Anyway, thank you imirkin_, this gave me some elements to go further.
22:44uriahheep: hey all... sorry about this potential nuisance of a question, but how well can I rely on feature matrix accuracy for an nv40 card nowadays? it's a GeForce 7900gs
22:45imirkin_: probably best to just ask here about things you care about
22:46uriahheep: I don't really plan on gaming, just want a smooth enough UI
22:46imirkin_: are you planning on using a "modern desktop environment"?
22:46uriahheep: I see memory timing development has stalled, which is pretty important iirc
22:46imirkin_: nv4x reclocking should work ok
22:46imirkin_: with the exception of a handful of boards, apparently
22:47uriahheep: well I'd set the system up for a low-knowledge user so I'd like some sort of easy GUI yes
22:47imirkin_: easy gui like ... twm? or easy gui like plasmashell?
22:47uriahheep: ah ok nice to know
22:47imirkin_: in the latter case, don't even bother trying
22:47uriahheep: possibly lxqt
22:48imirkin_: my recommendation would be to disable any and all possible 3d accel
22:48imirkin_: just use X's acceleration, which is nice and stable
22:48uriahheep: good to know
22:48imirkin_: and things will be fine
22:49imirkin_: (i.e. the nouveau ddx)
22:49uriahheep: wayland is also probably a no go right?
22:49uriahheep: since it's egl
22:49imirkin_: well, strictly speaking it could work
22:49imirkin_: practically speaking, i haven't tried
22:49imirkin_: and i doubt it'll work well
22:50uriahheep: I tried long ago, was decent enough but cpu intensive
22:50imirkin_: if it's cpu intensive, it wasn't using the GPU :)
22:51uriahheep: well thanks for the tips.
22:51imirkin_: the next generation and later should work more reliably
22:51imirkin_: but if you want an actual nice experience in linux, stick to intel or amd
22:53uriahheep: it's a pretty old system, maybe I can find an old amd/ati card and swap it