13:12 karolherbst: mhh, we could optimize X[0x0+a] -> X[a]
13:12 karolherbst: no idea if this really matters to the hardware
14:11 Subv: hello, i'm new to nouveau and mesa in general. Does nouveau support 3D acceleration for the GM20B gpu?
14:15 pmoreau: Subv: I think it should, thought I am not 100% if you can get that using the latest Mesa + kernel or if you need some extra patches.
14:20 imirkin: Subv: tagr would be the best person to ask ... he's sent some patches to improve the support there, not sure if they've landed
14:20 pmoreau: Subv: You will need https://lists.freedesktop.org/archives/mesa-dev/2018-March/187400.html which has been merged recently, but isn’t part of any Mesa releases yet.
14:22 imirkin: karolherbst: there is no separate X[reg]... there's always an offset possible in the instruction encoding.
14:22 imirkin: (almost always
14:22 imirkin: and the nv50_ir, there's always an offset possible with an indirect. hence it's printed that way.
14:27 Subv: ah, i see, so it's handled by the nvc0 driver too. I assume register definitions are somehow generated from the envytools's rnndb xmls, but that only goes up to gf100, how does nouveau handle new registers in Maxwell not present in older archs or registers with different behavior and things like that?
14:28 imirkin: don't let the names fool you
14:29 imirkin: (a) the layout of things hasn't changed much since fermi (and even that had a ton of overlap with tesla)
14:29 imirkin: (b) there are variants defined, and some regs are only available on some chips
14:35 Subv: huh, interesting. I've been working on a Nintendo Switch emulator, the Switch uses a Tegra X1 (GM20B gpu), i've seen some games attempt to draw/set up shaders/etc by writing to some registers that aren't documented on envytools, could the info we find be in any way useful to nouveau?
14:36 imirkin: we don't have 100% of regs in there
14:36 imirkin: a bunch don't matter, and a bunch do things we haven't RE'd
14:37 imirkin: tbh, i haven't invested much time looking at maxwell+, so there's probably things we're missing
14:37 imirkin: anyways, there's almost 100% chance that whatever you find applies to desktop chips too
14:38 imirkin: i suppose they might have thrown in a few gm20b-specific things in, but i can't imagine why or what those would be. nvidia tends to be pretty consistent with their hw.
14:39 imirkin: with the exception of argument order to the tex instruction, any changes from chip to chip tend to be for good reasons.
14:42 Subv: fantastic
14:46 Subv: then i'm gonna keep bothering you with questions for the time being :)
14:47 Subv: if i understand correctly, nvc0 uploads the shaders in nvc0_program.c:nvc0_program_upload and uses the START_ID register of each shader as an offset into the code to tell the GPU where to start executing
14:49 Subv: Switch games use register 0xE24 to set this offset for each shader type, and leave START_ID at 0
14:50 imirkin: so all your numbering is off by a factor of 4 from how we number things
14:51 imirkin: anyways ... this sort of core stuff i'd need to brush up on, but in theory you have a bo which is where all the code lives (CODE_ADDRESS_HI/LO)
14:51 imirkin: and START_ID is an offset from that
14:52 karolherbst: imirkin: okay, so it makes no difference
14:52 Subv: yeah, funnily enough, the game i've been testing sets CODE_ADDRESS_HI/LO to 0 and just uses the aforementioned 0xE24 register to set absolute shader addresses
14:53 Subv: presumably the addresses set with 0xE24 are actually offsets from CODE_ADDRESS, but we can't confirm that until we find something that doesn't set it to 0
14:53 imirkin: Subv: yeah, i mean it's just added together
14:53 imirkin: doesn't really matter either way
14:53 Subv: and yeah, we use register_id_from_envytools / 4 :)
14:54 Subv: 0xE24 would be 0x3890 (Undocumented) in the notation you use
14:54 imirkin: also, it's a minor point, but they're not registers
14:54 imirkin: 3890 is plenty documented
14:54 imirkin: actually ... heh
14:54 karolherbst: imirkin: any reason why we use $r and $rs and not $t and $s for the texture/sampler reference?
14:55 imirkin: i think 3890 is a macro
14:55 karolherbst: in the beginning it confused me a bit, because the $r kind of looks like a register as well
14:55 Subv: from the usage in the Switch, it seems to be a function call with 5 parameters
14:55 Subv: (is that what you call a macro?)
14:56 imirkin: macros are defined in their own simple language
14:56 imirkin: and can execute a few things
14:56 imirkin: they're written at gr startup
14:56 imirkin: here are some examples of macros: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/mme/com9097.mme
14:56 imirkin: they're executed by the gr engine, and yes - can take arguments
14:57 imirkin: they can also twiddle underlying registers
14:57 imirkin: er, methods
14:57 imirkin: (note that these things are all methods, not registers... although some methods just set some internal register)
14:57 imirkin: $ ~/src/envytools/rnn/lookup -a 126 -d SUBCHAN -- -v obj-class GM204_3D 3890
14:57 imirkin: GRAPH.MACRO[0x12] => 0
14:58 imirkin: you can get the code for the macro which is written in ... (i forget how all this setup is done, you do it once and then forget)
14:58 imirkin: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c#n583
14:59 imirkin: and then invoking the MACRO[] method will call the macro and feed it the params which are in the pushbuf
14:59 imirkin: this is also how a lot of the indirect calls are implemented
14:59 imirkin: nouveau doesn't copy nvidia's macros, so they won't necessarily map 1:1
14:59 imirkin: the macro isa is decodable with envydis (-m macro)
15:02 imirkin: (but the underlying hw is the same, and there are a limited number of approaches to getting it to do what you want)
15:03 imirkin: you can also look at demmt, which will decode an mmt trace
15:03 imirkin: including "executing" the macros and telling you what they would do
15:03 imirkin: (since it has all the arguments)
15:03 imirkin: (and all other state accessible to macros)
15:05 Subv: ooh, this makes a lot of sense, i suppose that's what the writes to methods 0x47 [SetGraphMacroEntry] and 0x45 [SetGraphMacroCode] mean (names come from symbols on the games)
15:11 karolherbst: Subv: i am not entirely sure what your goals are here, but just a word of advise: if you really want to develop an emulator it might be a good idea to try to intercept the actual API calls the applications are doing on the switch and try to reimplement the graphics API and not the low level device interface. I doubt that on the switch the games are even allowed to go so low level. And I am quite sure if you try to write an
15:11 karolherbst: emulator based on the nvidia GPU stuff the amulator would only run on nouveau, because I am quite sure it would be quite hard to emulate all that stuff on other hardware.
15:12 karolherbst: I am not saying it is a bad idea to figure out what the unknown bits are for and this could indeed help us
15:12 karolherbst: but I don't think it is helpful in writing an emulator, because the perf penalty on nvidia GPU interface -> abstract graphics API can be quite high
15:14 Subv: games can indeed send arbitrary pfifo commands to the gpu, hooking the sdk API calls is imo impractical, and wouldn't work for homebrew
15:17 karolherbst: Subv: I mean sure, but the devs don't know what they do, right? There is no doc about the nvidia stuff so I am quite sure that all of the games are actually targeting the switch graphics API
15:17 karolherbst: or not?
15:17 karolherbst: and yeah, I am kind of aware of the homebrew situation
15:17 karolherbst: for homebrew it might make sense to just port mesa over and just use that
15:17 karolherbst: or something like that
15:18 karolherbst: but emulator and homebrew are two different problems where I am sure having the same solution would lead to many other problems for both cases
15:20 karolherbst: I might be wrong though and there is no way around emulating all the graphics interfaces of the GPU, which would be kind of super painful I think
15:28 karolherbst: Subv: but why is hooking the sdk API calls is impractical?
15:32 imirkin: Subv: do you have a full symbol dump for all the method names? that'd be superuseful
15:33 imirkin: you can usually tell a lot from the method name
15:36 imirkin: Subv: are you writing an emulator, or trying to get nouveau working on switch? i thought the latter.
15:36 imirkin: making a full emulator for nvidia graphics would be a substantial undertaking
15:47 jamm: I think he's talking about an emulator for nintendo switch
15:47 jamm: Subv: By any chance, is this yuzu?
15:57 Subv: yeah, it's yuzu
15:58 Subv: imirkin: some games have very useful symbol information
15:58 Subv: all the nvidia stuff is on a separate binary which usually doesn't have symbols, but you can tell some things from the exports
15:58 Subv: AFAIK there's a game (Super Mario Odyssey) that does have full symbols even for nvidia stuff
15:58 Subv: i don't own it though
16:00 karolherbst: Subv: what kind of nvidia stuff, there are many layers afaik
16:00 Subv: the nvn driver, command buffer creation and commands
16:00 karolherbst: okay
16:00 karolherbst: yeah, this sounds usefull
16:03 Subv: for example, the "nvnCommandBufferBindTexture" function writes commands 0x80000E1A, 0x8000068B and 0x200208E3 into the buffer
16:03 karolherbst: anything related to zcull might be interesting
16:03 karolherbst: any references to zcull somewhere?
16:04 Subv: only "CommandBufferRestoreZCullData" and "CommandBufferSaveZCullData", maybe other games have more functions related to it
16:06 karolherbst: no, that sounds about everything
16:06 karolherbst: zcull is just a buffer you set
16:06 karolherbst: basically
16:06 Subv: let me find these functions in the binary to see what they do
16:10 Subv: CommandBufferRestoreZCullData sets ZCULL_ADDRESS_HIGH/LOW and ZCULL_LIMIT_HIGH/LOW with 0x200401FA and then uses command 0x80000540
16:12 karolherbst: yeah okay, I thing we already know so much
16:13 karolherbst: not quite sure if we know what the 0x200401FA thing means 100%, but yeah
16:13 Subv: is the macro ISA documented anywhere?
16:15 karolherbst: envytools I think
16:15 karolherbst: check out demmt
16:16 karolherbst: mhhh, maybe it uses rnndb as well
16:16 karolherbst: yeah, it should be inside rnndb/graph
16:16 karolherbst: envytools/rnndb/graph
16:17 karolherbst: gf100_3d.xml contains some documentation for example
16:17 Subv: but there's nothing about the macro ISA there
16:18 Subv: is the syntax not documented anywhere? this instruction for example 00000006: 21005212 mov $r2 (extrinsrt $r2 $r1 0x0 0x4 0x4)
16:19 karolherbst: never looked into that stuff. I only looked into the shader ISA and the falcon ISAs
16:19 karolherbst: Subv: envytools/envydis/macro.c maybe?
16:26 Subv: there's a little more info there but it's still somewhat confusing
16:51 stoatwblr: imirkin, can I pick your brain a little?
17:02 Subv: demmt is rather helpful, but proper documentation would be nice
17:25 Subv: mm, envydis is having an early exit on this macro code, it exits as soon as it finds the 0x1A31 instruction
17:26 imirkin: stoatwblr: ask and ye might receive
17:26 imirkin: Subv: basically i'm looking for a mapping from method id -> name
17:27 imirkin: Subv: macro isa is in envydis/macro.c
17:27 imirkin: envydis will decode a binary (or assemble one)
17:28 imirkin: Subv: once you get the hang of how those envydis tables work, the syntax is self-documenting
17:28 imirkin: but ... that can take a little while
17:28 imirkin: for now, feel free to ask :)
17:34 imirkin: extrinsrt is like extract + insert bits. the args are documented at the top of that macro.c file ... it's always a bit of a mindfuck to remember exactly what it does
17:38 Subv: yeah, took me a while :P
17:38 Subv: but still, it seems instruction 0x1A31 causes envydis to exit no matter where it is
17:40 Subv: it matches the { 0x00000001, 0xffffc007, T(dst), REG2 }, // SC rule, i have no idea what's going on here
17:48 stoatwblr: imirkin: is it possible to use the intel gpu to get the higher res but the nouveau ones to do the actual displaying?
17:49 stoatwblr: s/higher res/larger fb(bigger viewport/
18:01 imirkin: Subv: oh yeah
18:01 imirkin: there's a bug...
18:01 imirkin: i tend to comment that rule out
18:01 imirkin: i generally hit it when assembling though
18:02 imirkin: stoatwblr: no. the scanout engine can only handle a pitch that's so wide. doesn't matter who's rendering to it.
18:04 imirkin: Subv: hmm ... 1a31 decodes as "00001a31 parm $r2 send $r3"
18:04 stoatwblr: ah well, too bad.
18:05 imirkin: Subv: are you sure you're operating envydis properly?
18:05 Subv: thanks!
18:05 Subv: i'm just doing "envydis -m macro -n -i macro0"
18:06 imirkin: and what does macro0 contain?
18:06 imirkin: the raw binary? or something else?
18:06 Subv: the binary code for the macro, dumped from the PFIFO command that the emulator receives
18:06 imirkin: ok. could you have byteswapped it by accident?
18:07 Subv: i don't think so, the rest of the binary looks okay
18:07 imirkin: since the PFIFO receives it a word at a time
18:07 imirkin: so you have to write out each 32-bit word as LE into the binary
18:07 Subv: the problem instruction seems to be right after a maddr, so it kinda makes sense that it's a send
18:07 imirkin: anyways, if you can make that available, i can have a look
18:07 imirkin: well, i don't have any patches applied locally
18:08 imirkin: i'm just wondering if your dump is a little off perhaps
18:08 imirkin: if you can give me the cmdstream i could try reproducing it too
18:08 Subv: imirkin: this is the hex text of the dumped macro: https://gist.github.com/Subv/a13766284f5cf74da79c1034b8674228
18:09 imirkin: ok. i think envydis can take this in directly... -b iirc?
18:09 Subv: -i
18:09 imirkin: no
18:09 imirkin: -i is for a binary file
18:09 Subv: oh, hex text, right
18:09 imirkin: oh. no args. it'll treat it as bytes by default
18:10 imirkin: -w for 32-bit, -W for 64-bit, i guess 16-bit never came up :)
18:10 Subv: what i do is paste the hex text into a binary file in a hex editor and parse that :)
18:11 imirkin: ah no. only 32-bit at a time, or raw binary
18:11 imirkin: perl to the rescue
18:11 imirkin: perl -ane 'foreach (@F) { print pack "c", hex($_) }'
18:11 imirkin: :)
18:12 imirkin: weird. that decodes cleanly...
18:12 imirkin: https://hastebin.com/kukulonosi.bash
18:12 Subv: eh
18:12 imirkin: oh yeah - one little thing about the macro isa - it's like MIPS - explicit delay slots when branching
18:12 Subv: it just ends for me after it prints 0000000e: 61801022 maddr (extrinsrt $r2 0x0 0x0 0x6 0xc)
18:13 imirkin: 6e is way further
18:13 imirkin: er wait, i misread
18:13 Subv: and it works if i replace the 0x1a31 with a nop (0x11)
18:13 imirkin: do you have the latest envytools?
18:13 imirkin: wtf is -n btw?
18:13 Subv: yay delay slots (not). I'll probably end up writing an interpreter for this thing
18:13 imirkin: oh, color
18:13 imirkin: that's already written
18:14 imirkin: check out demmt
18:14 Subv: i'm on Windows, -n is to avoid cluttering my terminal with colors
18:14 imirkin: https://github.com/envytools/envytools/blob/master/demmt/macro.c
18:14 Subv: as for the envytools version, i didn't compile it myself (as that seems to be nigh impossible on Windows), someone sent me a binary
18:14 imirkin: ah. maybe an old one or something
18:15 imirkin: dunno. worksforme :)
18:15 Subv: i'll poke the person who sent it to me to recompile it :P
18:15 imirkin: it hasn't been touched in ages ... but perhaps something in the surroundings
18:15 imirkin: or some silly bug on windows
18:16 Subv: as for the interpreter, i saw demmt, to integrate it with what i'm working on would take a bit of refactoring, but at that point i could just use it as a reference and write another one. The ISA only has like 20 instructions anyways
18:17 imirkin: yeah. it's all a bit tricky since most ops do two things. order matters :)
18:17 imirkin: and yeah, i mostly meant to use it as a refernece.
18:19 Subv: thanks imirkin!
18:42 Subv: is there a write up somewhere about how the const buffer works? i assume that it's the working are for each shader, does each shader type get its own constbuffer?
18:53 karolherbst: Subv: const buffer are kind of like read only cached regions in VRAM and you can have a few of them per shader
18:56 Subv: how do you access them from the shader? there's a CB_BIND method but what does that actually do?
18:58 imirkin: Subv: instructions can take constbuf args
18:58 imirkin: there are up to 16 constbufs that can be bound, and each one is up to 64k
18:58 imirkin: each shader stage has its own set of 16
18:58 imirkin: updating them is tricky business
18:58 imirkin: they're backed by memory
18:59 imirkin: but they're fancily handled by the hw
18:59 imirkin: so you HAVE to go through the CB_DATA methods to update them -- those writes get staged internally
18:59 imirkin: so that you can have multiple draws in progress with the same constbufs bound but having different data
18:59 imirkin: this is semi-common where you update some const; draw; update const; draw; update const; draw
19:00 imirkin: otherwise you'd have to make copies of the entire buffer, or introduce stalls
19:00 Subv: ah
19:01 imirkin: (compute can only have 8 constbufs actually)
19:01 Subv: so if you were to write to a CB using CB_DATA, synchronize, and then read it from the CPU, you should see your written value, yes?
19:01 imirkin: pppprobably
19:01 imirkin: not sure.
19:02 imirkin: what do you mean by synchronize?
19:02 imirkin: do you mean graph method 0x100 (or 0x110, i forget)
19:02 Subv: not really, i mean wait for the draw operation to finish
19:02 imirkin: or do you mean waiting for a fence?
19:02 Subv: a fence sounds better, yes
19:02 imirkin: it's all a bit tricky
19:02 imirkin: depends who signals the fence
19:03 Subv: the GPU can signal it with QUERY_SEQUENCE right?
19:03 Subv: s/QUERY_SEQUENCE/QUERY_GET/
19:03 imirkin: if it's pfifo signalling the fence
19:03 imirkin: then that happens when the commands are submitted
19:03 imirkin: but then they can take any amount of time to process
19:03 karolherbst: Subv: you mean you want to have a const buf with some data and then update the date while the shader is running from the host and have some sync stuff inside that shader to synchronize?
19:04 imirkin: you have to use ... let's see...
19:04 karolherbst: Subv: you can't write to a CB from the sahder, nor do I think you can update the memory of a CB from the host
19:04 imirkin: right. have a look at nvc0_screen_fence_emit
19:04 karolherbst: well at least while a shader is running
19:04 imirkin: (so yes, GET_FENCE)
19:04 karolherbst: imirkin: or would that be possible?
19:05 imirkin: (there are like 20 diff ways to fence things)
19:05 imirkin: anyways, i *suspect* that waiting on that fence would also flush out any CB_DATA things
19:05 imirkin: but i'm not entirely 100% clear on what all the internal queues are
19:05 karolherbst: but, isnt it pointless to read from a CB anyway, because you would only just read out what you wrote into in the first place?
19:05 imirkin: there's also a more general synchronize call
19:06 imirkin: we call it NV50_GRAPH_SERIALIZE
19:06 imirkin: this is basically a monkey wrench you can throw in, which will wait for all previous operations to complete before starting any new ones
19:06 imirkin: i.e. it'll quiesce the graph unit, etc
19:07 imirkin: this can matter when you want to sync something but don't want or need the CPU involvement
19:07 imirkin: e.g. inter-engine, etc
19:09 imirkin: [it's almost like this hardware is complex...]
19:10 Subv: heh
19:12 Subv: to see if i get how CBs work: you upload a shader, set the CB_SIZE/ADDRESS, call CB_BIND with some index, and the shader automagically knows which buffer to use when executing?
19:13 Subv: ie, you bind the current CB_ADDRESS to the index you passed to CB_BIND?
19:13 karolherbst: Subv: you can access them directly
19:13 karolherbst: in mesa/nouveau we use c0[] ... c15[]
19:13 karolherbst: as a source to the instructions
19:14 Subv: so say, CB_ADDRESS = 0x1234, CB_BIND{index=5} would mean that c5[0xA] is accessing 0x1234 + 0xA right?
19:14 karolherbst: I am not quite sure how that index stuff works out exactly
19:15 karolherbst: but yeah, I guess this sounds kind of right
19:15 karolherbst: there might be alignment requiernments as well though
19:15 Subv: i'm mostly speculating, since i haven't actually looked at shaders yet, that's going to be fun
19:15 karolherbst: and maybe some shifting is done on the address as well
19:15 karolherbst: Subv: are you planning to reimplement all that stuff somehow?
19:16 karolherbst: just wondering, because I think it might be easier to just mess around with mesa and nouveau to get a better udnerstanding
19:16 karolherbst: because those things are pretty much the same
19:17 Subv: yes the idea is to reimplement all this
19:17 karolherbst: why not port mesa?
19:17 karolherbst: the code is already there
19:17 karolherbst: just the interfaces are different
19:17 Subv: that might work for a homebrew library (and iirc that's what the guys developing the switch homebrew sdk are doing)
19:18 Subv: but i'm working towards getting a functional emulator
19:18 karolherbst: I don't see how that helps with an emulator
19:18 karolherbst: what do you want to target?
19:18 karolherbst: native GPU interfaces? OpenGL? Vulkan?
19:18 Subv: gotta know how the shaders get their data if i want to be able to execute them when the games issue draw calls
19:19 karolherbst: okay sure, so you want to do Nvidia GPU HW API -> OpenGL?
19:19 Subv: either OpenGL or Vulkan, right now i'm just writing an API-agnostic frontend
19:19 Subv: something like that, yes
19:20 karolherbst: mhh, I see
19:22 Subv: so far all i have are vertices, attributes and a pointer to the shader code, the emulator still doesn't do anything when the games issue DrawArrays commands
19:22 karolherbst: well
19:22 karolherbst: you kind of need to translate the shader binary to glsl or not?
19:22 karolherbst: or on what does the emulator run on
19:23 Subv: yes we'll write a recompiler eventually
19:23 Subv: perhaps an interpreter, for starters, not sure yet
19:23 Subv: the idea is for it to be cross-platform
19:24 Subv: (Windows, linux, OS X, Android maybe)
19:24 karolherbst: right. I was just wondering in general if it is just easier to implement the higher level API, because this should give you a much simplier translation to OpenGL without having to deal with all those low level hardware things
19:25 karolherbst: but yeah
19:25 karolherbst: I could imagine that emulating a maxwell GPU can be quite some work
19:25 karolherbst: maybe you only need a really small amount of features
19:26 karolherbst: Subv: are you using fake shaders for now? Maybe just color all the vertices blue and just concentrate on this?
19:27 karolherbst: usually all those buffer things are only needed for more advanced features generally
19:27 karolherbst: getting something super simple working might be helpful. And by that I mean having a simple shader doing some maths from uniform inputs and writing into gl_FragColor (or whatever the equivalents on switch are for those things)
19:28 karolherbst: mhh allthough uniforms are located in the c0[] buffer :(
19:31 Subv: there isn't any kind of renderer yet :) but yeah iirc Ryujinx (another Switch emulator) was doing that and managed to draw the first few logos of a game with it
19:32 karolherbst: Subv: here is an idea: maybe you could take the nouveau mesa stuff and put your emulator code between and run it on something else?
19:33 karolherbst: Subv: ohh important question: are you working on an open source emulator?
19:33 Subv: what do you mean?
19:33 Subv: yes it's open source: https://github.com/yuzu-emu/yuzu
19:34 karolherbst: Subv: well, just instead of sending those commands to the nouveau kernel driver, send it to your translation layer
19:34 karolherbst: this should allow you to test much more simple stuff
19:35 Subv: interesting idea, i'll try it out when i get some time to install linux
19:36 Subv: for the time being it's mostly about satisfying my curiosity and getting to know more about how modern nvidia GPUs work
19:36 karolherbst: yeah, I got that
19:36 karolherbst: but for that it might be easier to just use nouveau and play around there
19:36 karolherbst: you have the full source
19:36 karolherbst: and you can change things and see what happens
19:36 karolherbst: or even dump the shaders and everything
19:37 karolherbst: Subv: generally being able to run the userspace stuff against a hardware emulator might be a reasonable thing to have in long term, allthough I see that there won't be enough devs working on such stuff so that we will get anything usefull
19:37 Subv: yep, i've been using nouveau as a reference to see what each method does, it's been really helpful
19:38 karolherbst: but maybe that might help out with the switch emulating stuff
19:38 Subv: the codebase is rather big though, and sometimes i get lost in it
19:38 karolherbst: and we could have an emulator doing enough to just verify with some CI testing, that the emulator code works as expected
19:38 karolherbst: like we have OpenGL test suites
19:39 karolherbst: and if they hit regressions while running against the emulator, your emulator might have gotten a new bug or something
19:39 karolherbst: just a random thought I have
19:39 Subv: i really want to write a "passthrough" renderer that just forwards all method calls to a real Maxwell GPU (perhaps on a Jetson TX1) to see how far that gets
19:40 karolherbst: mhhh
19:42 Subv: that's still a long ways off, firstly i gotta keep reading and asking questions to improve my understanding of nvidia gpus :P
19:42 karolherbst: sure
20:08 imirkin: Subv: it's tricky, since you have to make sure the VA for the passthrough channel is set up the same way
20:09 imirkin: commands refer to memory addresses ... those memory address can really come from anywhere, including vertex buffers, in a bindless texture type of setup
20:09 imirkin: so if you don't have an identical mapping, you're in for a world of pain
20:18 karolherbst: nice. 68179 vs 68166 passed test in piglit tests/all :) there are even some tests not passing with TGSI. I should look into those and figure out why they don't
20:21 karolherbst: mhh, only the interpolateAt fails are interesting here though
20:27 karolherbst: imirkin: is there some kind of code style we should stick to in mesa/nouveau? Like max line size? I found some lines with above 80 chars, so that's why I am wondering
20:33 imirkin: 80 chars mostly, but probably not 100%
20:33 imirkin: sometimes it makes sense to go over a little
20:34 imirkin: should be rare though
21:48 plutoo: hi
21:48 plutoo: are registers for class 0xb0b5 documented anywhere?
21:54 karolherbst: plutoo: that's MAXWELL_DMA_COPY_A right?
21:55 plutoo: yep
21:55 plutoo: i managed to "figure out" where src/dst addresses go
21:55 karolherbst: check nve0_bo_move_copy in nouveau kernel module
21:55 plutoo: src_hi goes to 0x100 and src_lo goes to 0x101
21:55 karolherbst: this looks like a sw method
21:56 karolherbst: but imirkin should know it more precisely
21:57 plutoo: seem to match up
21:58 plutoo: i was wondering more precisely about the flags written to 0x300: 0x386
21:59 plutoo: and if there's any way to get this dma engine to perform blocklinear tiling for me
22:00 karolherbst: advanced stuff I don't know anything about :) skeggsb or imirkin should know the details here
22:04 plutoo: any clue what m2mf means in "nvc0_bo_move_m2mf"
22:05 karolherbst: at some point I knew
22:05 karolherbst: I think it is an engine or something though
22:06 karolherbst: or rather part of the graph engine
22:06 karolherbst: plutoo: "memory to memory format. An object used to copy blocks of memory."
22:06 karolherbst: plutoo: https://nouveau.freedesktop.org/wiki/NouveauTerms/
22:08 plutoo: https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/nouveau/nouveau_bo.c#L958
22:08 plutoo: thanks
22:08 plutoo: this looks promising
22:09 plutoo: (but this is nv50 not nvc0)
22:10 plutoo: i wonder if newer dma engines automatically detect tiling via "kind"
22:10 karolherbst: then check nvc0_bo_move_m2mf :p
22:10 plutoo: that would be perfect
22:10 plutoo: that one doesn't seem to have any logic for tiling
22:11 plutoo: if that is handled transparently, then i will be impressed
22:12 karolherbst: I doubt it
22:13 karolherbst: I think we actually have code in mesa for that kind of stuff
22:15 Subv: what does the "ipa" instruction mean in a disassembled fragment shader?
22:17 karolherbst: Subv: I think this is load of interpolated input
22:17 karolherbst: mind sharing the shader?
22:17 karolherbst: or which tool gave you ipa, envydis?
22:17 Subv: envydis
22:17 Subv: let me upload it to a gist
22:19 Subv: karolherbst: https://gist.github.com/Subv/8ce04106c3a94513b5b3dad79cf9c5a9 disassembled vertex & fragment shaders
22:19 karolherbst: yeah, ipa is mesas interp instruction
22:19 Subv: i also don't know what the "a" array is
22:19 karolherbst: or linterp?
22:19 karolherbst: Subv: input
22:19 karolherbst: fp shader input
22:20 Subv: is it input to both vertex and fragment shaders?
22:20 Subv: the vertex shader seems to write to it, too
22:20 karolherbst: or vertex shader output :p
22:21 karolherbst: Subv: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/nvc0_program.c#n35
22:21 karolherbst: but this is not everything
22:21 karolherbst: Subv: anyway, it is used to pass values between shader stages
22:22 Subv: ah
22:22 karolherbst: Subv: fragment shader write the output into registers starting at $r0
22:22 karolherbst: ipa $r0 a[0x80] $r4 0x0 0x1
22:22 karolherbst: $r4 is the input being interpolated
22:22 Subv: do you have some pseudocode of what the ipa instruction does? (or what the operands mean)
22:22 karolherbst: 0x7c is gl_Position I think
22:23 karolherbst: let me check something
22:23 plutoo: if the dma engine on nv50 supports tiling, so should nvc0, right
22:24 karolherbst: Subv: well I would have to guess here now
22:24 karolherbst: Subv: uhm...
22:24 karolherbst: Subv: it's complicated
22:24 karolherbst: Subv: read up on interpolation
22:25 karolherbst: Subv: don't worry about it
22:25 karolherbst: it is up to the driver to do the correct thing
22:25 Subv: can i assume that it's a lerp?
22:26 karolherbst: I am sure it isn't
22:26 karolherbst: that might be linterp
22:26 Subv: mm
22:27 Subv: what do the last two numbers in the instruction mean?
22:27 karolherbst: or maybe ipa handles both... no idea
22:27 karolherbst: uhm
22:27 imirkin: plutoo: there's micro and macro-tiling
22:27 karolherbst: Subv: should be the interpolation mode
22:27 karolherbst: orr wait, that is pass
22:27 imirkin: plutoo: here's how we drive the kepler+ copy class: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c#n109
22:28 karolherbst: the envydis syntax confuses me, because it is different from what we use in mesa
22:28 imirkin: IPA = interpolate. A is the nvidia term for "shader input or output"
22:28 imirkin: (e.g. AST / ALD in other stages)
22:28 karolherbst: imirkin: well, gs uses o for output, right?
22:28 imirkin: no
22:28 karolherbst: or is it all the same
22:29 imirkin: tess control shaders can use ALD.O to load outputs
22:29 imirkin: from other invocations
22:29 karolherbst: ahh, right
22:29 imirkin: but that's a bit of a peculiarity of tess control shaders
22:29 karolherbst: I think with mesa we actually use o for outputs anyway...
22:29 Subv: karolherbst: do you mean ipa pass? that's another weird instruction
22:29 karolherbst: and a for inputs
22:29 imirkin: Subv: IPA interpolates an input based on some params given to it, as well as based on what's in the shader program header
22:30 imirkin: it can either multiply by 1/w (that's the "mul" option), or "pass", i.e. pass through the value directly, as would be done for a flat-interpolated input
22:30 Subv: ah
22:31 imirkin: there are also ways of using IPA so that you interpolate at an offset from center (used for interpolateAtOffset)
22:31 imirkin: as well as for interpolating at the current sample / centroid (the options are equivalent -- sample only works in SSAA mode anyways)
22:32 plutoo: imirkin: does copy class take into account what "kind" an addr is mapped as?
22:32 plutoo: and perform automatic tiling on that
22:32 imirkin: plutoo: not directly, but the PTE's have that
22:32 imirkin: that's the microtiling bit
22:33 plutoo: yeah
22:33 plutoo: can i get the copy class to do microtiling?
22:33 imirkin: you can't get it to not do that.
22:33 Subv: i see, so "ipa pass $r4 a[0x7c] 0x0 0x0 0x1; mufu rcp $r4 $r4" takes a flat-interpolated gl_Position.w and calculates 1/w, which is then passed to the other ipa calls
22:33 imirkin: Subv: precisely.
22:33 plutoo: so it automatically looks up what kind it is from the PTGE?
22:33 plutoo: so it automatically looks up what kind it is from the PTE?
22:33 imirkin: plutoo: the VM abstracts the microtiling stuff away
22:33 plutoo: what VM
22:34 imirkin: i mean the MMU/etc
22:34 plutoo: sweet
22:34 imirkin: some engines have "deep" level of access to the underlying memory, but then it's on them to worry about all the little details
22:34 imirkin: you can run into situations where you do something incompatible and they scream
22:34 imirkin: so ... "don't do that"
22:35 karolherbst: what is that mufu there for anyway?
22:35 karolherbst: multi function unit?
22:35 imirkin: mufu is same as sfu i think
22:35 imirkin: yea
22:35 karolherbst: ahh
22:35 plutoo: so i can do a copy from "Pitch" to "Generic_16BX2"
22:35 plutoo: and it will automatically do tiling for me
22:35 imirkin: plutoo: of course
22:35 plutoo: that's.. awesome
22:35 imirkin: we wait
22:35 imirkin: plutoo: 16BX2 is not microtiling
22:35 imirkin: 16BX2 is macrotiling
22:36 imirkin: so you have to pass in the proper tile mode (which is not the memory "kind")
22:36 imirkin: plutoo: have a look at https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c#n109
22:36 imirkin: i believe "memtype" == "kind". and "tile_mode" is different from all that
22:45 plutoo: cpp = bytes per pixel?
22:52 imirkin: yes
22:55 imirkin: Subv: i have to say, that fragment shader is surprising. they're fetching a full vec4, but only using its .w... very odd.
22:56 karolherbst: imirkin: output?
22:56 imirkin: oh duh
22:56 imirkin: it's probably like out = in; if (in.w < uniform) discard;
22:56 karolherbst: yeah
22:58 karolherbst: imirkin: is nvidia usually using const buffer with random ids or is there some internal reason for that?
22:58 karolherbst: or just a lot of const bufs and some unused in that shader
22:59 imirkin: huh?
23:00 imirkin: you mean why c8 and not c0
23:00 karolherbst: more like they use c3, c4 and c7
23:00 imirkin: could be a ubo
23:00 karolherbst: ohh right
23:00 imirkin: but more generally, they manage these differently than we do
23:01 imirkin: could be caching implications
23:01 imirkin: i really have no idea though
23:25 Subv: 00000050: 20570400 d8320080 texs nodep $r2 $r0 $r4 $r5 0x8 t2d rgba
23:25 Subv: i assume this is ($r2, $r3, $r0, $r1) = texture2D(handle(8), coords: ($r4, $r5)) right?
23:31 imirkin: interesting
23:31 imirkin: so actually i'm not intimately familiar with the "texs" variant
23:31 imirkin: it's new on maxwell
23:31 imirkin: let's see what nvdisasm says...
23:32 imirkin: TEXS.NODEP R2, R0, R4, R5, 0x8, 2D, RGBA
23:32 imirkin: ok, so it's decoded correctly, that's nice
23:33 imirkin: Subv: you're most likely correct. except not quite handle(8)
23:33 imirkin: more like cN[0x8 * 4]
23:33 imirkin: where N = the value passed into ... some method.
23:33 imirkin: TEX_CB_INDEX
23:33 imirkin: and it reads the handle out of there
23:50 Subv: ah, thanks!
23:50 imirkin: the handle is a combination of a TSC and TIC id (or just one id in case that LINKED_TSC = 1), which in turn is an index into the global TIC/TSC tables, which define the texture view and sampler, respectively
23:51 imirkin: (i hope you have quite some familiarity with graphics pipelines and capabilities... this is not for the faint-of-heart)
23:52 Subv: i have some superficial knowledge