02:57 mooch2: mwk: any idea what pfifo's cache1 ctx registers do on nv33? what about when one is marked dirty?
02:57 mooch2: *nv3
10:15 mwk: mooch: the CTX registers as in the array at 0x3280?
10:17 mwk: these select the object bound to every subchannel
10:17 mooch: ah
10:18 mwk: IIRC it was strange on NV3
10:18 mooch: but then what does the ctx dirty bit do?
10:18 mwk: the 0x3280 reg array is only for the current channel
10:18 mooch: also, i'm trying to get this loop to stop being infinite: https://pastebin.com/U0BnPQD7
10:18 mooch: any ideas?
10:18 mooch: ah
10:18 mwk: when PFIFO does a channel switch, the CTX array gets dumped to / reloaded from RAMFC
10:19 mwk: along with the SUBC field from pull state
10:19 mooch: ah, fair enough
10:19 mwk: the DIRTY bit keeps track of whether the switchable state changed since context load
10:19 mooch: ah, fair enough
10:19 mwk: if it's not set on a channel switch, PFIFO skips writing back CTX to RAMFC
10:20 mooch: ah, yeah
10:20 mooch: anyway...
10:20 mooch: i've determined that all the official windows nvidia drivers hang in fifoService on nv3
10:20 mooch: and i'm trying to make that infinite loop actually work
10:21 mwk: mooch: do you have that as a trace?
10:21 mooch: for some reason, the entry in the CACHE1_ADDR and CACHE1_DATA never gets updated
10:21 mwk: that IDA dump is kind of unreadable
10:21 mooch: i have a decompiled version
10:21 mooch: gimme a sec
10:21 mwk: that thing looks decompiled
10:21 mwk: badly
10:22 mooch2: it is
10:22 mooch2: by ida pro
10:22 mooch2: 7.0
10:23 mooch2: https://pastebin.com/jWJ6XECK
10:23 mooch2: mwk here's a register read/write trace
10:23 mooch2: i was able to make sense of enough of the reads and writes to mmio regs in the decompiled code that i could confirm it was this function
10:23 mwk: yeah, that's kind of obvious to anyone who's seen IDA output :p
10:24 mooch2: lol fair enough
10:24 mwk: is there a chance of getting the read values from registers in this trace?
10:24 mooch2: but yeah, apparently, the index into the CACHE1_ADDR and CACHE1_DATA arrays is supposed to increase, but it doesn't for some reason
10:24 mooch2: okay, gimme a sec
10:26 mooch2: keep in mind that due to the way my code works, you'll have to piece together the bytes into words
10:26 mooch2: sorry, i tried redoing it the correct way, but vesa wouldn't work
10:26 mooch2: "
10:28 mooch2: what the fuck?
10:28 mooch2: all i'm reading out are zeroes
10:28 mwk: that's kind of bad
10:28 mooch2: yeah, and i know my code's right
10:28 mooch2: i just checked it
10:29 mooch2: I'M A FUCKING DUMBASS
10:29 mooch2: i accidentally made pfifo end at 0x3000 instead of 0x4000
10:29 mooch2: wow
10:29 mooch2: what a fucking joke
10:30 mooch2: lemme check it now
10:31 mooch2: okay, it sorta works now
10:31 mooch2: but the loop still doesn't terminate
10:32 RSpliet: mooch2: last dumbass I spoke to didn't know what a pfifo was. Or a graphics card. Or anything beyond "press start to open solitaire or shut down the computer" really :-P
10:32 mooch2: lol fair enough
10:32 mooch2: at least i can learn from my mistakes i guess
10:32 RSpliet: :-D
10:32 mooch2: anyway, the index into the arrays updates now
10:33 mooch2: but i still can't get the loop to terminate
10:33 mooch2: hold on, hastebin's being a bit slow lol
10:33 mooch2: large log files'll do that
10:34 mooch2: nvm, uploading to mega
10:35 mooch2: it's like 10 mb tho, and considering most of you guise are in europe, it shouldn't be too hard for you to download :3
10:35 mooch2: hell, i can download that in a little over a second, and i'm in a ghetto neighborhood in the us
10:35 mooch2: https://mega.nz/#!149D2aIZ!F_kSPn7kPysVQGpEk38-jB8H2wqIPV42LPzXZ5fKLT0
10:35 mooch2: mwk, here's the log
10:37 mooch2: i have a suspicion that this has to do with my implementation of gray code but
10:37 mooch2: i took it from envytools lol
10:38 mooch2: either that or how i apply the gray code to the get and put state
10:38 mooch2: i dunno
10:47 mwk: mooch2: that's kind of a large file and I'll look at it when I'm back home
10:47 mwk:on mobile internet right now
10:48 mooch2: mwk, just start where it reads from 3300 for the first time
10:48 mooch2: that's the start of the relevant part
10:49 mwk: mooch2: yeah, but it's not the kind of file that I want to download on gprs
10:49 mooch2: ah okay sorry
10:49 mooch2: what's gprs?
10:49 mwk: craptastic cellular internet
10:51 mooch2: ah fair enough
10:51 RSpliet: dial-up speed, without the crackly sounds
10:51 mooch2: lol
10:51 mooch2: my cellular internet is about 50 mbps down, 20 mbps up, and i have 10 gb of data
10:52 mooch2: then again, i'm in a highly populated city so
10:52 mooch2: or metroplex, rather
10:52 mooch2: my home internet is 60 mbps down, 5 mbps up, and no data cap
10:52 mooch2: hell, i once downloaded like, 1.5 TERABYTES on this connection, and it still didn't cap out
10:52 RSpliet: GPRS is 64kbit/s down, 42 kbit/s up
10:52 mwk: mooch2: my cellular internet is also quite powerful usually
10:53 mwk: except right now I'm on a train
10:53 mooch2: the only problem was that my torrent client crapped out lol
10:53 mooch2: ah, fair enough
10:53 mwk: and it's gprs on good patches of the track, no service on bad patches
10:53 mooch2: even in a car, i can get 4g so
10:53 mooch2: oh wow
10:53 mooch2: in the us, most people can get 4g on the train too
10:54 mooch2: so i don't know why your train is so shitty about this
10:54 mwk: depends on the train, here
10:54 mooch2: ah, fair enough
10:54 RSpliet: We have wifi in most trains... with occasional homeopathic internet
10:54 mooch2: ah nice!!
10:54 mooch2: good thing most home internet connections in the us don't have data caps at all tho
10:55 mwk: on this connection, 4g works fine on the first 50km of the track
10:55 mooch2: too bad some parts of the us are still stuck on dial-up tho...
10:55 mwk: and then it's like in the middle of nowhere
10:55 mooch2: geez, mwk, did you go to a conference or something?
10:56 mooch2: because it sounds to me like you might've left the country lol
10:56 mooch2: considering europe has much smaller countries :/
10:57 mwk: nah, just intercity in poland
10:57 mwk: katowice -> warsaw
10:58 mooch2: ah, fair enough
10:58 mooch2: yeah, poland is a bit of a big country for europe
11:15 mooch2: mwk, i have the feeling that this loop is trying to get to 3400
11:16 mooch2: it gets to 33fc and then goes back down again
11:16 mooch2: why?
11:22 mwk: it cannot reach 3400, that's out of bounds
11:23 mwk: the cache array isn't that big
11:23 mooch: no it's not
11:23 mooch: is it?
11:23 mooch: ah crap you're right
11:23 mooch: then what is it supposed to do?
11:23 mooch: what does the get variable in each pfifo cache do?
11:23 mwk: and it goes back because 33fc isn't the last entry
11:24 mooch: ?????????
11:24 mwk: because gray code
11:24 mooch: ah
11:24 mooch: yeah, but it just keeps going forward then back, then forward, then back
11:24 mooch: it never ends
11:25 mwk: hm, is your put pointer rihht?
11:25 mwk: and the empty calc in status reg?
11:26 mooch: the put pointer is never used
11:26 mooch: what do you mean by empty calc in the status reg?
11:28 mooch2: can you explain all this to me please? it's not documented ANYWHERE?
11:35 mooch2: mwk
11:53 mwk: mooch2: is there a read from 0x3214?
11:53 mooch2: yeah
11:54 mooch2: what calculations are supposed to be done on reads from 0x3214?
11:54 mooch2: mwk
12:07 mooch2: wait, mwk, are you talking about this? cache1_get
12:10 mooch2: ah crap hold on
12:10 mooch2: mwk, do you mean this? https://github.com/envytools/envytools/blob/master/hwtest/pfifo.cc#L460
12:14 mooch2: THANK YOU
13:25 mooch2: mwk, how big is ramin btw, and how is it addressed?
13:25 mooch2: apparently, my ramht address calculations are wack as fuck
13:25 mooch2: on nv3 i mean
13:34 stsquad-on-arm64: what is BAR1/2 as in:
13:34 stsquad-on-arm64: nouveau 0001:00:00.0: fifo: write fault at 0000150000 engine 05 [BAR2] client 08 [HOST_CPU_NB] reason 02 [PTE] on channel -1 [003fe36000 unknown]
13:35 stsquad-on-arm64: is that part of the VMM on the card?
13:35 imirkin: the BAR id :)
13:35 imirkin: lspci to see the list of BAR's
13:35 imirkin: i believe it's 0-indexed
13:36 stsquad-on-arm64: imirkin: so this is a write fault while writing to the cards own BAR or the BAR for PCIe?
13:37 imirkin: i'm not 100% sure what that means... HOST_CPU_NB = northbridge. basically the GPU tried to write to sysmem but got some kind of fault.
13:37 imirkin: i don't precisely know what that combination means
13:37 imirkin: skeggsb_ might
13:43 imirkin: stsquad-on-arm64: er wait. i misread that.
13:43 imirkin: the client is the CPU, so the CPU is doing some accesses to BAR2 (which is basically accessing the gpu vm). and it tries to access an unmapped page.
13:43 imirkin: s/access/write/
14:10 stsquad-on-arm64: imirkin: I'd enabled NOUVEAU_DEBUG_MMU in the hope of tracking the mappings - but I see nothing else in the dmesg - but maybe I need to explictly enable in my config?
14:11 stsquad-on-arm64:has nouveau as a module now to theoretically can load and unload it
14:16 stsquad-on-arm64: http://ix.io/1d02
14:16 stsquad-on-arm64: ^ ahh there we go, dump with options nouveau config=NvClkMode=auto debug="PCE0=debug,PCE1=debug,BARCTL=debug" noaccel=1 nofbaccel=1
14:21 mooch2: can someone please tell me conclusively how ramht lookups work on nv3? i can't fucking figure it out
14:21 mooch2: i've used code from both xqemu and mame
14:21 mooch2: neither one produced the correct results
14:39 imirkin: mupuf: can you set up pendingchaos with our shader db?
14:39 imirkin: stsquad-on-arm64: PCE = copy engine, btw. i don't think your gpu has either CE0 or CE1...
14:40 pendingchaos: there's more than https://cgit.freedesktop.org/mesa/shader-db/?
14:40 imirkin: stsquad-on-arm64: also on newer kernels (since 4.3) the engine names have all changed
14:40 imirkin: pendingchaos: yes
14:40 mooch2: imirkin, do you think you could answer my question?
14:41 imirkin: mooch2: nope
14:41 mooch2: damn
14:41 mooch2: mwk, how about you?
15:12 mwk: mooch2: I haven't tested that part, but
15:12 mooch: ah damn
15:12 mwk: the general idea is that there is a hash function
15:12 mooch: yeah, but i need to know the internals of that hash function
15:12 mwk: which is run on channel id and object handle
15:12 mwk: you should be able to find that hash function in the driver
15:12 mooch: any idea what it's called?
15:13 mwk: if not... I should be able to look it up quickly
15:13 mooch: because i already looked lol
15:13 mooch: ah okay thanks
15:13 mwk: just search for hash?
15:13 mooch: i did
15:13 mooch: all i got was some weird shit about hash trees
15:13 mooch: anyway, i've tried both xqemu's and mame's hash functions and they didn't work
15:14 mooch: btw, the first 16kb of ramin is zeroed out, so
15:14 mooch: for some reason
15:14 mooch: i don't know why the driver does that
15:14 mwk: hash = ((((name) ^ ((name) >> 8) ^ ((name) >> 16) ^ ((name) >> 24)) & 0xFF) ^ ((chid) & 0x7F));
15:15 mwk: mooch: that would be the hash table init
15:16 mooch: ah thanks
15:16 mwk: anyway, that's the hash calculations
15:16 mooch: where did you find that?
15:16 mwk: it um
15:16 mwk: I found that code on pastebin
15:16 mooch: ah
15:16 mooch: what did you search?
15:16 mooch: i have no idea how to search pastebin
15:16 mwk: I don't remember
15:16 mooch2: oh
15:16 mooch2: weird
15:17 mooch2: well thanks
15:17 mwk: anyhow
15:17 mwk: this is the hash
15:17 mwk: it is 8-bit, obviously
15:17 mwk: so there are 256 buckets
15:17 mooch2: ah, yeah
15:17 mwk: then, PFIFO has a configurble parameter
15:17 mooch2: oh?
15:17 mwk: called "hash depth" in this... thing
15:18 mwk: it selects how many entries there are for each hash bucket
15:18 mooch2: oh wow
15:18 mooch2: which reg is that on nv3?
15:18 mwk: it can be... 1, 2, 4, or 8 I think
15:19 mwk: mooch2: reg 0x2210 aka RAMHT in rnndb, look it up
15:19 mooch2: uh, according to rnndb that's non-existent on nv3
15:19 mooch2: https://github.com/envytools/envytools/blob/master/rnndb/fifo/nv1_pfifo.xml#L40
15:19 mwk: it's called SIZE, since depth is directly related to RAMHT size
15:19 mwk: uh, its' present on NV3
15:20 mwk: NV3- is "from NV3 upwards"
15:20 mooch2: ah okay
15:20 mooch2: weird
15:21 mooch2: sorry, i thought that referred to how many hashes you could store in the hash table
15:21 mooch2: ya know, like a sane person would
15:21 mwk: hm
15:21 mwk: my bad, the depth is {2, 4, 8, 16}
15:21 mwk: which corresponds to sizes 4kb, 8kb, 16kb, 32kb
15:21 mooch2: ah
15:22 mwk: though the driver I'm looking it appears to always select 4kb
15:22 mooch2: ah weird
15:22 mooch2: the driver i'm using selected 16kb
15:22 mooch2: i'm using the win 3.11 riva 128 driver from nvidia's website
15:22 mwk: is the size field in 0x2210 set to 2?
15:22 mwk: if so, good, everything is in order
15:23 mwk: so... you would have a hash depth of 8
15:23 mwk: which means that each of the 256 hash buckets has 8 entries
15:23 mooch2: yeah something like that
15:23 mwk: the hw reads all of them in sequence
15:23 mwk: if it finds an entry with matching handle & chid, it takes the object data from that entry
15:24 mwk: otherwise, it's a hash error, an interrupt is raised, and the driver is supposed to provide an object manually
15:24 mwk: which can happen when you run out of entries in a hash bucket
15:24 mwk: each entry is 8 bytes long, so
15:24 mooch2: nope, that didn't work either
15:24 mooch2: you see, the first method the driver calls is method 0 with a parameter of 0x3
15:24 mwk: entry A of hash bucket B would be at RAMHT_base + B * 0x40 + A * 8
15:25 mooch2: with this hash function, i got an address of 0x18
15:25 mooch2: which, of course, is all zeroes
15:25 mwk: mooch2: RAMHT_base + B * 0x40 + A * 8
15:25 mwk: B is 3
15:25 mwk: so you should look at 0xc0
15:25 mwk: and then at 0xc8, 0xd0, 0xd8, 0xe0, 0xe8, 0xf0, 0xf8
15:25 mooch2: because the driver zeroed out all of ramht
15:26 mooch2: mwk, it wouldn't matter, the driver zeroed out the first 16kb of ramin
15:26 mwk: if none of these match, it's a hash error, which means it's an interrupt
15:26 mwk: hmm
15:26 mooch2: ah
15:26 mwk: and ther4e are no other writes to RAMIN?
15:26 mooch2: there are
15:26 mooch2: but they're only after 0x7000
15:27 mooch2: also, for some reason, it writes to ramin using the vram aperture
15:27 mooch2: not sure what's up with that
15:27 mwk: as in, via BAR1?
15:27 mwk: yeah, that's how it's done on NV3
15:27 mooch2: yeah
15:27 mooch2: oh weird
15:27 mooch2: i thought there was also a pramin block in mmio
15:27 mwk: RAMIN is at 0xc00000+ in this aperture
15:27 mwk: nope
15:27 mooch2: oh
15:28 mwk: there is a MMIO block on NV1, and on NV4 up
15:28 mwk: but NV3 is special
15:28 mooch2: ah fair enough
15:29 mooch2: mwk: so then what would be the right hash function then?
15:29 mwk: well, the one I mentioned above
15:30 mooch2: well, i put that in, and it still didn't generate high enough addresses
15:30 mwk: well, if your ramht size is 16kiB
15:30 mwk: aka 0x4000 bytes
15:31 mwk: then there's no way you can make a hash that sits at 0x7000+
15:31 mooch2: weird
15:31 mooch2: why is the only data there though?
15:31 mwk: so, either there's a driver write to RAMHT that you missed somehow
15:31 mooch2: i don't get it
15:31 mwk: or the object is simply not in RAMHT and you have to raise an interrupt
15:31 mooch2: weird
15:31 mooch2: RIVA 128 RAMIN write addr 00000000 val 00010000
15:31 mooch2: RIVA 128 RAMIN write addr 00000004 val ffffffff
15:32 mooch2: those are the only writes, and they're quickly succeeded by zeroes
15:32 mwk: the software object handling in PFIFO is a horrible mess, it's entirely possible that the driver simply doesn't write to RAMHT and expects an interrupt
15:32 mooch2: maybe
15:36 mooch2: mwk, so is it a hash error or a cache error?
15:37 mwk: both
15:37 mooch2: ah
15:37 mwk: hash error is a subtype of cache error
15:37 mooch2: ah
15:37 mooch2: how do you report that on nv3?
15:38 mwk: set the hash error bit in pull state
15:38 mwk: HASH == SOFTWARE or some other nonsensical name
15:38 mwk: and raise the cache error interrupt in 2100
15:38 mwk: also, set both bits in 2080
15:39 mwk: also, disable pull enable
15:39 mooch2: ah okay
15:39 mwk: the driver should read all the state and either swap the object in, or handle it manually and bump the get pointer
15:40 mwk: either way, once pull enable is set again, you set HASH == HARDWARE again, and resume processing
15:40 mooch2: ah okay
15:40 mooch2: uh, there's no hash reg
15:40 mwk: yeah, it's a bit in the pull state somewhere
15:40 mooch2: just pull_ctrl with a single bit hash_failed
15:40 mooch2: ah okay
15:40 mwk: yep, that one
15:48 mooch2: okay, i finally got all that implemented
15:48 mooch2: let's hope it works
15:49 mooch2: mwk, it didn't change anything
15:51 mooch2: mwk, for some reason, it seems to only be handing the gpu 8-bit handles
16:16 mooch2: mwk, weird, i just searched riva 128 and nv3 on pastebin and i got literally nothing lol
16:34 mooch2: mwk, the weird thing about these commands is that they seem to be semi-valid, like, if there was a valid objclass they'd make sense
16:34 mooch2: so i have no fucking clue what's going on
16:57 mwk: mooch2: FYI, "I found it on pastebin" is an euphemism for "I got it from questionably legitimate sources"
16:57 mwk: there is no actual pastebin involved
16:59 mooch: ah okay
16:59 mooch: sorry
16:59 mooch: still though, i wish i knew what was up with these commands
16:59 mooch: so i could figure out the objclass they're trying to use
17:00 mwk: and it refers to the practice of certain agencies
17:00 mwk: of "randomly" finding things on pastebin
17:00 mwk: (which they put up themselves 5 minutes before)
17:00 mooch: ah, fair enough
17:01 mooch: like, the method numbers are correct, but the parameters don't seem to be
17:01 mooch: it's bizarre
17:02 mooch: any ideas?
17:03 mooch: i've already implemented hash errors, but they aren't doing anything
17:03 mooch: and for some reason, all of these method 0 calls seem to set an objclass of 0
17:07 mooch: also, for some reason, ,my disassemblies of the win 3.11 driver files don't seem to match up with the code the emulator is running
17:07 mooch: in fact, that code more closely matches up with the nt4 driver for some reason
18:39 mooch2: mwk: https://hastebin.com/agepiwunaf.css
18:39 mooch2: here's a log of all the graphics commands sent
18:40 mooch2: again, the first 16kb of ramin is cleared by the driver
18:41 mooch2: this is the win9x driver this time, btw
19:03 mooch2: mwk, update, apparently, linux zeroes out all of ramht too, and puts data outside of it
20:25 mooch2: mwk, i've discovered one part of the problem. for sufficiently small handles, the hashes of the handles are exactly the same as the handles themselves!
20:35 mooch2: hey, can someone help me with ramht lookups?
21:06 HdkR: imirkin: pendingchaos: Fix coming for the bindless casting issue.
21:12 mjg59: n/win 24
21:13 mjg59: Oops
21:25 imirkin_: HdkR: do you agree that one should be able to do uvec2(bound image)?
21:26 imirkin_: and similarly for sampler?
22:23 HdkR: imirkin_: Correct
22:24 imirkin_: and then do what with them?
22:24 imirkin_: seems like passing them to another stage... quesitonable
22:24 imirkin_: or not?
22:25 HdkR: As uvec2 you should be able to do w/e you want with it even if it is an opaque type being represented as such
22:25 imirkin_: ok, but if you pass it to another stage
22:25 imirkin_: you now have a uvec2
22:25 imirkin_: which represents a handle
22:25 imirkin_: which is NOT a bindless handle
22:25 imirkin_: i.e. was not added to the list of bindless resources
22:25 imirkin_: and now you've sent it to a stage to which that resource was not bound
22:25 HdkR: ah, that's not bindless
22:26 imirkin_: you'd still expect that to work, for you to cast it back to an image and have it work
22:26 imirkin_: but that's what i mean by "bound image"
22:26 HdkR: right
22:26 HdkR: and that is intentionally left vague in the spec
22:26 imirkin_: super.
22:27 imirkin_: the one thing i like more than vagueness...
22:27 imirkin_: intentional vagueness!
22:27 HdkR: As soon as you cast I think it should be probably be treated as if it is bindless
22:27 imirkin_: i mean at the api level
22:27 HdkR: aye
22:27 imirkin_: you didn't call glMakeImageResident()
22:27 imirkin_: or whatever it's called
22:28 HdkR: oh right, dumb residency thing
22:28 imirkin_: you just did glBindImageTexture()
22:28 imirkin_: and glUniform1i("fooimage", 0)
22:28 imirkin_: the thing never had an explicit bindless handle allocated to it (that is api-visible)
22:29 HdkR: If you're handling everything as bindless under the hood and residency handling is already being handled in the bound cases then it would still work theoretically
22:29 imirkin_: sure
22:29 imirkin_: my question is whether this is all allowed by the spec
22:31 HdkR: Definitely would be nice to have some spec clarifications on that. Probably worth opening a bug on the github tracker
22:32 imirkin_: where?
22:33 HdkR: https://github.com/KhronosGroup/OpenGL-Registry/issues <--Isn't this a forum to discuss spec issues?
22:33 pendingchaos: there's also https://github.com/KhronosGroup/OpenGL-API/issues
22:34 HdkR: ah, that one is probably the better location
22:35 HdkR: I know Piers, Daniel, and Jeff all pay attention to that as far as I'm aware
22:39 mooch: is there anything that can cause the nvidia windows drivers to put weird shit in RAMIN
22:39 mooch: and also periodically erase all of RAMHT?
22:40 HdkR: imirkin_: I live in an ideal world where bindless never encounters the bounded world. Fully free :P
22:40 mooch: like, every value from 0x7818 onward is basically 0x3 | (constantly incrementing number << 12)
22:40 mooch: it's so stupid
22:42 imirkin_: HdkR: yeah, a world where someone else writes the driver :)
22:43 HdkR: :P