00:00 imirkin: drf?
00:01 gnurou: imirkin: basically, these guys: https://github.com/kfractal/nouveau/commit/31964fd7d11ba3601a56b92777771b78b4f0f358
00:02 gnurou: would replace hardcoded registers and bit-fields with proper names
00:02 imirkin: what's "drf"?
00:02 gnurou: Device Register Field :)
00:02 imirkin: ahhhh
00:03 gnurou: here is an example usage: https://github.com/kfractal/nouveau/commit/7fad3bda21b99e264f413bedfb9b1c35f00531cc
00:03 imirkin: my stance, which probably holds limited weight, is that it's all-or-nothing... that said if it was in just files named gk20a, i wouldn't care too much.
00:03 gnurou: (final macros will be shorter than in this example)
00:03 imirkin: [or gm20b]
00:04 gnurou: well it would mostly affect new code I believe
00:05 mupuf: hmm, on the one hand it helps documentation. On the other, it is super verbose but helps catching stupid errors like the one I made for the voltage control
00:05 gnurou: replacing all older code would be pretty time-consuming
00:05 imirkin: gnurou: not to mention would require generating the docs in the first place :p
00:06 gnurou: generating the docs?
00:06 imirkin: gnurou: well, the headers
00:06 gnurou: ah right
00:06 imirkin: with all the register names
00:06 imirkin: we use a lot of them
00:06 imirkin: the other problem is rnndb integration
00:06 imirkin: basically right now i can take a mmiotrace of blob driver
00:06 imirkin: and attempt to match it up to what nouveau is doing. or grep around relatively easily if i know what i'm doing
00:07 imirkin: having register numbers in both places helps that a lot
00:07 imirkin: with symbolic names it's constantly looking things up all over
00:07 imirkin: unless there's rnndb integration
00:07 gnurou: yeah, eventually I would like all the information that is used for these to be merged into rnndb, and have the headers generated from it
00:07 gnurou: but according to Ken there are some conflicts with existing information
00:08 gnurou: don't know what precisely
00:08 imirkin: either your docs or our docs are wrong :p
00:08 gnurou: and in that case, who should take precedence? :)
00:09 imirkin: the guys with the real docs, obviously
00:09 imirkin: but yeah, it'd be a significant project
00:09 gnurou: if rnndb integration is a requirement, we can try to work towards it - for now I'm wondering whether injecting the headers files as-is would be a satisfactory first step
00:09 sooda: the macro magic looked easy enough to tweak into generating some symbolic name database for rnndb. if the mmiotrace would look at that and map the trace to the names, then grepping would be as easy as with numbers
00:09 mupuf: gnurou: add yours as a comment when there is a conflict :p
00:09 imirkin: however if there were commitment to completing it, it could be done piecemeal
00:10 imirkin: sooda: mmiotrace does use rnndb for lookups
00:10 imirkin: sooda: or rather, demmio does
00:10 sooda: cool
00:10 sooda: (disclaimer: i've never used those yet)
00:10 gnurou: so rnndb is definitely the right place to put that in
00:14 gnurou: so as I understand it, nobody objects that these macros are a good alternative to hardcoded mmio addresses and bitfields? :)
00:15 imirkin: no, my objection is purely to having it be partially converted
00:15 imirkin: although it raises an interesting point
00:15 imirkin: let's say you guys generate all the docs, and we convert over
00:15 imirkin: and then we want to add some feature
00:16 imirkin: which uses some undocumented register
00:16 imirkin: then what
00:16 imirkin: start throwing in UNK1234's all over? i guess that wouldn't be the worst thing in the world...
00:17 sooda: the headers are huge, so i suppose there aren't too many of undocumented regs... but adding UNK* names into the headers would at least be consistent
00:17 gnurou: aren't the released engines fully documented? I would expect that, so there should not be missing registers
00:17 mlankhorst: interesting
00:17 imirkin: gnurou: if you say so. it appeared that it was the minimum amount of regs in each engine
00:18 imirkin: but i could have looked at an older version. or misunderstood the organization.
00:19 gnurou: imirkin: I am not sure myself, but the coverage seems exhaustive at least
00:20 mupuf: we could have a different folder with headers not coming from nvidia
00:20 mupuf: this way, no conflicts
00:21 mlankhorst: notvidia!
00:21 mupuf: and yes, we have regs that we documented, use and you did not document
00:21 imirkin: gnurou: so you're telling me that every PGRAPH and PFB register is documented?
00:21 mupuf: especially in ptherm
00:21 imirkin: gnurou: which means with a bit of guessing, we should be able to figure out all the isohub stuff too?
00:22 gnurou: imirkin: I am not saying that - let me confirm with Ken
00:22 mlankhorst: interesting
00:22 gnurou: mupuf: ok, so if ptherm it incomplete, there is no reason to expect other engines to be completely covered
00:22 mupuf: yep
00:22 gnurou: although I wonder *how* the released registers got selected
00:22 mupuf: but there are ways around it :)
00:23 mupuf: probably based on nvgpu's needs
00:23 skeggsb: gnurou: as i understand it, it's the list gk20a nvgpu used, extrapolated back to earlier chipsets
00:23 skeggsb: my sole objection to the idea is the incompleteness and having part of the driver completely different to the rest
00:26 mlankhorst: can't have the whole driver documented at once :)
00:26 gnurou: skeggsb: we can certainly release more stuff in the future - but we need to start somewhere
00:26 gnurou: mlankhorst: exactly
00:26 sooda: i suppose that nvidia has also internally some list of publicly-releasable regs, which probably doesn't cover everything and adding one line to the headers needs some legal bullshit round
00:27 gnurou: also it seems like some internal developments have started using these macros... we will be in upstreaming hell if we don't find a way to merge them somehow :/
00:27 sooda: even the trm doesn't cover many important parts of other things. it's a joke, not properly maintained
00:31 skeggsb: gnurou: yes, it'll be a process to get everything moved over
00:31 skeggsb: given past experiences, i wonder at what kind of lag we're talking about / if it'll actually eventuate
00:32 skeggsb: having the process started of the "low hanging" bits we touch (grep nvkm_* use, i guess) and having them go through review and get into the headers would be a decent start, that way the burdon of actually modifying the "old" code isn't entirely on you guys, all of us can chip away at it
00:33 skeggsb: i wouldn't even care if it wasn't *completely* documented, and only the bitfields we currently touch were covered etc. of course, ideally it'd be everything, but, small steps ;)
00:34 skeggsb: brb in a bit, dinner time
00:35 gnurou: skeggsb: yep. What I am seeking for at the moment is an overall agreement that these macros are a good direction to move to, because we have started investing in them
00:35 skeggsb: yes, i don't object to the idea at all
00:35 gnurou: skeggsb: so if we cannot get them merged eventually, we will need to rethink a lot of things and we'd rather do it now :)
00:35 skeggsb: i don't expect any of us do
00:35 gnurou: skeggsb: excellent then, thanks
00:36 gnurou: skeggsb: I also suppose a submission to rnndb is preferable to the headers being directly submitted?
00:37 imirkin: you suppose correctly. and note that headergen can be rewritten at will.
00:37 imirkin: e.g. freedreno uses rnndb and a totally different header generator
00:37 skeggsb: gnurou: i, personally, don't mind about that either way... but that's mostly because i rarely use it
00:38 gnurou: rnndb makes more sense, definitely - but keep in mind that some exiting names will very likely get changed
00:39 imirkin: gnurou: afaik nothing relies on the current names
00:39 gnurou: i.e. we will overwrite without consideration :) but hopefully this will be the "correct" data
00:39 skeggsb: your names are preferable anyway :P
00:39 gnurou: actually I think Ken attempted a merge to envytools in the past, but it got rejected for some reason...
00:39 imirkin: gnurou: but please talk to me before touching any of the 3d class docs -- those *are* used in the mesa driver, extensively.
00:40 imirkin: we can look into doing a giant rename, but it'd have to be done with love and care
00:40 imirkin: gnurou: iirc he was doing it in a very non-rnndb way
00:41 imirkin: gnurou: ideally you have one register once and mark it with variants it applies to
00:41 gnurou: aha, I see
00:42 imirkin: his generator was a bit more brute-force than that
00:42 gnurou: yeah, and it only covered gk20a
00:42 gnurou: but since his second-attempt headers were properly generated for all GPU generations, I believe we can achieve the desired result for rnndb
00:43 hakzsam_: imirkin, you are going to have major headaches if you try to re-generate all headers :) but sounds like a good idea
00:43 gnurou: well, that's gonna be some work of course
00:43 mlankhorst: quick add unk1234 everywhere
00:44 imirkin: hakzsam_: would have to create a remap table + clever awk script
00:44 hakzsam_: yeah
01:10 gnurou: uh, I kind of understand why Ken gave up on importing the data into rnndb
01:10 gnurou: it seems impossible to do properly
01:10 gnurou: e.g. the bitsets that are reused by several registers... there is no way we can preserve that
01:12 imirkin: those could be factored manually... it's pretty rare
01:12 imirkin: you could also build a cache into your generator
01:12 imirkin: and if you see 2 identical lists, factor them out
01:12 gnurou: ... or just ignore such things, what are they for besides making the xml smaller?
01:15 imirkin: understandability
01:17 skeggsb: i don't entirely find that easier tbh, it means skipping around when reading it
01:22 imirkin: but then you notice that regs 1, 2, 3, 4, 5 are all about the same thing
04:24 karolherbst: mupuf: okay, seems I was wrong about the pwm value. The voltage set seems to be right indeed
04:24 karolherbst: didn't had any crashes after your pwm fix
04:31 imirkin: sweet, my atomic counter code also works on fermi. i guessed it would, but only just now confirmed.
04:37 karolherbst: imirkin: with what does those atomic counter stuff helps? less overhead doing atomic operations?
04:37 imirkin: ARB_shader_atomic_counters
04:58 karolherbst: I see
05:50 RSpliet: karolherbst: they *are* atomic ops :-P
05:51 RSpliet: very useful for OpenCL as well
05:51 imirkin: mildly useful in GL imo, but... it's part of the spec
05:52 karolherbst: RSpliet: well I was thinking about native vs emulated atomic operations
05:52 imirkin: way easier to implement than images ;)
05:52 RSpliet: karolherbst: I don't think you can really emulate an atomic op without cache coherence
05:52 karolherbst: RSpliet: well libraries still do
05:53 karolherbst: pre ARMv6 only has swap as atomic operations, everything else is emulated
05:53 RSpliet: I thought it always had load-linked, store-conditional?
05:53 karolherbst: nope, only swap
05:54 RSpliet: did you have dual-core ARMv6's?
05:54 karolherbst: I said pre ARMv6
05:54 RSpliet: pre-ARMv6's
05:54 karolherbst: no, but there is a reference to that in the glibc source
05:55 RSpliet: because atomics don't make a huge deal of sense in single-core machines with no or physically tagged caches
05:55 karolherbst: welll
05:55 karolherbst: you don't know compiler well enough, do you? :p
05:56 karolherbst: C++ is a bitch here
05:56 RSpliet: doesn't have much to do with the compiler
05:57 RSpliet: as long as you don't get a context-switch mid-addition you should be fine... it's a whole different kind of emulation compared to what you'd do on a (massive-) multi-core machine
05:57 karolherbst: yeah right
05:57 karolherbst: but in the emulation code you can get a nasty context switch
05:57 karolherbst: that's what I menat
05:58 RSpliet: yes, I got that
05:59 karolherbst: anyway, that's what I thought imirkin did: add some native atomic support hence removing overhead through emulation code
05:59 RSpliet: nah, you can't emulate atomics on an NVIDIA GPU
05:59 karolherbst: I see
06:40 karolherbst: what do you think about this? https://gist.github.com/karolherbst/bc807c678588798603ff
06:40 karolherbst: while the gpu is off
06:40 imirkin_: sgtm
06:42 karolherbst: the patch looks a bit hacky though: https://github.com/karolherbst/nouveau/commit/a00fe8e693f0e73f20e59397f67347b47cf9c45d :/ I really want to do it cleaner
06:43 imirkin_: could avoid calling it in the first place
06:47 karolherbst: mhhh
06:47 karolherbst: well
06:48 karolherbst: I thought it is a good idea to print the current pstate table nethertheless
06:48 karolherbst: mhh maybe
06:49 karolherbst: imirkin_: mhh the fixed shouldn't be inside debugfs, because those interfaces are designed to be used also from userspace applications afaik
06:49 imirkin_: you can still get the pstate table
06:49 imirkin_: just don't get the current pstate info
06:49 imirkin_: "the fixed"?
06:49 karolherbst: *fix
06:49 imirkin_: right now this is all very debugfs-appropriate
06:52 karolherbst: I know, but the thing is, that nvkm_clk_read is called when the gpu is off, even if we fix that in sysfs/debugfs, another caler might mess that up
06:52 karolherbst: that's why I want to rather fix that inside device/ctrl
06:54 imirkin_: so just do like
06:54 imirkin_: if (!pm_runtime_on()) return -EAGAIN
06:54 imirkin_: instead of filling in bogus data
06:55 karolherbst: mhhh right, this thing is called for every pstate
06:55 karolherbst: ...
06:55 karolherbst: yes, that was too easy for me :D
06:55 imirkin_: i mean in the bit where it actually reads stuff
06:55 imirkin_: but so what... every nvif call has to check for pm_suspended now?
06:55 imirkin_: seems a little crazy
06:56 karolherbst: mhhh no, only if they read something out of the gpu
06:56 imirkin_: which is like 99% of them
06:56 karolherbst: well
06:56 karolherbst: it depends
06:56 karolherbst: I think if we actually care enough we shold fix that
06:56 karolherbst: currently echoing something into pstate is not a good idea while the gpu is off
06:56 karolherbst: locks the process until I don't know
08:53 Lekensteyn: With runtime PM enabled, is it possible to disable the GPU when no external monitor is connected (on Optimus laptops)?
08:54 imirkin_: it should auto-disable
08:54 imirkin_: it should say 'DynOff' in vgaswitcheroo
09:16 Lekensteyn: imirkin_: oh right, it does that automatically. Neat!
09:54 karolherbst: mupuf: I was thinking a bit about the pdaemon host communication regarind reclocking. I think if we are smart enough, it's entirely possible to do that without needing ACKs from the host at all
10:04 karolherbst: mupuf: I think all of the acks can be implicilty optimized away, and also the need of knowing the pstate/cstates and everything
10:16 karolherbst: mhh how can I call stuff from a falcon on the host?
10:16 vedranm: imirkin_: OK
10:17 vedranm: is Fermi the only series where compute is enabled at the moment?
10:17 imirkin_: karolherbst: you feed the data port iirc
10:17 imirkin_: vedranm: no, kepler and soon tesla will have it "enabled"
10:17 karolherbst: imirkin_: I mean I know how I can send data to the host after the host called something on the falcon
10:17 imirkin_: vedranm: but that enablement won't do an end user any good
10:17 karolherbst: but now I have like a timer thingy on the falcon and this thing decides now to send somethiong to the host
10:18 imirkin_: karolherbst: intr
10:18 vedranm: imirkin_: will it run at least hello world stuff?
10:18 vedranm: like, sum 2 vectors
10:18 imirkin_: vedranm: yes and no... if you write that program appropriately, yes. but not in any standard way.
10:18 karolherbst: imirkin_: okay, I guessed that much... mhh do I have to set a bit inside 0x10a008 and handle that bit inside nvkm_pmu_intr?
10:19 imirkin_: karolherbst: something like that
10:19 vedranm: imirkin_: OK
10:19 karolherbst: imirkin_: and I bet htere are reserved bits for falcon only usage
10:19 vedranm: regardless, it's an epic milestone
10:19 imirkin_: karolherbst: there's a way to generate intr's from the falcon, but tbh i don't know what it is. pretty sure we do it elsewhere though.
10:19 imirkin_: vedranm: one that was reached years ago
10:19 vedranm: imirkin_: so you mean that
10:20 vedranm: 's just a bool setting somewhere that was switched and no particular improvements were done recently?
10:20 imirkin_: vedranm: that's right.
10:20 imirkin_: well, very minor improvements.
10:20 imirkin_: to make it conflict less with the 3d pipeline on fermi, making it enableable by default
10:25 vedranm: imirkin_: I see
10:25 vedranm: thanks for the info
10:26 vedranm: to be honest, I am very sad we have no open source compute on either radeon or nvidia
10:26 imirkin_: vedranm: radeon supports OpenCL 1.1
10:26 karolherbst: intel too
10:26 vedranm: imirkin_: yes, but it has no image support
10:27 imirkin_: vedranm: that's different than "no open source compute" :p
10:27 vedranm: karolherbst: does intel beignet offer image support?
10:27 vedranm: imirkin_: you are right
10:27 imirkin_: i think so, yea
10:27 karolherbst: no idea, never test it
10:27 vedranm: looks worth giving a shot, but I have no Intel CPU anywhere >D
10:27 vedranm: :D
10:28 karolherbst: well :D
10:28 karolherbst: opencl on the intel hd isn't that fast
10:28 karolherbst: it's faster in total when you use the cpu and the gpu
10:28 karolherbst: but still
10:28 vedranm: even on Win? limited hardware?
10:28 karolherbst: tested on linux
10:29 karolherbst: only
10:29 vedranm: OK
10:29 vedranm: thx for the info
10:57 john_cephalopoda: Hey
12:08 ulteq: help
12:08 ulteq: sorry forgot the /
12:30 imirkin_: skeggsb: btw, did you see my question about GF117 pmu? it's not hooked up and causes an oops on v4.3... probably should hook it up right? (and cc that to stable)
12:51 imirkin_: joi: hakzsam_ has a trace where the compute channel for some reason isn't coming up in demmt -- it's a cuda app -- http://people.freedesktop.org/~hakzsam/traces/gt218/ (the cuda + gl trace)
12:52 imirkin_: joi: i haven't super-investigated it yet, but perhaps you have a few to take a glance and point me in the right direction?
12:52 imirkin_: joi: it's fd 26 which has the pushbuf, but i don't remember how exactly that info is communicated to the kernel
12:52 imirkin_: (i'm sure you do)
13:11 imirkin_: joi: oh i see... looks like they have 2 separate IB's open
13:11 imirkin_: and demmt likes for there to only be one
13:13 pmoreau: imirkin_: Hey! :-) Do you have any advice on how to debug / find which optimisation pass is messing some registers assignment?
13:13 imirkin_: pmoreau: yeah, try with NV50_PROG_OPTIMIZE=0
13:13 imirkin_: then =1
13:13 imirkin_: then =2
13:14 imirkin_: figure out which one it fails on
13:14 imirkin_: then manually bisect the pass list by commenting stuff out
13:14 pmoreau: I have something like `a = f1(b); c = f2(a)` and this gets converted to `a = f1(b); c = f2(e)`, with `e` that was never set nor used
13:14 pmoreau: Ok
13:14 imirkin_: pmoreau: if you want me to look at it
13:14 imirkin_: you know what i like to see :)
13:14 pmoreau: backtraces! :D
13:14 imirkin_: no
13:15 imirkin_: full debug logs
13:15 pmoreau: Just kidding
13:15 pmoreau: I will put them some where :-)
13:18 pmoreau: https://phabricator.pmoreau.org/P60
13:19 pmoreau: I'll paste the NV50 IR generation code, I'm sure I'm doing a lot of things wrong
13:20 imirkin_: pmoreau: original code might be nice too
13:21 pmoreau: https://phabricator.pmoreau.org/P61
13:22 imirkin_: er what
13:22 imirkin_: lol
13:22 pmoreau: It's not the whole source code though
13:23 pmoreau: The remaining parts are scattered all around, but the generation does work for it.
13:24 imirkin_: your code looks wrong on many levels
13:24 pmoreau: :D
13:24 imirkin_: level#1: why do you a do a cmp for the "else" case? shouldn't that just be an unconditional branch?
13:24 imirkin_: level #2: you're using tree edges everywhere!
13:25 pmoreau: I have no idea what those tree edges are, I just saw them being used in nv50_ir_from_tgsi and tried to use them as well. :/
13:25 imirkin_: i explain it here:
13:26 imirkin_: http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_graph.cpp#n339
13:27 pmoreau: level#1: The else case is not really an else case: I first branch for id==0, then id==1, and then id!=2. That id!=2 case is what I name the else case. but I still need the comparison, as otherwise I do the id==2 case.
13:27 pmoreau: Thanks!
13:27 imirkin_: level#100
13:27 imirkin_: you can't have code the way you have it :)
13:28 pmoreau: :-(
13:28 imirkin_: so every BB has to be organized thusly:
13:28 imirkin_: code. branches.
13:28 imirkin_: code can't contain any branches, and branches can't contain any non-flow ops
13:29 imirkin_: that's the point of a basic block -- it has no branches :)
13:29 imirkin_: you also have to set join bb's & co
13:29 pmoreau: Hum… I followed SPIR-V, where every BB ends with a branch, so I tried to add each BB after a branch
13:29 imirkin_: otherwise you get eternal sadness
13:30 imirkin_: look at you rbb
13:30 imirkin_: you have ne %r8 bra BB:4 in the middle
13:30 imirkin_: you can't have none of that
13:30 imirkin_: you could do all your comparisons first
13:31 imirkin_: and then have 2 conditional branches at the end -- that's kosher
13:31 pmoreau: Oh!! That's true
13:31 imirkin_: anyways, codegen makes a lot of assumptions about how code is structured
13:32 imirkin_: and will happily eat your code if you violate them
13:32 pmoreau: I completely forgot about those conditional branches in the middle of the BB
13:32 pmoreau: :D
13:33 imirkin_: also if you make Converter an instance of BuildUtil, then happy-times
13:34 pmoreau: Converter inherits from BuildUtil, I followed the same path as from_tgsi, or your LLVM branch maybe
13:36 imirkin_: pmoreau: so why all the nv50_ir::BuildUtil::foo junk?
13:36 imirkin_: don't tell me i did that in my spir converter...
13:37 imirkin_: if i did, i might have had a reason, like namespace collision with llvm
13:37 pmoreau: I like to be explicit to be sure from those functions come
13:37 pmoreau: No, you didn't IIRC
13:37 pmoreau: s/from/from where
13:37 imirkin_: it was a bit annoying since there was nv50_ir::Function and llvm::Function (and so on)
13:37 imirkin_: leading to tons of type confusion
13:48 imirkin_: HCE_RE_ILLEGAL_OP
13:48 imirkin_: heh
13:48 imirkin_: is that like "in regards to the illegal op"? :)
14:03 glennk: polite hardware
14:09 karolherbst: RSpliet: can you help me with how to send and interrupt from a falcon to the kernel?
14:17 skeggsb: imirkin_: yes, we should probably wire that up
14:18 imirkin_: gnurou's patch accidentally fixed the crash btw
14:18 imirkin_: or rather... the fix was on purpose, but accidental wrt GF117
14:19 karolherbst: skeggsb: hi, by any chance, do you know how I can send data from an falcon timer interupt to the kernel?
14:58 karolherbst: yay, it seems to be easy to do actually :/
14:59 RSpliet: oh yes, sorry, I wasn't around :-P
15:00 karolherbst: no worries
15:00 karolherbst: I just have to call(send)
15:00 karolherbst: it was just too easy for me :D
15:00 karolherbst: or I lacked the needed creativity to just try that out
15:00 karolherbst: mhh ohh wait, nvm
15:00 karolherbst: then I need to handle that in nvkm_pmu_recv I guess
15:04 RSpliet: yep, otherwise I take it it's just discarded
15:07 karolherbst: yeah
15:07 karolherbst: okay, I think I can work with that
15:24 karolherbst: nooo my cursor went missing :/
15:31 pmoreau: Seems to work better now. :-)
15:31 pmoreau: Still a couple of things to fix, but the control flow should be in a better shape
15:31 imirkin: now that you don't have control flows in the middle of bb's?
15:31 pmoreau: (Couldn't be much worse I guess)
15:31 pmoreau: Eh eh!
15:32 john_cephalopoda: Hey, I had a few problems with framerate over the last days but it turns out that I didn't compile mesa with 3d accel. Everything is so fast. I just wanted to say "thank you" for your great work.
15:32 imirkin: pmoreau: and it's inserting joinat/join's yes?
15:32 john_cephalopoda: Bye! :)
15:33 pmoreau: imirkin: https://phabricator.pmoreau.org/P60
15:33 pmoreau: With the insertConvergenceOps? Not yet
15:34 pmoreau: And https://phabricator.pmoreau.org/P61
15:34 imirkin: ok, that means you're still doing something slightly wrong
15:35 imirkin: but the join's are not required
15:35 imirkin: they're just there to have good performance
15:35 pmoreau: Well, I could learn about them while I'm at it
15:36 pmoreau: The join is the insertConvergenceOps, right? Or is there something else that I missed?
15:36 imirkin: i forget the details
15:36 imirkin: and i forget how they appear
15:36 imirkin: sorry :(
15:36 pmoreau: No problem
15:37 imirkin: wtf. and u16 $r0 $r0 0x0003
15:37 imirkin: is that a thing?
15:38 pmoreau: Hum, the aim was $rX = $r0h & 0x3
15:38 imirkin: is *that* a thing?
15:39 imirkin: not according to envydis
15:39 imirkin: er, envyas
15:40 pmoreau: local_id(1) == $r0h & c1[0x0] according to tracing the blob, and for some reason I extrapolated that c1[0x0] == 0x3
15:40 imirkin: what opcode do they actually use
15:43 pmoreau: One sec, I messed somewhat some traces
15:44 pmoreau: `d0840201 00400780 and b16 $r0l $r0h c1[0x8]`
15:46 pmoreau: Oh right, for some reason, rather than comparing to 1 or 2 for the dimension, they compare to c1[0x0] and c1[0x4].
15:46 imirkin: yeah, *that* works
15:47 imirkin: but you can't have a 32-bit result with 16-bit sources for those alu ops
15:47 imirkin: only for mul
15:47 imirkin: (and mad)
15:49 pmoreau: Hum, need to change the loadImm to mkImm
15:49 pmoreau: As loadImm is generating an u32
15:49 imirkin: and u16 %r7 %r2 %r10 -- that's no good
15:49 imirkin: well, you can just make a 16-bit variant of loadImm
15:49 imirkin: or a loadImm16 to be explicit
15:49 pmoreau: True :D
15:50 imirkin: i'm always pretty afraid of doing something dumb with those type overrides
15:55 pmoreau: Once it works, I'll try to find out where that function (and similar ones) could be placed, as they are independent of the input IR
15:55 pmoreau: So Hans can easily reuse them as well
16:03 joi: imirkin: if you move pb_pointer_found to be per device, it should work(tm)
16:04 imirkin: joi: ah yeah. i'm going to have to investigate what the various data structures are :)
16:06 joi: basically move this variable to struct nvrm_device, add getter/setter and use it in buffer_decode
16:07 joi: look at nvrm_get_chipset for inspiration ;)
16:07 imirkin: ok thanks
16:08 imirkin: joi: i also need to figure out a good place to print texture/sampler descriptors for kepler+...
16:08 imirkin: i'm thinking at draw time or something
16:10 joi: what changed from previous gens?
16:11 imirkin: joi: bindless
16:11 imirkin: no more BIND_TSC/BIND_TIC
16:11 joi: oh
16:11 joi: so how it's selected?
16:12 imirkin: constbuf
16:12 imirkin: you decree that constbuf N shall contain references to tic/tsc's
16:12 imirkin: and then the texture instruction knows to read from that constbuf
16:12 imirkin: (and there's a pushbuf method to set N)
16:13 imirkin: so i have to keep track of which constbuf is N... and that can map to a diff buffer for each shader type, great
16:14 imirkin: alternatively i can track uploads to the TSC/TIC buffers
16:14 imirkin: that might be easier
16:15 joi: but it won't work if blob will write tsc/tic data in non-sequential manner, right?
16:16 joi: (or patch part of it)
16:16 imirkin: i'll shoot it.
16:16 imirkin: non-sequential is fine
16:16 imirkin: since i know where the buffer is
16:24 imirkin: mlankhorst: were your max-texture-size tests running -fbo -auto, or were they displaying things to the screen?
16:25 karolherbst: yay, it works
16:26 karolherbst: https://gist.github.com/karolherbst/ccdd52b03848507b2fb8
16:27 karolherbst: 0xff000000 pcie 0x00ff0000 memory, 0x0000ff00 video and 0x000000ff core, and currently I only check the core
16:27 karolherbst: and this is run every 0.1 seconds on the pmu
16:29 karolherbst: so who want to try out dyn reclocks on gt215+ cards tomorrow?
16:35 gnurou: imirkin: re: 7:18, which crash are you talking about?
16:35 imirkin: "7:18"?
16:36 gnurou: Time of your message :)
16:36 imirkin: in what tz?
16:37 gnurou: Damn >_<
16:37 imirkin: sorry, i totally forgot what i said, nor can i find it
16:37 gnurou: You mentioned a patch of mine that accidentally fixed a crash
16:37 imirkin: ohhh
16:37 imirkin: the pgob thing
16:37 imirkin: where it assumed that there was a pmu
16:38 imirkin: but there isn't one for gk20a
16:38 imirkin: which is on purpose
16:38 gnurou: Ah, that one
16:38 imirkin: but there's also not one for gf117, which is accidental
16:38 gnurou: I was hoping something more spectacular :)
16:38 imirkin: nope, just that one-liner
18:28 airlied: gnurou: please reply to Andrew asap :)
18:28 gnurou: airlied: yep - I'm not sure what a proper fix will be though :/
18:29 gnurou: I'm afraid this workaround is the best we will have for the time being
18:29 airlied: I'll apply my hack for now
18:29 gnurou: sounds good
18:30 gnurou: I will check again whether there is no other way to do what we need and wil reply to Andrew
22:42 skeggsb: imirkin: i've got some more fixes to come, and some additions to the ones already there.. didn't quite finish the series today, but what's in the tree now should work well enough for what you want
22:43 skeggsb: imirkin: and the ce issue - airlied's patch on mesa3d-dev fixes the cause of it too
23:45 mlankhorst: imirkin: -fbo -auto in parallel
23:49 airlied: skeggsb: does something traverse the instobj list? surely something must and that would need locking