01:53 pmoreau: Stupid me… Of course I might have an incredibly high amount of temporaries: I never initialised `prog->tlsSize` since I thought it would be done automatically… --"
02:55 hakzsam: karolherbst, git grep GALLIUM_HUD docs
07:00 pmoreau: imirkin: Ping, did you see my comments from yesterday about the mysterious reg with one def but no insn?
09:24 imirkin: pmoreau: ok, so it has a def because it's an input, but it's not an actual instruction =/
09:24 imirkin: sad!
09:24 pmoreau: Yeah…
09:26 pmoreau: Do you think this can happen for regular unused inputs of a function, or will one of the pass before RA remove them?
09:31 pmoreau: imirkin: Should we add some additionnal pass to remove those unused inputs, or simply tell the RA pass that having defs does not always give you an insn?
09:36 karolherbst: mhh can anybody tell me how the process stuff works on those falcons?
09:37 karolherbst: is this like all virtual or is there a real context switch and stuff like that?
10:00 mwk: karolherbst: there are no special context switching features in falcon hw, but it's rather easy to make an interrupt handler that does a software switch
10:00 mwk: so... pretty much the usual
10:04 karolherbst: mwk: yeah but I was thinking how those "processes" work on the pmu
10:04 karolherbst: is a process just a virtual number and some magic inside call:?
10:09 mwk: what kind of process are you talking about?
10:09 mwk: the blob ones?
11:46 karolherbst: mwk: well I meant the stuff like memx, perf and so on
13:04 tacchinotacchi: i'm curious
13:05 tacchinotacchi: when you guys try to reverse engineer reclocking
13:05 tacchinotacchi: do you just watch the messages sent to the gpu by the driver, or you also disassemble it?
13:06 karolherbst: what you mean by "disassemble" it?
13:06 imirkin: tacchinotacchi: none of the above
13:06 tacchinotacchi: look at the opcodes of the nvidia blob, try to see what functions move the clock
13:06 tacchinotacchi: what do you do?
13:06 karolherbst: imirkin: well falcon could be disassembled though :/
13:06 karolherbst: *code
13:06 tacchinotacchi: yes, why don't you disassemble
13:07 karolherbst: actually we do disassemble falcon binaries, but that only helps for memory reclocking on gt215+ cards
13:07 tacchinotacchi: nvidia surely doesn't share specs, but i hope at least they didn't dive into the assembly and deliberately obfuscate them
13:07 karolherbst: and by "binaries" I mean stuff sent to the gpu through mmio
13:07 tacchinotacchi: no i mean the module
13:07 karolherbst: and no, we don't disassemble the nvidia binaries
13:07 karolherbst: because for I guess legal reasons?
13:08 tacchinotacchi: mmhh
13:08 tacchinotacchi: how would they know :D
13:08 karolherbst: tacchinotacchi: there is this thing called mmiotrace
13:08 karolherbst: mhh
13:08 karolherbst: because their lawyer gets more money
13:08 tacchinotacchi: you shouldn't tell that in irc though
13:08 karolherbst: and how can we tell them how we found it out?
13:09 karolherbst: anyway, disassembling stuff is tough and would actually need more time than looking at mmiotraces
13:09 tacchinotacchi: you already do mmiotrace
13:09 karolherbst: yes
13:09 tacchinotacchi: i knew about it, i didn't think it would be easier than disassembling
13:10 tacchinotacchi: i sent one once, maybe i don't remember exactly but it was various megabytes
13:10 tacchinotacchi: pretty big
13:10 karolherbst: yeah well, you have to mark or extract those important parts
13:10 imirkin: tacchinotacchi: a lot of it is looking at traces, a lot of it is changing the vbios to see what the blob does differently as a result
13:10 imirkin: tacchinotacchi: and yes, we also look at the "high level" opcodes being sent by the blob to the falcon units
13:11 karolherbst: "high level" :D
13:11 tacchinotacchi: high level opcodes?
13:11 imirkin: well, it's not like falcon isa or anything -- look for the SEQ isa
13:11 tacchinotacchi: well i'm lost with this low level stuff anyway
13:11 imirkin: http://envytools.readthedocs.org/en/latest/nvrm/pmu/index.html
13:12 imirkin: so we're able to see what reclocking script is uploaded for any particular situation, all in the mmiotrace
13:13 tacchinotacchi: why is it so hard to implement then?
13:13 tacchinotacchi: sorry for the dumb question
13:13 imirkin: well, you have to know what to put into the reclocking script
13:14 imirkin: where to find the various values, how to compute them, etc
13:15 imirkin: and given that we're not individually privvy to having ALL the hardware ever produced, we tend to resort to vbios fuzzing to see what the blob will generate differently as a result
13:15 karolherbst: tacchinotacchi: it isn't like 5 "commands" to execute but more like 100+
13:15 karolherbst: and each of these have to do the "right2 thing
13:16 tacchinotacchi: wow i can't even think how such work looks like
13:16 tacchinotacchi: i should stop calling myself a programmer
13:18 karolherbst: tacchinotacchi: subdev/fb/ram* and s/g ddr files
13:18 karolherbst: tacchinotacchi: that's for kepler memory stuff: https://github.com/karolherbst/nouveau/blob/master_4.3/drm/nouveau/nvkm/subdev/fb/ramgk104.c :D
13:19 tacchinotacchi: it's not the first time they send me a piece of reclocking code
13:19 karolherbst: just that you get a feeling how that works :D
13:19 tacchinotacchi: i don't get it
13:19 tacchinotacchi: asd
13:19 karolherbst: no problem I don't get it too
13:19 tacchinotacchi: i actually don't know how a linux driver works
13:20 karolherbst: mhh
13:20 karolherbst: well
13:20 karolherbst: usually you use APIs
13:20 tacchinotacchi: if i have to read a default program, i look for main or an entry point
13:20 karolherbst: like for any other application or library, just that you program inside kernel space
13:20 karolherbst: and usually do some I/O stuff
13:20 karolherbst: tacchinotacchi: well SDL based appications doesn't have their own main usually
13:20 karolherbst: and "main" is also just an ABI thing you use
13:21 tacchinotacchi: yes, but that's an ABI thing almost everybody uses
13:21 karolherbst: yeah well, you have to start the applcation somehow though and glibc handles that for GNU based systems
13:21 tacchinotacchi: so when sometimes uses an API i don't know that puts his own entry point i'm also lost, like for QT apps if i understand well
13:22 karolherbst: it's all about APIs in general
13:22 karolherbst: tacchinotacchi: well in enterprise java application you also have no main ;)
13:22 karolherbst: so this isn't a kernel thing at all
13:23 karolherbst: main afaik is pretty much a C/C++ thing, maybe some older languages also have that
13:23 karolherbst: no idea though
13:24 RSpliet: public static void Main()
13:24 karolherbst: RSpliet: well and if you have like 20 of them?
13:24 RSpliet: ^ that's your Java equivalence
13:24 karolherbst: RSpliet: yeah, but in java that stuff works not like a main in C/C++
13:25 RSpliet: it probably works exactly the same, it's an agreed-on entry point for a linked binary
13:25 karolherbst: mhhh
13:25 karolherbst: not really
13:25 karolherbst: in java it is the entry point for this _class_
13:25 karolherbst: not binary
13:25 RSpliet: sure, and C doesn't have classes
13:25 karolherbst: ever wondered why you start a java application with a class argument when there is no default one defined in the jar?
13:25 tacchinotacchi: well, java program are just a bunch of classes
13:26 karolherbst: yeah and each class can have its own "main" function
13:26 tacchinotacchi: it just happens there is a class with a main method which is the first called by default
13:26 tacchinotacchi: or a default entry class in a jar
13:26 RSpliet: conceptually there's no difference
13:26 karolherbst: no, there are classes with main functions
13:26 karolherbst: not "a class"
13:26 tacchinotacchi: android apps have no main method
13:26 karolherbst: RSpliet: yeah, from one point of view you are right, but they differ quite much though
13:27 tacchinotacchi: ELF executables do have an entry point
13:27 tacchinotacchi: it's not the main function, but they have one
13:27 karolherbst: right
13:27 karolherbst: from a static libc file usually
13:29 karolherbst: RSpliet: but if I think about it, they are closer than I first though actually, because both are being called from the application runtime
13:36 tacchinotacchi: i'll set your driver as my ultimate goal for some time
13:55 karolherbst: mupuf: so now I also disabled all tmr interupts and guess what, still some get lost from the pmu :/
13:56 karolherbst: and nvkm_mc_intr doesn'T get it
14:37 karolherbst: aha
14:37 karolherbst: k
14:37 karolherbst: the IRQ gets lost inside nouveau somewhere
14:39 karolherbst: sooo
14:39 karolherbst: nvkm_rd32(device, 0x10a008) & disp & ~(disp >> 16) is just 0 for that IRQ :/
15:03 karolherbst: skeggsb: found the pmu issue
15:04 karolherbst: skeggsb: nvkm_mc_intr_mask returns 0
15:06 karolherbst: skeggsb: and then nvkm_rd32(device, 0x10a008); also returns 0
15:06 karolherbst: like if the interrupt isn't configured, but there is one we expect
15:11 karolherbst: ohhhhhh wait
15:11 karolherbst: actually this only happens sometimes
15:11 karolherbst: sometimes the mask is 0, sometimes it has the right value
15:12 karolherbst: but then the 0x10a008 can be still 0
15:12 karolherbst: maybe this is just a stupid timing issue
15:22 tacchinotacchi: wonder why nobody's working on fermi
15:23 tacchinotacchi: well i'll be off to sleep
15:23 tacchinotacchi: enjoy your superior intelligence
16:03 skeggsb: karolherbst: i've been looking at it a bit so far this morning, and can't find a good reason for it so far either...
16:03 karolherbst: :/
16:03 karolherbst: at least we know now that the hardware sends the IRQ
16:03 karolherbst: and that nouveau gets it
16:03 karolherbst: this is at least _something_
16:04 karolherbst: skeggsb: I tried reading the 0x10a008 reg inside a timeout loop
16:04 karolherbst: and later I get the value 2 out of it
16:04 karolherbst: well sometimes
16:04 karolherbst: or I saw it only once
16:06 airlied: skeggsb: do I have a -next to find somewhere yet?
16:06 skeggsb: airlied: ah, right. i'll do that now before i continue with other things
16:07 karolherbst: what about the pcie stuff? :D
16:07 karolherbst: didn't got any reply from you :p
16:07 skeggsb: i merged it, i think..
16:08 karolherbst: k
16:09 skeggsb: apparently i didn't push it though
16:10 karolherbst: skeggsb: when you are done with the pusing and -next thing: nv_iowr(NV_PPWR_INTR_TRIGGER, ...) <= is this all what has to be done to fully configure those IRQs or is there something else needed on the pmu?
16:12 karolherbst: skeggsb: k, so I got 2 again by the way
16:14 karolherbst: okay, so three times in a row I could recover the 0 to a 2
16:14 karolherbst: I bet something is messing with it somehow
16:18 karolherbst: and then it changes back into a 0 :/
16:19 skeggsb: oh, hangon, i have an idea
16:20 karolherbst: I love ideas :)
16:32 karolherbst: oh yeah nice, working 361 driver :)
16:33 skeggsb: https://github.com/skeggsb/nouveau/commit/c62c59910255b8ffaaa7c8945c6156be3af145c6
16:33 skeggsb: there's usually a simple explanation :)
16:34 karolherbst: :D
16:34 karolherbst: testing this out
16:35 karolherbst: I looked at this code though :O
16:36 karolherbst: skeggsb: at which rate would you consider this stable?
16:37 karolherbst: one error in 1M or in 1G requests?
16:37 skeggsb: zero errors?
16:37 imirkin_: karolherbst: just do infinity :)
16:37 karolherbst: :D
16:37 skeggsb: i just quickly tested with "while (true);do cat current_load; done"
16:37 karolherbst: yeah
16:38 karolherbst: I have a variable inside that
16:38 karolherbst: so I know how many runs I did
16:38 karolherbst: but it seems better now
16:38 karolherbst: 60k calls without issues
16:38 skeggsb: yes, it lasted far longer than any previous attempt while debugging it this morning
16:38 skeggsb: (ie. it didn't fail before i decided to post the patch)
16:38 karolherbst: :D
16:38 karolherbst: k
16:38 karolherbst: I tested this and had like 20 failures in 1M calls
16:39 karolherbst: so this case hit pretty rare already :/
16:39 karolherbst: if that works, then we can debug the other error case
16:39 karolherbst: maybe it is the same
16:39 skeggsb: it depends on a falcon-routed interrupt occuring at the right (wrong?) time
16:39 karolherbst: yeah I guess
16:39 karolherbst: so 200k calls without issues
16:40 karolherbst: this is already much better
16:40 karolherbst: do we still want to have such a workaround as mine though? For dynamic reclocking this might come in handy maybe
16:40 karolherbst: k, now the more aggressive stress test
16:41 karolherbst: and died
16:41 karolherbst: after 16 calls :D
16:41 karolherbst: skeggsb: do this: i=0; while true; do echo $((i=$i+1)); cat current_load >/dev/null; echo 07 > pstate; echo 0f > pstate ; done
16:41 skeggsb: the board i have plugged at the moment is fermi, so, that's going to be fail for sure
16:42 karolherbst: ohhh meh
16:42 karolherbst: k
16:42 karolherbst: but the situatio is different here
16:42 karolherbst: there is no reply queued
16:43 karolherbst: skeggsb: 4k loops and 14 replies lost
16:43 skeggsb: well, hunt for the reason why :P
16:44 karolherbst: yeah iwll do
16:48 karolherbst: skeggsb: aha!
16:48 karolherbst: you won't believe that thing :D
16:48 karolherbst: skeggsb: pmu: data 0:1000000 1:0
16:48 karolherbst: this looks very wrong somehow :p
16:49 karolherbst: or does it?
16:49 karolherbst: ...
16:49 karolherbst: ohhh no
16:49 karolherbst: something else should be wrong
16:49 karolherbst: it is still a weird answer
16:50 karolherbst: ohh wait no
16:50 karolherbst: this is the current_load stuff
16:50 karolherbst: the 1 just means there is some messured pcie load
16:50 karolherbst: ...
16:52 karolherbst: k, so the IRQ is lost in this case for sure
16:52 karolherbst: even nvkm_pci_intr doesn't get it
16:54 karolherbst: skeggsb: soooo now I need to now how that process stuff works.
16:55 karolherbst: basically current_load calls something inside perf and the pstate changes insode memx
16:55 karolherbst: as long as I only do one of these things at the time, everything is fine now
16:55 karolherbst: but when I mix it, there is a rather high chance it messes up
17:27 karolherbst: skeggsb: I think this one is easy: the order of the requests changes for whatever reasons ./
17:34 memleak: Hi all, I just wanted to say thank you for the nouveau driver, it's faster than the closed driver for windows (Quadro NVS 140M)
17:35 memleak: your work is highly appreciated, take care!
17:36 karolherbst: skeggsb: can you tell me why sometimes the order of the pmu stuff gets odd when doing "i=0; while true; do echo $((i=$i+1)); cat current_load >/dev/null; echo 07 > pstate; echo 0f > pstate ; done" =
17:36 karolherbst: ?
17:36 karolherbst: skeggsb: https://gist.github.com/karolherbst/a66ea5843010cac12027
17:37 karolherbst: there are always 4 requests per pstate change
17:37 karolherbst: and sometimes the very last comes after the current_load requests
17:37 karolherbst: and I have no clue why that happens
17:39 karolherbst: ohhhh I guess the second requests per pstate change is done asyncly after the debugfs call returneD?
17:39 karolherbst: but why...
17:57 karolherbst: skeggsb: uhh yeah k, so I think I fixed also this one
17:58 karolherbst: skeggsb: https://github.com/karolherbst/nouveau/commit/098b4e6c23d36f0e64bd0abb5dac5d556b913d16
17:59 karolherbst: imagine the kernel schedules stuff a bit messy
17:59 karolherbst: :D
18:03 karolherbst: skeggsb: yep, 100k loops and no lookup :)
18:03 karolherbst: *lockup
18:03 imirkin: [1233713.775940] nouveau 0000:02:00.0: gr: DATA_ERROR 0000000c [INVALID_BITFIELD] ch 7 [007f996000 X[5013]] subc 1 class 90c0 mthd 1694 data 00000011
18:04 imirkin: skeggsb: there's something wrong there... the errors are being reported against the wrong channel
18:06 imirkin: skeggsb: note that it's the compute class, but X doesn't do any compute things.
18:06 skeggsb: imirkin: umm, yes, that's... interesting
18:07 imirkin: and in fact it's very directly due to something i messed up and was running deqp against
18:07 imirkin: (stupid CB_BIND shift changes to 8 for nvc0 compute vs nvc0 3d... gr.)
18:09 karolherbst: skeggsb: any idea why the old locking is wrong? Because I don't see it
18:11 skeggsb: it's actually stupidly wrong for a few reasons, your change makes sense
18:11 karolherbst: ohh k
18:11 karolherbst: anyway
18:11 karolherbst: it works now
18:11 karolherbst: 220k requests and going
18:12 karolherbst: so now I can go back to dynamic reclocking stuff, because the pmu stuff is stable now :D
18:14 imirkin: skeggsb: fwiw that's on kernel 4.3.0
18:22 skeggsb: imirkin: hrm, i don't suppose you got indirect rendering somehow?
18:22 imirkin: skeggsb: i built deqp with the "drm" platform. from i can tell it doesn't even know that X exists
18:22 skeggsb: i can't see how that'd happen tbh.. and, X on channel 7 sounds unlikely too
18:22 skeggsb: hrm
18:23 imirkin: and compute :)
18:24 imirkin: and a bug that was only in the version of mesa that i was testing
18:24 imirkin: it's clearly from deqp
18:24 skeggsb: yep, no argument from me there :) just wondering if somehow X opened the fd mesa is using - but - that seems unlikely given what you've told me
18:24 imirkin: hmmmmm
18:25 imirkin: i did just try to run it with DISPLAY=
18:25 imirkin: and it failed to init
18:25 imirkin: so... something somewhere knows about X
18:58 imirkin: alrighty... 80% pass rate on the deqp ssbo tests (with compute shaders)
19:09 airlied: imirkin: you have compute shaders as well?
19:10 imirkin: airlied: yeah... using hakzsam's work
19:10 imirkin: fixed it up a bunch so that it actually works
19:10 imirkin: arb_compute_shader branch on my tree
19:10 imirkin: fixing up some stupid boolean thing now with ssbo's
19:13 imirkin: airlied: only on nvc0 though, not kepler... neither hakzsam nor i have one handy atm
19:13 imirkin: although the ssbo fixes i'm making are pretty generally applicable ones. it's just that the deqp gles31 tests require compute.
19:17 imirkin: ah nice. looks like now i'm closer to 95% pass rate
19:24 imirkin: [2159/2159] skip: 3, pass: 2062, dmesg-warn: 1, fail: 78, dmesg-fail: 1, crash: 14
19:24 imirkin: that's much better.
19:27 imirkin: and a bunch of the failures are related to images, shared memory (not yet piped through), etc
19:28 imirkin: can't seem to get compare-and-swap working...
19:29 imirkin: will have to see what all blob does
21:15 Tom^: imirkin: i have kepler so just ping me when you want things piglit tested.
21:15 imirkin: Tom^: it's not about testing... it's about developing
21:15 Tom^: oh :P
21:15 imirkin: there's tons of iteration
21:16 imirkin: and actually i have a GK208 which should be a lot more similar to your kepler than the GK10x's
21:16 imirkin: it's just at work, and i'm at home
21:16 imirkin: and i tend to do work at work, not nouveau :)