01:53pmoreau: Stupid me… Of course I might have an incredibly high amount of temporaries: I never initialised `prog->tlsSize` since I thought it would be done automatically… --"
02:55hakzsam: karolherbst, git grep GALLIUM_HUD docs
07:00pmoreau: imirkin: Ping, did you see my comments from yesterday about the mysterious reg with one def but no insn?
09:24imirkin: pmoreau: ok, so it has a def because it's an input, but it's not an actual instruction =/
09:26pmoreau: Do you think this can happen for regular unused inputs of a function, or will one of the pass before RA remove them?
09:31pmoreau: imirkin: Should we add some additionnal pass to remove those unused inputs, or simply tell the RA pass that having defs does not always give you an insn?
09:36karolherbst: mhh can anybody tell me how the process stuff works on those falcons?
09:37karolherbst: is this like all virtual or is there a real context switch and stuff like that?
10:00mwk: karolherbst: there are no special context switching features in falcon hw, but it's rather easy to make an interrupt handler that does a software switch
10:00mwk: so... pretty much the usual
10:04karolherbst: mwk: yeah but I was thinking how those "processes" work on the pmu
10:04karolherbst: is a process just a virtual number and some magic inside call:?
10:09mwk: what kind of process are you talking about?
10:09mwk: the blob ones?
11:46karolherbst: mwk: well I meant the stuff like memx, perf and so on
13:04tacchinotacchi: i'm curious
13:05tacchinotacchi: when you guys try to reverse engineer reclocking
13:05tacchinotacchi: do you just watch the messages sent to the gpu by the driver, or you also disassemble it?
13:06karolherbst: what you mean by "disassemble" it?
13:06imirkin: tacchinotacchi: none of the above
13:06tacchinotacchi: look at the opcodes of the nvidia blob, try to see what functions move the clock
13:06tacchinotacchi: what do you do?
13:06karolherbst: imirkin: well falcon could be disassembled though :/
13:06tacchinotacchi: yes, why don't you disassemble
13:07karolherbst: actually we do disassemble falcon binaries, but that only helps for memory reclocking on gt215+ cards
13:07tacchinotacchi: nvidia surely doesn't share specs, but i hope at least they didn't dive into the assembly and deliberately obfuscate them
13:07karolherbst: and by "binaries" I mean stuff sent to the gpu through mmio
13:07tacchinotacchi: no i mean the module
13:07karolherbst: and no, we don't disassemble the nvidia binaries
13:07karolherbst: because for I guess legal reasons?
13:08tacchinotacchi: how would they know :D
13:08karolherbst: tacchinotacchi: there is this thing called mmiotrace
13:08karolherbst: because their lawyer gets more money
13:08tacchinotacchi: you shouldn't tell that in irc though
13:08karolherbst: and how can we tell them how we found it out?
13:09karolherbst: anyway, disassembling stuff is tough and would actually need more time than looking at mmiotraces
13:09tacchinotacchi: you already do mmiotrace
13:09tacchinotacchi: i knew about it, i didn't think it would be easier than disassembling
13:10tacchinotacchi: i sent one once, maybe i don't remember exactly but it was various megabytes
13:10tacchinotacchi: pretty big
13:10karolherbst: yeah well, you have to mark or extract those important parts
13:10imirkin: tacchinotacchi: a lot of it is looking at traces, a lot of it is changing the vbios to see what the blob does differently as a result
13:10imirkin: tacchinotacchi: and yes, we also look at the "high level" opcodes being sent by the blob to the falcon units
13:11karolherbst: "high level" :D
13:11tacchinotacchi: high level opcodes?
13:11imirkin: well, it's not like falcon isa or anything -- look for the SEQ isa
13:11tacchinotacchi: well i'm lost with this low level stuff anyway
13:12imirkin: so we're able to see what reclocking script is uploaded for any particular situation, all in the mmiotrace
13:13tacchinotacchi: why is it so hard to implement then?
13:13tacchinotacchi: sorry for the dumb question
13:13imirkin: well, you have to know what to put into the reclocking script
13:14imirkin: where to find the various values, how to compute them, etc
13:15imirkin: and given that we're not individually privvy to having ALL the hardware ever produced, we tend to resort to vbios fuzzing to see what the blob will generate differently as a result
13:15karolherbst: tacchinotacchi: it isn't like 5 "commands" to execute but more like 100+
13:15karolherbst: and each of these have to do the "right2 thing
13:16tacchinotacchi: wow i can't even think how such work looks like
13:16tacchinotacchi: i should stop calling myself a programmer
13:18karolherbst: tacchinotacchi: subdev/fb/ram* and s/g ddr files
13:18karolherbst: tacchinotacchi: that's for kepler memory stuff: https://github.com/karolherbst/nouveau/blob/master_4.3/drm/nouveau/nvkm/subdev/fb/ramgk104.c :D
13:19tacchinotacchi: it's not the first time they send me a piece of reclocking code
13:19karolherbst: just that you get a feeling how that works :D
13:19tacchinotacchi: i don't get it
13:19karolherbst: no problem I don't get it too
13:19tacchinotacchi: i actually don't know how a linux driver works
13:20karolherbst: usually you use APIs
13:20tacchinotacchi: if i have to read a default program, i look for main or an entry point
13:20karolherbst: like for any other application or library, just that you program inside kernel space
13:20karolherbst: and usually do some I/O stuff
13:20karolherbst: tacchinotacchi: well SDL based appications doesn't have their own main usually
13:20karolherbst: and "main" is also just an ABI thing you use
13:21tacchinotacchi: yes, but that's an ABI thing almost everybody uses
13:21karolherbst: yeah well, you have to start the applcation somehow though and glibc handles that for GNU based systems
13:21tacchinotacchi: so when sometimes uses an API i don't know that puts his own entry point i'm also lost, like for QT apps if i understand well
13:22karolherbst: it's all about APIs in general
13:22karolherbst: tacchinotacchi: well in enterprise java application you also have no main ;)
13:22karolherbst: so this isn't a kernel thing at all
13:23karolherbst: main afaik is pretty much a C/C++ thing, maybe some older languages also have that
13:23karolherbst: no idea though
13:24RSpliet: public static void Main()
13:24karolherbst: RSpliet: well and if you have like 20 of them?
13:24RSpliet: ^ that's your Java equivalence
13:24karolherbst: RSpliet: yeah, but in java that stuff works not like a main in C/C++
13:25RSpliet: it probably works exactly the same, it's an agreed-on entry point for a linked binary
13:25karolherbst: not really
13:25karolherbst: in java it is the entry point for this _class_
13:25karolherbst: not binary
13:25RSpliet: sure, and C doesn't have classes
13:25karolherbst: ever wondered why you start a java application with a class argument when there is no default one defined in the jar?
13:25tacchinotacchi: well, java program are just a bunch of classes
13:26karolherbst: yeah and each class can have its own "main" function
13:26tacchinotacchi: it just happens there is a class with a main method which is the first called by default
13:26tacchinotacchi: or a default entry class in a jar
13:26RSpliet: conceptually there's no difference
13:26karolherbst: no, there are classes with main functions
13:26karolherbst: not "a class"
13:26tacchinotacchi: android apps have no main method
13:26karolherbst: RSpliet: yeah, from one point of view you are right, but they differ quite much though
13:27tacchinotacchi: ELF executables do have an entry point
13:27tacchinotacchi: it's not the main function, but they have one
13:27karolherbst: from a static libc file usually
13:29karolherbst: RSpliet: but if I think about it, they are closer than I first though actually, because both are being called from the application runtime
13:36tacchinotacchi: i'll set your driver as my ultimate goal for some time
13:55karolherbst: mupuf: so now I also disabled all tmr interupts and guess what, still some get lost from the pmu :/
13:56karolherbst: and nvkm_mc_intr doesn'T get it
14:37karolherbst: the IRQ gets lost inside nouveau somewhere
14:39karolherbst: nvkm_rd32(device, 0x10a008) & disp & ~(disp >> 16) is just 0 for that IRQ :/
15:03karolherbst: skeggsb: found the pmu issue
15:04karolherbst: skeggsb: nvkm_mc_intr_mask returns 0
15:06karolherbst: skeggsb: and then nvkm_rd32(device, 0x10a008); also returns 0
15:06karolherbst: like if the interrupt isn't configured, but there is one we expect
15:11karolherbst: ohhhhhh wait
15:11karolherbst: actually this only happens sometimes
15:11karolherbst: sometimes the mask is 0, sometimes it has the right value
15:12karolherbst: but then the 0x10a008 can be still 0
15:12karolherbst: maybe this is just a stupid timing issue
15:22tacchinotacchi: wonder why nobody's working on fermi
15:23tacchinotacchi: well i'll be off to sleep
15:23tacchinotacchi: enjoy your superior intelligence
16:03skeggsb: karolherbst: i've been looking at it a bit so far this morning, and can't find a good reason for it so far either...
16:03karolherbst: at least we know now that the hardware sends the IRQ
16:03karolherbst: and that nouveau gets it
16:03karolherbst: this is at least _something_
16:04karolherbst: skeggsb: I tried reading the 0x10a008 reg inside a timeout loop
16:04karolherbst: and later I get the value 2 out of it
16:04karolherbst: well sometimes
16:04karolherbst: or I saw it only once
16:06airlied: skeggsb: do I have a -next to find somewhere yet?
16:06skeggsb: airlied: ah, right. i'll do that now before i continue with other things
16:07karolherbst: what about the pcie stuff? :D
16:07karolherbst: didn't got any reply from you :p
16:07skeggsb: i merged it, i think..
16:09skeggsb: apparently i didn't push it though
16:10karolherbst: skeggsb: when you are done with the pusing and -next thing: nv_iowr(NV_PPWR_INTR_TRIGGER, ...) <= is this all what has to be done to fully configure those IRQs or is there something else needed on the pmu?
16:12karolherbst: skeggsb: k, so I got 2 again by the way
16:14karolherbst: okay, so three times in a row I could recover the 0 to a 2
16:14karolherbst: I bet something is messing with it somehow
16:18karolherbst: and then it changes back into a 0 :/
16:19skeggsb: oh, hangon, i have an idea
16:20karolherbst: I love ideas :)
16:32karolherbst: oh yeah nice, working 361 driver :)
16:33skeggsb: there's usually a simple explanation :)
16:34karolherbst: testing this out
16:35karolherbst: I looked at this code though :O
16:36karolherbst: skeggsb: at which rate would you consider this stable?
16:37karolherbst: one error in 1M or in 1G requests?
16:37skeggsb: zero errors?
16:37imirkin_: karolherbst: just do infinity :)
16:37skeggsb: i just quickly tested with "while (true);do cat current_load; done"
16:38karolherbst: I have a variable inside that
16:38karolherbst: so I know how many runs I did
16:38karolherbst: but it seems better now
16:38karolherbst: 60k calls without issues
16:38skeggsb: yes, it lasted far longer than any previous attempt while debugging it this morning
16:38skeggsb: (ie. it didn't fail before i decided to post the patch)
16:38karolherbst: I tested this and had like 20 failures in 1M calls
16:39karolherbst: so this case hit pretty rare already :/
16:39karolherbst: if that works, then we can debug the other error case
16:39karolherbst: maybe it is the same
16:39skeggsb: it depends on a falcon-routed interrupt occuring at the right (wrong?) time
16:39karolherbst: yeah I guess
16:39karolherbst: so 200k calls without issues
16:40karolherbst: this is already much better
16:40karolherbst: do we still want to have such a workaround as mine though? For dynamic reclocking this might come in handy maybe
16:40karolherbst: k, now the more aggressive stress test
16:41karolherbst: and died
16:41karolherbst: after 16 calls :D
16:41karolherbst: skeggsb: do this: i=0; while true; do echo $((i=$i+1)); cat current_load >/dev/null; echo 07 > pstate; echo 0f > pstate ; done
16:41skeggsb: the board i have plugged at the moment is fermi, so, that's going to be fail for sure
16:42karolherbst: ohhh meh
16:42karolherbst: but the situatio is different here
16:42karolherbst: there is no reply queued
16:43karolherbst: skeggsb: 4k loops and 14 replies lost
16:43skeggsb: well, hunt for the reason why :P
16:44karolherbst: yeah iwll do
16:48karolherbst: skeggsb: aha!
16:48karolherbst: you won't believe that thing :D
16:48karolherbst: skeggsb: pmu: data 0:1000000 1:0
16:48karolherbst: this looks very wrong somehow :p
16:49karolherbst: or does it?
16:49karolherbst: ohhh no
16:49karolherbst: something else should be wrong
16:49karolherbst: it is still a weird answer
16:50karolherbst: ohh wait no
16:50karolherbst: this is the current_load stuff
16:50karolherbst: the 1 just means there is some messured pcie load
16:52karolherbst: k, so the IRQ is lost in this case for sure
16:52karolherbst: even nvkm_pci_intr doesn't get it
16:54karolherbst: skeggsb: soooo now I need to now how that process stuff works.
16:55karolherbst: basically current_load calls something inside perf and the pstate changes insode memx
16:55karolherbst: as long as I only do one of these things at the time, everything is fine now
16:55karolherbst: but when I mix it, there is a rather high chance it messes up
17:27karolherbst: skeggsb: I think this one is easy: the order of the requests changes for whatever reasons ./
17:34memleak: Hi all, I just wanted to say thank you for the nouveau driver, it's faster than the closed driver for windows (Quadro NVS 140M)
17:35memleak: your work is highly appreciated, take care!
17:36karolherbst: skeggsb: can you tell me why sometimes the order of the pmu stuff gets odd when doing "i=0; while true; do echo $((i=$i+1)); cat current_load >/dev/null; echo 07 > pstate; echo 0f > pstate ; done" =
17:36karolherbst: skeggsb: https://gist.github.com/karolherbst/a66ea5843010cac12027
17:37karolherbst: there are always 4 requests per pstate change
17:37karolherbst: and sometimes the very last comes after the current_load requests
17:37karolherbst: and I have no clue why that happens
17:39karolherbst: ohhhh I guess the second requests per pstate change is done asyncly after the debugfs call returneD?
17:39karolherbst: but why...
17:57karolherbst: skeggsb: uhh yeah k, so I think I fixed also this one
17:58karolherbst: skeggsb: https://github.com/karolherbst/nouveau/commit/098b4e6c23d36f0e64bd0abb5dac5d556b913d16
17:59karolherbst: imagine the kernel schedules stuff a bit messy
18:03karolherbst: skeggsb: yep, 100k loops and no lookup :)
18:03imirkin: [1233713.775940] nouveau 0000:02:00.0: gr: DATA_ERROR 0000000c [INVALID_BITFIELD] ch 7 [007f996000 X] subc 1 class 90c0 mthd 1694 data 00000011
18:04imirkin: skeggsb: there's something wrong there... the errors are being reported against the wrong channel
18:06imirkin: skeggsb: note that it's the compute class, but X doesn't do any compute things.
18:06skeggsb: imirkin: umm, yes, that's... interesting
18:07imirkin: and in fact it's very directly due to something i messed up and was running deqp against
18:07imirkin: (stupid CB_BIND shift changes to 8 for nvc0 compute vs nvc0 3d... gr.)
18:09karolherbst: skeggsb: any idea why the old locking is wrong? Because I don't see it
18:11skeggsb: it's actually stupidly wrong for a few reasons, your change makes sense
18:11karolherbst: ohh k
18:11karolherbst: it works now
18:11karolherbst: 220k requests and going
18:12karolherbst: so now I can go back to dynamic reclocking stuff, because the pmu stuff is stable now :D
18:14imirkin: skeggsb: fwiw that's on kernel 4.3.0
18:22skeggsb: imirkin: hrm, i don't suppose you got indirect rendering somehow?
18:22imirkin: skeggsb: i built deqp with the "drm" platform. from i can tell it doesn't even know that X exists
18:22skeggsb: i can't see how that'd happen tbh.. and, X on channel 7 sounds unlikely too
18:23imirkin: and compute :)
18:24imirkin: and a bug that was only in the version of mesa that i was testing
18:24imirkin: it's clearly from deqp
18:24skeggsb: yep, no argument from me there :) just wondering if somehow X opened the fd mesa is using - but - that seems unlikely given what you've told me
18:25imirkin: i did just try to run it with DISPLAY=
18:25imirkin: and it failed to init
18:25imirkin: so... something somewhere knows about X
18:58imirkin: alrighty... 80% pass rate on the deqp ssbo tests (with compute shaders)
19:09airlied: imirkin: you have compute shaders as well?
19:10imirkin: airlied: yeah... using hakzsam's work
19:10imirkin: fixed it up a bunch so that it actually works
19:10imirkin: arb_compute_shader branch on my tree
19:10imirkin: fixing up some stupid boolean thing now with ssbo's
19:13imirkin: airlied: only on nvc0 though, not kepler... neither hakzsam nor i have one handy atm
19:13imirkin: although the ssbo fixes i'm making are pretty generally applicable ones. it's just that the deqp gles31 tests require compute.
19:17imirkin: ah nice. looks like now i'm closer to 95% pass rate
19:24imirkin: [2159/2159] skip: 3, pass: 2062, dmesg-warn: 1, fail: 78, dmesg-fail: 1, crash: 14
19:24imirkin: that's much better.
19:27imirkin: and a bunch of the failures are related to images, shared memory (not yet piped through), etc
19:28imirkin: can't seem to get compare-and-swap working...
19:29imirkin: will have to see what all blob does
21:15Tom^: imirkin: i have kepler so just ping me when you want things piglit tested.
21:15imirkin: Tom^: it's not about testing... it's about developing
21:15Tom^: oh :P
21:15imirkin: there's tons of iteration
21:16imirkin: and actually i have a GK208 which should be a lot more similar to your kepler than the GK10x's
21:16imirkin: it's just at work, and i'm at home
21:16imirkin: and i tend to do work at work, not nouveau :)