04:18 karolherbst: mupuf_: sometimes at read out my core counter (only set to 0x1) is sometimes higher then the ticks counter :/
04:18 karolherbst: allthough I read the ticks counter out last
04:18 karolherbst: and reset it first
04:19 karolherbst: any idea?
04:20 karolherbst: it seems I have to workaround this a bit, but I would rather not :D
04:35 karolherbst: best way to do reg = max(reg, 0xff) on the falcon?
04:36 karolherbst: cmp, bra, set? or is there something better
04:37 karolherbst: and with set I mean imm32
04:53 mupuf_: yes, I would suggest you just clamp the value
05:39 karolherbst: mupuf_: yeah, but this is kind of ugly in the code, because I have to do that for 4 counters now :/
05:40 karolherbst: ohh wait
05:40 karolherbst: I could skip the div then
05:40 karolherbst: still 4 cmps, meh
05:54 mupuf_: make a function?
05:55 karolherbst: add a min function to arith.fuc?
06:05 RSpliet: does it not already exist?
06:09 karolherbst: didn't find it
06:10 RSpliet: to my surprise, it doesn't seem to no :-P
06:10 karolherbst: min is really ugly to write in assembly code :/
06:12 karolherbst: isn't there an instruction to just conditionally skip the next one
06:12 RSpliet: I take it falcon is fairly risc
06:14 karolherbst: RSpliet: it seems there isn't even a min/max instruction on x86
06:14 karolherbst: there is some SSE4 stuff though :D
06:17 RSpliet: ARM I think was able to do conditional execution of instructions
06:18 karolherbst: mhh
06:18 karolherbst: it just bothers me, that I need to add an extra label just for that :/
06:33 karolherbst: ohhh wait
06:33 karolherbst: mupuf_: I think the issue is not the counter being greater than the ticks, but nearly equal and I get 0x100 as the result :/
06:34 mupuf_: hehe, clamp it
06:34 mupuf_: make a function for it
06:35 karolherbst: yeah, min is also used for this, so I can use it there
06:39 karolherbst: yay, now it works
06:40 karolherbst: mhhh
06:40 karolherbst: I could also divide the ticks by 0xff instead of doing a shr 8
06:40 karolherbst: :/
06:42 karolherbst: yeah, this may make more sense in the end
06:44 karolherbst: anyway I cleaned up the implementation and use 3 registers less, so this is also nice :D
06:57 karolherbst: here is my fuc code: https://github.com/karolherbst/nouveau/blob/3b0404ea0f647b84ca3ae0306a05fe4ee266265d/drm/nouveau/nvkm/subdev/pmu/fuc/perf.fuc
06:57 karolherbst: do you want to check if I did something terribly wrong regarding how fuc works?
06:58 karolherbst: the ld(b8, $r11, #perf_eng_pcie) shl b32 $r11 8 looks dodgy to me, but it works
06:58 karolherbst: so I don't know
07:04 mupuf_: karolherbst: was $r11 cleared before?
07:04 mupuf_: otherwise, I do not see what is dodgy here
07:05 karolherbst: does it matter if it's cleared? ohhh it does on tesla :O
07:05 mupuf_: as long as you never read more than 16 bits of r11, I do not see the problem here
07:05 karolherbst: I write all bits of it on fermi+ that is
07:29 mupuf_: why would tesla be different?
07:30 mupuf_: I would say clear a reg before using it if you do not want to remember which parts of the reg got init where
07:30 mupuf_: and, always clear a reg you are going to use at the begining of the function
07:55 karolherbst: mupuf_: on tesla I only shift 3 8 bit values into $r11, because it isn't possible to read the pcie load
07:55 karolherbst: so only core, memory, video
07:57 karolherbst: mupuf_: also I am not quite sure when to use these pushed regs and when these "high" ones. I somehow got that the lower ones are pushed (cleared?) before and then reset from stack at the end. The high ones are usually used for functions parameters
07:57 karolherbst: but do I have to reset those high regs to their old values at the end of the function, too?
07:58 mupuf_: ok, so this is all down to conventions
07:58 karolherbst: like if I am using $r13 somewhere, does the calee expect $r13 to be it's old value after call, when I do not document it?
07:58 karolherbst: yeah I figured
07:58 mupuf_: r0 shoud always be 0
07:58 karolherbst: in perf_counter_readout I use $r14
07:58 mupuf_: r1-r9 are callee-saved, which means that if a function replaces them, it should restore them afterwards
07:58 mupuf_: r10-r15 are caller-savec
07:59 mupuf_: which means that if you want to keep useful data in it, you need to save them before calling a function
07:59 karolherbst: okay
07:59 karolherbst: so after a function call r10-r15 can be whatever
07:59 mupuf_: yes
07:59 karolherbst: okay
07:59 mupuf_: that's how I remember it
07:59 karolherbst: yeah, it figures a bit that way
07:59 mupuf_: I wrote a commit to really enforce this
08:00 karolherbst: so instead of doing push $r1...
08:00 karolherbst: I could also use $r12 for example
08:00 karolherbst: as long as I don't call functions
08:00 mupuf_: http://cgit.freedesktop.org/~darktama/nouveau/commit/?id=e38efaee0b89c50979c52b52dbe7df5dc943bf1a
08:00 mupuf_: yes
08:00 karolherbst: okay
08:01 karolherbst: which I guess is preferred or doesn't it matter
08:01 mupuf_: well, less instructions == good
08:01 karolherbst: yeah I figured
08:02 karolherbst: okay, then I can replace $r1 and $r2 in perf_counter_readout with $r13 and $r12
08:02 karolherbst: because I don't call anything
08:02 karolherbst: or is nv_iord considered a call?
08:02 karolherbst: or other macros
08:02 mupuf_: macros are not calls
08:03 mupuf_: and they usually use r0 to do everything
08:03 karolherbst: okay
08:03 mupuf_: and then clear r0 again
08:03 mupuf_: but check the code
08:04 mupuf_: example: http://cgit.freedesktop.org/~darktama/nouveau/commit/?id=eabe8bca383c91286db70e5aad2d33ac44a9595a
08:04 karolherbst: what is this nv_iord thingy all about?
08:05 karolherbst: ohh
08:05 karolherbst: I get it
08:05 karolherbst: :D
08:06 karolherbst: ohh the falcons prior to gk208 can only mov 16bit?
08:06 karolherbst: good to know
08:11 mupuf_: hence the function :p
08:12 mupuf_: I really like the fuc ISA
08:12 mupuf_: it is pretty simple and useful
08:12 mupuf_: oh, and in your code, you could have used the instructions to add the value straight at the right place in the register
08:12 mupuf_: without needing to shift
08:12 mupuf_: that would save one instruction
08:12 mupuf_: but meh
08:27 mwk: heh
08:27 mwk: I have no idea why they went with variable-length opcodes
08:27 mwk: otherwise, it's quite nice
08:28 mwk: maybe xtensa overdose
08:34 mupuf_: :p
08:34 mupuf_: mwk: I am fixing the platform code before pushing it
08:34 mupuf_: not that I had to do much
08:34 mupuf_: I added a comment to tell that there is no risk of exposing gpus older than the tk1
08:35 mupuf_: I checked the device trees to be sure
08:35 mupuf_: I still need to figure out how to map the damn bar1
09:10 mupuf_: mwk: hmm, I really don't know how to get the address for bar1...
09:10 mupuf_: I found the info in the device tree, but the exposed version in sysfs does not seem to contain it
09:10 mupuf_: that can't be right
09:26 karolherbst: mupuf_: actually that would save like 3 shifts
09:26 mupuf_: yes, I meant one instruction per value
09:26 karolherbst: okay
09:26 karolherbst: it makes sense nethertheless
09:26 karolherbst: which instruction is it?
09:26 mupuf_: but you are right, thefirst one does not need the shift
09:26 mupuf_: look for extr
09:26 mupuf_: you'll find the opposite one :D
09:27 karolherbst: ins?
09:35 mupuf_: sounds good
09:35 karolherbst: mhh
10:07 m3n3chm0:nasZ
10:10 karlmag: oh.. tegra
10:12 karlmag: I did see some patches for GM206 too recently, didn't I?
10:23 karolherbst: mupuf_: can I access stuff from the data segment directly with instructions?
10:23 karolherbst: I thought I always have to read them into registers first
10:24 mupuf_: nope, you will need to load it first
10:24 karolherbst: mhh
10:24 karolherbst: then I would need one instruction more using ins
10:24 karolherbst: because I already load into the return register directly
10:30 karolherbst: mupuf_: want to look over the patches before I send them to the ML=
10:40 mupuf_: oh right!
10:40 mupuf_: and yes, I guess I can have a look
10:49 karolherbst: mupuf_: thanks https://github.com/karolherbst/nouveau/commits/pdaemon_counters
10:49 karolherbst: just the four newest one
11:33 imirkin: who's wwa? that username sounds really familiar...
11:38 mwk: imirkin: he work[ed?] for pathscale
11:40 imirkin: ahh ok
12:02 mupuf_: imirkin: I gave him the push rights when moving envytools out of pathscale's repo
12:02 mupuf_: of course, the CTO was pissed that he lost the exclusivity in choosing who gets in or not, but hey, it is open source stuff
12:03 mupuf_: and we were no ok with the model
12:07 karolherbst: sounds like fun
12:08 mupuf_: karolherbst: yeah, I guess...
12:09 karolherbst: though I am also totally unaware of pathscale, I just noticed they have like modified version of nouveau of something, but I don't really know what they are up to or anything
12:16 mupuf_: well
12:17 mupuf_: the idea is that they wanted to propose a new GPGPU approach
12:17 mupuf_: a new programming model
12:17 mupuf_: but at first, it required a modified version of nouveau
12:17 mupuf_: then, nouveau could be used
12:18 mupuf_: and then mwk reverse engineered the IOCTLs of the blob and that allowed them to do all they wanted
12:18 mupuf_: and we do not hear too much from them anymore
12:18 mupuf_: as in, 0 communication actually
12:18 mupuf_: but they employed calim and mwk for some time, and gave them hw
12:18 mupuf_: they also got me started
12:19 mupuf_: but they were promising stuff they could achieve though
12:19 karolherbst: mhh
12:20 karolherbst: these envytools commit kind of sounded like they actually know a thing about how to schedule those instructions
12:24 tacchinotacchi: meawhile
12:24 tacchinotacchi: anybody's working on fermi's reclocking?
12:24 tacchinotacchi: it just seems to have been forgotten by everyone
12:25 karolherbst: I already added it on my todo list, but there is stuff more important for me I want to do first
12:26 karolherbst: tacchinotacchi: you could help though if you would like :)
12:27 tacchinotacchi: i would really,really like to help
12:27 tacchinotacchi: but i'm too noob for reclocking
12:27 tacchinotacchi: actually i have trouble understanding any of the code of nouveau
12:27 karolherbst: mhh :/
12:28 tacchinotacchi: i have some experience reading and playing around with other people's projects, but nouveau among all is the one i understand less
12:28 karolherbst: I always thought it is one of the more clean projects overall :D
12:28 tacchinotacchi: well, among the ones i tried
12:29 tacchinotacchi: for example i tried to push a new feature to obs-studio
12:29 tacchinotacchi: which of course is a lot easier than nouveau, just for the fact that obs studio is a simple tool and nouveau a reverse engineered kernel module
12:29 karolherbst: well, I would say the kernel module is simplier in design
12:30 karolherbst: nobody cares about bad software quality in userspace, kernelspace on the other hand ....
12:30 karolherbst: you don't want to include bad quality there :p
12:30 mwk: wish that was true...
12:31 karolherbst: well, I saw so much crappy software
12:31 karolherbst: :D
12:31 mwk: heh
12:31 mwk: I suppose there are lot of projects that fare worse than the kernel
12:31 tacchinotacchi: the last time i tried to read nouveau, i took 8 hours to find one of the functions that was doing the actual "sending data to registers"
12:32 karolherbst: mwk: know the omg/sec metric? :D
12:32 tacchinotacchi: to find out i had no idea when that function got called and what it was for
12:32 imirkin: tacchinotacchi: roy has a couple patches... nothing close to working though: https://github.com/RSpliet/kernel-nouveau-nv50-pm/
12:32 tacchinotacchi: i already talked with rspliet more than once
12:32 imirkin: tacchinotacchi: nouveau has recently been rewritten... there's a bit less indirection now and a *lot* more types
12:33 tacchinotacchi: a lot more types? weren't they enough? :D
12:33 karolherbst: there are _never_ enough types :D
12:33 tacchinotacchi: i'll give it a shot but i'll never do anything useful
12:33 tacchinotacchi: francly the little knowledge i have is more c++ stylish
12:33 karolherbst: memory reclocking code is though though
12:34 karolherbst: *tough
12:34 tacchinotacchi: i don't even know what does it mean when a function starts with __
12:34 karolherbst: internal use
12:34 karolherbst: do _not_ use
12:34 mwk: I've had the pleasure of working on gdb recently, and I'm currently at the 3rd recursion level of debugging the debugger after hitting a showstopper
12:34 karolherbst: :D
12:34 karolherbst: once I was thinking about debugging gdb
12:34 mwk: if you ever want to try the reverse debugging feature... think of the good old printf as debugging tool
12:34 karolherbst: but then my mind stoped me
12:35 karolherbst: yeah
12:35 karolherbst: printf helps fixing more bugs than gdb
12:35 karolherbst: :)
12:35 tacchinotacchi: i agree
12:35 imirkin: there's a time and place for each
12:35 karolherbst: I heard about a kernel debugger somewhere :D
12:35 imirkin: i hate it when people refuse to do one or the other
12:35 karolherbst: never used it though
12:36 karolherbst: mh I really rarly use gdb directly, I like some graphical debugger though in C++ projects
12:36 tacchinotacchi: i feel like even before studying the code i should study the structure of the gpus
12:36 karolherbst: mhhh
12:36 tacchinotacchi: i mean the nvidia ones specificallyt
12:36 karolherbst: I don't think this is really neccesary
12:37 tacchinotacchi: like rspliet's got a commit on his git
12:37 tacchinotacchi: Enable reclocking for GDDR3 G94-G200
12:37 tacchinotacchi: what the sht does it mean
12:37 imirkin: ?
12:37 karolherbst: tesla
12:37 imirkin: seems pretty obvious...
12:37 imirkin: what part of that commit message do you not understand?
12:37 tacchinotacchi: how does he distinguish individual gddr3 models
12:37 karolherbst: tacchinotacchi: http://nouveau.freedesktop.org/wiki/CodeNames/
12:37 tacchinotacchi: there it is karolherbst
12:38 karolherbst: well, the vbios tells you which memory type the card has, maybe there are other ways as well, don't know
12:39 karolherbst: the commit message is a bit wrong by the way
12:39 karolherbst: should be G94-GT200
12:39 karolherbst: :D
12:39 tacchinotacchi: oh that's why i didn't find g200
12:39 tacchinotacchi: xD
12:39 tacchinotacchi: i understand now
12:40 imirkin: it's unclear whether it's G200 or GT200
12:40 imirkin: the chips are marked G200
12:40 tacchinotacchi: i'd think that's stuff that usually happens in companies
12:40 tacchinotacchi: they now it's the same, nobody cares and keep it like that
12:41 karolherbst: or nobody has time to actually fix it :D
12:41 tacchinotacchi: anyway, i'd have no idea how to check in the vbios
12:41 karolherbst: tacchinotacchi: install envytools
12:41 karolherbst: there is a tool called nvbios
12:41 karlmag: maybe it should be called NVA0(G200/GT200) on the page then?
12:41 tacchinotacchi: so that's why i say i should study the (little) documentations before the code
12:41 karolherbst: tacchinotacchi: but dmesg should tell you which memory type you have
12:42 tacchinotacchi: i can read most C code, but it's useless if i don't understand what it's for
12:42 tacchinotacchi: karolherbst: i have systemd, should i be able to find that in journald?
12:42 karolherbst: don't know, I have systemd and dmesg works
12:42 tacchinotacchi: because i think i have no messages from nouveau
12:42 tacchinotacchi: i have some from i915
12:42 tacchinotacchi: no nouveau
12:42 karolherbst: ohhh, laptop fun
12:42 karolherbst: sounds like, fun
12:43 tacchinotacchi: yea you see
12:43 karolherbst: is nouveau loaded?
12:43 tacchinotacchi: haha
12:43 tacchinotacchi: not now, i have bumblebee now
12:43 tacchinotacchi: going to eat, hit you when i come back
13:02 karolherbst: mupuf_: the 0x100 bit is really strange
13:03 karolherbst: but also helpfull on kepler cards with memory load
13:03 karolherbst: or core waiting on memory stuff
13:03 karolherbst: the thing is, on some cards it produces some high values at idle
13:03 karolherbst: around 65/255 of the ticks
13:03 karolherbst: but on load it goes higher (depending on what exactly you do)
13:04 karolherbst: and when you upclock memory (leaving core as it is) it decreases
13:06 karolherbst: same goes for the PMFB bit
13:07 mupuf_: :)
13:08 karolherbst: it will be fun figuring a nice bit combination out
13:08 karolherbst: :D
13:08 karolherbst: the thing currently in my branch is good eough for core, pcie and video
13:09 karolherbst: but useless for memory
13:09 karolherbst: but it at least works somehow for all cards
13:09 karolherbst: thing is, I don't get anything usefull with the config the blob uses
13:10 karolherbst: mupuf_: we need to create benchmarks which are targeting one engine directly, like one which does a lot of memory stuff
13:10 karolherbst: while running this, the memory counter should be at full load
13:11 mupuf_: not sure I get your point
13:11 karolherbst: we need to verify that our counter configuration does, what we think it does
13:12 karolherbst: otherwise benchmarking dyn reclocking is useless when the counters produce false reports
13:12 tacchinotacchi: ok i'm here
13:13 karolherbst: mupuf_: 7,8,9 and 12 are kind of usefull stuff here, but I don't know for what exactly
13:14 mupuf_: oh, ok, but this is for REing
13:14 mupuf_: I guess it should be trivial to write a shader that does a shitton of memory reads
13:14 karolherbst: yes
13:14 karolherbst: we need also one with a stalling core waiting on memory operations
13:14 karolherbst: because this is more troubling
13:15 karolherbst: there are sitations where the core load counter decreases after upclocking the memory only
13:16 mupuf_: Yeah, more fun! :D
13:19 tacchinotacchi: i have no idea what you're talking about :D i'll just let you do your work and not disturb
13:19 tacchinotacchi: here i am like the silly kid who wants to play in the team thinkin he's going to help but in reality he is useless
13:21 karolherbst: there are easy task everybody can do though
13:22 karolherbst: mupuf_: how did you read the load in your pdaemon tool? read the regs or something else?
13:22 imirkin: tacchinotacchi: best not to start with huge unsolved problems
13:28 mupuf_: read the regs
13:29 karolherbst: mupuf_: ever got like a high memory load? Or don't you remember
13:29 karolherbst: and with high I mean above 80%
13:43 tacchinotacchi: can you give me a good entry point?
13:43 tacchinotacchi: after this i'll try not asking anything
13:44 imirkin: tacchinotacchi: what are you interested in?
13:45 imirkin: whole bunch of tasks listed at https://trello.com/b/ZudRDiTL/nouveau but they won't make much sense without further explanation
13:46 mupuf_: no, I struggled to get it higher than 36% IIRC
13:47 tacchinotacchi: imirkin: i would like to say i'm interested in reclocking, but i really need a general insight before. i can't start from ther
13:48 imirkin: why are you interested in reclocking?
13:48 tacchinotacchi: because i want to use nouveau myself
13:49 imirkin: does it already work fine for you except for the clock speeds?
13:50 tacchinotacchi: what other problem could i have? except for a single game not finding the opengl version, rendering works fine
13:50 tacchinotacchi: i've been forced to use the proprietary all along, so even if i get an amd before finishing any code i want to help other people
13:53 tacchinotacchi: https://trello.com/c/WKKbOXeI/45-auto-downclock-when-overheating this looks about right
13:54 mupuf_: tacchinotacchi: sure, that should be kind of trivial to implement
13:55 mupuf_: you may have to rework the code handling the temperature though
13:55 mupuf_: but that really is a simple project to understand, so it is a good starting point
13:55 mupuf_: ping me if you do not understand the documentation in envytools
13:55 mupuf_: rnndb
13:56 karolherbst: mupuf_: I already though so
13:56 karolherbst: mhh, well we could actually do better than the blob here :D
13:56 mupuf_: what are you talking about?
13:57 karolherbst: better memory load counter
13:57 mupuf_: ah
13:57 mupuf_: well, before doing better, we need a way to quantify that :D
13:57 karolherbst: I think I found some good configuration on my card, but it didn't work on other kepler
13:57 karolherbst: yeah
13:58 karolherbst: tacchinotacchi: I will probably do the downclok on overheat thingy :/
13:59 karolherbst: mupuf_: that should be also done on the pmu, right?
13:59 karolherbst: at least for gt215+ that is
13:59 tacchinotacchi: karolherbst: i can checkout to this commit and do it myself in a worst way
13:59 mupuf_: karolherbst: no you won't, there are two separate things
13:59 karolherbst: imirkin: I don't think we should do that in the kernel at all
13:59 mupuf_: and the one I was refering to in this trello entry is the hw-based downclocking
13:59 karolherbst: mupuf_: ohh right, there are alarm events
13:59 mupuf_: yes, and the FSRM
13:59 tacchinotacchi: even if you do that before (and you will) it'll still be an exercise for me
13:59 karolherbst: I see
14:00 mupuf_: tacchinotacchi: don't worry, no-one is going to still your work, it is not something we need right now, we need good reclocking and good performance to actually overheat a GPU :D
14:00 karolherbst: :D
14:00 karolherbst: right
14:00 tacchinotacchi: what has envytools got to do with downclocking?
14:01 mupuf_: I suggest you read all the doc we have on the FSRM, found in the rnndb/pm/therm.xml file
14:01 mupuf_: oh, and you are in luck, I have written a paper on this
14:01 mupuf_: more doc
14:01 tacchinotacchi: that trello entry refers to activating an already done reclocking function when the card overheats, i don't have to communicate with the card
14:01 tacchinotacchi: i'm too tired now, imma watch tbbt
14:02 tacchinotacchi: get back to you tomorrow if i have time
14:02 karolherbst: mupuf_: how does this hw based downclock on overheat works in the end then? Is there some saved emergency clock level the card is puts itself into when a special temp threshold was hit?
14:03 mupuf_: tacchinotacchi: http://phd.mupuf.org/files/xdc2013-nvidia_pm.pdf <-- section IV-C and D
14:03 mupuf_: karolherbst: Ah ah, you really never read what I sent you :D
14:03 mupuf_: they do not reclock any PLL
14:03 mupuf_: they basically have a divider by a power of two on the clock
14:03 mupuf_: that you can set
14:04 karolherbst: ahhhh
14:04 mupuf_: and then, you can select how long you want to spend on the divided clock vs the normal clock
14:04 karolherbst: so some kind of emergency switch :D
14:04 mupuf_: yes
14:04 karolherbst: then it start to make sense
14:04 mupuf_: read the damn section C and D and be done with it, it is just one page
14:04 karolherbst: but what about the dynamic reclocking code in the meantime?
14:06 mupuf_: I am reviewing your patches now
14:07 tacchinotacchi: meanwhile
14:07 tacchinotacchi: i remember once nouveau would expose a thermal sensor
14:07 tacchinotacchi: i just loaded nouveau and can't see it anymore
14:07 karolherbst: tacchinotacchi: it still does
14:07 karolherbst: at least for me
14:07 karolherbst: tacchinotacchi: what does "sensors" give you?
14:08 tacchinotacchi: http://pastebin.com/iLNm4Bwb
14:08 tacchinotacchi: it doesn't change if i load nouveau
14:08 karolherbst: weird
14:08 mupuf_: tacchinotacchi: well well well
14:09 mupuf_: so, what chipset do you have?
14:09 tacchinotacchi: i see
14:09 tacchinotacchi: the card is shut down
14:09 mupuf_: ah, that may explain, indeed
14:09 tacchinotacchi: optimus
14:09 mupuf_: runpm=0 will help
14:09 mupuf_: but it is bad for the battery life
14:09 tacchinotacchi: imma read the nouveau page and turn it on
14:10 karolherbst: tacchinotacchi: you may want to power on the card before loading nouveau
14:10 karolherbst: otherwise nouveau might not find your gpu
14:11 karolherbst: I have the same problem, so I alway power the gpu on before loading nouveau
14:11 tacchinotacchi: i thought nouveau had control of power managment
14:11 tacchinotacchi: ok i'll do
14:11 karolherbst: well it works for me the first time
14:11 tacchinotacchi: i'll load bbswitch, turn it on, unload bbswitch and load nouveau
14:12 karolherbst: but after that, the pci config is a bit screwed
14:12 mupuf_: tacchinotacchi: the paper is not really up to date, but I checked the sections C and D, they are OK
14:12 imirkin: bumblebee messes everything up... you have to be very careful when combining it with nouveau.
14:12 mupuf_: E is almost fully understood now
14:12 mupuf_: and almost implemented
14:12 karolherbst: imirkin: though this is a bbswitch issue, and the pci device class get screwed up for whatever reason :/
14:13 tacchinotacchi: shit
14:13 tacchinotacchi: nouveau crashed Xorg
14:13 karolherbst: well well well
14:14 tacchinotacchi: but now it's loaded
14:14 karolherbst: I know that
14:14 tacchinotacchi: anyway thank you for the paper
14:14 karolherbst: tacchinotacchi: this bug: https://bugs.freedesktop.org/show_bug.cgi?id=91388
14:14 tacchinotacchi: you could have told me before :D
14:15 karolherbst: I forget about it
14:15 karolherbst: I have a little patch to prevent it though
14:15 karolherbst: if you want to compile your X server yourself, I will gladly give it to you :D
14:15 karolherbst: but now the temperatur should show up
14:16 tacchinotacchi: yay
14:17 tacchinotacchi: the card is still shut down, but nouveau's got the control over it's power state
14:17 tacchinotacchi: so sensors shows the nvidia card, but the temp is N/A
14:17 mupuf_: ack
14:17 mupuf_: karolherbst: why did you write the first patch?
14:18 mupuf_: https://github.com/karolherbst/nouveau/commit/01704fe6cc760d39e64825d087dc325a7458d387 --> I do not see any difference with gf119;s
14:18 karolherbst: mupuf_: because kepler has PCOPY2
14:18 karolherbst: ohh wait
14:19 karolherbst: mupuf_: the #define GK104 0xe4 thing should have been enough, shouldn't it?
14:19 mupuf_: no
14:19 tacchinotacchi: ok started glxgears, got card temp, all OK
14:20 tacchinotacchi: except it's running at 200 mhz
14:20 karolherbst: ahh right, no, it is fine then
14:20 karolherbst: I was worried for a second I did something useless :D
14:20 karolherbst: no, I need to check against GK104 at some points
14:20 mupuf_: karolherbst: please explain why you make the commit though, because it was a big WTF otherwise
14:20 karolherbst: k
14:25 tacchinotacchi: btw let's say there's a chip for which reclocking works
14:25 tacchinotacchi: is it still manual? or it depends on load?
14:26 karolherbst: tacchinotacchi: manual, but we are working on it currently
14:26 karolherbst: will take time though
14:26 karolherbst: mupuf_: If I make it b8 I get allignement warnings/errors, what should I do about those?
14:27 tacchinotacchi: ok thank you all
14:27 mupuf_: karolherbst: are you doing a st b32 ?
14:28 karolherbst: no, in the data segment itself
14:28 karolherbst: in the end I had b8 on kepler for all
14:28 karolherbst: and b16 for perf_eng_mc on earlier
14:28 karolherbst: I meant fermi
14:29 mupuf_: maybe it is because you need to add an align instruction at the end of the data segment
14:29 mupuf_: that is weird though
14:29 karolherbst: on tesla I have only three fields
14:29 karolherbst: 3x 8bit
14:29 karolherbst: and it only complained there
14:30 karolherbst: "stdin:198.1-198.16: Warning: Unaligned data .b32; section 'gt215_pmu_data', offset 0xd13"
14:31 mupuf_: align .b32 or something like that after your 8 bits values
14:31 mupuf_: it will automatically insert the necessary padding
14:34 karolherbst: k
14:35 mupuf_: would be nice if there were a method that would return the format of your output
14:35 mupuf_: but it's ok I guess
14:36 mupuf_: well, the entire thing looks pretty good, good job!
14:39 karolherbst: thanks
14:40 mupuf_: now, I think I will just go to bed and try to adjust my sleep schedule a tiny bit :D
14:40 karolherbst: :D
14:47 hakzsam: imirkin, Hi, I'm back for ... two minutes. :) I don't have time to have a look at your "upload-time fixups" patch right now (even if I'm really interested in), but I'll have a look tomorrow or so.
14:52 imirkin: hakzsam: don't worry about it, i don't really expect meaningful reviews for the nouveau patches i send
14:53 imirkin: hakzsam: if you like, i can explain what's going on
14:53 imirkin: hakzsam: also, i probably have a new version of that patch i haven't posted... check my tree, master branch
14:53 imirkin: hakzsam: allllso, i'm trying to get it all (mostly) working on nv50, but running into some trouble. that code is nasty.
14:56 karolherbst: mupuf_: .align 2 helps :/
14:56 mupuf_: karolherbst: do .align 4
14:56 karolherbst: okay, this is also good
14:56 karolherbst: same code
14:58 karolherbst: mupuf_: should I remove these polling interval stuff?
14:59 karolherbst: I don't know if there is a _good_ reason to ever change it
15:00 hakzsam: imirkin, okay, we'll discuss about it tomorrow then. I really need to go to my bed because I have to wake up in less than 6 hours. see you!
15:06 imirkin: karolherbst: hey, mind tracing a few piglits for me on blob + kepler?
15:07 imirkin: karolherbst: the ones that airlied added in this branch: http://cgit.freedesktop.org/~airlied/piglit/log/?h=arb_bindless_texture
15:07 karolherbst: what shall I do?
15:10 karolherbst: imirkin: give me the command I shall run and I will
15:10 imirkin: fetch airlied's branch
15:10 imirkin: and build the new tests
15:10 imirkin: then do ...
15:10 karolherbst: already built it
15:11 imirkin: bin/arb_bindless_texture-bindless-texture -fbo -auto
15:11 imirkin: (a) does that pass?
15:12 imirkin: oh heh, looks like it doesn't actually test anything. great. anyways, mmt log of that would be great
15:12 karolherbst: http://filebin.ca/2KJ2kPJ2SKZY/arb_bindless_texture.mmt.xz
15:12 airlied: yeah I just noticed it doesn't pass
15:12 karolherbst: :O
15:12 airlied: it show draw a nice checkboaard
15:12 karolherbst: it passes here
15:12 airlied: or it doesn't test rather
15:13 karolherbst: with the blob
15:13 airlied: does it draw?
15:13 karolherbst: yes
15:13 airlied: drop the -fbo -auto that is
15:13 karolherbst: 4 squares
15:13 karolherbst: collors
15:13 airlied: cool
15:13 airlied: that should let imirkin work things out the
15:13 airlied: then
15:14 karolherbst: blue white
15:14 airlied: karolherbst: does the layout one pass?
15:14 karolherbst: red green
15:14 karolherbst: what shall I run?
15:14 karolherbst: and which branch :D
15:14 airlied: bin/arb_bindless_texture-layout -fbo -auto
15:14 airlied: same one
15:15 karolherbst: nope
15:15 karolherbst: fail
15:15 karolherbst: well
15:15 karolherbst: "fail"
15:15 karolherbst: Unexpected GL error: GL_NO_ERROR 0x0
15:15 karolherbst: Expected GL error: GL_INVALID_OPERATION 0x502
15:15 airlied: what text number?
15:15 imirkin: 00000028: fc001f06 8013c008 tex p lauto live dfp $r0:$r1:$r2:$r3 t2d $t8 $s0 $r0:$r1 ()
15:15 airlied: test number
15:16 imirkin: well, that's the non-bindless version
15:16 karolherbst: airlied: (Error at /home/karol/Dokumente/repos/piglit/tests/spec/arb_bindless_texture/bindless_texture_layout.c:137) this?
15:16 karolherbst: ohh don't know
15:16 airlied: karolherbst: it should print failed test x
15:16 karolherbst: ohhh
15:16 karolherbst: 0
15:16 karolherbst: failed test 0
15:16 airlied: sounds like nvidia got the default wrong possibly
15:17 imirkin: well, it could also be smart and notice that it can just do it this way
15:17 airlied: imirkin: it shouldn't be able to be that smart :)
15:17 airlied: or rather being that smart will use the CPU you are trying to save by having bindless :)
15:18 imirkin: what it's saying is that it's using a fixed "8" index into the (texture, sampler) binding list
15:18 airlied: I should probably dump the handle in the test
15:18 airlied: karolherbst: can you printf texhandle in tests/spev/arb_bindless_texture/bindless_texture.c
15:19 airlied: and rebuild/rerun the squares test
15:19 karolherbst: when?
15:19 airlied: after it gets assigned
15:20 airlied: imirkin: that's fixed at compile time?
15:20 airlied: maybe the binding list has different semantics or something
15:21 karolherbst: airlied: for whatever reasons it doesn't print :/
15:21 karolherbst: printf("%li\n", texhandle); after texhandle = GetTextureHandle(texid);
15:23 imirkin: forgot to rebuild?
15:23 karolherbst: I rebuilt
15:24 karolherbst: "[ 34%] Linking C executable ../../../../../bin/arb_bindless_texture-bindless-texture"
15:24 imirkin: forgot to save? :)
15:24 karolherbst: ohh wait
15:24 karolherbst: this is another test
15:25 karolherbst: mhhh
15:25 karolherbst: airlied: maybe you got me the wrong file?
15:26 karolherbst: airlied: maybe you meant tests/spec/arb_bindless_texture/bindless_texture_layout.c ?
15:26 karolherbst: 4294969856
15:26 karolherbst: got this
15:27 imirkin: interesting. 0x100000a00.
15:27 imirkin: i see that address uploaded
15:27 imirkin: PB: 0x00000020 GK104_3D.CB_POS = 0x20
15:27 imirkin: PB: 0x00000a00 GK104_3D.CB_DATA[0] = 0xa00
15:27 imirkin: PB: 0x00000001 GK104_3D.CB_DATA[0] = 0x1
15:28 imirkin: and 0x20 corresponds to $t8
15:28 imirkin: neat.
15:30 imirkin: bbbbuuuttt... i don't see anything get stored there.
15:34 imirkin: gah. this indirection is killing me.
15:35 airlied: imirkin: so maybe it does some math on the hnadle
15:36 imirkin: that's not an address
15:36 imirkin: it just looks like one
15:36 imirkin: in actuality it's the (sampler, texture) values
15:37 imirkin: so it does the remapping from handle -> texture as well
15:38 airlied: so it must just give out handles that are texture, sampler combined, and when you make things resident it puts the values in the correct place
15:39 imirkin: yep
15:39 airlied: sounds like the gallium interface should be fine for that as well
15:40 imirkin: wellllll
15:40 imirkin: yeah probably
15:40 imirkin: as long as i can rely on some sort of consistent mapping between the argument to the TEX instruction
15:40 imirkin: and ... the resident thing
15:40 imirkin: or something
15:41 airlied: well the constant value is the resident thing handle
15:41 airlied: the args to the TEX are just going to be CONST[x]
15:41 imirkin: but what is x?
15:42 imirkin: i.e. can i use it for something useful?
15:42 airlied: offset into the standard constant buffer
15:42 imirkin: bleh that won't work
15:42 airlied: which you read from to get the handle
15:42 imirkin: too much indirectly
15:42 imirkin: indirection*
15:42 airlied: yes welcome to bbindless
15:43 airlied: it's all about indirection
15:43 imirkin: well, nvidia blob seems to be able to get rid of a lot of it
15:43 airlied: well they pass the handle in a const buffer don't they?
15:43 imirkin: that's how texturing is done in the first place
15:43 imirkin: and it's a special constbuf
15:43 imirkin: one dedicated for texture handles
15:44 airlied: oh so we would need a new BINDLESSCONST buffer
15:44 airlied: at gallium instead of using CONST[x]
15:44 imirkin: right.
15:44 imirkin: i mean, it's a regular constbuf
15:44 airlied: and GLSL level also
15:44 imirkin: but
15:44 imirkin: it has to be predeclared
15:44 imirkin: sticking it into a UBO would work just fine
15:45 airlied: but you need the index into the constbuf to be meaningful
15:45 imirkin: PB: 0x80020982 GK104_3D.TEX_CB_INDEX = 2
15:45 imirkin: and then you do something like this:
15:45 imirkin: PB: 0x00000021 GK104_3D.CB_BIND[0] = { VALID | INDEX = 2 }
15:45 imirkin: (which binds a particular address to cb2 for a particular shader stage)
15:45 airlied: I should write a test mixing a bunch of bindless types
15:46 imirkin: and then you just upload handle pairs into that cb
15:46 imirkin: and the tex instruction reads them
15:46 airlied: I suppose on radeonsi we don't have limits on const buffers anymore
15:46 imirkin: so it can be a perfectly generic constbuf, but it caon't be the global constbuf
15:46 airlied: so it would work there as well
15:46 imirkin: i.e. if you have st/mesa declare a UBO and stuff the handle values into it, that'd be perfectly fine for nvc0
15:46 airlied: muprobably should have this converstaion in front of marek :)
15:47 imirkin: he tends to ask questions i don't have answers to :)
15:47 airlied: I'll finish the API bits off anyways, and we can look at that then
15:48 airlied:hotel food time &
15:49 karlmag:perks up, then realizes that he's not at a hotel..
15:49 imirkin: i don't think airlied delivers
15:49 karlmag: bummer
16:00 pmoreau: Ha! stack_addr + user_param[0] == local_addr
16:01 pmoreau: Finally managed to get some use of the pointer stored in use_param[0]
16:02 pmoreau: But… Is the stack in shared memory? Cause IIRC, local isn't.
16:02 imirkin: pmoreau: stack, or s[]?
16:02 imirkin: s[] = shared
16:03 imirkin: afaik there's no way to access the stack memory [easily]
16:04 pmoreau: I am getting a pointer from shared, but I don't find any mention of shared in the mmt trace, apart from its size
16:04 imirkin: i forget exactly how it works
16:04 imirkin: iirc you specify a window of gmem that acts as shared memory
16:04 imirkin: [and has to be accessed via s[] ]
16:06 pmoreau: Oh wait, I'm mixing things: the pointer I get from s[] doesn't have to be local to shared! :D
16:08 imirkin: the pointer you get out of s[] is to gmem
16:10 pmoreau: gmem is global memory, right?
16:10 imirkin: sssort of
16:10 imirkin: it's a 32-bit window into regular VM memory
16:11 pmoreau: :/
17:25 imirkin: pmoreau: also there's some sort of index -- i think you can have up to 15 separate windows into the gpu vm
18:21 airlied: imirkin: I think you can also pass the handles via vertex attributes
18:22 airlied: otherwise I'm not sure why VertexAttribL1ui64ARB is required
18:23 imirkin: err.... how would they make it to the shader?
18:25 imirkin: oh i see
18:26 imirkin: any sampler type(uvec2) // Converts a pair of 32-bit unsigned integers to
18:26 imirkin: // a sampler type
18:26 airlied: you can have sampler inputs
18:26 imirkin: i'll have to trace code that does that :)
18:26 airlied: I'll have to write code to do it :)
18:33 imirkin: i wonder if there's a way to load the tic/tsc numbers directly
21:27 Dan39: hi \o
21:28 Dan39: had a game running on 1 monitor (the dark mod) and was looking at irc on other. i clicked a link and soon as firefox popped up everything froze. got these errors in dmesg: http://dpaste.com/3N4PV0T eventually it restarted, after a few attemps to kill -9 Xorg which did nothing :|
21:28 Dan39: i fear my gfx card may need another reflow... any comment based on the errors?
21:31 Dan39: tbh, the dmesg output colors are quite nice, so hears a screenshot: http://i.imgur.com/u4weiZP.png
21:32 Dan39: and im sure someone will enjoy that being their team's colors :P
21:32 imirkin: the new dmesg builds like to do that... not a big fan myself
21:33 imirkin: those errors aren't overly specific as to what's going on... those invalid_cmd things are errors that pop up every so often for various people, no idea why
21:33 Dan39: i never get them
21:33 Dan39: those messages are all from when it froze
21:34 Dan39: the rest of logs from several days has no nouveau messages
21:34 Dan39: and i was surprised i couldnt even switch to a console with ctrl+alt+F1
21:35 imirkin: cmd submission was probably hosed
21:35 imirkin: surprised it recovered
21:35 Dan39: yea
21:35 Dan39: i think X and wm restarted completely, it went back into lightdm
21:35 imirkin: yeah, often these sorts of situations lead to overall hangs
21:37 Dan39: the part that i am interested about is "trapped read at .."
21:37 Dan39: but i have absolutely no idea what any of it means
21:37 imirkin: that's most likely a result of bogus command submission
21:39 Dan39: got this in xsession-errors...
21:39 Dan39: XIO: fatal IO error 4 (Interrupted system call) on X server ":0"
21:39 Dan39: after 441113 requests (441113 known processed) with 0 events remaining.
21:39 Dan39: xterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":0"
21:39 imirkin: coz X exited
21:39 imirkin: all running clients got the shaft
21:40 Dan39: heh
21:40 Dan39: i see now, xterm
21:40 Dan39: and whatever XIO is
21:41 Dan39: thanks for the info
21:41 Dan39: will idle but g2g. peace
23:35 imirkin: gnurou: so does upstream nouveau work with the TX1? [with your patches]
23:36 imirkin: i assume that there's a bunch of shader work that needs to be completed for that...
23:39 gnurou: imirkin: you're correct on both points
23:39 gnurou: imirkin: my priority now is to enable secure boot on dGPU, then we can concentrate on shader issues
23:40 gnurou: hopefully we can get some cooperation from the people who know
23:40 imirkin: cool
23:40 gnurou: but getting FECS/GPCCS signed firmwares to load is on the way to anything else anyway
23:40 imirkin: sure
23:40 gnurou: I hope you guys will like this version of secure boot... it took a while to put together
23:41 imirkin: well, it's all up to skeggsb ... i don't really understand any of that stuff
23:42 gnurou: it's down to ~1750 lines now, and almost reads like a story book, so hopefully he will like it :P
23:43 gnurou: we could probably make things simpler, but we have little control over the format of the different firmwares, and I prefer to use them as-is if possible
23:44 gnurou: for instance, PMU firmware comes from different people, and in a different format than FECS/GPCCS, requiring a different loading function... I'm trying to find a way around this, but everything may not end up completely consistent
23:47 imirkin: what does the PMU firmware do again? reclocking?
23:48 gnurou: imirkin: on GM20B I am not sure it does that yet - I suppose we will need to document its interface when we release it officially anyway
23:54 imirkin: gnurou: what is it good for?
23:57 gnurou: imirkin: giving my headaches, apart from that I'm not too sure... :P
23:57 gnurou: I'm sure mupuf_ knows more about it than I do
23:58 imirkin: hehehe
23:58 imirkin: i'm sure the headaches bit is mentioned as the main purpose in its design doc :)
23:58 gnurou: but apparently it it involved with some host-triggered power-management functions, like ELPG
23:59 gnurou: therm and fan control as well it seems?
23:59 gnurou: from what I know more and more PM functions will be moved to it
23:59 mupuf_: imirkin: pdaemon does power gating, load monitoring, fan management
23:59 mupuf_: hw scheduling (executing scripts with the card off bus)
23:59 mupuf_: it also talks to the power sensor
23:59 imirkin: right, i knew about the reclocking bit, wasn't 100% sure about the rest