{"id":2614,"date":"2010-12-08T16:16:50","date_gmt":"2010-12-08T07:16:50","guid":{"rendered":"http:\/\/yasu2.prosou.nu\/blog\/index.php\/2010\/12\/08\/fpt_10_day_1\/"},"modified":"2010-12-08T16:16:50","modified_gmt":"2010-12-08T07:16:50","slug":"fpt_10_day_1","status":"publish","type":"post","link":"https:\/\/yasu2.prosou.nu\/blog\/index.php\/2010\/12\/08\/2614\/","title":{"rendered":"FPT &#8217;10, Day 1"},"content":{"rendered":"<p>Friday pickup bus at 8:30.<br \/>\n[ Opening ]<br \/>\n&#8211; Received total 163<br \/>\n&#8211; Regular oral 32<br \/>\n&#8211; Special oral 5<br \/>\n&#8211; Poster 52<br \/>\n&#8211; Demo 9<br \/>\n&#8211; Accepted total 98<br \/>\n&#8211; JP submission 23, regular 6, poster 7, demo 1.<br \/>\n&#8211; Asia submission 63, accept 37<br \/>\n&#8211; Design competition: reversi opponent.<br \/>\n[ Keynote 1: Reconfigurable Computing &#8211; Evolution of Von Neumann Architecture ]<br \/>\nProf. ShaoJun Wei at Tsinghua University.<br \/>\n\u4fee\u58eb\u3092\u3053\u3053\u3067\u53d6\u3063\u305f\u3042\u3068\u30d9\u30eb\u30ae\u30fc\u3067\u535a\u58eb\u3092\u3068\u3089\u308c\u3066\u3001\u3044\u307e\u306f\u6bcd\u6821\u306e\u8b1b\u58c7\u3060\u305d\u3046\u3067\u3059\u3002\u304b\u3063\u3053\u3044\u3044\u3002<br \/>\n&#8211; Golden Moore: semiconductor, scaling-down rule<br \/>\n&#8211; Von Neumann: computer, Von Neumann architecture<br \/>\nPower density \u306f\u3060\u3044\u305f\u3044\u4e00\u5b9a\u306e\u306f\u305a\u3060\u3063\u305f\u3051\u3069\u3001\u30d7\u30ed\u30bb\u30b9\u306e\u5fae\u7d30\u5316\u304c\u9032\u3080\u3068\u6025\u901f\u306b\u5927\u304d\u304f\u306a\u3063\u305f\u3002\u6f0f\u308c\u96fb\u6d41\u304c\u7121\u8996\u3067\u304d\u306a\u3044\u304b\u3089\u3002Scale down \u304c cost down \u3067\u3042\u3063\u305f\u6642\u4ee3\u3082\u7d42\u308f\u308a\u3064\u3064\u3042\u308a\u3001cost\/gate \u306f 32nm \u2192 22nm \u3067\u306f 3% \u3057\u304b\u5909\u308f\u3089\u306a\u3044\u3002<br \/>\nVon Neumann architecture \u306f\u3044\u308d\u3044\u308d\u9032\u6b69\u3057\u305f\u3051\u3069\u3001instruction \u3092\u3068\u3063\u3066\u304d\u3066 operand \u3092\u3068\u3063\u3066\u304d\u3066\u8a08\u7b97\u3057\u3066 store \u3057\u3066\u3001\u3068\u3044\u3046 Von Neumann bottleneck \u306f\u672c\u8cea\u7684\u306b\u89e3\u6c7a\u3057\u3066\u3044\u306a\u3044\u3002<br \/>\n\u3053\u308c\u3092\u89e3\u6c7a\u3059\u308b\u305f\u3081\u306b datapath \u3068 controller \u3092\u5206\u3051\u3066\u3001reconfigurable processor \u3092\u4f5c\u308c\u3070\u3044\u3044\u3058\u3083\u306a\u3044\u304b\u3001\u3068\u3044\u3046\u304a\u8a71\u3002Operating system (\u30de\u30eb\u30c1\u30bf\u30b9\u30af\u304c\u554f\u984c!), \u9ad8\u4f4d\u5408\u6210\u304b\u3089 power gating \u307e\u3067&#8230;<br \/>\nDatapath \u306e\u3068\u3053\u308d\u306f ALU array \u3067\u3001controller \u306e\u3068\u3053\u308d\u306f RISC-based programmable FSM \u3067\u4f5c\u3063\u3066\u3044\u308b\u3002<br \/>\n\u3053\u306e\u624b\u306e dynamically reconfigurable array \u3067\u306f design partitioning \u304c\u91cd\u8981\u3002\u30c7\u30fc\u30bf\u30d1\u30b9\u3092\u5207\u308a\u5206\u3051\u308b\u3068\u304d\u306b\u6c17\u3092\u3064\u3051\u306a\u3044\u3068 deadlock \u3057\u305f\u308a\u3059\u308b\u3002<br \/>\n\u4e2d\u56fd\u306e\u534a\u5c0e\u4f53\u8f38\u5165\u91cf\u306f 138 billion USD\/year. \u307e\u3058\u304b&#8230; \u7d44\u307f\u7acb\u3066\u3066\u518d\u8f38\u51fa\u3059\u308b\u5206\u3068\u304b\u3082\u5165\u3063\u3066\u304a\u308a\u3001\u56fd\u5185\u6d88\u8cbb\u306f 1\/6 \u304f\u3089\u3044\u3063\u307d\u3044\u3002<br \/>\n[ Keynote 2: FPGA Platforms leading the way in the apps of &#8216;More than Moore&#8217;s&#8217; technology ]<br \/>\nDr. Ivo Bolsens, Senior Vice President and CTO, Xilinx.<br \/>\nDesign cost challenge \u2192 \u30c1\u30c3\u30d7\u3092\u4f5c\u308b\u3053\u3068\u306f\u30ea\u30b9\u30af\u3092\u62b1\u3048\u308b\u3053\u3068\u3002<br \/>\nLSI \u307e\u308f\u308a\u3078\u306e\u6295\u8cc7\u306f\u6025\u901f\u306b\u6e1b\u3063\u3066\u3044\u308b\u307f\u305f\u3044\u3002<br \/>\n\u305d\u3046\u3059\u308b\u3068\u3001\u5f93\u6765\u306e ASIC\/ASSP \u3068 FPGA \u306e application boundary \u304c\u79fb\u52d5\u3057\u3066\u3001FPGA \u3092\u4f7f\u3063\u305f\u65b9\u304c\u3044\u3044\u7bc4\u56f2\u304c\u5e83\u304f\u306a\u308b\u3002<br \/>\nMore than Moore: stacked silicon interconnect.<br \/>\n&#8211; Chip-to-chip via standard I\/Os and serdes: more gates but&#8230; \ud83d\ude41<br \/>\n&#8211; Xilinx \u306e\u30a2\u30ec\u306f\u3001silicon interposer \u306e\u4e0a\u306b FPGA slice \u3092\u4e26\u3079\u3066\u4f5c\u3063\u3066\u3044\u308b\u3002standard I\/O \u3067\u4f5c\u308b\u3088\u308a\u6027\u80fd\u9762\u3067\u305a\u3063\u3068\u6709\u5229\u3002Interposer \u306f TSMC \u304c\u4f5c\u3063\u3066\u3044\u308b\u6a21\u69d8\u3002<br \/>\n&#8211; Chip-package co-design. In-package power plane, on chip decoupling caps&#8230;<br \/>\nProgrammable platform<br \/>\n&#8211; legacy: CPU &#8211; North bridge &#8211; south bridge &#8211; PCI &#8211; FPGA = I\/O extension<br \/>\n&#8211; current: CPU- south bridge &#8211; PCI &#8211; FPGA = co-processing<br \/>\n&#8211; new: CPU &#8211; HT\/QPI &#8211; FPGA = peer-computing (cache coherent!)<br \/>\n\u3084\u3063\u3071\u3044\u3061\u3044\u3061 DMA \u3068\u304b\u3054\u308a\u3054\u308a\u66f8\u3044\u3066\u308b\u3088\u3046\u3058\u3083\u4e16\u306e\u4e2d\u5909\u308f\u3089\u3093\u304b\u306a\u3042\u3002<br \/>\n[ An FPGA architecture supporting dynamically controlled power gating ]<br \/>\nUniversity of British Colombia, Canada.<br \/>\nTurn off regions at run-time with on-chip control.<br \/>\nASIC designers do this regularly.<br \/>\nBut in FPGA:<br \/>\n&#8211; routing for control signals<br \/>\n&#8211; handling rush current in a programmable way.<br \/>\n\u30cf\u30a4\u30a8\u30f3\u30c9\u306e FPGA \u3067\u306f\u96fb\u529b\u304c\u3057\u3093\u3069\u304f\u306a\u3063\u3066\u304d\u3066\u3044\u308b (\u7d76\u5bfe\u305d\u3046\u3060\u3088\u306d) \u306e\u3067\u3001\u306a\u3093\u3068\u304b\u3057\u306a\u3051\u308c\u3070\u306a\u3089\u306a\u3044\u3002<br \/>\nProposed architecturea:<br \/>\n&#8211; Divide FPGA device into power-controlled regions<br \/>\n&#8211; Used general-purpose routing fabric for control signals<br \/>\nLogic block \u3068 routing channel (LB \u304b\u3089\u914d\u7dda\u306b\u306e\u305b\u308b\u3068\u3053\u308d) \u306f power control \u304c\u3067\u304d\u308b\u3002\u30b9\u30a4\u30c3\u30c1\u306f\u554f\u984c (\u3084\u3063\u3066\u306a\u3044\u3063\u307d\u3044)\u3002<br \/>\nsleep transistor \u3092\u3069\u308c\u3060\u3051\u306e\u7bc4\u56f2\u3067\u5171\u6709\u3059\u308b\u304b\u3002\u7bc4\u56f2\u3092\u5927\u304d\u304f\u3059\u308c\u3070\u9762\u7a4d\u306f\u7bc0\u7d04\u3067\u304d\u308b\u3051\u3069\u3001\u5927\u304d\u304f\u3057\u3059\u304e\u308b\u3068\u8a2d\u8a08\u304c\u96e3\u3057\u3044\u3002<br \/>\nRush current\uff1a limit how much can be turned at once.<br \/>\n1) expose it to the user: usual ASIC way<br \/>\n2) expose it to the CAD tools<br \/>\n3) dedicated architectural support: i.e., programmable delay elements in turn-on circuits so they don&#8217;t turn on at once.<br \/>\nCurrent solution is (1).<br \/>\n\u8a55\u4fa1\u306fSPICE\u3067\u3084\u3063\u3066\u3044\u308b\u6a21\u69d8\u3002<br \/>\n&#8211; Area overhead: static gating &gt; dynamic gating by 33%, but less than 1% overhead compared to ungated version.<br \/>\n&#8211; Leakage: Dynamic gating &#038;gt static by 11%. dynamic \/ static &lt; ungated by 40+%.<br \/>\n&#8211; Delay overhead is 10%.<br \/>\nIsolation block \u304c\u5fc5\u8981\u3067\u306f? \u30b3\u30b9\u30c8\u306f\u8a08\u7b97\u306b\u5165\u3063\u3066\u3044\u308b? \u2192 \u51fa\u529b\u30d0\u30c3\u30d5\u30a1\u306e\u3068\u3053\u308d\u3067\u3084\u3063\u3066\u3044\u308b\u3002off \u306b\u3057\u305f\u30d6\u30ed\u30c3\u30af\u306e\u51fa\u529b\u3092\u3054\u308a\u3063\u3068\u505c\u3081\u308b (\u3044\u3044\u306e\uff1f)<br \/>\nswitch \u306f? \u2192 \u5168\u9762\u7684\u306a\u518d\u8a2d\u8a08\u304c\u5fc5\u8981\u3067\u3059\u3002<br \/>\n[ A tiled programmable fabric for quantum-dot cellular automata ]<br \/>\nIIT Delhi \u306e\u5b66\u751f\u3055\u3093\u3002\u91cf\u5b50\u30c9\u30c3\u30c8\u3067\u3059\u3063\u3066!?<br \/>\n4 quantum dots in each cell, 2 mobile electrons &#8211; binary 0, 1, NULL \u3092\u8868\u73fe\u3002wire \u3084\u5404\u7a2e\u306e\u30b2\u30fc\u30c8\u304c\u4f5c\u308c\u308b\u3002<br \/>\n\u30af\u30ed\u30c3\u30af\u306f 4-cycle \u3067\u8868\u73fe\u3002<br \/>\n&#8211; LUTs, CLBs, Switches &#8211; NOT NECESSARY<br \/>\n&#8211; Selective clocking: let the unused cells relax<br \/>\n&#8211; Reduce defects: use clock based scheme<br \/>\n\u3042\u3042\u306a\u3093\u304b\u308f\u304b\u3063\u305f\u3002\u30b2\u30fc\u30c8\u3082\u914d\u7dda\u3082\u540c\u3058\u4ed5\u7d44\u307f\u3067\u3067\u304d\u3066\u308b\u3093\u3060\u30fb\u30fb\u30fb<br \/>\nprogramming \/ clocking \u306e\u3068\u3053\u308d\u304c\u3088\u304f\u308f\u304b\u3089\u3093\u3068\u3067\u3059\u3002<br \/>\n\u30b7\u30ea\u30b3\u30f3\u3067\u306e\u5b9f\u73fe\u307e\u3067\u306f\u3069\u308c\u304f\u3089\u3044\u304b\u304b\u308a\u305d\u3046\uff1f\u2192\u307e\u3060\u3051\u3063\u3053\u3046\u306d\u30fb\u30fb\u30fb<br \/>\n[ Phase-change-memory-based storage elements for configurable logic ]<br \/>\nNon-volatile FPGA is expensive&#8230; New technological opportunities?<br \/>\nPhase-change RAM principle:<br \/>\n&#8211; Material with 2 stable phases: polycrystal (high conductivity) and amorphous (low conductivity).<br \/>\n&#8211; requires heater electrode + contact<br \/>\n&#8211; non-volatile, small size, low delay and cost friendly!<br \/>\n\u66f8\u304d\u8fbc\u307f\u6642\u9593\u306f 50ns \u304f\u3089\u3044\u304b\u306a\u3002<br \/>\n\u9762\u7a4d\u306f SRAM 115 &gt; FLash 46 &gt; PCM 30.<br \/>\nArea reduction up to 13%, delay reduction up to 51%!<br \/>\nPCM\u3001\u3064\u307e\u308aresistor memory\u306f4k\u03a9\u3060\u3051\u3069 pass transistor \u306a\u30899k\u03a9\u3002\u62b5\u6297\u304c\u5c0f\u3055\u3044\u304b\u3089\u9045\u5ef6\u3082\u5c0f\u3055\u3044\u3002<br \/>\n\u88fd\u9020\u306e\u305f\u3081\u306e\u5177\u4f53\u7684\u306a\u554f\u984c\u306f\u305d\u308c\u307b\u3069\u306a\u3044\u3089\u3057\u3044\u3002<br \/>\nwriting cycle \u304c\u554f\u984c\u3067\u3001\u73fe\u72b6\u306eSRAM cell \u306e\u3088\u3046\u306b\u306f\u4f7f\u3048\u306a\u3044\u306e\u3067\u3001Flash \u306e\u4ee3\u66ff\u3068\u3057\u3066\u8003\u3048\u308b\u306e\u304c\u6b63\u3057\u3044\u3001\u3068\u306e\u3053\u3068\u3002<br \/>\n[Dynamic Reconfigurable Bit-Parallel architecture for large-scale regular expression matching]<br \/>\nYusaku Kaneta @ \u5317\u5927\u9662<br \/>\nMassive regex matching in apps such as NIDS (Network intrusion detection system) etc.<br \/>\nStatic compilation approach: fast but hard to change regex in runtime.<br \/>\nDynamic reconfiguration approach: suitable for dynamic reconfiguration, but worst-case performance is not guaranteed.<br \/>\nProposal:<br \/>\n&#8211; Dynamic BP-NFA architecture<br \/>\n&#8211; Dynamic reconfiguration by bit-parallel NFA simulation<br \/>\n&#8211; Extended patterns<br \/>\nDynamic BP-NFA on Virtex-5 FPGA.\u2028- BP-NFA for string pattern: 54 slices, 2.9Gbps<br \/>\n&#8211; BP-NFA for extended patterns: 123 slices, 1.6Gbps.<br \/>\n&#8211; It&#8217;s FAST!<br \/>\n&#8211; Worst-case performance is GUARANTEED, while others are not.<br \/>\n&#8211; Fast reconfiguration.<br \/>\n\u3059\u3054\u3044\u3002<br \/>\nCan process 256 patterns in parallel.<br \/>\n[ Impact on Reconfigurable Hardware on Acceelrating MPI_Reduce() ]<br \/>\nAlready implemented MPI_Barrier() in previous research and got promising results.<br \/>\nTestbed: Xilinx ML410 Board x 64 + bidirectional SATA cable.<br \/>\nPowerPC 300MHz + reduce core + 16 local link interfaces.<br \/>\n\u5c0f\u3055\u306a\u30e1\u30c3\u30bb\u30fc\u30b8\u304c\u305f\u304f\u3055\u3093\u98db\u3076\u3088\u3046\u306a\u72b6\u6cc1\u3067\u306f commodity \u306a\u30af\u30e9\u30b9\u30bf\u3088\u308a\u6539\u5584\u3059\u308b\u3068\u306e\u3053\u3068\u3002\u5927\u304d\u3044\u30e1\u30c3\u30bb\u30fc\u30b8\u306e\u5834\u5408\u306f RDMA \u304c\u5a01\u529b\u3092\u767a\u63ee\u3059\u308b\u304b\u3089\uff1f<br \/>\nscalability \u304c\u6539\u5584\u3059\u308b\u70b9\u306f\u3088\u3055\u305d\u3046\u3002<br \/>\n[Accelerating HMMER on FPGA using Parallel Prefixes and Reductions]<br \/>\nWriting Virterbi and DP.<br \/>\n[Multiple dataset reduction on FPGAs ]<br \/>\nNo shown?<br \/>\n[ Accelerating FPGA Development Through the Automatic Parallel Application of Standard Implementation Tools ]<br \/>\nPain for large-scale FPGA implementations:<br \/>\n&#8211; No software-like linkage allowing concurrent module implementation<br \/>\n&#8211; Global implementation changes when adding or changing signal probe<br \/>\n&#8211; P&#038;R algorithm is mostly single threaded and memory eating<br \/>\nImplement each major block as a partial module<br \/>\n&#8211; Simplified PR design flow without reconfiguration<br \/>\n&#8211; Automatic floorplanning, including bus macro insertion<br \/>\n\u30e2\u30b8\u30e5\u30fc\u30eb\u3054\u3068\u306b\u914d\u7f6e\u914d\u7dda\u3057\u3066\u304a\u3044\u3066\u3001\u304f\u3063\u3064\u3051\u308b\u3068\u304d\u306f inter-module net delay \u3060\u3051\u8003\u3048\u308b\u306e\u304b\u3002<br \/>\n\u81ea\u52d5\u30d5\u30ed\u30a2\u30d7\u30e9\u30f3\u306e\u3068\u3053\u308d\u3068\u304b\u3001\u304b\u3063\u3053\u3044\u3044\u3002<br \/>\nincremental design \u3092\u3046\u307e\u304f\u4f7f\u3063\u3066 P&#038;R \u306b\u304b\u304b\u308b\u6642\u9593\u3068\u304b\u304b\u306a\u308a\u77ed\u7e2e\u3055\u308c\u308b\u6a21\u69d8\u3002<br \/>\ndesign verification \u3068\u304b\u3082\u77ed\u7e2e\u3067\u304d\u308b\u3088\u306d\uff01<br \/>\n[Parallelizing FPGA Placement Using Transactional Memory]<br \/>\nCAD \u306e\u4e26\u5217\u5316\u306f\u91cd\u8981 &#8211; simulated annealing based placement<br \/>\n1. start with random placement of blks<br \/>\n2. randomly pick a pair of blks to swap<br \/>\n3. evaluate and loop<br \/>\n\u3044\u308d\u3044\u308d\u306a trial \u304c\u3042\u308b\u308f\u3051\u3060\u304b\u3089\u305d\u3053\u306f\u4e26\u5217\u5316\u3067\u304d\u308b\u3088\u306d\u3002<br \/>\nSwap \u3092 accept \u3059\u308b\u304b reject \u3059\u308b\u304b\u3001\u3068\u3044\u3046\u306e\u3092\u3001transaction \u3092 exec \u3059\u308b\u304b abort \u3059\u308b\u304b\u3067\u8868\u73fe\u3067\u304d\u308b\u3068\u3044\u3044\u611f\u3058\u3002<br \/>\nSTM (software based transactional memory) has high overhead, but no HTM (hardware TM) yet.<br \/>\n&#8211; New software transactional memory (tinySTM)<br \/>\n&#8211; potential easier parallelization with TM.<br \/>\n&#8211; based on VPR (Versatile P&#038;R) 5.0<br \/>\n&#8211; Platform: 8 CPUs<br \/>\n\u5b66\u751f\u304c1\u30f6\u6708\u3067\u3084\u3063\u3066\u306e\u3051\u305f\u3002\u3064\u307e\u308a\u308f\u308a\u3068\u5b9f\u88c5\u306f\u7c21\u5358\u3002P&#038;R \u306f\u30ea\u30cb\u30a2\u306b\u901f\u304f\u306a\u308b\u3051\u3069 QoR degradation \u304c\u3059\u3054\u3044 (30%) \u3002abort rate \u3082 60% \u3068\u9ad8\u3044\u3002<br \/>\nVPR \u81ea\u4f53\u304c\u9014\u4e2d\u3067\u3084\u308a\u306a\u304a\u3059\u305f\u3081\u306e\u30b3\u30fc\u30c9\u3092\u3082\u3063\u3066\u3044\u308b\u306e\u3067\u3001\u305d\u306e\u3042\u305f\u308a\u3092\u6539\u5584\u3057\u305f\u308a\u3057\u3066\u307f\u305f\u3068\u3053\u308d\u3001QoR deg worst 35% to 8%, avg 7% to 2% \u3067\u3001\u304b\u306a\u308a\u6539\u5584\u3002<br \/>\n[A Message-Passing Multi-Softcore Architecture on FPGA for Breadth-First Search]<br \/>\nBreadth-first search in graph.<br \/>\nglobal buffer \u3068 barrier sync \u304c\u5fc5\u8981\u3002\u3061\u3087\u3063\u3068\u3088\u304f\u308f\u304b\u3089\u3093\u3002<br \/>\n[Deterministic Multi-Core Parallel Routing for FPGAs]<br \/>\nRouting \u3092\u4e26\u5217\u5316\u3059\u308b\u304a\u8a71\u3002<br \/>\nPathFinder: VPR \u3068\u4e26\u3093\u3067 Xilinx\/Altera \u306e\u30d9\u30fc\u30b9\u306b\u306a\u3063\u3066\u3044\u308b\u3084\u3064\u3002Maze routing \u3092\u4f7f\u3063\u3066\u3044\u308b\u3002<br \/>\n1. Route all signals (allow shorts)<br \/>\n2. Increase penalties for shorts<br \/>\n3. Route all signals<br \/>\n3.1 rip-up and re-route next signal<br \/>\n3.2 update congestion<br \/>\n3.3 return to 3.1 if more signals remaining<br \/>\n4. return to 2 if shorts remain<br \/>\n&#8211; Fine-grained: maze routing of a single net in parallel<br \/>\n&#8212; using pthreads, parallelize calculation of forward cost &#038; adding coresponding nodes to the priority queue<br \/>\n&#8212; for N procs, maintain N separate priority queues to avoid need of locks<br \/>\n&#8211; Coarse-grained: each node routes different net<br \/>\n&#8212; 3 \u306e\u3068\u3053\u308d\u304c\u307e\u308b\u3054\u3068\u4e26\u5217\u5316\u3055\u308c\u3066 MPI \u3067\u3064\u306a\u304c\u308b<br \/>\nFine-grained \u306f Core2Quad (FSB\u5171\u6709) \u3067\u306f\u9045\u3044\u3051\u3069 Core i5 (L3 \u5171\u6709) \u306a\u3089\u3044\u3051\u308b\u3002Coarse-grained \u306a\u3089\u3069\u3061\u3089\u3067\u3082\u3002<br \/>\n[The TransC Process Model and Interprocess Communication]<br \/>\nTransC language<br \/>\n&#8211; C-like<br \/>\n&#8211; Supports parallel processes: communication via data streams<br \/>\n&#8211; Multiple return values (!)<br \/>\n[Comparing Performance and Energy Efficiency of FPGAs and GPUs for High Productivity Computing]<br \/>\n\u3044\u304f\u3064\u304b\u306e\u30a2\u30d7\u30ea\u30b1\u30fc\u30b7\u30e7\u30f3\u3067\u6bd4\u8f03\u8a55\u4fa1\u3057\u3066\u308b\u3002<br \/>\nFPGA \u3067 FFT \u3084\u308b\u3068\u901f\u3044\u306a\u3002flops\/W \u306f\u5727\u5012\u7684\u3002<br \/>\nMonte-carlo \u306f FPGA \u306e\u307b\u3046\u304c GPU \u3088\u308a\u901f\u3044\u3063\u307d\u3044\u3093\u3060\u3051\u3069\u3001\u3046\u3046\u3080\u3002\u3069\u3046\u3044\u3046\u30a2\u30fc\u30ad\u30c6\u30af\u30c1\u30e3\u3067\u3084\u3063\u3066\u308b\u304b\u6c17\u306b\u306a\u308b\u305e\u3002<br \/>\n[Local-and-Global Stall Mechanism for systolic Computational- Memory Array on Extensible Multi-FPGA System]<br \/>\n\u6771\u5317\u5927\u306e\u738b\u3055\u3093\u3002<br \/>\n\u7570\u306a\u3063\u305f\u30af\u30ed\u30c3\u30af\u30c9\u30e1\u30a4\u30f3\u9593\u306e systolic array \u72b6\u306e PE \u305f\u3061\u3092\u540c\u671f\u3055\u305b\u308b\u30b7\u30b9\u30c6\u30e0\u306e\u8a71\u3002FIFO \u306e empty \u4fe1\u53f7\u306a\u3069\u304b\u3089\u751f\u6210\u3057\u305f local stall signal \u3068\u3001\u305d\u308c\u3092\u5168\u90e8 or \u3068\u3063\u305f global stall signal \u3092\u4f7f\u3046\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Friday pickup bus at 8:30. [ Opening ] &#8211; Received total 163 &#8211; Regular oral 32 &#8211; Special oral &hellip; <a href=\"https:\/\/yasu2.prosou.nu\/blog\/index.php\/2010\/12\/08\/2614\/\" class=\"more-link\"><span class=\"screen-reader-text\">&#8220;FPT &#8217;10, Day 1&#8221; \u306e<\/span>\u7d9a\u304d\u3092\u8aad\u3080<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":4,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":""},"categories":[10],"tags":[],"class_list":["post-2614","post","type-post","status-publish","format-standard","hentry","category-conference-logs"],"_links":{"self":[{"href":"https:\/\/yasu2.prosou.nu\/blog\/index.php\/wp-json\/wp\/v2\/posts\/2614","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/yasu2.prosou.nu\/blog\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/yasu2.prosou.nu\/blog\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/yasu2.prosou.nu\/blog\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/yasu2.prosou.nu\/blog\/index.php\/wp-json\/wp\/v2\/comments?post=2614"}],"version-history":[{"count":0,"href":"https:\/\/yasu2.prosou.nu\/blog\/index.php\/wp-json\/wp\/v2\/posts\/2614\/revisions"}],"wp:attachment":[{"href":"https:\/\/yasu2.prosou.nu\/blog\/index.php\/wp-json\/wp\/v2\/media?parent=2614"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/yasu2.prosou.nu\/blog\/index.php\/wp-json\/wp\/v2\/categories?post=2614"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/yasu2.prosou.nu\/blog\/index.php\/wp-json\/wp\/v2\/tags?post=2614"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}